Improved user interface
This version has an improved user interface that takes the genome FASTA file and associated organism name directly on the command line. For example, to annotate a Vibrio cholerae genome sequence in the file Vchol.fasta:
pgap.py -r -g Vchol.fasta -s 'Vibrio cholerae' -o Vchol.annot
For more details visit our Quick Start page.
Additional output files for better interoperability
In addition to the GFF, GenBank, and protein FASTA annotation files that PGAP has always produced, it now provides:
- annot_cds_from_genomic.fna: nucleotide sequences in FASTA format of all coding sequence (CDS) features annotated on the assembly, based on the genome sequence.
- annot_translated_cds.faa: protein sequences in FASTA format of CDS features annotated on the genomic records. The sequences are the conceptual translation of the nucleotide sequence provided in the annot_cds_from_genomic.fna.gz file.
- annot_with_genomic_fasta.gff: annotation in GFF format followed by the ## FASTA pragma and the genomic sequence(s) in FASTA format. This makes the file directly useable by Roary.
More Gene Ontology (GO) terms in the annotation
PGAP assigns function to predicted proteins based on hits to Protein Family Models, such as protein profile HMMs, Blast hits, and domain architectures. New in this release, GO terms and Enzyme Commision (EC) numbers associated with domain architectures are inherited by the annotated proteins. On average, 50% of proteins annotated on a genome are annotated with at least one GO term.
And, as every previous release, this release comes with incremental improvements by expert curators of the Protein Family Model collection that drives the precision of PGAP’s structural and functional annotation.
Stay up to date
We want to hear from you!
Please try this new version and share your experience with us.