You can now download PGAP from GitHub and run it on your machine, compute farm or the cloud, on any public or privately-owned genome. PGAP predicts genes on bacterial and archaeal genomes using the same inputs and applications used inside NCBI. This is a great opportunity for you to try it now and send us comments (please use GitHub issues).
How does it work? Provide some basic information and the FASTA files for your genome of interest, and voila! PGAP will produce an annotation, conforming to what the pipeline internal to NCBI would generate.
The pipeline is written in the Common Workflow Language (CWL) and is packaged in a Docker container with the necessary binaries and cwltool, the CWL reference implementation. Datasets curated at NCBI for prokaryotic annotation, such as proteins representing homology clusters, Hidden Markov Models and other annotation rules are also distributed with the tool.
This version of the software does not yet provide submission-ready files for GenBank, but this is scheduled for release next month. We are also working on including taxonomic verification of the input genomes to PGAP, so stay tuned!
Look for continuous updates to PGAP on GitHub, as we improve it based upon your feedback. More information about PGAP can be found here.
2 thoughts on “Run the Prokaryotic Genome Annotation Pipeline (PGAP) on your own machine”
Can I annotate metagenome binned genomes with the pipeline? We cannot assign the taxon names to the binned genomes.