You can now download PGAP from GitHub and run it on your machine, compute farm or the cloud, on any public or privately-owned genome. PGAP predicts genes on bacterial and archaeal genomes using the same inputs and applications used inside NCBI. This is a great opportunity for you to try it now and send us comments (please use GitHub issues).
How does it work? Provide some basic information and the FASTA files for your genome of interest, and voila! PGAP will produce an annotation, conforming to what the pipeline internal to NCBI would generate.
The pipeline is written in the Common Workflow Language (CWL) and is packaged in a Docker container with the necessary binaries and cwltool, the CWL reference implementation. Datasets curated at NCBI for prokaryotic annotation, such as proteins representing homology clusters, Hidden Markov Models and other annotation rules are also distributed with the tool.
This version of the software does not yet provide submission-ready files for GenBank, but this is scheduled for release next month. We are also working on including taxonomic verification of the input genomes to PGAP, so stay tuned!