NCBI staff to present 3 posters at Advances in Genome Biology and Technology (AGBT), February 2020

Next week, NCBI staff will attend AGBT in Marco Island, Florida. On Tuesday, February 25, 2020, three posters from NCBI staff will be on display from 4:40 p.m. – 6:10 p.m. during the Poster Session and Wine Reception in the Banyan and Calusa Ballroom Foyers, Levels 1 and 3. Read on to learn a little bit about what we’ll be presenting.

Continue reading

New PGAP release with Singularity, no-internet option, and Taxonomy Check

A new version of the Prokaryotic Genome Annotation Pipeline (PGAP) with several important features is now available on Github.

  • In response to several requests we have added the option of running PGAP with Singularity, Podman or any other Docker-compatible executable you wish to use.
  • We have also lifted the requirement for internet access in case you have privacy concerns. To run the pipeline without internet access, set the flag
  • Are you unsure about the identity of organism you sequenced? We’ve added the Taxonomy-Check module to help you. This module will confirm the organism name or suggest a new taxonomic assignment through average nucleotide identity comparison with type material assemblies from GenBank. The check is currently an optional validation step prior to PGAP.

Try these new features and let us know what you think! Or submit your PGAP-annotated assembly to GenBank. And remember that if you are still improving the assembly and your genome doesn’t pass the pre-annotation validation, you can use the --ignore-all-errors flag to get a preliminary annotation.

Important changes coming to prokaryotic Reference and Representative genome assemblies

We are making changes to the set of bacterial and archaeal RefSeq Reference and Representative assemblies in February 2020.

  • We will reduce the number of Reference assemblies to 15 that have annotation provided by outside experts (Table 1) and re-annotate the 105 other current Reference assemblies using the latest Prokaryotic Genome Annotation Pipeline (PGAP) software. The re-annotated assemblies will lose reference status.
  • We will reassess and revise the set of Representative assemblies so that there is one assembly per species to better reflect the taxonomic diversity of the RefSeq bacterial and archaeal assemblies.

Continue reading

NCBI on YouTube: Get the most out of NCBI resources with these videos

Check out the latest videos on YouTube to learn how to best use NCBI graphical viewers, SRA, PGAP, and other resources.

Genome Data Viewer: Analyzing Remote BAM Alignment Files and Other Tips

This video shows you how to upload remote BAM files, and succinctly demonstrates handy viewer settings, such as Pileup display options, and highlights the very helpful tooltips in the Genome Data Viewer (GDV). There’s also a brief blog post on the same topic.

Continue reading

December 11 Webinar: Running the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) on your own data

On Wednesday, December 11, 2019 at 12 PM, NCBI staff will present a webinar that will show you how to use NCBI’s PGAP ( on your own data to predict genes on bacterial and archaeal genomes using the same inputs and applications used inside NCBI. You can run PGAP your own machine, a compute farm, or in the Cloud. Plus, you can now submit genome sequences annotated by your copy of PGAP to GenBank.  Attend the webinar to learn more!

  • Date and time: Wed, Dec 11, 2019 12:00 PM – 12:45 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

New release of the Prokaryotic Genome Annotation Pipeline with updated tRNAscan and protein models

A new version of the Prokaryotic Genome Annotation Pipeline (PGAP) is now available on GitHub. This release uses a new and improved version of tRNAscan (tRNAscan-SE:2.0.4) and includes our most up-to-date Hidden Markov Model and BlastRule collections for naming proteins.

Remember that you can submit the results of PGAP to GenBank. Or, if you are still improving the assembly and your genome doesn’t pass the pre-annotation validation, you can use the –ignore-all-errors mode to get a preliminary annotation.

See our previous post and our documentation for details on how to set up and run PGAP yourself.

Try PGAP and let us know how you like it!

New release of the Prokaryotic Genome Annotation Pipeline now available

We have released a new version of the Prokaryotic Genome Annotation Pipeline (PGAP), available on GitHub. The new release includes the ability to ignore pre-annotation validation errors (–ignore-all-errors). This new feature allows you to produce a preliminary annotation for a draft version of the genome, even one that contains vector and adapter sequences or that is outside of the size range for the species. This draft annotation should be helpful with your ongoing work on the genome assembly. Please keep in mind that these pre-annotations and assemblies with contaminants or other errors are not suitable for submission to GenBank.

Another new feature allows you to provide the name of the consortium that generated the assembly and annotation so that this information appears in the final GenBank records. For more details, consult our guidelines on input files.

See our previous post and our documentation for details on how to obtain and run PGAP yourself.

Next on our to-do list is a module for calculating Average Nucleotide Identity (ANI) to confirm the assembly’s taxonomic assignment. Stay tuned!