On Wednesday, December 11, 2019 at 12 PM, NCBI staff will present a webinar that will show you how to use NCBI’s PGAP (https://github.com/ncbi/pgap) on your own data to predict genes on bacterial and archaeal genomes using the same inputs and applications used inside NCBI. You can run PGAP your own machine, a compute farm, or in the Cloud. Plus, you can now submit genome sequences annotated by your copy of PGAP to GenBank. Attend the webinar to learn more!
Date and time: Wed, Dec 11, 2019 12:00 PM – 12:45 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
Read the recent publication (PMID: 31427293) on the AMRFinder, a tool that identifies antimicrobial resistance (AMR) genes in bacterial genome sequences using a high-quality curated AMR gene reference database. We use the AMRFinder to identify AMR genes in the hundreds of bacterial genomes that NCBI receives every day, and the results of AMRFinder are used in NCBI’s Isolates Browser to provide accurate assessments of AMR gene content. You can install AMRFinder locally and run it yourself. Follow the instructions on our GitHub site.
Since the publication we have upgraded AMRFinder to AMRFinderPlus. The enhanced tool now
supports searches based on protein annotations, nucleotide sequences, or both for best results
identifies point mutations in Campylobacter, E.coli, Shigella, and Salmonella
optionally identifies many genes involved in biocide, heat, metal, and stress resistance, as well as many antigenicity and virulence genes
provides information about gene function, including resistance to individual antibiotics and other phenotypes
The latest improvement in the NCBI search experience is designed to help you quickly find microbial proteins. Now when you search for a prokaryotic protein name such as recombinase RecA in NCBI’s sequence databases or in the All databases search, a high-quality representative protein sequence is highlighted in a panel at the top of the results page (Figure 1).
The result panel also allows you to quickly link to related resources such as NCBI’s new pages for protein family models, Identical Protein Groups, and SPARCLE, NCBI’s protein domain architecture resource. We also provide as-you-type suggestions so you don’t have to type out some of the long names.
Figure 1. The result for a search with recombinase RecA. The panel provides access to analysis tools, downloads, and relevant links to the protein family, the RefSeq protein, the identical protein group, and citations in PubMed.
Try these protein name searches, or your own, and use the as-you-type suggestions to assist your searches.
We are now showing the curated evidence used for assigning names and, if possible, gene symbols, publications, and Enzyme Commission numbers on nearly 70% (83 million) microbial RefSeq proteins. This evidence includes a hierarchical collection of curated Hidden Markov Model (HMM)-based and BLAST-based protein families, and conserved domain architectures.
As of March 2018, there were 141,000 prokaryotic genomes in the Assembly database. As this database grows, misassigned prokaryotic genomes becomes a serious problem. Taxonomy misassignment can occur through simple submission error or can accumulate as new information adds greater specification to the taxonomic tree.
A paper in the International Journal of Systematic and Evolutionary Microbiology presents the method NCBI scientists used to verify taxonomic identities in prokaryotic genomes. The authors used an Average Nucleotide Identity method with optimum threshold ranges for prokaryotic taxa to review all prokaryotic genome assemblies in GenBank. This method relies on Type strain information and is one outcome of a 2015 workshop involving several important parties in the bacteriology community.