Tag: NCBI Taxonomy

NCBI Taxonomy to include phylum rank in taxonomic names

NCBI Taxonomy will append a list of 42 names of prokaryote phyla published for validation purposes as required under the International Code of Nomenclature for Prokaryotes (ICNP). You can still search for previous informal names, and any informal phylum rank names not addressed in the validation list will remain unchanged.

The largest named groups affected by this change are:

Current name New name
Firmicutes Bacillota
Proteobacteria Pseudomonadota
Actinobacteria Actinomycetota
Bacteroidetes Bacteroidota

In the first half of 2021, the International Committee on Systematics of Prokaryotes (ICSP) voted to include the rank of phylum under taxonomic names covered by the International Code of Nomenclature of Prokaryotes (ICNP) (2008 Revision). The rank phylum was previously widely used in literature for prokaryotic names, and included in the NCBI Taxonomy, but not formally recognized in the ICNP. Currently, this rank is assigned to 167 bacterial and 39 archaeal informal names in NCBI Taxonomy. The newly adjusted rule (Rule 8) in the ICNP requires all formal rank names to be formed by the addition of the suffix ” -ota” to the stem of the name of the designated type genus. NCBI Taxonomy adheres to the rules stipulated in several codes of nomenclature and this means that several names in long standing use will be changed accordingly.

NCBI Taxonomy is a curated classification and nomenclature for all of the organisms in public sequence databases. This currently represents about 10% of the described life on the planet.

March 10 Webinar: Where to find data for your research organism!

March 10 Webinar: Where to find data for your research organism!

Do you work with data from organisms outside the traditional set of model organisms? Join us on March 10, 2021 to learn how to use NCBI resources including NCBI’s Taxonomy and BLAST that can help you find information from your organism and closely related taxa. You will see an example that shows you how to retrieve and download gene sequences for a set of species, generate multiple sequence alignments, and design primers using Primer-Blast.

  • Date and time: Wed, March 10, 2021 12:00 PM – 12:45 PM EST
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Enhanced prokaryote type strain report now with details on needed type strain data

The Prokaryote type strain report provides information on type-strains for over 18,000 species. We revised and expanded the report to make it easier to identify cases where sequencing or establishing type material would have the biggest impact on improving prokaryote taxonomy and accurate identification.  These cases include species with designated type strains but without a sequenced type strain assembly and species without designated type material. We hope that the community will prioritize sequencing type strains for the former set of species (Table 1) and establishing a neotype or reftype, where applicable (as defined in Cuifo et al 2018) for the latter set (Table 2).

Other changes from the old format file are detailed in a recent genomes announce post.

Scientific Name Type material/co-identical strains Assemblies
Burkholderia ubonensis CCUG:48852, CIP:1070, … 308
Escherichia albertii Albert 19982, BCCM/LMG:20976, … 181
Xanthomonas perforans AATCC:BAA-983, DSM:18975, … 153
Listeria innocua ATCC:33090, BCCM/LMG:11387, … 106
Streptococcus iniae ATCC:29178, BCCM/LMG:14520, … 94
Vibrio lentus CECT:5110, CIP:107166, … 87
Vibrio cyclitrophicus ATCC:700982, BCCM/LMG:21359, … 83
Pseudomonas coronafaciens BCCM/LMG:5060, CFPB:2216, … 77
Aliivibrio fischeri ATCC:7744, BCCM/LMG:4414, … 66
Xanthomonas fragariae ATCC:33239, BCCM/LMG:708, … 61

Table 1. The top 10 candidate species for sequencing type-strains sorted by the number of assemblies. These have designated type strains but no type strain assembly. We generated the list by sorting by “number of assemblies from type materials per species”, then by decreasing “number of assemblies per taxon”, then filtering out “type materials and coidentical strains” = “na”.

Table 2. The top 10 candidates for proposing a reftype assembly, or neotype where applicable sorted by the number of assemblies. These species have no designated type strain.  We generated the list by selecting for “type materials and coidentical strains” = “na”, “number of assemblies from type materials per species” = 0, and sorting by decreasing “number of assemblies per taxon”, then filtering out Candidatus.

Please contact info@ncbi.nlm.nih.gov if you want to provide information about missing type-strains.

Expanded average nucleotide identity analysis now available for prokaryotic genome assemblies

As we described in an earlier post, GenBank uses average nucleotide identity (ANI) analysis to find and correct misidentified prokaryotic genome assemblies. You can now access ANI data for the more than 600,000 GenBank bacterial and archaeal genome assemblies through a downloadable report (ANI_report_prokaryotes.txt) available from the genomes/ASSEMBLY_REPORTS area of the FTP site. The README describes the contents of the report in detail. You can use the ANI data to evaluate the taxonomic identity of genome assemblies of interest for yourself.

The new ANI_report_prokaryotes.txt replaces the older ANI_report_bacteria.txt in the same directory. We are no longer updating the ANI_report_bacteria.txt file and will remove it after 31st May 2020.

Improving the Display of Type Material in the NCBI TaxBrowser

Have you ever been confused by multiple taxonomic names for a single organism? You’re not alone! It’s one of the challenges in maintaining any biological database. Recently we updated the NCBI TaxBrowser to assist with this.

Let’s start with a brief word about how investigators name species in the first place. For any new species, the reporting author declares a “type.” They then deposit a specimen, or “type material,” in a publicly available biorepository. This type material is tied to the new species name and serves as a reference for future comparisons. Researchers can then use DNA sequences obtained from type material to identify other samples from the same species. NCBI currently uses such an approach to verify the taxonomic assignment of prokaryotic genomes.

Our Taxonomy group has been curating type material records in the Taxonomy database since 2013 using a common vocabulary accepted by our international partners (the INSDC). For example, the Entrez query “type material[prop]” in the Taxonomy database will return all type material at NCBI.

So what are the improvements to the TaxBrowser?

Continue reading “Improving the Display of Type Material in the NCBI TaxBrowser”

May 16 webinar: Improved Standalone BLAST database and programs: now with taxonomic information

May 16 webinar: Improved Standalone BLAST database and programs: now with taxonomic information

Next Wednesday, May 16, 2018, we’ll show you how to download and use the latest standalone BLAST databases, dbv5. You’ll learn how to use BLASTdbv5 and the new BLAST programs to limit searches to taxonomic groups and to retrieve sequences from the database by taxonomy.

Date and time: Wed, May 16, 2018 12:00 PM – 12:30 PM EDT

Register here: https://bit.ly/2qW7LLy

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

BLAST+ database improved

We’ve made some recent enhancements to the BLAST+ applications that allow you to:

  1. Limit your search by taxonomy using information built into the BLAST databases
  2. Search sequences by accession faster
  3. Use blastdbcmd to retrieve sequences by taxonomy from a BLAST database

The new version of the BLAST databases (version 5, release notes) supports the items listed above. You can access the new executables on FTP. Sample version 5 databases are also available.

Note: This is an alpha release to allow users to test and comment on new features.

Problems/Feedback

Please send problem reports and feedback to blast-help@ncbi.nlm.nih.gov or write to the Help Desk.

New taxonomy files available with lineage, type, and host information

New taxonomy files available with lineage, type, and host information

NCBI is now producing a new set of taxonomy files that include the taxonomic lineage of taxa, information on type strains and material, and host information. These files are particularly helpful for people maintaining local installations of NCBI data.

You can download the new archive (new_taxdump.tar.gz) from the taxonomy directory on the FTP site (ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/). The new files are typematerial.dmp, typeoftype.dmp, rankedlineage.dmp, fullnamelineage.dmp,
taxidlineage.dmp, and host.dmp. Please see the readme file for details of the file contents.

The original taxonomy file archive without the new content will remain available under its original name, taxdump.tar.gz. The section below shows the entries for the monkey species Cercopithecus lomamiensis from the new ranked lineage and  type material files. Continue reading “New taxonomy files available with lineage, type, and host information”