Tag: NCBI Taxonomy

RefSeq Release 217

RefSeq Release 217

RefSeq release 217 is now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.

What’s included in this release?

As of March 8, 2023, this full release incorporates genomic, transcript, and protein data, containing:

  • 348,351,219 records
  • 254,500,694 proteins
  • 50,975,429 RNAs
  • sequences from 130,837 organisms

The release is provided in several directories as a complete dataset and divided by logical groupings. Continue reading “RefSeq Release 217”

New & Improved NCBI Datasets Genome and Assembly Pages

New & Improved NCBI Datasets Genome and Assembly Pages

Legacy pages will be redirected effective June 2023

In June 2023, NCBI’s Assembly and Genome record pages will be redirected to new Datasets pages as part of our ongoing effort to modernize and improve your user experience. NCBI Datasets is a new resource that makes it easier to find and download genome data 

We will update the following pages:
  • The NCBI Assembly pages will be redirected to the new DatasetsGenome pages that describe assembled genomes and provide links to related NCBI tools such as Genome Data Viewer and BLAST. 
  • The NCBIGenome pages will be redirected to the DatasetsTaxonomy pages that provide a taxonomy-focused portal to genes, genomes and additional NCBI resources.  
  • During this transition, you will have the option to return to the legacy Genome and Assembly pages. 

Continue reading “New & Improved NCBI Datasets Genome and Assembly Pages”

Upcoming changes to influenza virus names in NCBI Taxonomy

Upcoming changes to influenza virus names in NCBI Taxonomy

In order to reflect changes to the International Code of Virus Classification and Nomenclature (ICVCN) made by the International Committee on Taxonomy of Viruses (ICTV), NCBI will introduce new binomial influenza species names like ‘Alphainfluenzavirus influenzae.’ Changes are expected to be in place near summer 2023.

We recognize that the traditional influenza virus names like ‘Influenza A virus’ and ‘Influenza B virus’ are broadly used in public health, educational institutions, and research. To minimize the impact of this change to those who use NCBI resources, the taxonomy schema will keep the former names in the lineages for each species; however, they will be moved below the (new) species taxa in the hierarchy. See example below.

Continue reading “Upcoming changes to influenza virus names in NCBI Taxonomy”

Updated bacterial and archaeal reference genomes collection now available!

Updated bacterial and archaeal reference genomes collection now available!

An updated bacterial and archaeal reference genome collection is available! This collection of 17,163 genomes was built by selecting exactly one genome assembly for each species among the 272,000+ prokaryotic genomes in RefSeq, except for E. coli for which two assemblies were selected as reference.

A total of 497 species are included in this collection for the first time. In addition, comparing to the October 2022 set, 174 species are represented by a better assembly and 15 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment. The criteria for selecting one assembly for a given species from all assemblies available in RefSeq for the species include assembly contiguity and completeness and quality of the RefSeq annotation. See the documentation for details.

We have updated the nucleotide BLAST RefSeq reference genomes database (fourth in the menu) as well as the database on the Microbial Nucleotide BLAST page to reflect these changes. You can also run BLAST searches against the proteins annotated on these reference genomes (RefSeq Select proteins database, second in the menu).

Prokaryotic phylum name changes coming soon!

Prokaryotic phylum name changes coming soon!

Beginning in the first week of January 2023, NCBI Taxonomy will initiate changes to prokaryote phylum names in accordance with the recent inclusion of rank ‘phylum’ in the International Code of Nomenclature for Prokaryotes (ICNP). We first announced this update that involves changes to 42 NCBI taxa about a year ago. We will change several names that have long been in use (e.g., Firmicutes, Proteobacteria) to newly formalized names (e.g., Bacillota, Pseudomonadota) that may be unfamiliar to some.

You will still see the previous names on records and can search using them, but they will not be displayed as prominently as before. The organism names on Entrez records will not change (e.g., Bacillus subtilis). However, we will update the phylum names on the displayed lineages for ~276 million records (see an example in Figure 1 below). Continue reading “Prokaryotic phylum name changes coming soon!”

Now available: Updated prokaryote representative genomes collection

Now available: Updated prokaryote representative genomes collection

An updated bacterial and archaeal representative genomes collection is available! We selected a total of 16,665 of the 262,000 prokaryotic assemblies in RefSeq to represent their respective species. For the first time, more complete assemblies (as calculated by CheckM) were ranked higher than less complete assemblies. See the ranked list of criteria for selecting representative assemblies here. Continue reading “Now available: Updated prokaryote representative genomes collection”

Fungal species identification using DNA: an NCBI and USDA-APHIS collaboration with a focus on Colletotrichum

Fungal species identification using DNA: an NCBI and USDA-APHIS collaboration with a focus on Colletotrichum

As reported in the journal Plant Disease,  a recent collaboration between National Library of Medicine’s NCBI and the U.S. Department of Agriculture’s Animal and Plant Health Inspection Service (APHIS) analyzed public sequence records for the fungal genus Colletotrichum, an important group of fungal plant pathogens that are a significant threat  to food production. Colletotrichum species are challenging to identify accurately, and public sequences may contain out of date taxonomic information. The study improved the accuracy of species names assigned to Colletotrichum database sequences, verified a comprehensive set of reliable reference markers for the genus, and produced a multi-marker tree as well as the genome based interactive tree shown in Figure 1.

Figure 1.  Views from genome assembly derived multi-protein distance tree that shows the analysis of publicly available Colletotrichum genomes. The interactive tree is available online. You can browse, search, download, and export the tree. As an example search, you can demonstrate that assembly GCA_002901105.1 was incorrectly labeled as Colletotrichum gloeosporioides.  Searching the tree for the name “Colletotrichum gloeosporioides” highlights two clades.  Clicking the node for the Truncatum species complex and clicking “Show descendants” expands the clade and shows that assembly GCA_002901105.1, which was labelled as gloeosporioides, clusters with the Truncatum species complex. You can find more details on the tree building process in the supplementary material for the publication and on GitHub.

Continue reading “Fungal species identification using DNA: an NCBI and USDA-APHIS collaboration with a focus on Colletotrichum”

ASM Microbe 2022 was a success!

ASM Microbe 2022 was a success!

NCBI had the pleasure of attending and participating in this year’s American Society of Microbiology (ASM) Microbe conference, June 9-13 in Washington, D.C. NCBI staff participated in activities and events throughout the three-day conference. Over 4,500 attendees gathered in the exhibit hall and joined a variety of poster presentations and talks!

Reflections from a few of our NCBI experts

“It was a great honor for me to receive the ASM Elizabeth O. King Lecturer Award. Thank you to my colleagues, without whom so much of my work would not have been possible, and to all of those who attended my presentation on Making Genomics Accessible to Aid Public Health and Research.”

~Michael Feldgarden, Ph.D.  Continue reading “ASM Microbe 2022 was a success!”

Announcing an updated prokaryotic representative genomes collection with 706 new species!

Announcing an updated prokaryotic representative genomes collection with 706 new species!

An updated bacterial and archaeal representative genomes collection is available! A total of 16,105 assemblies among the 249,000 prokaryotic assemblies in RefSeq were selected to represent their respective species. The collection has grown by 3.7% since January 2022. A total of 706 species are represented for the first time. In addition, 186 species are represented by a better assembly, and 124 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment.

We updated the database on the Microbial Nucleotide BLAST page as well as the basic nucleotide BLAST RefSeq Representative genomes database (fourth in the menu) to reflect these changes. Finally, remember that you can now run BLAST searches against the proteins annotated on representative genomes (second in the menu). See more info here.

Come see NCBI at the ASM Microbe Conference 2022

Come see NCBI at the ASM Microbe Conference 2022

The American Society of Microbiology (ASM) Microbe conference is back, and scheduled to take place in-person, June 9th-13th in Washington, D.C.

NCBI staff member Dr. Michael Feldgarden will be recognized by ASM with an award for his research. Other NCBI staff will present posters on NCBI resources and will also be available at our booth (#1128) to address your questions. Drop by to see what’s new and provide your feedback. We hope to see you there! Check out NCBI’s schedule of activities:  Continue reading “Come see NCBI at the ASM Microbe Conference 2022”