What’s included in this release?
As of March 8, 2023, this full release incorporates genomic, transcript, and protein data, containing:
- 348,351,219 records
- 254,500,694 proteins
- 50,975,429 RNAs
- sequences from 130,837 organisms
The release is provided in several directories as a complete dataset and divided by logical groupings.
Updates & announcements
Prokaryote phylum names
As previously announced, NCBI Taxonomy began updating phylum names for prokaryotes in January 2023. Informal phylum names in long use (e.g., Firmicutes, Proteobacteria) were changed to newly formalized names (e.g. Bacillota, Pseudomonadota, respectively). This update affected over 40 NCBI TaxIDs at phylum rank. The rollout of new phylum names is now complete! The flatfiles in this release contain the new phylum names.
New eukaryotic genome annotations
This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 33 species, including:
- Slow loris, based on new assembly mNycCou1.pri (GCF_027406575.1-RS_2023_03) (pictured)
- Common vampire bat, based on new assembly HLdesRot8A (GCF_022682495.1-RS_2023_02)
- Plains spadefoot toad, based on new assembly aSpeBom1.2.pri (GCF_027358695.1-RS_2023_03)
- Blue catfish, based on new assembly Billie_1.0 (GCF_023375685.1-RS_2023_03)
- Arachis duranensis, based on updated assembly aradu.V14167.gnm2.J7QH (GCF_000817695.3-RS_2023_02)
- California two-spot octopus, based on updated assembly ASM119413v2 (GCF_001194135.2-RS_2023_01)