RefSeq release 217 is now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.
What’s included in this release?
As of March 8, 2023, this full release incorporates genomic, transcript, and protein data, containing:
- 348,351,219 records
- 254,500,694 proteins
- 50,975,429 RNAs
- sequences from 130,837 organisms
The release is provided in several directories as a complete dataset and divided by logical groupings.
Updates & announcements
Prokaryote phylum names
As previously announced, NCBI Taxonomy began updating phylum names for prokaryotes in January 2023. Informal phylum names in long use (e.g., Firmicutes, Proteobacteria) were changed to newly formalized names (e.g. Bacillota, Pseudomonadota, respectively). This update affected over 40 NCBI TaxIDs at phylum rank. The rollout of new phylum names is now complete! The flatfiles in this release contain the new phylum names.
New eukaryotic genome annotations
This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 33 species, including:
- Slow loris, based on new assembly mNycCou1.pri (GCF_027406575.1-RS_2023_03) (pictured)
- Common vampire bat, based on new assembly HLdesRot8A (GCF_022682495.1-RS_2023_02)
- Plains spadefoot toad, based on new assembly aSpeBom1.2.pri (GCF_027358695.1-RS_2023_03)
- Blue catfish, based on new assembly Billie_1.0 (GCF_023375685.1-RS_2023_03)
- Arachis duranensis, based on updated assembly aradu.V14167.gnm2.J7QH (GCF_000817695.3-RS_2023_02)
- California two-spot octopus, based on updated assembly ASM119413v2 (GCF_001194135.2-RS_2023_01)
Stay up to date
RefSeq supports the NIH Comparative Genomics Resource (CGR), an NLM project to establish an ecosystem to facilitate reliable comparative genomics analyses for all eukaryotic organisms. Join our mailing list to keep up to date with RefSeq and other CGR news. Follow us on Twitter @NCBI.
Questions?
If you have questions or would like to provide feedback, please reach out to us at info@ncbi.nlm.nih.gov.
Additional information
The total volume of ASN.1 files in the ‘Complete’ node grew by 24%. This size increase is an artifact of a change in the ordering of records that affects gzip efficiency.