RefSeq Release 221

RefSeq Release 221

RefSeq release 221 is now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.

What’s included in this release?

As of November 6, 2023, this full release incorporates genomic, transcript, and protein data containing:

  • 404,657,610 records
  • 300,054,945 proteins
  • 57,882,313 RNAs
  • sequences from 143,819 organisms 

The release is provided in several directories as a complete dataset and divided by logical groupings.

Human genome annotation update
Assembly GRCh38.p14
  • Annotation Release GCF_000001405.40-RS_2023_10 is an update of NCBI Homo sapiens Annotation Release 110, incorporating the latest set of curated RefSeq transcript changes.
  • The annotation products are available in the sequence databases and on the FTP site.
Assembly T2T-CHM13v2.0
  • Annotation Release GCF_009914755.1-RS_2023_10 is an update of NCBI Homo sapiens Annotation Release 110, incorporating the latest set of curated RefSeq transcript changes.
  • The annotation products are available in the sequence databases and on the FTP site.
New eukaryotic genome annotations

This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 36 additional species, including:

Future changes

We plan to rename the *.nonredundant_protein* files to clarify that the files are specific to the prokaryote WP protein dataset. You should continue to use all complete.*protein.* files if you wish to obtain a complete set of RefSeq proteins across all represented taxa.

Stay up to date

RefSeq is part of the NIH Comparative Genomics Resource (CGR). CGR facilitates reliable comparative genomics analyses for all eukaryotic organisms through an NCBI Toolkit and community collaboration. Follow us on social @NCBI and join our mailing list to keep up to date with RefSeq and other CGR news.

Questions?

If you have questions or would like to provide feedback, please reach out to us! 

Leave a Reply