RefSeq Release 219

RefSeq release 219 is now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.

What’s included in this release?

As of July 18, 2023, this full release incorporates genomic, transcript, and protein data containing:

  • 371,291,248 records
  • 3,752,372,037,103 nucleotide bases
  • 106,842,615,422 amino acids
  • sequences from 138,491 organisms

The release is provided in several directories as a complete dataset and divided by logical groupings.

Updates & announcements

Rat genome annotation update

Annotation Release GCF_015227675.2-RS_2023_06 is an update of NCBI Rattus norvegicus Annotation Release 108. The updated annotation includes the curation of over 5000 genes since our last annotation in 2021.

Access to historical human transcript alignments is now available!

We are providing a collection of RefSeq transcript alignments including both the latest versions in the GCF_000001405.40-RS_2023_03 annotation release, and older transcripts going back to 1999. If you work with variant data mapped to historical human RefSeq transcript versions, you can now map your data to the current GRCh38 reference genome and MANE transcripts. The data are available for download from the FTP site.

Three new representative fungal RefSeq assemblies

There has been a switch in RefSeq assembly representation for Talaromyces marneffei to GCF_009556855.1, Fusarium oxysporum to GCF_013085055.1 and [Candida] auris to GCF_003013715.1. All three new RefSeq representatives have complete genomes that are an improvement on previous RefSeq assemblies.

New eukaryotic genome annotations

This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 58 species, including:

