RefSeq release 200 is public

RefSeq release 200 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of May 4, 2020, and contains 237,381,664 records, including 171,643,729 proteins, 31,244,247 RNAs, and sequences from 100,605 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Other announcements:

The number of organisms in RefSeq crosses 100,000!
The current RefSeq release contains 100,605 distinct species or taxons, with a net increase of 763 species since Release 99. This milestone coincides with the 100th release though the current release number is 200 (see below). Note that there is a decrease in the number of species for prokaryotes (bacteria and archaea) due to a clean-up that mainly removed unclassified bacteria, and assemblies from Metagenome-Assembled Genomes (MAGs).

The FTP release number has skipped to 200
As previously announced, NCBI’s Reference Sequence (RefSeq) FTP release number has incremented to 200 for this release, and skipped over the numbers 100-199. The previous, March 2020 release, was release 99. This change is to avoid overlapping with the release numbers of the independently numbered RefSeq annotation releases for the eukaryotic genomes we annotate, which are currently in the range 100-109, for example Mus musculus Annotation Release 108.

NCBI Protein Families
A new release of the NCBI protein families profiles used by PGAP (the Prokaryotic Genome Annotation Pipeline) is now available. You can search this collection of Hidden Markov models (HMM) against your favorite prokaryotic proteins to identify their function using hmmer.

Recalculation of Prokaryotic Reference and Representative Genome Assemblies
We have updated the collection of reference and representative assemblies for Bacteria and Archaea to better reflect the taxonomic breadth of the prokaryotes in RefSeq. We have selected one reference or representative assembly for every species based on several criteria including contiguity, completeness, and whether the assembly is from type material.

Future change: Mouse Reference Assembly Update
A full assembly update for the mouse GRCm38.p6 reference assembly is expected to be released in 2020 by the GRC. We anticipate updating the mouse RefSeq annotation to the new GRCm39 assembly this summer, for either RefSeq FTP Release 201 or 202.


One thought on “RefSeq release 200 is public

Leave a Reply