Site icon NCBI Insights

RefSeq release 208 is available!

Tiger

RefSeq release 208 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of September 7, 2021, and contains 288,903,207 records, including 210,703,648 proteins, 40,213,945 RNAs, and sequences from 113,002 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

New eukaryotic genome annotations
This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 31 species, including:

NCBI Hidden Markov models (HMM) 6.0 release
The NCBI Hidden Markov models (HMM) 6.0 release, available on our FTP site, has 15,247 models supported at NCBI. We created 80 more new HMMs and consolidated the collection by removing 2,151 HMMs that were nearly identical to another. Release 6.0 also incorporates 12,656 PFAM from release 34 that apply to prokaryotic proteins.

RefSeq assembly information
We are considering adding information to the RefSeq FTP release catalog about the RefSeq assembly for each sequence. We welcome your comments on information that would be useful to you.

Plasmid sequences
We are  looking at revising the set of sequences included in the plasmid bin to add in plasmids from WGS sequences.

Exit mobile version