This full release incorporates genomic, transcript, and protein data available as of September 7, 2021, and contains 288,903,207 records, including 210,703,648 proteins, 40,213,945 RNAs, and sequences from 113,002 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.
New eukaryotic genome annotations
This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 31 species, including:
- lion annotation release 100, based on new assembly P.leo_Ple1_pat1.1 (GCF_018350215.1)
- tiger annotation release 102, based on new assembly P.tigris_Pti1_mat1.1 (GCF_018350195.1)
- American lobster annotation release 100, based on new assembly GMGI_Hamer_2.0 (GCF_018991925.1)
- Drosophila yakuba (flies) annotation release 102, based on new assembly Prin_Dyak_Tai18E2_2.1 (GCF_016746365.2)
- Drosophila bipectinata (flies) annotation release 102, based on new assembly ASM1815384v1 (GCF_018153845.1)
- Drosophila ficusphila (flies) annotation release 102, based on new assembly ASM1815226v1 (GCF_018152265.1)
- Drosophila grimshawi (flies) annotation release 103, based on new assembly ASM1815329v1 (GCF_018153295.1)
NCBI Hidden Markov models (HMM) 6.0 release
The NCBI Hidden Markov models (HMM) 6.0 release, available on our FTP site, has 15,247 models supported at NCBI. We created 80 more new HMMs and consolidated the collection by removing 2,151 HMMs that were nearly identical to another. Release 6.0 also incorporates 12,656 PFAM from release 34 that apply to prokaryotic proteins.
RefSeq assembly information
We are considering adding information to the RefSeq FTP release catalog about the RefSeq assembly for each sequence. We welcome your comments on information that would be useful to you.
We are looking at revising the set of sequences included in the plasmid bin to add in plasmids from WGS sequences.