This full release incorporates genomic, transcript, and protein data available as of May 2, 2022, and contains 314,915,153 records, including 229,417,182 proteins, 44,805,833 RNAs, and sequences from 119,373 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.
Human genome Annotation Release 110
Annotation Release 110 is the first new annotation of human in four years, including all latest curated RefSeqs, and recalculation of models using over 80M long reads and 9B Illumina RNA-seq reads. AR 110 includes annotation of two human assemblies:
- Reference assembly GRCh38.p14 (Genome Reference Consortium Human Build 38 patch release 14)
- Alternate assembly T2T-CHM13v2.0 (Telomere-to-Telomere assembly of the CHM13 cell line)
The annotation products are available in the sequence databases and on the FTP site.
New eukaryotic genome annotations
In addition to human, this release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 29 species, including:
- American grasshopper annotation release 100, based on new assembly iqSchAmer2.1 (GCF_021461395.2)
- Impatiens glandulifera annotation release 100, based on new assembly dImpGla2.1 (GCF_907164915.1)
- Gray squirrel (pictured) annotation release 100, based on new assembly mSciCar1.2 (GCF_902686445.1)
- Flathead mullet annotation release 100, based on new assembly CIBA_Mcephalus_1.1 (GCF_022458985.1)
- Fishing cat annotation release 100, based on new assembly UM_Priviv_1.0 (GCF_022837055.1)