This full release incorporates genomic, transcript, and protein data available as of March 1, 2021, and contains 269,975,565 records, including 197,232,209 proteins, 36,514,168 RNAs, and sequences from 108,257 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.
Updated human genome Annotation Release 109.20210226
Updated Annotation Release 109.20210226 is an update of NCBI Homo sapiens Annotation Release 109. The annotation products are available in the sequence databases and on the FTP site.
New rat assembly and annotation
RefSeq is the first resource to release an annotation for the new rat reference assembly, mRatBN7.2, referred to as Rattus norvegicus Annotation Release 108 (AR108). The new assembly was recently released by the Darwin Tree of Life Project at the Wellcome Sanger Institute. This is the first coordinate-changing update to the rat reference since the 2014 release of Rnor_6.0 from the Rat Genome Sequencing Consortium, and brings the rat assembly into the modern age with a nearly 300x increase in contig N50 and 9x increase in scaffold N50 lengths. It’s a major improvement! The new assembly and annotation greatly reduces the number of artificially duplicated genes.
Details of this annotation, including statistics on the annotation products, the input data used in the pipeline and intermediate alignment results, can be found here. The annotation products are available in the sequence databases and on the FTP site.
New dog assemblies and annotation
Canis lupus familiaris Annotation Release 106 includes annotations for two new dog RefSeq assemblies, ROS_Cfam_1.0 (GCF_014441545.1) which is derived from a Labrador retriever, and Dog10K_Boxer_Tasha (GCF_000002285.5) which is an update for the previous assembly derived from a Boxer. Three additional assemblies were also annotated:
- UMICH_Zoey_3.1 (GCF_005444595.1) derived from a Great Dane
- UU_Cfam_GSD_1.0 (GCF_011100685.1) derived from a German Shepherd
- UNSW_CanFamBas_1.0 (GCF_013276365.1) derived from a Basenji
All five assemblies were annotated jointly allowing common genes to be identified across the set, which are all available in NCBI Gene. The RefSeq versions of the three additional assemblies have been suppressed from our primary resources and are not included in this FTP release, but are available from the Genomes FTP site.
Other new eukaryotic genome annotations
This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 30 additional species, including:
- Bolivian squirrel monkey annotation release 102, based on the new assembly BCM_Sbol_2.0 (GCF_016699345.1)
- Indian flying fox annotation release 100, based on the assembly Ma_sr-lr_union100 (GCF_902729225.1)
- Drosophila simulans annotation release 102, based on the new assembly Prin_Dsim_3.0 (GCF_016746395.1)
- Drosophila yakuba annotation release 101, based on the new assembly Prin_Dyak_Tai18E2_2.0 (GCF_016746365.1)
- Drosophila santomea annotation release 100, based on the assembly Prin_Dsan_1.0 (GCF_016746245.1)
- northern house mosquito annotation release 100, based on the assembly TS_Cpip_V1 (GCF_016801865.1)
- Medicago truncatula annotation release 102, based on the assembly MtrunA17r5.0-ANR (GCF_003473485.1)
- date palm annotation release 103, based on the assembly palm_55x_up_171113_PBpolish2nd_filt_p (GCF_009389715.1)
RefSeq assembly information
We are considering adding information to the RefSeq FTP release catalog about the RefSeq assembly for each sequence. We welcome your comments on information that would be useful to you.
We are looking at revising the set of sequences included in the plasmid bin to add in plasmids from WGS sequences.