Site icon NCBI Insights

RefSeq Release 205 is available!

Great Dane dog in back yard

RefSeq release 205 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of March 1, 2021, and contains 269,975,565 records, including 197,232,209 proteins, 36,514,168 RNAs, and sequences from 108,257  organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Updated human genome Annotation Release 109.20210226
Updated Annotation Release 109.20210226 is an update of NCBI Homo sapiens Annotation Release 109. The annotation products are available in the sequence databases and on the FTP site.

New rat assembly and annotation
RefSeq is the first resource to release an annotation for the new rat reference assembly, mRatBN7.2, referred to as Rattus norvegicus Annotation Release 108 (AR108). The new assembly was recently released by the Darwin Tree of Life Project at the Wellcome Sanger Institute. This is the first coordinate-changing update to the rat reference since the 2014 release of Rnor_6.0 from the Rat Genome Sequencing Consortium, and brings the rat assembly into the modern age with a nearly 300x increase in contig N50 and 9x increase in scaffold N50 lengths. It’s a major improvement! The new assembly and annotation greatly reduces the number of artificially duplicated genes.

Details of this annotation, including statistics on the annotation products, the input data used in the pipeline and intermediate alignment results, can be found here. The annotation products are available in the sequence databases and on the FTP site.

New dog assemblies and annotation
Canis lupus familiaris Annotation Release 106 includes annotations for two new dog RefSeq assemblies, ROS_Cfam_1.0 (GCF_014441545.1) which is derived from a Labrador retriever, and Dog10K_Boxer_Tasha (GCF_000002285.5) which is an update for the previous assembly derived from a Boxer. Three additional assemblies were also annotated:

All five assemblies were annotated jointly allowing common genes to be identified across the set, which are all available in NCBI Gene. The RefSeq versions of the three additional assemblies have been suppressed from our primary resources and are not included in this FTP release, but are available from the Genomes FTP site.

The annotation report is available here. The annotation products are available in the sequence databases and on the FTP site.

Other new eukaryotic genome annotations
This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 30 additional species, including:

RefSeq assembly information
We are considering adding information to the RefSeq FTP release catalog about the RefSeq assembly for each sequence. We welcome your comments on information that would be useful to you.

Plasmid sequences
We are looking at revising the set of sequences included in the plasmid bin to add in plasmids from WGS sequences.