RefSeq Release 99 is public

RefSeq release 99 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of March 2, 2020, and contains 231,402,293 records, including 167,278,920 proteins, 29,869,155 RNAs, and sequences from 99,842 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Other announcements:

Updated human genome Annotation Release 109.20200228

Annotation Release 109.20200228 is a quarterly update of the human annotation incorporating the latest set of curated RefSeq transcript changes.

The annotation report for 109.20200228 is available here: https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Homo_sapiens/109.20200228/

The annotation products are available in the sequence databases and on the FTP site: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/109.20200228/

Important changes to the genomes FTP site

We have added the latest NCBI Eukaryotic Genome Annotation Pipeline results for the more than 580 species that we annotate to the genomes/refseq directory on the genomes FTP area.  https://ftp.ncbi.nlm.nih.gov/genomes/refseq/

As we announced in December, we will stop publishing annotation results to the genus_species directories (example: genomes/Xenopus_tropicalis) on the genomes FTP site effective February 1, 2020. We will also move existing genus_species directories to genomes/archive/old_refseq during the month of March 2020. Useful link: https://ncbiinsights.ncbi.nlm.nih.gov/2020/02/07/genomes-ftp/

Drosophila assemblies and annotation

In coordination with FlyBase, NCBI is transitioning almost all of the Drosophila assemblies included in the RefSeq collection to annotation produced and distributed primarily through NCBI, using NCBI’s eukaryotic genomic annotation pipeline. The one exception is for Drosophila melanogaster, which will continue to use the reference annotation produced and submitted by FlyBase. This will allow us to provide consistent, high-quality annotation across the full spectrum of Drosophila species and rapidly provide annotation as new high-quality assemblies become available. Data is available in NCBI nucleotide, protein, BLAST, Gene, Genome Data Viewer, Genomes, Assembly, and via FTP in both the RefSeq FTP release and on the NCBI genomes FTP site at ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/invertebrate/.

More information about our annotation process is available at https://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/.

23 Drosophila species are now included in the RefSeq dataset. Of the original twelve published by the Drosophila 12 Genomes Consortium in 2007 (PMID:17994087), updates for five are included in RefSeq Release 99, and updates for six others are either underway or planned. Most use new high-contiguity assemblies that have been publicly released in the last few years. In accordance with the Fort Lauderdale Agreement (https://en.wikipedia.org/wiki/Fort_Lauderdale_Agreement), please check the publication status of the genomes/assemblies before publishing any genome-wide analyses using these data. The annotation produced for RefSeq is within the public domain; we request that NLM/NCBI be cited (PMID:26553804) if using these data.

Future change: Mouse Reference Assembly Update

A full assembly update for the mouse GRCm38.p6 reference assembly is expected to be released in early 2020 by the GRC. We anticipate updating the mouse RefSeq annotation to the new GRCm39 assembly this Spring, for either RefSeq FTP Release 100 or 101.

Leave a Reply