RefSeq release 86 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available, as of January 8, 2018 and contains 149,493,466 records, including 102,133,844 proteins, 21,370,778 RNAs, and sequences from 75,218 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings.
Two important notes follow; please see the RefSeq release notes for more information.
Non-human SNP data dropped
Non-human SNPs were dropped from all RefSeq FTP files in the daily FTP files starting in December 2017, and in this full release (January 2018).
HPRD features removed
We have dropped a set of features, originally imported from HPRD, from human transcript and protein RefSeq records.
The 2018 Nucleic Acids Research database issue features several papers from NCBI staff that cover the status and future of databases including CCDS, ClinVar, GenBank and RefSeq. These papers are also available on PubMed. To read an article, click on the PMID number listed below.
RefSeq release 85 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available, as of November 6, 2017, and contains 146,710,309 records, including 100,043,962 proteins, 20,905,608 RNAs, and sequences from 73,996 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings. See the RefSeq release notes for more information.
Starting in March 2018, SNP variation features will no longer be in RefSeq genome assembly records – chromosome and contig records with NC_, NT_, NW_ and AC_ accession prefixes. This change affects both the ASN.1 and flatfile records. Because the number of variants is already enormous and still growing, removing SNP features from these large genomic records will significantly reduce the size of RefSeq FTP files and make downloading and processing easier. We will continue to include SNPs on NG_-prefixed genomic records, and transcript (NM_, NR_, XM_, XR_) and protein (NP_, XP_, YP_) sequences.
On Wednesday, November 1, 2017, we will present a webinar on GDV, NCBI’s full-featured genome browser. In this webinar, you’ll learn how to explore and analyze sequences and annotations for eukaryotic RefSeq genome assemblies. We’ll show you how to:
Search across the entire assembly for genes, products and other markers or jump to a specific position or range
Display any of seven preselected track sets highlighting various aspects of the assembly or create and load your own custom track sets from your NCBI account.
Load and display submitted alignment data from NCBI’s GEO or SRA.
Upload your own annotation and variant data
Display BLAST or Primer-BLAST results on the assembly in the browser.
Date and time: Wednesday, November 1, 2017 12:00-12:30PM EDT
On October 11, 2017, NCBI will present a webinar on RefSeq Functional Elements. This NCBI Minute will introduce you to this project and its scope, describe how functional elements are curated and displayed, demonstrate how to access the data, and provide information on the current progress of the project.
Date and time: Wed, Oct 11, 2017 12:00 PM – 12:30 PM EDT
The new RefSeq Functional Elements project is an expansion of the NCBI RefSeq project to include non-genic functional genomic regions in human and mouse that have been experimentally validated and described in the scientific literature.