RefSeq release 86 is now public


RefSeq release 86 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available, as of January 8, 2018 and contains 149,493,466 records, including 102,133,844 proteins, 21,370,778 RNAs, and sequences from 75,218 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings.

Two important notes follow; please see the RefSeq release notes for more information.

Non-human SNP data dropped

Non-human SNPs were dropped from all RefSeq FTP files in the daily FTP files starting in December 2017, and in this full release (January 2018).

HPRD features removed

We have dropped a set of features, originally imported from HPRD, from human transcript and protein RefSeq records.

5 NCBI articles in 2018 Nucleic Acids Research database issue


The 2018 Nucleic Acids Research database issue features several papers from NCBI staff that cover the status and future of databases including CCDS, ClinVar, GenBank and RefSeq. These papers are also available on PubMed. To read an article, click on the PMID number listed below.

Continue reading

Seventeen new NCBI annotations in RefSeq for cat, maize, clownfish, and more


In November and December, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Amphiprion ocellaris (clown anemonefish)
  • Centruroides sculpturatus (bark scorpion)
  • Ceratitis capitata (Mediterranean fruit fly)
  • Cucurbita maxima (winter squash)
  • Cucurbita moschata (crookneck pumpkin)
  • Drosophila hydei (fly)
  • Drosophila willistoni (fly)
  • Felis catus (domestic cat)
  • Leptinotarsa decemlineata (Colorado potato beetle)
  • Maylandia zebra (zebra mbuna)
  • Olea europaea sylvestris (wild olive)
  • Onthophagus taurus (beetle)
  • Piliocolobus tephrosceles (Ugandan red Colobus)
  • Seriola lalandi dorsalis (yellowtail amberjack)
  • Spodoptera litura (moth)
  • Xiphophorus maculatus (southern platyfish)
  • Zea mays (maize)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

RefSeq release 85 is now public


RefSeq release 85 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available, as of November 6, 2017, and contains 146,710,309 records, including 100,043,962 proteins, 20,905,608 RNAs, and sequences from 73,996 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings. See the RefSeq release notes for more information.

Continue reading

Variation feature changes in NCBI Reference Sequences coming in 2018


Starting in March 2018, SNP variation features will no longer be in RefSeq genome assembly records – chromosome and contig records with NC_, NT_, NW_ and AC_ accession prefixes. This change affects both the ASN.1 and flatfile records. Because the number of variants is already enormous and still growing, removing SNP features from these large genomic records will significantly reduce the size of RefSeq FTP files and make downloading and processing easier. We will continue to include SNPs on NG_-prefixed genomic records, and transcript (NM_, NR_, XM_, XR_) and protein (NP_, XP_, YP_) sequences.

Reminder: As of September 2017, NCBI has stopped accepting submissions for non-human SNPs in dbSNP and dbVar. RefSeq flatfiles will stop presenting non-human variant data in November 2017.

Subscribe to the refseq-announce listserv for regular updates on RefSeq.

November 1 webinar: Introducing the Genome Data Viewer (GDV)


On Wednesday, November 1, 2017, we will present a webinar on GDV, NCBI’s full-featured genome browser. In this webinar, you’ll learn how to explore and analyze sequences and annotations for eukaryotic RefSeq genome assemblies. We’ll show you how to:

  • Search across the entire assembly for genes, products and other markers or jump to a specific position or range
  • Display any of seven preselected track sets highlighting various aspects of the assembly or create and load your own custom track sets from your NCBI account.
  • Load and display submitted alignment data from NCBI’s GEO or SRA.
  • Upload your own annotation and variant data
  • Display BLAST or Primer-BLAST results on the assembly in the browser.

Date and time: Wednesday, November 1, 2017 12:00-12:30PM EDT

After registering, you will receive a confirmation email with information about attending the webinar. After the live presentation, the webinar will be uploaded to the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Updated HIV-1 interaction datasets in Gene


We recently updated the HIV-1 interaction datasets in Gene with data provided by the Southern Research Institute (SRI).

The protein interactions dataset now has:

  • 8,005 interactions,
  • 16,215 interaction descriptions,
  • 3,859 proteins encoded by 3,757 human genes,
  • and 6,822 publications.

The replication interactions dataset now has:

  • 1,595 interactions,
  • 1,854 interaction descriptions,
  • 1,583 proteins encoded by 1,583 human genes,
  • and 229 publications.

Data are also available at the RefSeq HIV-1 website and the GeneRIF FTP site.

October 11 NCBI Minute: Introducing the New RefSeq Functional Elements Project


On October 11, 2017, NCBI will present a webinar on RefSeq Functional Elements. This NCBI Minute will introduce you to this project and its scope, describe how functional elements are curated and displayed, demonstrate how to access the data, and provide information on the current progress of the project.

Date and time: Wed, Oct 11, 2017 12:00 PM – 12:30 PM EDT

After registering, you will receive a confirmation email with information about attending the webinar. After the live presentation, the webinar will be uploaded to the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

The new RefSeq Functional Elements project is an expansion of the NCBI RefSeq project to include non-genic functional genomic regions in human and mouse that have been experimentally validated and described in the scientific literature.