RefSeq release 86 is now public


RefSeq release 86 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available, as of January 8, 2018 and contains 149,493,466 records, including 102,133,844 proteins, 21,370,778 RNAs, and sequences from 75,218 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings.

Two important notes follow; please see the RefSeq release notes for more information.

Non-human SNP data dropped

Non-human SNPs were dropped from all RefSeq FTP files in the daily FTP files starting in December 2017, and in this full release (January 2018).

HPRD features removed

We have dropped a set of features, originally imported from HPRD, from human transcript and protein RefSeq records.

5 NCBI articles in 2018 Nucleic Acids Research database issue


The 2018 Nucleic Acids Research database issue features several papers from NCBI staff that cover the status and future of databases including CCDS, ClinVar, GenBank and RefSeq. These papers are also available on PubMed. To read an article, click on the PMID number listed below.

Continue reading

GenBank release 223.0 is available via FTP, Entrez and BLAST


GenBank release 223.0 (12/15/2017) has 206,293,625 traditional records (including non-bulk-oriented TSA) containing 249,722,163,594 base pairs of sequence data. In addition, there are 551,063,065 WGS records containing 2,466,098,053,327 base pairs of sequence data, 201,559,502 TSA records containing 181,394,660,188 base pairs of sequence data, and 12,695,198 TLS records containing 4,458,042,616 base pairs of sequence data.

Continue reading

Seventeen new NCBI annotations in RefSeq for cat, maize, clownfish, and more


In November and December, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Amphiprion ocellaris (clown anemonefish)
  • Centruroides sculpturatus (bark scorpion)
  • Ceratitis capitata (Mediterranean fruit fly)
  • Cucurbita maxima (winter squash)
  • Cucurbita moschata (crookneck pumpkin)
  • Drosophila hydei (fly)
  • Drosophila willistoni (fly)
  • Felis catus (domestic cat)
  • Leptinotarsa decemlineata (Colorado potato beetle)
  • Maylandia zebra (zebra mbuna)
  • Olea europaea sylvestris (wild olive)
  • Onthophagus taurus (beetle)
  • Piliocolobus tephrosceles (Ugandan red Colobus)
  • Seriola lalandi dorsalis (yellowtail amberjack)
  • Spodoptera litura (moth)
  • Xiphophorus maculatus (southern platyfish)
  • Zea mays (maize)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.