RefSeq release 83 now public


RefSeq release 83 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available as of July 17, 2017, and contains 132,052,465 records, including 88,385,530 proteins, 19,634,664 RNAs, and sequences from 71,356 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings. More information about RefSeq release 83 is available in the release notes.

Future changes

NCBI will phase out support for non-human organisms in the dbSNP and dbVar databases. These databases will stop accepting submissions for non-human SNPs in September 2017. The interactive websites for these databases and related NCBI services, including RefSeq flatfiles, will stop presenting non-human variant data in November 2017.

dbSNP architecture redesign supports future human variation data expansion; changes to be introduced over the next year


To continue providing efficient and timely processing, annotation, and dissemination of data, dbSNP’s architecture and process flow have been redesigned. The technical redesign prepares the database for increasing data volumes and providing timely, effective and trustworthy reference SNP results as submission rates continue to increase.

Highlights of the new system include:

  • Use of data objects instead of a relational database
  • Improved algorithms for clustering data into unique Reference SNPs
  • Automation of the entire process to provide timely releases
  • Guaranteed data consistency across dbSNP data accessed using web-based products or downloaded content, such as VCF and FTP files

Continue reading

Phasing out support for non-human genome organism data in dbSNP and dbVar


This blog post is directed toward people who use dbSNP and dbVar, particularly those who submit non-human data to the two databases.

dbSNP and dbVar archive, process, display and report information related to germline and somatic variations from multiple species. These two databases have grown rapidly as sequencing and other discovery technologies have evolved, and now contain nearly two billion variants from over 360 species.

Based on projected growth and the resources required to archive and distribute the data, continued support for all organisms will become unsustainable for NCBI in the near future. Therefore, NCBI will phase out support for all non-human organisms in dbSNP and dbVar, and will support only human variation.

Continue reading