To continue providing efficient and timely processing, annotation, and dissemination of data, dbSNP’s architecture and process flow have been redesigned. The technical redesign prepares the database for increasing data volumes and providing timely, effective and trustworthy reference SNP results as submission rates continue to increase.
Highlights of the new system include:
- Use of data objects instead of a relational database
- Improved algorithms for clustering data into unique Reference SNPs
- Automation of the entire process to provide timely releases
- Guaranteed data consistency across dbSNP data accessed using web-based products or downloaded content, such as VCF and FTP files
How does this affect dbSNP users?
The new system results in changes that will be introduced in staggered releases (see Timeline). These changes are:
- VCF FTP files are provided for the new build data for the current and previous human assemblies (GRCh38 and GRCh37, respectively).
- New JSON FTP file includes all rs records for the current human assembly (GRCh38). The new JSON format is much more amenable to programmatic approaches. However, you may need to adjust your local workflows, especially if they relied on SQL table dumps.
- Separate dbSNP FTP download site for new products. During the transition to the new dbSNP build system, previously released Build files for Human will remain available in parallel with the new through the dbSNP FTP download site. FTP files for dbSNP Human Build 151-the last build based on the old system, will be available on the FTP site until June 1, 2018. As previously announced, non-human data will no longer supported by NCBI after Sept 1, 2017, however, these data continue to be available at European Variation Archive-EBI/EVA.
- Newly designed RefSNP Report (rs) Web Page for enhanced performance and ease of use with a refined and updated presentation to provide online access to individual Reference SNP (rs) records.
- Sequence Viewer data tracks. Data tracks generated using the new build process will be available in NCBI graphical displays. Data tracks that were generated for previous dbSNP releases, using the older build logic, will remain available. You may observe differences with the old data, since the improved variant clustering and remapping algorithms may generate more precise results and reduce redundant rs annotations at a given genomic location.
- Discontinued products: The following dbSNP products and file formats will be discontinued. If these products are part of your workflow, please contact us if you need help adjusting your workflow to the new products.
- Batch Query
- FTP files
- Chromosome report
- SQL database files
- Genotype XML by chromosome
- Genotype XML by gene
- RS ASN.1 Docsum (flat and binary)
- RS XML Docsum
- RS FASTA
- BED
Please note: The redesigned dbSNP build system will not affect dbSNP submissions of human data. These submissions will continue to be accepted in the same form as they are now.
Timeline
The transition plan to the new architecture and build system for human variation data includes alpha and beta releases of some products for early testing and to give you the opportunity to provide feedback.
Release schedule (dates subject to change)
Release | Date | Product | Description |
Pre-Alpha (Available) | April 2017 | JSON FTP File | Pre-Alpha release of the new JSON format file for all Reference SNP records, by chromosome. (Further details)
Other dbSNP FTP files continue to be available in parallel. |
Pre-Alpha (Available) | April 2017 | VCF FTP Files | Pre-Alpha release of the two VCF files for the previous and latest human assembly – GRCh37 and GRCh38, respectively. |
Alpha | July 2017 | JSON FTP File
VCF FTP Files |
Alpha versions of the new JSON file format for all Reference SNP records.
Two VCF files from the redesigned build based on dbSNP Human build b150. Other dbSNP FTP files continue to be available in parallel. |
Alpha | July 2017 | Reference SNP (rs) Web Report | Alpha version of the redesigned Reference SNP (rs) Report Web page.
Other dbSNP tools remain unchanged. |
Build 151 | Q3 2017 | FTP files for Human Build151 | Human FTP files for dbSNP Build 151 – the last build based on the old system – will be released in the Fall 2017 and will remain available on the FTP site until June 1, 2018. |
Beta | Q4 2017 | JSON and VCF FTP Files
RefSNP Report |
Beta versions of the products will include frequency data from 1000 Genomes, GO-ESP, and TOPMED.
Products will be available on the dbSNP FTP download site in parallel to the existing builds. Some other dbSNP access tools may cease to return results or return older data. |
Public | Q1 2018 | All new products public | Complete transition to the new dbSNP build system for all the supported products.
Unsupported products will no longer be provided. See the listing below. |
2 thoughts on “dbSNP architecture redesign supports future human variation data expansion; changes to be introduced over the next year”