dbSNP Enhances Scalability, Data Diversity, and Accessibility

dbSNP Enhances Scalability, Data Diversity, and Accessibility

As part of the Human Genome Project, NCBI, part of the National Library of Medicine, and the National Human Genome Research Institute (NHGRI) established the Single Nucleotide Polymorphism database (dbSNP) in 1998. Over the last 25 years, dbSNP has evolved into a reliable central public repository for genetic variation data. dbSNP is a community-accepted reference data set for genetic research, analysis pipelines, and for both open-source and commercial tools. It is also an essential part of genetic research and discovery. For example, dbSNP data are used in nearly all human genetic variation research workflows and it serves as the foundation for commercially available ancestry testing products.  

Current dbSNP statistics include:
  • 3,800 submitters from all over the world 
  • 3.3 billion submitted SNP records
  • 1.1 billion Reference SNP records 
  • 1.0 billion Reference SNP records with population frequency 
  • dbSNP accessions are cited in over 65K publications 

What’s new?

We have made numerous improvements to make molecular variation more accessible for physical mapping, population genetics, investigations into evolutionary relationships, genome wide association, and quickly quantifying the amount of variation at a given site of interest.  

We have fundamentally improved our infrastructure and the underlying technology and data release processes. This makes dbSNP more reliable and efficient to cope with the large amounts of data and exponential growth over the last few years.  

We created Allele Frequency Aggregator (ALFA) to provide more granular allele frequencies for populations derived from 198K subjects, with the goal of 1M subjects, from dbGaP controlled-access studies. ALFA will improve the discovery of common and uncommon variations that have biological effects or contribute to disease. These data included chip array, exome, and genomic sequencing data from 12 distinct populations, including European, African, Asian, and Latin American subjects. We put these data into regular dbSNP build releases and ALFA data into RefSeq. ALFA can be accessed via the browser, FTP, API, and TrackHub.  On GitHub, we have tutorials and code examples to help with programming. 

Next steps
  • The data will continue to grow by leaps and bounds over the next 25 years. 
  • We intend to diversify the data by including genotype frequencies. 
  • We will make the data more useful to you by making it easier to find and packaging it in consumable formats. 

Thank you to our submitters who’ve shared data and to everyone who’s contributed to the success of dbSNP.  

Stay up to date

Follow us on Twitter @NCBI and join our mailing list to keep up to date with dbSNP and other NCBI news.    

We want to hear from you!

If you have questions or would like to provide feedback, please reach out to us at snp-admin@ncbi.nlm.nih.gov. Let us know how dbSNP has aided you in your work!  

2 thoughts on “dbSNP Enhances Scalability, Data Diversity, and Accessibility

Leave a Reply