As part of the Human Genome Project, NCBI, part of the National Library of Medicine, and the National Human Genome Research Institute (NHGRI) established the Single Nucleotide Polymorphism database (dbSNP) in 1998. Over the last 25 years, dbSNP has evolved into a reliable central public repository for genetic variation data. dbSNP is a community-accepted reference data set for genetic research, analysis pipelines, and for both open-source and commercial tools. It is also an essential part of genetic research and discovery. For example, dbSNP data are used in nearly all human genetic variation research workflows and it serves as the foundation for commercially available ancestry testing products.
Current dbSNP statistics include:
3,800 submitters from all over the world
3.3 billion submitted SNP records
1.1 billion Reference SNP records
1.0 billion Reference SNP records with population frequency
dbSNP accessions are cited in over 65K publications
Join NCBI at the Bio-IT World 2022 Hackathon on May 4-5, 2022 to learn about and work with data from our ALFA project! The primary goal of this hackathon project is to develop a novel tool, app, or approach to explore and visualize NCBI ALFA variants and allele frequency for 12 different human populations. We aspire to create a new helpful variant interpretation resource for the clinical and research communities.
NCBI offers a portfolio of medical genetics resources to help you research, diagnose, and treat diseases and conditions. You can easily access our data and tools through the Medical Genetics and Human Variation page of the NCBI website. We also encourage you to join our community of thousands of submitters and share your germline and/or somatic data to advance discovery and optimize clinical care.
How and why should you use our resources? Consider the example below.
Your patient is a 40-year-old mother of two presenting with changes in bathroom habits, bleeding, and belly pain. She has a medical history of colonic polyps. Her family history reveals that her maternal grandmother, mother and uncle had several forms of cancers including colon, breast, and endometrium.
If you’ve ever tried searching for a genomic location in NCBI’s Genome Data Viewer (GDV) or Variation Viewer and found that your search term didn’t work, it’s time to try again! We recently expanded support for searches in our genome browsers using non-NCBI identifiers such as HGVS patterns (e.g. NM_001318787.2:c.2258G>A) and Ensembl IDs. You can also search by chromosome coordinates, cytogenetic band, assembly scaffold/component, disease/phenotype, dbSNP identifier, or RefSeq transcript/protein accession. We’ve gathered example searches in the table below.
When you search by single coordinate, SNP or dbVar ID, or HGVS, the browser view zooms to the location of the search result. A marker is automatically created to identify the searched position. For HGVS, the marker is labelled with the corresponding rsID, if there is one.
As always, please contact us if you have additional questions or suggestions about this or any other feature in GDV or Variation Viewer. You can use the Feedback button on the page or write to the NCBI Help Desk directly.
Did you know that you can see epigenomic or other experimental data in NCBI’s Genome Data Viewer (GDV)?
You can easily add aligned study results from GEO, SRA, and dbGaP as data tracks to GDV browser view. Just go to the Tracks button on the toolbar and select the menu option to Configure Tracks. Navigate to the ‘Find Tracks’ tab on the pop-up Configure panel (Figure 1).
Two up-and-coming NCBI resources will be featured in videos, surveys and live events at the American Society for Human Genetics (ASHG) 2020 Annual Meeting. Come and watch on-demand videos in the CoLab Theater. Then, let us know what you think and how you do or might use these resources by either taking an online survey or joining us for the CoLab Live! Events on Thursday, October 29, 2020.
dbSNP human build 154, now available, includes new ALFA (Allele Frequency Aggregator) variants and allele frequency. This build contains over two billion Submitted SNP (ss) records and 730 million Reference SNP (rs) records.
On Wednesday, April 22, 2020 at 12 PM, join NCBI staff to learn how results from the Allele Frequency Aggregator (ALFA) project will help you interpret the biological impact of common and rare sequence variants. ALFA’s initial release includes analysis of genotype data from ~100K unrestricted dbGaP subjects and provides high-quality allele frequency data now displayed on relevant dbSNP records. In this webinar, you will learn about the data in the recent ALFA release, see how to access the data from the web, FTP, and how to programmatically retrieve data by positions, genes, and other attributes using E-utilities and Variation Services API in Python.
Date and time: Wed, Apr 22, 2020 12:00 PM – 12:45 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
NIH’s data sharing policy now allows unrestricted access to genomic summary results for data from NCBI’s Database of Genotypes and Phenotypes (dbGaP). Pooled allele frequency data from dbSNP and the dbGaP summary results are available as the new Allele Frequency Aggregator (ALFA) dataset. The ALFA dataset includes aggregated and harmonized array chip genotyping, exome, and genome sequencing data. The ALFA data are open access and freely available for you to incorporate into your workflows and applications from the dbSNP web pages (Figure 1), through FTP,and the Variation Services API. dbGaP currently has data for more than 2 million study subjects, approximately 1 million of whom have genotype data that is suitable for input into the ALFA dataset. The first release of ALFA contains data on about 100,000 subjects, and we hope to complete processing of data on the other 925,000 subjects within the next year. This volume and variety of data promises unprecedented opportunities to identify genetic factors that influence health and disease. Register to attend our April 22 webinar and read on to learn more.
Figure 1. ALFA allele frequencies for a variant (rs4988235) in the promotor of the lactase gene showing frequency differences across populations.
Check out the latest videos on YouTube to learn how to best use NCBI graphical viewers, SRA, PGAP, and other resources.
Genome Data Viewer: Analyzing Remote BAM Alignment Files and Other Tips
This video shows you how to upload remote BAM files, and succinctly demonstrates handy viewer settings, such as Pileup display options, and highlights the very helpful tooltips in the Genome Data Viewer (GDV). There’s also a brief blog post on the same topic.