With a growing database of over 2,300 studies with billions of demographic, phenotypic, and exposure measurements, we want to ensure you can easily access publicly available information for data submitted to us.
Do you have human genetic data from a large-scale study? Submit your data to NCBI’s Database of Genotypes and Phenotypes (dbGaP) to contribute to meaningful discoveries about health. dbGaP contains data from more than 2.8 million study participants who have provided over 3.3 million molecular samples.
Are you familiar with the well-known Framingham Heart Study, a multi-generation study of residents of Framingham, Massachusetts begun in 1948? Much of what is now known about the impact of genetics, lifestyle, and diet on cardiovascular health and disease has come from this research study. (See PMC4159698 for a historical perspective.) Did you know that data from this study and over 2,000 other studies that demonstrate the relationship between genetic and medical outcomes and other phenotypes are available from NCBI’s Database of Genotypes and Phenotypes (dbGaP)?
dbGaP was established in 2007 as a repository of human data from large scale studies. You can access data from more than 2.8 million study participants who have provided over 3.3 million molecular samples. You can retrieve patient-level phenotypic (e.g., demographic, clinical, exposure) data and molecular (e.g., called genotypes omics, sequence) data, and the results of association analyses from genome-scale case-control and longitudinal studies of heritable diseases.
What types of studies and data are available in dbGaP?
dbGaP contains a wide range of studies and types of data, all relating to human genetic and phenotypic measurements. Most dbGaP data are from NIH-funded research, but recently we have expanded to include non-NIH funded studies. An easy way to find dbGaP Studies, Phenotype and Molecular Datasets, Variables, Analyses and Documents is through the dbGaP Advanced Search (Figure 1). The interface allows you to filter results by different characteristics depending on the tab you choose.
Figure 1. The dbGaP Advanced Search interface. Tabs that appear at the top of the web interface allow you to select the studies, datasets, analyses, etc. of interest. Filters (facets) appear on the left (see inset). Click on filters to select values to find Links on the study summary pages provide direct access to data. Top panel: Studies tab and the corresponding filter categories. Bottom panel: Molecular data tab results with Study (Framingham SHARe), Markerset Source (Affymetrix) filters applied.
We will present a variety of talks and posters featuring our clinical and human genetic resources, as well as genome products and tools. We are excited to introduce the NIH Comparative Genomics Resource (CGR), a multi-year National Library of Medicine (NLM) project to maximize the impact of eukaryotic research organisms and their genomic data resources to biomedical research. If you’re interested in providing feedback that will be used to help drive CGR forward, consider joining our round table discussion.
Check out NCBI’s schedule of activities and events:
NCBI offers a portfolio of medical genetics resources to help you research, diagnose, and treat diseases and conditions. You can easily access our data and tools through the Medical Genetics and Human Variation page of the NCBI website. We also encourage you to join our community of thousands of submitters and share your germline and/or somatic data to advance discovery and optimize clinical care.
How and why should you use our resources? Consider the example below.
Your patient is a 40-year-old mother of two presenting with changes in bathroom habits, bleeding, and belly pain. She has a medical history of colonic polyps. Her family history reveals that her maternal grandmother, mother and uncle had several forms of cancers including colon, breast, and endometrium.
The Genome Data Viewer (GDV) is now the comprehensive NCBI genome browser. The development of GDV led to a few different types of genome browsers along the way, each one originally delivering visual displays for particular datasets. We developed the 1000 Genomes Browser for variation data from the 1000 Genomes project, the dbGaP Data Browser for controlled-access sequence read alignment data, and the GeT-RM browser for Genome in a Bottle (GIAB) data.
The data displayed in these three browsers is now either obsolete and/or can largely be accessed from the GDV browser or other NCBI resources. Moreover, unlike GDV, these older browsers are no longer under active development and the data has not been updated to meet changing needs of the communities they were developed to serve. For these reasons we will retire these browsers in April 2022. Please see details below for more information on the data displayed in these browsers and how to access and display these data now through GDV and other means.
Attention dbGaP submitters! Join us on November 3, 2021 at 12PM US eastern time to learn about data submission and processing improvements to dbGaP, NIH’s database of Genotype and Phenotype, which contains individual-level data associated with human research studies. You will see how we have made submission easier through the Submission Portal using automated preliminary validation and how you can use GaPTools, a stand-alone data validation tool, on your own submission to expedite the submission process. Join us to discover how dbGaP ensures integrity and high-quality in the genomic data that scientists can access to further their research.
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI webinars playlist on the NLM YouTube channel. You can learn about future webinars on the Webinars and Courses page.
Did you know that you can see epigenomic or other experimental data in NCBI’s Genome Data Viewer (GDV)?
You can easily add aligned study results from GEO, SRA, and dbGaP as data tracks to GDV browser view. Just go to the Tracks button on the toolbar and select the menu option to Configure Tracks. Navigate to the ‘Find Tracks’ tab on the pop-up Configure panel (Figure 1).
Figure 1. Go to the ‘Tracks’ menu on the browser toolbar and select ‘Configure Tracks’ option. This will launch a panel where you can add, configure, remove, and search for data tracks. Go to the ‘Find Tracks’ tab to search for tracks to add to your browser view. Note: spaces act as AND operators in the search, and wildcards are accepted.
We have just launched GaPTools, a stand-alone data validation tool for NCBI’s database of Genotype and Phenotype (dbGaP) submissions. You can use GaPTools to validate your dbGaP submissions or submissions to other genomic data repositories. GaPTools checks for common data inconsistency and integrity issues and validates subject-sample ID mapping, subject consents, data dictionaries, and phenotype and genotype data. GaPTools is available as a docker image on Docker Hub.
Why Use GaPTools?
GaPTools will validate files before you submit (see Figure 1). This means that by the time you formally submit, some of the pre-validation steps are already addressed. This tool allows you to prepare your data quickly and ensures a faster processing cycle and a faster release of your individual-level research data.Figure 1: Flow chart depicting data submission and GaPTools validation
dbGaP has recently released a new feature to simplify submissions and provide study accessions faster. This video provides a quick overview of the new feature.
Our new study config webform enables a study submitter to enter important study summary information including study description, inclusion/exclusion criteria, history, attribution, and associated publications online and instantly preview the study config content and study accession on their dbGaP study report page. Study design and type, PMIDs, Genes, MeSH terms, and associated Clinical Trials have built-in help and validation to ensure that the information provided is complete and searchable by users looking for that data.
The database of Genotypes and Phenotypes (dbGaP) provides controlled-access to the data and results from studies that have investigated the interaction of genotype and phenotype in humans. dbGaP assigns stable, unique identifiers to studies and subsets of information from those studies, including documents, individual phenotypic variables, tables of trait data, sets of genotype data, computed phenotype-genotype associations, and groups of study subjects who have given similar consents for use of their data.
Figure 1. dbGaP summary statistics
The submissions made to dbGaP represent the best and latest research in topic areas such as cardiovascular diseases, diabetes, autism spectrum disorders, precision medicine and many more. Submitters are central to the success of dbGaP and sharing of genomic research across the broader scientific community. Our submission portal serves as a central place to collect multiple components of a research study, including the metadata/summary and associated phenotype, genotype, and sequence data.