NCBI offers a portfolio of medical genetics resources to help you research, diagnose, and treat diseases and conditions. You can easily access our data and tools through the Medical Genetics and Human Variation page of the NCBI website. We also encourage you to join our community of thousands of submitters and share your germline and/or somatic data to advance discovery and optimize clinical care.
How and why should you use our resources? Consider the example below.
Your patient is a 40-year-old mother of two presenting with changes in bathroom habits, bleeding, and belly pain. She has a medical history of colonic polyps. Her family history reveals that her maternal grandmother, mother and uncle had several forms of cancers including colon, breast, and endometrium.
The Genome Data Viewer (GDV) is now the comprehensive NCBI genome browser. The development of GDV led to a few different types of genome browsers along the way, each one originally delivering visual displays for particular datasets. We developed the 1000 Genomes Browser for variation data from the 1000 Genomes project, the dbGaP Data Browser for controlled-access sequence read alignment data, and the GeT-RM browser for Genome in a Bottle (GIAB) data.
The data displayed in these three browsers is now either obsolete and/or can largely be accessed from the GDV browser or other NCBI resources. Moreover, unlike GDV, these older browsers are no longer under active development and the data has not been updated to meet changing needs of the communities they were developed to serve. For these reasons we will retire these browsers in April 2022. Please see details below for more information on the data displayed in these browsers and how to access and display these data now through GDV and other means.
Attention dbGaP submitters! Join us on November 3, 2021 at 12PM US eastern time to learn about data submission and processing improvements to dbGaP, NIH’s database of Genotype and Phenotype, which contains individual-level data associated with human research studies. You will see how we have made submission easier through the Submission Portal using automated preliminary validation and how you can use GaPTools, a stand-alone data validation tool, on your own submission to expedite the submission process. Join us to discover how dbGaP ensures integrity and high-quality in the genomic data that scientists can access to further their research.
Did you know that you can see epigenomic or other experimental data in NCBI’s Genome Data Viewer (GDV)?
You can easily add aligned study results from GEO, SRA, and dbGaP as data tracks to GDV browser view. Just go to the Tracks button on the toolbar and select the menu option to Configure Tracks. Navigate to the ‘Find Tracks’ tab on the pop-up Configure panel (Figure 1).
We have just launched GaPTools, a stand-alone data validation tool for NCBI’s database of Genotype and Phenotype (dbGaP) submissions. You can use GaPTools to validate your dbGaP submissions or submissions to other genomic data repositories. GaPTools checks for common data inconsistency and integrity issues and validates subject-sample ID mapping, subject consents, data dictionaries, and phenotype and genotype data. GaPTools is available as a docker image on Docker Hub.
Why Use GaPTools?
GaPTools will validate files before you submit (see Figure 1). This means that by the time you formally submit, some of the pre-validation steps are already addressed. This tool allows you to prepare your data quickly and ensures a faster processing cycle and a faster release of your individual-level research data.Figure 1: Flow chart depicting data submission and GaPTools validation
dbGaP has recently released a new feature to simplify submissions and provide study accessions faster. This video provides a quick overview of the new feature.
Our new study config webform enables a study submitter to enter important study summary information including study description, inclusion/exclusion criteria, history, attribution, and associated publications online and instantly preview the study config content and study accession on their dbGaP study report page. Study design and type, PMIDs, Genes, MeSH terms, and associated Clinical Trials have built-in help and validation to ensure that the information provided is complete and searchable by users looking for that data.
The database of Genotypes and Phenotypes (dbGaP) provides controlled-access to the data and results from studies that have investigated the interaction of genotype and phenotype in humans. dbGaP assigns stable, unique identifiers to studies and subsets of information from those studies, including documents, individual phenotypic variables, tables of trait data, sets of genotype data, computed phenotype-genotype associations, and groups of study subjects who have given similar consents for use of their data.
The submissions made to dbGaP represent the best and latest research in topic areas such as cardiovascular diseases, diabetes, autism spectrum disorders, precision medicine and many more. Submitters are central to the success of dbGaP and sharing of genomic research across the broader scientific community. Our submission portal serves as a central place to collect multiple components of a research study, including the metadata/summary and associated phenotype, genotype, and sequence data.
Two up-and-coming NCBI resources will be featured in videos, surveys and live events at the American Society for Human Genetics (ASHG) 2020 Annual Meeting. Come and watch on-demand videos in the CoLab Theater. Then, let us know what you think and how you do or might use these resources by either taking an online survey or joining us for the CoLab Live! Events on Thursday, October 29, 2020.
The National Library of Medicine (NLM) is pleased to announce that all controlled-access and publicly available data in SRA is now available through Google Cloud Platform (GCP) and Amazon Web Services (AWS). To access the data please visit our SRA in the Cloud webpage where you will find links to our new SRA Toolkit and other access methods.
The SRA data available in the two clouds currently totals more than 14 petabytes and consists of all data in the SRA format as well as some data in its original submission format. Since May 2019, NCBI has been putting all submitted SRA data on the GCP and AWS clouds in both the submitted format and our converted SRA format. We have also been moving previously submitted original format data to the clouds and expect to complete that process in 2021. Continue reading “The entire corpus of the Sequence Read Archive (SRA) now live on two cloud platforms!”→
Check out the latest videos on YouTube to learn how to best use NCBI graphical viewers, SRA, PGAP, and other resources.
Genome Data Viewer: Analyzing Remote BAM Alignment Files and Other Tips
This video shows you how to upload remote BAM files, and succinctly demonstrates handy viewer settings, such as Pileup display options, and highlights the very helpful tooltips in the Genome Data Viewer (GDV). There’s also a brief blog post on the same topic.