June 27 NCBI Minute: dbGaP’s New Ancestry Composition Visualization tool and GRAF Software

Next Wednesday, June 27, 2018, we’ll introduce you to the Genetic Relationship and Fingerprinting (GRAF) software package. GRAF is a quality assurance tool that finds duplicates and closely related subjects in your data using SNP genotypes. We’ll also introduce the GRAF-pop feature, which computes subject ancestries and plots data for export as a .png or .txt file.

Date and time: Wed, June 27 12:00 PM – 12:30 PM EDT

Register here: https://bit.ly/2LjCaML

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

GRAF, a new tool for finding duplicates and closely related samples in large genomic datasets

Genome-wide association studies (GWAS) usually rely on the assumption that different samples aren’t from closely related individuals. If you’re using combined datasets that have been genotyped on different platforms, though, how do you detect duplicates and close relatives?

The dbGaP team at NCBI developed a new software tool and rapid statistical method called Genetic Relationship and Fingerprinting (GRAF) to do exactly that. At NCBI, we use GRAF as a quality assurance tool in dbGaP data processing. We’re presenting this tool publicly so any researcher can check the quality of their own data.

GRAF uses two statistical metrics to determine subject relationships directly from the observed genotypes, without estimating probabilities of identity by descent (IBD), or kinship coefficients, and compares the predicted relationships with those reported in the pedigree files. Please see the PLOS ONE article published in July 2017 for a detailed description of GRAF.

A recent update to GRAF adds the ability to determine subject ancestries. For more information on this addition, visit Poster #1322T, “Quickly determining subject ancestries in large datasets using genotypes of dbGaP fingerprint SNPs”, on Thursday, October 19th from 3-4 in the Exhibit Hall at ASHG.

dbGaP 10th Anniversary Symposium June 9, 2017

dbGaP (the NIH database of Genotypes and Phenotypes) is celebrating its 10th Anniversary this year! We are proud to support over 850 studies and 1.6 million samples.

We invite you to join us at the dbGaP 10th Anniversary Symposium to be held on June 9, 2017; 1:30-3:00 PM Wilson Hall, Building-1 on the NIH Bethesda campus. For information on Campus access and security, NIH Visitor Center, Parking, and directions to NIH, see the NIH Visitor Information page.

Continue reading

NCBI’s Open Data – A Source of Experimental Data for Important Discoveries

On a typical day, researchers download about 30 terabytes of data from NCBI in an effort to make discoveries. NCBI began providing online access to data in the early 1990s, starting with the GenBank database of DNA sequences. Over the years we’ve greatly expanded the types and quantity of data available. You can now find on our site descriptions and data from experimental studies such as next-generation sequencing projects, bioactivity assays for small molecules, microarray datasets and genome-wide association studies.

The White House recently recognized these efforts by awarding NCBI Director David J. Lipman with the “Open Science” Champion of Change Award [1]. The scientific community has recognized the benefits of open data. Access to this information serves as  a source of both original and supplemental data for exploration and validation [2-4], which improves the power of experimental data [5] while increasing the speed and decreasing the cost of discovery [6].

In this post, we summarize three recent cases where researchers used data from an NCBI resource/database to make significant discoveries.

Continue reading