GRAF, a new tool for finding duplicates and closely related samples in large genomic datasets


Genome-wide association studies (GWAS) usually rely on the assumption that different samples aren’t from closely related individuals. If you’re using combined datasets that have been genotyped on different platforms, though, how do you detect duplicates and close relatives?

The dbGaP team at NCBI developed a new software tool and rapid statistical method called Genetic Relationship and Fingerprinting (GRAF) to do exactly that. At NCBI, we use GRAF as a quality assurance tool in dbGaP data processing. We’re presenting this tool publicly so any researcher can check the quality of their own data.

GRAF uses two statistical metrics to determine subject relationships directly from the observed genotypes, without estimating probabilities of identity by descent (IBD), or kinship coefficients, and compares the predicted relationships with those reported in the pedigree files. Please see the PLOS ONE article published in July 2017 for a detailed description of GRAF.

A recent update to GRAF adds the ability to determine subject ancestries. For more information on this addition, visit Poster #1322T, “Quickly determining subject ancestries in large datasets using genotypes of dbGaP fingerprint SNPs”, on Thursday, October 19th from 3-4 in the Exhibit Hall at ASHG.

One thought on “GRAF, a new tool for finding duplicates and closely related samples in large genomic datasets

  1. Pingback: Weekly Postings | The MARquee

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s