GRAF, a tool for finding duplicates and closely related samples in large genomic datasets


NCBI’s Genetic Relationship and Fingerprinting (GRAF) tool is a quality assurance tool that can quickly find duplicates and closely related subjects in your data using SNP genotypes.

The population tool GRAF-pop included in GRAF computes subject ancestries using genotypes and normalizes ancestry prediction in large datasets collected across different genotyping platforms, making it possible to generate population frequency based on more than a million dbGaP samples.

Who can use this?

GRAF is a tool for researchers; it is not designed to assess an individual’s ancestry or to find relatives.

You can use this tool against your own large datasets with results generated within hours or minutes, even when there is a very high genotype missing rate to the order of 99%. This tool can check genotype datasets obtained using different chips or platforms, plotting them in the same picture for comparison purposes.

NCBI’s database of Genotypes and Phenotypes (dbGaP) uses GRAF-pop for quality control and computation to generate population frequency data from dbGaP studies with more than a million samples. This population frequency data will allow for a larger set of populations against many more novel variants, compared to what has been available historically. Clinicians and researchers can use this frequency data in rare variant identification, variant interpretation, assay design, and many more applications.

What is the underlying technology?

GRAF is a downloadable C++ application, compiled for GNU/Linux.

This tool can output the data in .txt files and plot the data for export as .png files. It is also built into a CGI for dbGaP submitters and users to examine the results in dynamic web pages.

Want to learn more about GRAF and GRAF-pop?

Here are a couple ways:

  • Read this recent G3 Journal article that explains how this GRAF-pop uses fast distance-based method to infer subject ancestry from multiple genotype datasets without principal components analysis (PCA).
  • Read this paper about GRAF.
  • Watch our video describing method and use.
  • Read our User Guide to learn how to use GRAF.
  • Download the free software from the dbGaP website and tell us about your experience.

Contact us at info@ncbi.nlm.nih.gov to let us know how dbGaP, GRAF, or GRAF-pop has helped you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s