The National Library of Medicine (NLM) is pleased to announce that all controlled-access and publicly available data in SRA is now available through Google Cloud Platform (GCP) and Amazon Web Services (AWS). To access the data please visit our SRA in the Cloud webpage where you will find links to our new SRA Toolkit and other access methods.
The SRA data available in the two clouds currently totals more than 14 petabytes and consists of all data in the SRA format as well as some data in its original submission format. Since May 2019, NCBI has been putting all submitted SRA data on the GCP and AWS clouds in both the submitted format and our converted SRA format. We have also been moving previously submitted original format data to the clouds and expect to complete that process in 2021. Continue reading “The entire corpus of the Sequence Read Archive (SRA) now live on two cloud platforms!”→
Check out the latest videos on YouTube to learn how to best use NCBI graphical viewers, SRA, PGAP, and other resources.
Genome Data Viewer: Analyzing Remote BAM Alignment Files and Other Tips
This video shows you how to upload remote BAM files, and succinctly demonstrates handy viewer settings, such as Pileup display options, and highlights the very helpful tooltips in the Genome Data Viewer (GDV). There’s also a brief blog post on the same topic.
NCBI’s Genetic Relationship and Fingerprinting (GRAF) tool is a quality assurance tool that can quickly find duplicates and closely related subjects in your data using SNP genotypes.
The population tool GRAF-pop included in GRAF computes subject ancestries using genotypes and normalizes ancestry prediction in large datasets collected across different genotyping platforms, making it possible to generate population frequency based on more than a million dbGaP samples.
Who can use this?
GRAF is a tool for researchers; it is not designed to assess an individual’s ancestry or to find relatives.
You can use this tool against your own large datasets with results generated within hours or minutes, even when there is a very high genotype missing rate to the order of 99%. This tool can check genotype datasets obtained using different chips or platforms, plotting them in the same picture for comparison purposes.
Do you need access to controlled data in the database of Genotypes and Phenotypes (dbGaP)? This short video will show you how to request data today!
dbGaP archives and distributes the data and results from studies that have investigated the interaction of genotype and phenotype in humans. Responsible stewardship of controlled-access data subject to the NIH GDS Policy is shared among the NIH, the investigators approved to access the data, and the investigators’ institutions.
Next Wednesday, June 27, 2018, we’ll introduce you to the Genetic Relationship and Fingerprinting (GRAF) software package. GRAF is a quality assurance tool that finds duplicates and closely related subjects in your data using SNP genotypes. We’ll also introduce the GRAF-pop feature, which computes subject ancestries and plots data for export as a .png or .txt file.
Date and time: Wed, June 27 12:00 PM – 12:30 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
Genome-wide association studies (GWAS) usually rely on the assumption that different samples aren’t from closely related individuals. If you’re using combined datasets that have been genotyped on different platforms, though, how do you detect duplicates and close relatives?
The dbGaP team at NCBI developed a new software tool and rapid statistical method called Genetic Relationship and Fingerprinting (GRAF) to do exactly that. At NCBI, we use GRAF as a quality assurance tool in dbGaP data processing. We’re presenting this tool publicly so any researcher can check the quality of their own data.
GRAF uses two statistical metrics to determine subject relationships directly from the observed genotypes, without estimating probabilities of identity by descent (IBD), or kinship coefficients, and compares the predicted relationships with those reported in the pedigree files. Please see the PLOS ONE article published in July 2017 for a detailed description of GRAF.
A recent update to GRAF adds the ability to determine subject ancestries. For more information on this addition, visit Poster #1322T, “Quickly determining subject ancestries in large datasets using genotypes of dbGaP fingerprint SNPs”, on Thursday, October 19th from 3-4 in the Exhibit Hall at ASHG.
We invite you to join us at the dbGaP 10th Anniversary Symposium to be held on June 9, 2017; 1:30-3:00 PM Wilson Hall, Building-1 on the NIH Bethesda campus. For information on Campus access and security, NIH Visitor Center, Parking, and directions to NIH, see the NIH Visitor Information page.