As the American Society of Human Genetics (ASHG) conference is around the corner, the NCBI staff begin to prep for their presentations in San Diego. Here is some background for dbGaP’s poster about their process to improve data storage and accessibility.
Visit Poster 1435T “Storage and use of dbGaP data in the cloud” Thursday, October 18 from 2 PM to 3PM. (Exhibit Hall, Ground Floor)
Back in 2006, NCBI developed a database of Genotypes and Phenotypes (dbGaP) to archive and provide access to information from genome-scale studies that investigate the interaction of genotype and phenotype in humans. To support next-generation sequence data deposited to dbGaP, the NCBI’s Sequence Read Archive (SRA) processes and distributes the data.
If you’ve ever tried downloading the raw sequence data, you know that the files’ enormous size usually leads to a slow, time-consuming download. You might have even encountered the ever-dreaded timeout error – a common problem when downloading large datasets. Even if you can download all the files you want, do you have the disk space to store it? What about for analyses? With issues like this, is the data really accessible?
Making data more accessible is part of the NIH Genomic Data Sharing (GDS) policy, which sets expectations to ensure the broad and responsible sharing of genomic research data. One solution to the accessibility problem is cloud-based storage. By placing the large datasets into cloud-based storage, you will no longer need to download the data and create redundant copies as the analyses can directly go to the cloud-hosted data. To develop cloud-hosted data, the NIH funded the Sequence Data Delivery Pilot (SDDP) project in support of the NIH GDS policy. In collaboration with SDDP, the dbGaP/SRA has successfully begun a new model to share data by providing straightforward access to large cloud-based datasets.
Need the full technical details? Talk to the expert at ASHG by going to Poster 1435T “Storage and use of dbGaP data in the cloud” on Thursday, Oct. 18 from 2 PM to 3 PM.