The NIH NCBI Sequence Read Archive (SRA) on AWS, containing all public SRA data, is now live! This data is hosted on Amazon Web Services (AWS) under the Open Data Sponsorship Program (ODP) with support from NIH’s Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative.
The SRA is NIH’s primary repository for raw, high-throughput sequencing data and hosts over 43 petabytes of sequence data representing controlled- and public-access datasets and continues to grow exponentially. Rapid and reliable access to data is paramount to support research and translate discovery into insights. Including this dataset in the AWS ODP means researchers can access and egress critical datasets at no cost, helping them get straight to the science. These data are publicly accessible from S3 for researchers to download and analyze locally or compute on, directly in the cloud.
Please see details on the content of the open-access S3 buckets below:
- All available public-access SRA data in NCBI‘s normalized SRA format.
- Metadata files for the entire SRA. View our recent webinar to learn more about using SRA metadata to get to your data of interest, faster.
- Newly-released, high-value original (user-submitted) sequencing data files from commonly used NGS platforms.
The National Library of Medicine (NLM) also maintains an additional bucket that contains 250 TB of coronavirus genome sequence data. Work is currently underway to add controlled-access data, as previously announced. NIH is committed to hosting large data sets and bringing together computational tools and cloud technologies in ways that support open access, interoperability, and collaborative analyses.
Write to us at email@example.com to let us know how we can serve your research needs better.