To make it easier for you to find and access Sequence Read Archive (SRA) data, we are re-organizing and improving our cloud storage systems.
Beginning April 2023, we will move the SARS-CoV-2 normalized data and source files from the COVID-19 data buckets on Amazon Web Services (AWS) and Google Cloud Platform (GCP) to the NIH NCBI SRA on AWS registry. We will also remove the SARS-CoV-2 original format data from AWS and GCP COVID-19 buckets and make them available in AWS cold storage. If you need these data, you can request them using the Cloud Data Delivery Service (CDDS).
Where and how will I be able to access SARS-CoV-2 normalized data after this change occurs?
To ensure a smooth transition, we want you to have enough time to adjust your scripts and pipelines to minimize disruption to your analyses.
- For SRA Toolkit users, the latest version of the Toolkit is configured to automatically locate data in their new locations.
- For those not using SRA Toolkit, we recommend updating pipelines to look for SARS-CoV-2 SRA normalized files in the COVID AWS bucket first (s3:::sra-pub-sars-cov2), and if the file is not found, look in the SRA AWS ODP bucket (s3:::sra-pub-run-odp).
Other SARS-CoV-2 data assets, including Variant Calling Format (VCF) data and metadata tables that are products of our dedicated SARS-CoV-2 Variant Calling Pipeline, will continue to be available in the COVID-19 Genome Sequence dataset on AWS and GCP.
The SARS-CoV-2 Variant Calling Pipeline and the SRA data on AWS and GCP are supported by the NIH Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) and Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiatives.
Questions?
We appreciate your understanding and cooperation as we work to improve access to our data on the cloud. Please contact our help desk with any questions or concerns.