The NIH NCBI Sequence Read Archive (SRA) on AWS, containing all public SRA data, is now live! This data is hosted on Amazon Web Services (AWS) under the Open Data Sponsorship Program (ODP) with support from NIH’s Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative.
National Library of Medicine’s (NLM) National Center for Biotechnology Information (NCBI) and Amazon Web Services (AWS) are happy to announce that the controlled- and public-access Sequence Read Archive (SRA)–one of the world’s largest repositories of raw next generation sequencing data–will be freely accessible from Amazon S3 via the Open Data Sponsorship Program (ODP) as of January 2021. The SRA is currently hosted by NLM at the National Institutes of Health (NIH).
On a typical day, researchers download about 30 terabytes of data from NCBI in an effort to make discoveries. NCBI began providing online access to data in the early 1990s, starting with the GenBank database of DNA sequences. Over the years we’ve greatly expanded the types and quantity of data available. You can now find on our site descriptions and data from experimental studies such as next-generation sequencing projects, bioactivity assays for small molecules, microarray datasets and genome-wide association studies.
The White House recently recognized these efforts by awarding NCBI Director David J. Lipman with the “Open Science” Champion of Change Award . The scientific community has recognized the benefits of open data. Access to this information serves as a source of both original and supplemental data for exploration and validation [2-4], which improves the power of experimental data  while increasing the speed and decreasing the cost of discovery .
In this post, we summarize three recent cases where researchers used data from an NCBI resource/database to make significant discoveries.