Tag: Sequence Read Archive (SRA)

Learn the best way to find data in NIH’s Sequence Read Archive (SRA) on the cloud

Learn the best way to find data in NIH’s Sequence Read Archive (SRA) on the cloud

NCBI will present a workshop at the American Society for Human Genetics (ASHG) as part of their conference activities in 2021. The workshop is scheduled for Wednesday, September 15, 2021.

Register now!

Adelaide Rhodes, Ph.D. from the Customer Experience team and Adam Stine, SRA Curator will co-lead the workshop, which will introduce attendees to powerful metadata searches on BigQuery on Google Cloud Platform (GCP) and Athena on Amazon Web Services (AWS) to speed up analytic workflows using the NIH’s Sequence Read Archive (SRA).

Cloud-based query services with expanded metadata options for SRA help researchers to find the target data more quickly than ever before. The workshop will be a mix of training in Structured Query Language (SQL), demos on the cloud console and hands-on exercises in Jupyter notebooks with examples to help researchers understand how to build searches in SQL. Researchers who attend this workshop will learn how to extract specific data sets as well as how to conduct exploratory analysis of the entirety of the SRA data available in the cloud.

Both BigQuery and Athena require SQL but no prior SQL experience is required. By the end of this workshop you will know how to run cloud metadata queries using SQL to find SRA data based on parameters that are of interest to you.

Adam Stine, Ph.D., SRA Curator
Adelaide Rhodes, Ph.D., Customer Experience

 

Tackling Petabyte Scale Sequence Search Challenges

Tackling Petabyte Scale Sequence Search Challenges

The volume of biological data being generated by the scientific community is growing exponentially, reflecting technological advances and research activities. This increase in available data has great promise for pushing scientific discovery but also introduces new challenges that scientific communities need to address. The National Institutes of Health’s (NIH) Sequence Read Archive (SRA), which is maintained by the National Library of Medicine’s National Center for Biotechnology Information (NCBI), is a rapidly growing public database that researchers use to improve scientific discovery across all domains of life. As part of the Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative, over 36 petabytes of “next generation” (raw and SRA-formatted) sequencing data is accessible to anybody via two cloud service providers.

To help address the challenges of conducting large-scale analysis of -omic data in the SRA and similar databases, the Department of Energy (DOE) Office of Biological and Environmental Research (BER), the NIH Office of Data Science Strategy (ODSS), and NCBI, held a virtual workshop on June 8, 2021, on Emerging Solutions in Petabyte Scale Sequence Search. The workshop brought together experts from DOE national labs, research institutions, and universities across the world.

SRA data growth over time. Databases like the NIH Sequence Read Archive are growing rapidly and are used extensively by scientific communities. As these databases grow, so do their potential scientific value, but work must be done to ensure ease of access. 

Continue reading “Tackling Petabyte Scale Sequence Search Challenges”

Aug 18 Webinar: Finding Data for your Research Organism: Plants and RNA-Seq data

Aug 18 Webinar: Finding Data for your Research Organism: Plants and RNA-Seq data

Join us on August 18, 2021 at 12PM eastern time for the second webinar on finding data for your non-model research organism. In this webinar, you will learn how to use NCBI’s web resources to get data for a plant species, the black cottonwood. You will see how to find, access, and analyze gene and sequence data from Datasets and other NCBI web resources, as well as sample metadata and gene expression RNA-Seq data from SRA and the SRA Run Selector. You will also see an example that highlights how to use and analyze these data in a typical workflow set up in a Jupyter notebook that uses the NCBI next-gen aligner Magic-BLAST to get relative gene expression levels across samples.

  • Date and time: Wed, August 18, 2021 12:00 PM – 12:45 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI webinars playlist on the NLM YouTube channel. You can learn about future webinars on the Webinars and Courses page.

NCBI events at the Bioinformatics Open Science Conference 2021 (BOSC 2021)

NCBI events at the Bioinformatics Open Science Conference 2021 (BOSC 2021)

Come visit us virtually to learn about new NCBI data access, tools and best practices at the Bioinformatics Open Science Conference  part of the ISMB/ECCB online conference from July 29 – 30, 2021. We will be presenting virtual posters on NCBI resources, offering a Birds of a Feather discussion, and participating in the BOSC  (CoFest) following the conference where you can take part in a hands-on evaluation of ElasticBLAST.

NCBI Posters, July 29, 2021, 11:20 – 12:20 PM EDT

All posters will be presented on Thursday afternoon. You can see complete abstracts on the ISMB/ECCB BOSC schedule.

Nuala O’Leary will talk about NCBI Datasets, a new resource for fast, easy access to NCBI sequence data.  You will learn about the new interface and new tools to access reference genomes, genes, and orthologs using web-based and programmatic tools.

Adelaide Rhodes will present Open access NCBI cloud resources to accelerate scientific insights where you can learn about recent developments in transferring > 20 petabytes of NCBI Sequence Read Archive (SRA) data to the cloud.

Deacon Sweeney will describe the web RAPT service for assembling and annotating bacterial genomes at the click of a button in RAPT, The Read assembly and Annotation Pipeline Tool: building a prokaryotic genome annotation package for users of all backgrounds.

Roberto Vera Alvarez will talk about best practices for using cloud tools for transcriptomics in his poster Transcriptome annotation in the cloud: complexity, best practices, and cost.

Greg Boratyn will discuss improvements to the BLAST-based short read aligner, Magic-Blast, in Recent improvements in Magic-BLAST 1.6.

Visit Christiam Camacho’s poster ElasticBLAST: Using the power of the cloud to speed up science to get an introduction to  ElasticBLAST, a Kubernetes-based approach for high throughput BLAST tasks. Join us following the conference in the CoFest to try out ElasticBLAST yourself and provide input. See the section on the CoFest below and our companion post.

Birds of a Feather, July 29, 2021, 11:20 – 12:20 PM EDT

We will host a Birds of Feather public feedback session on Thursday, where you can provide feedback and participate in discussions on all aspects of NCBI’s new data access options: NCBI Datasets, SRA, BLAST, and the Genome Data Viewer (GDV) — our genome browser for sequence visualization. We welcome your input!  Come and see us!

CollaborationFest (CoFest), July 31 – August 1, 2021

The ElasticBlast team will attend the BOSC CoFest following the conference. Sign up to participate on July 31 and August 1 to get an in-depth orientation and opportunity to test the capabilities of ElasticBlast on the Amazon Web Services (AWS) cloud. You do not have to register for the conference to attend the CoFest. See our post on the CoFest for more information.

 

NCBI to present on SRA and cloud computing at the 2021 Galaxy Community Conference

NCBI to present on SRA and cloud computing at the 2021 Galaxy Community Conference

 

We’re bringing exciting developments to our user community at the 2021 Galaxy Community Conference (GCC 2021), which is virtual this year!

Dr. Jon Trow, SRA Subject Matter Expert
Dr. Adelaide Rhodes, Cloud Subject Matter Expert

 

 

 

 

 

 

 

 

 

 

We start with hosting NCBI’s first ever GCC training week tutorial co-written by Jon Trow, Ph.D. – Sequence Read Archive (SRA): Subject Matter Expert and Adelaide Rhodes, Ph.D. – Cloud: Subject Matter Expert. This tutorial will become a permanent addition to the Galaxy Training Network. The tutorial, “SRA Aligned Read Format (SARF) to Speed Up SARS-CoV-2 Data Analysis”, has detailed instructions and a video demonstration on how to search SRA metadata for SARFs and download them into Galaxy workflows. We will be available via Slack during Office Hours for live virtual interactions.

Continue reading “NCBI to present on SRA and cloud computing at the 2021 Galaxy Community Conference”

The wait is over… NIH’s Public Sequence Read Archive is now open access on the cloud

The NIH NCBI Sequence Read Archive (SRA) on AWS, containing all public SRA data, is now live! This data is hosted on Amazon Web Services (AWS) under the Open Data Sponsorship Program (ODP) with support from NIH’s Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative.

Continue reading “The wait is over… NIH’s Public Sequence Read Archive is now open access on the cloud”

Magic-BLAST version 1.6.0 is here!

Magic-BLAST version 1.6.0 is here!

We’ve just released  a new version (1.6.0) of Magic-BLAST, the BLAST-based next-gen alignment tool, with these improvements:

  • Usage reporting — you can help improve Magic-BLAST by sharing limited information about your search. The BLAST User Manual has details on the information collected, how it is used, and how to opt-out.
  • Magic BLAST can access NCBI SRA next-gen reads from the cloud when  you use the -sra or -sra_batch options.  See the Magic-BLAST cookbook for more details.
  • NCBI taxonomy IDs are reported in SAM output if they are present in the target BLAST database.
  • You can get unaligned reads reported separately from the aligned ones by using the -out_unaligned <file name> option.  You can also select the format ( SAM, tabular, or FASTA) with the -unaligned_fmt option. The default format is the same as one for the main report .

The version 1.6.0 executables are available from the NCBI FTP site.  See the release notes , the NCBI GitHub site , and the Magic-BLAST publication for more information.

May 19 Webinar: Using the new web RAPT service to assemble and annotate prokaryotic genomes

May 19 Webinar: Using the new web RAPT service to assemble and annotate prokaryotic genomes

Join us on May 19, 2021 at 12PM eastern time to learn how to use the new  RAPT pilot service to assemble and annotate public or private Illumina genomic reads sequenced from bacterial or archaeal isolates at the click of a button. RAPT consists of two major components, the genome assembler SKESA and the Prokaryotic Genome Annotation Pipeline (PGAP), and produces an annotated genome of quality comparable to RefSeq in a couple of hours.

  • Date and time: Wed, May 19, 2021 12:00 PM – 12:45 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI webinars playlist on the NLM YouTube channel. You can learn about future webinars on the Webinars and Courses page.

NIH’s Sequence Read Archive to be made available on AWS’s Open Data Sponsorship Program

NIH’s Sequence Read Archive to be made available on AWS’s Open Data Sponsorship Program

National Library of Medicine’s (NLM) National Center for Biotechnology Information (NCBI) and Amazon Web Services (AWS) are happy to announce that the controlled- and public-access Sequence Read Archive (SRA)–one of the world’s largest repositories of raw next generation sequencing data–will be freely accessible from Amazon S3 via the Open Data Sponsorship Program (ODP) as of January 2021. The SRA is currently hosted by NLM at the National Institutes of Health (NIH).

Continue reading “NIH’s Sequence Read Archive to be made available on AWS’s Open Data Sponsorship Program”