Tag: cloud computing

December 9 Webinar: Using BLAST+ in Docker and on the cloud

December 9 Webinar: Using BLAST+ in Docker and on the cloud

Join us on December 9, 2020 to learn about containerized BLAST+ in Docker that is ready to use locally and in the cloud. We are staging BLAST databases in some cloud providers making running containerized BLAST as part of a pipeline in the cloud even easier. In this webinar you will learn about the advantages of containerized BLAST and learn how to use it in some practical examples. You will also learn about Elastic BLAST, a cloud application that is useful for aligning extremely large numbers of sequences against BLAST databases.

  • Date and time: Wed, December 9, 2020 12:00 PM – 12:45 PM EST
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

December 2 Webinar: Using the new Read assembly and Annotation Pipeline Tool (RAPT) to assemble and annotate microbial genomes

December 2 Webinar: Using the new Read assembly and Annotation Pipeline Tool (RAPT) to assemble and annotate microbial genomes

Join us December 2 to learn how to use the Read assembly and Annotation Pipeline Tool (RAPT). With RAPT, you can assemble and annotate a microbial genome right out of the sequencing machine! Provide the short genomic reads or an SRA run on input, and get back the sequence annotated with a complete gene set. The assembly is built with SKESA and annotated with PGAP. In addition, RAPT also verifies the taxonomic assignment of the genome with the Average Nucleotide Identity tool. In this webinar, you will learn how you can run RAPT on your own machine or on the Google Cloud Platform.

  • Date and time: Wed, December 2, 2020 12:00 PM – 12:45 PM EST
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

We want to hear from you about changes to NIH’s Sequence Read Archive data format and storage

RFI_SRA_largeNIH’s Sequence Read Archive (SRA) is the largest, most diverse collection of next generation sequencing data from human, non-human and microbial sources. Hosted by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM), SRA data is also available on the Google Cloud Platform (GCP) and Amazon Web Services (AWS) as part of the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative.

SRA currently contains more than 36 petabytes (PB) of data and is projected to grow to 43 PB by 2023. Though the value of this resource grows with each new sample, the exponential growth experienced over the last decade (Figure 1) threatens SRA sustainability. The storage footprint is growing more costly to maintain and the data more difficult to use at scale. The situation has reached a tipping point. SRA must be refactored to support FAIR data principles into the future.

Sra_growthFigure 1. SRA data has grown exponentially over the last decade.

NIH remains committed to the SRA and hopes to establish a long-range plan for sustained resource growth. Considerations include a model wherein normalized working files without Base Quality Scores (BQS) are readily available through cloud platforms and NCBI FTP sites, and larger source files and normalized files with base quality scores will be distributed on cloud platforms based on prevalent use cases and usage demands. Further details regarding data formats are available here.

It is critical that as an SRA user, you  participate in the review and testing of proposed data formats and infrastructure by commenting on how these developments impact your data usage. NIH has prepared a Request for Information (RFI) that details planned developments and would greatly appreciate feedback from the scientific community.

Continue reading “We want to hear from you about changes to NIH’s Sequence Read Archive data format and storage”

May 20 webinar: Exploring SRA metadata in the cloud with BigQuery

May 20 webinar: Exploring SRA metadata in the cloud with BigQuery

Join us on May 20th to learn how to use Google’s BigQuery to quickly search the data from the Sequence Read Archive (SRA) in the cloud to speed up your bioinformatic research and discovery projects. BigQuery is a tool for exploring cloud-based data tables with SQL-like queries. In this webinar, we’ll introduce you to using BigQuery to mine SRA submitter-supplied metadata and the results of taxonomic analysis for SRA runs. You’ll see real-world case studies that demonstrate how to find key information about SRA runs and identify data sets for your own analysis pipelines.

  • Date and time: Wed, May 20, 2020 12:00 PM – 12:45 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

The entire corpus of the Sequence Read Archive (SRA) now live on two cloud platforms!

The National Library of Medicine (NLM) is pleased to announce that all controlled-access and publicly available data in SRA is now available through Google Cloud Platform (GCP) and Amazon Web Services (AWS). To access the data please visit our SRA in the Cloud webpage where you will find links to our new SRA Toolkit and other access methods.

The SRA data available in the two clouds currently totals more than 14 petabytes and consists of all data in the SRA format as well as some data in its original submission format.  Since May 2019, NCBI has been putting all submitted SRA data on the GCP and AWS clouds in both the submitted format and our converted SRA format. We have also been moving previously submitted original format data to the clouds and expect to complete that process in 2021. Continue reading “The entire corpus of the Sequence Read Archive (SRA) now live on two cloud platforms!”

Computational Medicine Codeathon and AWS workshop at Chapel Hill in March

Computational Medicine Codeathon and AWS workshop at Chapel Hill in March

NIH is pleased to announce a computational medicine-focused codeathon. To apply, please complete the application form by February 25, 2020. We will also be offering a free workshop, AWS Technical Essentials, the day before the codeathon. Read on for more information about the event. Continue reading “Computational Medicine Codeathon and AWS workshop at Chapel Hill in March”

Genome Workbench is now in the cloud!

If you’re interested in visualizing and analyzing genomic data, then you’ll want to check out a new way to run Genome Workbench: in the cloud! Genome Workbench is a desktop application (both Windows and Mac) that lets you analyze genomic data in one place. You can run tools such as BLAST and create views such as multiple sequence alignments, and much more. You can run Genome Workbench on a cloud environment from your local desktop computer. This manual will show you how.

blog-525_Cloud Graphic

There are many advantages to using Genome Workbench in the cloud:

  • You can easily compare your data to the complete GenBank and RefSeq datasets without needing to download them
  • You can run BLAST searches against standard databases or any custom databases you’ve assembled in the cloud
  • All of the data (e.g. FASTA, BAM, GFF files) remain in the cloud with no need for local copies
  • You won’t pay egress fees for downloading data

Give it a try and let us know how it goes!