Tag: Cloud computing

Learn the best way to find data in NIH’s Sequence Read Archive (SRA) on the cloud

Learn the best way to find data in NIH’s Sequence Read Archive (SRA) on the cloud

NCBI will present a workshop at the American Society for Human Genetics (ASHG) as part of their conference activities in 2021. The workshop is scheduled for Wednesday, September 15, 2021.

Register now!

Adelaide Rhodes, Ph.D. from the Customer Experience team and Adam Stine, SRA Curator will co-lead the workshop, which will introduce attendees to powerful metadata searches on BigQuery on Google Cloud Platform (GCP) and Athena on Amazon Web Services (AWS) to speed up analytic workflows using the NIH’s Sequence Read Archive (SRA).

Cloud-based query services with expanded metadata options for SRA help researchers to find the target data more quickly than ever before. The workshop will be a mix of training in Structured Query Language (SQL), demos on the cloud console and hands-on exercises in Jupyter notebooks with examples to help researchers understand how to build searches in SQL. Researchers who attend this workshop will learn how to extract specific data sets as well as how to conduct exploratory analysis of the entirety of the SRA data available in the cloud.

Both BigQuery and Athena require SQL but no prior SQL experience is required. By the end of this workshop you will know how to run cloud metadata queries using SQL to find SRA data based on parameters that are of interest to you.

Adam Stine, Ph.D., SRA Curator
Adelaide Rhodes, Ph.D., Customer Experience

 

PubMed Central Article Datasets are Now Available on the Cloud

To enhance machine access to biomedical literature and drive impactful analyses and reuse, the National Library of Medicine (NLM) is pleased to announce the availability of the PubMed Central (PMC) Article Datasets on Amazon Web Services (AWS) Registry of Open Data as part of AWS’s Open Data Sponsorship Program (ODP). These datasets collectively span 4 million of PMC’s 7 million (total) full-text scientific articles.

screenshot of the registry of open data of AWS (Amazon Web Services)
Figure 1. NCBI PMC Article Datasets on Registry of Open Data on AWS.

Continue reading “PubMed Central Article Datasets are Now Available on the Cloud”

NCBI events at the Bioinformatics Open Science Conference 2021 (BOSC 2021)

NCBI events at the Bioinformatics Open Science Conference 2021 (BOSC 2021)

Come visit us virtually to learn about new NCBI data access, tools and best practices at the Bioinformatics Open Science Conference  part of the ISMB/ECCB online conference from July 29 – 30, 2021. We will be presenting virtual posters on NCBI resources, offering a Birds of a Feather discussion, and participating in the BOSC  (CoFest) following the conference where you can take part in a hands-on evaluation of ElasticBLAST.

NCBI Posters, July 29, 2021, 11:20 – 12:20 PM EDT

All posters will be presented on Thursday afternoon. You can see complete abstracts on the ISMB/ECCB BOSC schedule.

Nuala O’Leary will talk about NCBI Datasets, a new resource for fast, easy access to NCBI sequence data.  You will learn about the new interface and new tools to access reference genomes, genes, and orthologs using web-based and programmatic tools.

Adelaide Rhodes will present Open access NCBI cloud resources to accelerate scientific insights where you can learn about recent developments in transferring > 20 petabytes of NCBI Sequence Read Archive (SRA) data to the cloud.

Deacon Sweeney will describe the web RAPT service for assembling and annotating bacterial genomes at the click of a button in RAPT, The Read assembly and Annotation Pipeline Tool: building a prokaryotic genome annotation package for users of all backgrounds.

Roberto Vera Alvarez will talk about best practices for using cloud tools for transcriptomics in his poster Transcriptome annotation in the cloud: complexity, best practices, and cost.

Greg Boratyn will discuss improvements to the BLAST-based short read aligner, Magic-Blast, in Recent improvements in Magic-BLAST 1.6.

Visit Christiam Camacho’s poster ElasticBLAST: Using the power of the cloud to speed up science to get an introduction to  ElasticBLAST, a Kubernetes-based approach for high throughput BLAST tasks. Join us following the conference in the CoFest to try out ElasticBLAST yourself and provide input. See the section on the CoFest below and our companion post.

Birds of a Feather, July 29, 2021, 11:20 – 12:20 PM EDT

We will host a Birds of Feather public feedback session on Thursday, where you can provide feedback and participate in discussions on all aspects of NCBI’s new data access options: NCBI Datasets, SRA, BLAST, and the Genome Data Viewer (GDV) — our genome browser for sequence visualization. We welcome your input!  Come and see us!

CollaborationFest (CoFest), July 31 – August 1, 2021

The ElasticBlast team will attend the BOSC CoFest following the conference. Sign up to participate on July 31 and August 1 to get an in-depth orientation and opportunity to test the capabilities of ElasticBlast on the Amazon Web Services (AWS) cloud. You do not have to register for the conference to attend the CoFest. See our post on the CoFest for more information.

 

Try out ElasticBLAST at the BOSC2021 CoFest!

Try out ElasticBLAST at the BOSC2021 CoFest!

Join the BLAST team at the virtual CollaborationFest (July 31 -August 1, 2021) after the BOSC 2021 conference to help test and improve ElasticBLAST, a new cloud-based tool designed to speed up high throughput BLAST searches. We would love to have your help with real world testing of our alpha release of ElasticBLAST with you own workflows and data. You may sign up for the CoFest even if you aren’t registered for BOSC 2021.

Here are suggestions for how you can participate. See the FAQs below for additional information.

  1. Try it out and let us know how well it works. You can be blunt.
  2. Help us improve the documentation.
  3. Write a script to make ElasticBLAST part of your workflow.
  4. Try to process ElasticBLAST results with cloud-native tools. Here is an example.
  5. Bring your own high throughput BLAST search problem to use with ElasticBLAST!  Please discuss it with us first to make sure you don’t blow our budget and get the ElasticBLAST team in trouble!

Continue reading “Try out ElasticBLAST at the BOSC2021 CoFest!”

NCBI to present on SRA and cloud computing at the 2021 Galaxy Community Conference

NCBI to present on SRA and cloud computing at the 2021 Galaxy Community Conference

 

We’re bringing exciting developments to our user community at the 2021 Galaxy Community Conference (GCC 2021), which is virtual this year!

Dr. Jon Trow, SRA Subject Matter Expert
Dr. Adelaide Rhodes, Cloud Subject Matter Expert

 

 

 

 

 

 

 

 

 

 

We start with hosting NCBI’s first ever GCC training week tutorial co-written by Jon Trow, Ph.D. – Sequence Read Archive (SRA): Subject Matter Expert and Adelaide Rhodes, Ph.D. – Cloud: Subject Matter Expert. This tutorial will become a permanent addition to the Galaxy Training Network. The tutorial, “SRA Aligned Read Format (SARF) to Speed Up SARS-CoV-2 Data Analysis”, has detailed instructions and a video demonstration on how to search SRA metadata for SARFs and download them into Galaxy workflows. We will be available via Slack during Office Hours for live virtual interactions.

Continue reading “NCBI to present on SRA and cloud computing at the 2021 Galaxy Community Conference”

The wait is over… NIH’s Public Sequence Read Archive is now open access on the cloud

The NIH NCBI Sequence Read Archive (SRA) on AWS, containing all public SRA data, is now live! This data is hosted on Amazon Web Services (AWS) under the Open Data Sponsorship Program (ODP) with support from NIH’s Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative.

Continue reading “The wait is over… NIH’s Public Sequence Read Archive is now open access on the cloud”

NIH’s Sequence Read Archive to be made available on AWS’s Open Data Sponsorship Program

NIH’s Sequence Read Archive to be made available on AWS’s Open Data Sponsorship Program

National Library of Medicine’s (NLM) National Center for Biotechnology Information (NCBI) and Amazon Web Services (AWS) are happy to announce that the controlled- and public-access Sequence Read Archive (SRA)–one of the world’s largest repositories of raw next generation sequencing data–will be freely accessible from Amazon S3 via the Open Data Sponsorship Program (ODP) as of January 2021. The SRA is currently hosted by NLM at the National Institutes of Health (NIH).

Continue reading “NIH’s Sequence Read Archive to be made available on AWS’s Open Data Sponsorship Program”

December 9 Webinar: Using BLAST+ in Docker and on the cloud

December 9 Webinar: Using BLAST+ in Docker and on the cloud

Join us on December 9, 2020 to learn about containerized BLAST+ in Docker that is ready to use locally and in the cloud. We are staging BLAST databases in some cloud providers making running containerized BLAST as part of a pipeline in the cloud even easier. In this webinar you will learn about the advantages of containerized BLAST and learn how to use it in some practical examples. You will also learn about Elastic BLAST, a cloud application that is useful for aligning extremely large numbers of sequences against BLAST databases.

  • Date and time: Wed, December 9, 2020 12:00 PM – 12:45 PM EST
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

December 2 Webinar: Using the new Read assembly and Annotation Pipeline Tool (RAPT) to assemble and annotate microbial genomes

December 2 Webinar: Using the new Read assembly and Annotation Pipeline Tool (RAPT) to assemble and annotate microbial genomes

Join us December 2 to learn how to use the Read assembly and Annotation Pipeline Tool (RAPT). With RAPT, you can assemble and annotate a microbial genome right out of the sequencing machine! Provide the short genomic reads or an SRA run on input, and get back the sequence annotated with a complete gene set. The assembly is built with SKESA and annotated with PGAP. In addition, RAPT also verifies the taxonomic assignment of the genome with the Average Nucleotide Identity tool. In this webinar, you will learn how you can run RAPT on your own machine or on the Google Cloud Platform.

  • Date and time: Wed, December 2, 2020 12:00 PM – 12:45 PM EST
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.