April 8 Webinar: Accelerate genomics discovery with SRA in the cloud

On Wednesday, April 8, 2019 at 12 PM, NCBI staff will show you how to leverage the cloud to speed up your research and discovery. You’ll be introduced to new and existing tools and data including BigQuery, SRA Toolkit, and more. You’ll hear about real workflows in the cloud featuring an example of the work NCBI was able to accomplish in the cloud using SRA data and a case study from an SRA cloud customer

By the end of this webinar, you will know where to look for new cloud products from NCBI, access help information to get you started, and will see how to run your analyses efficiently in the cloud.

  • Date and time: Wed, Apr 8, 2020 12:00 PM – 12:45 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

The entire corpus of the Sequence Read Archive (SRA) now live on two cloud platforms!

The National Library of Medicine (NLM) is pleased to announce that all controlled-access and publicly available data in SRA is now available through Google Cloud Platform (GCP) and Amazon Web Services (AWS). To access the data please visit our SRA in the Cloud webpage where you will find links to our new SRA Toolkit and other access methods.

The SRA data available in the two clouds currently totals more than 14 petabytes and consists of all data in the SRA format as well as some data in its original submission format.  Since May 2019, NCBI has been putting all submitted SRA data on the GCP and AWS clouds in both the submitted format and our converted SRA format. We have also been moving previously submitted original format data to the clouds and expect to complete that process in 2021. Continue reading

View BAM alignments in the NCBI genome browsers and sequence viewers sorted by haplotype tag

NCBI’s genome browsers and graphical sequence viewers now allow you to view BAM alignments sorted by haplotype tag. This option is useful for analyzing variants within a sequenced sample and can help you detect or validate structural variants.GDV_bamsFigure 1. Remote BAM alignment data sorted by haplotype tag in the Genome Data Viewer. The remote BAM file was added through the “User Data and Track Hubs” feature in GDV.  You can load the remote BAM for this example through https://go.usa.gov/xpM9c. The sorted display shows that haplotype 1 contains a significant deletion in this region relative to haplotype 2 and the reference genome assembly. Aligned reads not assigned a haplotype tag in the BAM file are grouped under the heading “haplotype not set” (not shown). 

Continue reading

Request for proposals: Single Cell in the Cloud codeathon at NYGC in January

The New York Genome Center is hosting an NCBI  Single Cell in the cloud codeathon from January 15-17, 2020. Submissions for project proposals are due December 2nd.

Please submit your proposal and apply here.

What topics are in scope?

This codeathon will focus on single cell data, including RNA, DNA, and chromatin accessibility.  We are particularly interested in proposals for pipelines and analysis of SRA data, data interoperability, and using machine learning techniques in clustering.  We also welcome proposals for tutorial pipelines and educational tools. You will have access to computational resources in the Cloud to turn your idea into a working prototype.   Visit our website for examples of previous codeathon projects.

Continue reading

November 13 NCBI Minute: Resources for next-gen sequence analysis

On Wednesday, November 13, 2019 at 12 PM, NCBI staff will present a webinar on NCBI resources for next-gen sequence analysis.  You will learn about key  resources that support multiple aspects of next-gen sequence analyses, including quality control, alignment, data visualization and interpreting results. You will also see how to access and apply these resources for both SRA and your own RNASeq/DNASeq datasets. Whether you’re embarking on your first analysis or already have a background in bioinformatics, you’ll find tools that meet your needs!

  • Date and time: Wed, Nov 13, 2019 12:00 PM – 12:45 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

NCBI at ASHG 2019: Two Data CoLabs Demonstrate How to Analyze NextGen Sequence Data and Access Genetic Variation Population Data

NCBI will be attending the American Society of Human Genetics (ASHG) 2019 in Houston Texas on Oct 15-19.

This year, we will be presenting two CoLabs – interactive sessions where you can learn about new NCBI tools and resources. Read on below for a description of each CoLab and join us at ASHG next week!

Continue reading

Announcing the first ever RNA-Seq in the Cloud hackathon!

From March 11-13, 2019, the NCBI will help run a bioinformatics hackathon in the North Carolina Research Triangle hosted by the University of North Carolina, Chapel Hill (UNC).

Potential topics include:

  • technical metadata homogenization
  • a simple interface for using ontologies to make data searches more sensitive and specific
  • automated data analysis and visualization
  • novel isoform identification and comparison

We’re looking for people who have experience in working with subjects like these. If this describes you, please apply!

This event is for researchers, including students and postdocs, who use bioinformatics data or develop pipelines for large scale RNA-Seq analyses from high-throughput experiments. The event is open to anyone selected for the hackathon and willing to travel to UNC. Continue reading

Florida (USF) Biological Data Science “IronHack” February 25-27, 2019

From February 25-27, 2019, NCBI will help with a Data Science hackathon at USF in Tampa Florida!

The hackathon will focus on the genomics of Iron-linked Rare Diseases as well as large scale RNA-Seq indexing and analysis. This event is for researchers, including students and postdocs, who have already engaged in the use of large datasets or in the development of pipelines for analyses from high-throughput experiments. Some projects are available to other non-scientific developers, mathematicians, or librarians.

The event is open to anyone selected for the hackathon and willing to travel to Tampa.

Working groups of five to six individuals will be formed into five to eight teams. These teams will build or expand on pipelines and tools to analyze large datasets within a cloud infrastructure. Example subjects for such hackathons include:

  • Integrative pipelines to analyze large scale RNA-Seq experiments
  • Visualization tools for mapping phenotypes to genotypes
  • Rapid clinical diagnostics tools
  • Structural variant mining with single molecule sequencing data

Please see the application form for more details and additional projects.  The project list will continue to evolve and will be updated on the application form.

Continue reading