NCBI’s First Hackathon: Advanced Bioinformatic Analysis of Next-Gen Sequencing Data

This blog post is geared toward genomics professionals.

From January 5th-7th, 2015, NCBI, in conjunction with the NIH Office of Data Science, held a genomics hackathon, where genomics professionals gathered to write useful, efficient pipelines for people new to genomics.

After we announced the hackathon, over 130 qualified applicants expressed interest in attending. Four team leads chose 23 attendees from this pool, then assigned initial predefined roles and provided biological guidance for a product in one of four subject areas: DNA-Seq, RNA-Seq, Epigenomics and Metagenomics.

These projects were loosely predefined, with the expectation that they would change. The table below, from left to right, shows the initial predefinition, the output pipeline, and some unexpected code that came out of some of the projects.

Subject area Project definition Output pipeline Code
DNA-Seq Tumor/normal and primary/metastatic comparisons in exonic datasets with obvious variant elimination and iteration Pipeline to Output Tumor – Unique Variants
RNA-Seq Variant calling pipeline to differentiate germline and somatic mutations from editing RNA-Seq Variant Caller Multi-program Installer and HISAT Pipeline
Epigenomics Epigenomic peak caller (DNA and histone) that compares with RNA-Seq data Profile-based Caller for Epigenomic Analysis Automated R updater (from 2.17 to 3.xx) for Amazon Web Services
Metagenomics Identification of all viruses in metagenomic samples and mammalian samples viral Identification Pipeline Potential 10X – 60X BLAST Speedup for Metagenomic Analysis

Organizationally, there were three surprises:

First, data acquisition was a major hurdle for participants, even though they were genomics professionals. For future events, we intend to pre-distribute resources to avoid this.

Second, the participants’ enthusiasm was boundless. Both socially and scientifically, our hackathon participants were “all in”, and their zeal was evident, even after the hackathon ended.

Third, we thought that the roles and projects may be too structured, but to our surprise, participants wanted even more structure. Upon further discussion, we found out this was because the major motivation for the participants in the hackathon was to get a useful software product as close to completion as possible.

An in-depth discussion of the hackathon is available in the preprint version of the manuscript produced during the event. In light of the results produced and the lessons we learned, we intend to host more hackathons. Please keep an eye on our news page, NCBI News, and our social media accounts (Twitter, Facebook and LinkedIn, in particular) page for future hackathon announcements!

Update (5/28/15): As promised, we just announced our next hackathon on NCBI News. Navigate over to read more about our plans for the August hackathon and to learn how to apply!

One thought on “NCBI’s First Hackathon: Advanced Bioinformatic Analysis of Next-Gen Sequencing Data

  1. The first point shouldn’t be surprising. Data acquisition from NCBI is typically a huge hurdle. Maybe for the next hackathon, instead of pre-distributing resources perhaps it would be better to focus on making it easier for users to get the data from the NCBI source.

