Virus hunting in the cloud codeathon, v2

We are pleased to announce the second installment of the Virus Hunting Codeathon that will take place from November 4-6, 2019 at the University of Maryland in College Park.

The NCBI will help run this bioinformatics codeathon, hosted by the UMIACS and CBCB at the University of Maryland. The purpose of this event is to continue develop techniques, code, and pipelines to identify known, taxonomically definable, and novel viruses from metagenomic datasets on cloud infrastructure.

This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for virological analyses from high-throughput experiments. We especially encourage people who have experience in Computational Virus Hunting or related fields to participate.  The event is open to anyone selected for the codeathon and willing to travel to College Park (see below).


  • Fast, federated indexing
    • Big Query
  • Metadata features
  • Genome graphs for viruses
  • Approximate taxonomic analysis
  • Domain/HMM Boundary and Taxonomic Refinement
  • Bringing together approximate taxonomy and domain models
  • Sequence data quality metrics
  • Phage-host interactions

We will provide the final list of projects before the codeathon starts.


The event runs from 9 am – 6 pm each day, with the potential to extend into the evening hours each day.  After a brief organizational session, teams will spend three days addressing a challenging set of scientific problems related to a group of datasets. Working groups of five to six individuals, with various backgrounds and expertise, will be formed into five to eight teams with an experienced leader. These teams will build pipelines and tools to analyze large datasets hosted on the cloud. We will also come together to discuss progress on each of the topics, bioinformatics best practices, coding styles, etc.


Datasets will come from public repositories, with a focus on metagenomics datasets in the cloud-hosted sequence read archive data and contigs derived from these data.


We will make all pipelines, other scripts, software, and programs generated in this codeathon available on a dedicated public GitHub repository.

Each team may submit manuscripts describing the design and use of the software tools they created  to an appropriate journal such as the F1000Research hackathons channel, BMC Bioinformatics, GigaScience, Genome Research, or PLoS Computational Biology.  A major goal of the codeathon is to publish virological index from these cloud-hosted datasets.

How To Apply

To apply, please complete the form linked to the separate page for this event. Applications are due Monday, October 7th, 2019 by 3 pm ET. We will select participants based on their experience and their motivation to attend. We encourage prior NCBI codeathon participants and applicants to apply.

We will notify the first round of accepted applicants on October 8th by 11:59 pm ET.  Participants have until October 11th at 4 pm ET to confirm.  International applicants or those with particular skillsets may be accepted early. If you confirm, please make sure that you can attend, as confirming and not attending prevents other data scientists from attending this event. Please provide a monitored email address in case we have follow-up questions.

Note: Participants must bring their own laptop to this program. A working knowledge of scripting (e.g., Shell, Python, R) is useful but not necessary to be successful. Knowledge of higher level scripting or programming languages may also be useful. Applicants must be willing to commit to all three days of the event.

We offer no financial support for travel, lodging or meals for this event. Also, note that the codeathon may extend into the evening hours each day.  Depending on the number of people that need accommodations, we will attempt to get a group rate at one of the local hotels. Please indicate on the registration form if you need a hotel room.

There will be no registration fee or cost associated with attending this event.


Entrants retain ownership of all intellectual property rights (including moral rights) in the code submitted to as well as developed in the codeathon. Employees of the U.S. Government attending as part of their official duties retain no copyright on their work and their work is in the public domain in the U.S.

The Government disclaims any rights in the code submitted or developed in the codeathon.

Participants agree to publish the code and any related data on GitHub.

For more information or questions, please contact Ben Busby (

Leave a Reply