The NCBI will assist with a data science hackathon to take place on the NIH Campus in Bethesda, Maryland, from April 16-18, 2018.
The hackathon will focus on tools for advanced analysis of biomedical datasets including text, images, next generation sequencing data, proteomics, and metadata. Many individuals who attend these events have already engaged in the use of large datasets or in the development of informatics tools, code, or pipelines; however, researchers who are in the earlier stages of their data science journey, including students and postdocs are also encouraged to apply. Some projects are available to other non-scientific developers, mathematicians, or librarians.
The event is open to anyone selected for the hackathon and willing to travel to Bethesda, Maryland.
Working groups of five to six individuals, with various backgrounds and expertise, will be formed into five to eight teams with an experienced leader. These teams will build pipelines and tools to analyze large datasets within a cloud infrastructure. The hackathon runs from 9 am – 6 pm each day, with an optional social event on the evening of the second day.
Potential subjects for this iteration include:
* Implementing CWL-based genome annotation pipelines
* Prototyping federated cloud-search for biomedical data
* Machine-learning based metadata harmonization
* Visualization of Single Cell RNA-Seq Data
* Sentiment analysis from a variety of text corpora
* Metadata standardization for EMR analysis
* Building an educational experience for RNA-Seq and epigenomics analysis
* Expanding a versatile antimicrobial resistance pipeline
* Searching for novel virus families
Please see the application for more details and additional projects. Applications are due Monday March 22nd, 2018 by 3 pm ET.
After a brief organizational session, teams will spend three days addressing a challenging set of scientific problems related to a group of datasets. Participants will analyze and combine datasets to work on these problems.
Datasets will come from public repositories or will be supplied by the project lead. During the hackathon, participants will have an opportunity to include other datasets and tools for analysis. Please note, if you use your own data during the hackathon, we ask that you submit it to a public database within six months of the end of the event.
All pipelines and other scripts, software and programs generated in this hackathon will be added to a public GitHub repository designed for that purpose (github.com/NCBI-Hackathons). Manuscripts describing the design and usage of the software tools constructed by each team may be submitted to an appropriate journal such as the [F1000Research hackathons channel|http://f1000research.com/channels/hackathons].
To apply, complete this form (approximately 10 minutes to complete). Applications are due Monday March 22nd, 2018 by 3 pm ET. Participants will be selected based on the experience and motivation they provide on the form. Prior participants and applicants are especially encouraged to apply. The first round of accepted applicants will be notified on March 23 by 3 pm ET, and have until March 26th at 5 pm ET to confirm their participation. If you confirm, please make sure it is highly likely you can attend, as confirming and not attending prevents other data scientists from attending this event. Please include a monitored email address, in case there are follow-up questions.
Note: Participants will need to bring their own laptop to this program. A working knowledge of scripting (e.g., Shell, Python, R) is necessary to be successful in this event. Employment of higher level scripting or programming languages may also be useful. Applicants must be willing to commit to all three days of the event. No financial support for travel, lodging or meals is available for this event. Please make any necessary arrangements to accommodate this possibility.
Please contact Ben Busby with any questions.
Venue: NIH Campus