NCBI usually participates in hackathons as direct organizers and planners. However, NCBI staff recently functioned as facilitators in two hackathons organized by outside groups: one at the Bio-IT World conference, and one at the Silicon Valley Artificial Intelligence (SVAI) incubator.
The Bio-IT FAIR Data Hackathon
The Bio-IT World Hackathon, held in Boston in May, was the group’s first hackathon. Along with ontology experts from Dutch Techcentre for Life Sciences and Ontoforce, three teams of developers aligned biological datasets with FAIR Data Principles to make them more findable, accessible, interoperable, and reusable.
The jury selected the FAIR-ClinVar project as the best example of the data stewardship principles. The FAIR-ClinVar team converted part of the ClinVar VCF file to Resource Description Framework (RDF) – a standard technology in knowledge representation – and annotated certain fields with explicit RDF predicates from shared vocabularies such as Sequence Ontology. They also added RDF for the file metadata.
This prototype work demonstrates a way to make ClinVar data easier to find and access for groups that may not yet be familiar with VCF or ClinVar’s data model for genomic variants of clinical interest. It also shows how to make ClinVar data more interoperable and reusable with semantic query engines.
Silicon Valley Artificial Intelligence AI Genomics Hackathon
SVAI and collaborators organized the SVAI hackathon, which was held in June 23 at the SVAI site in San Francisco. One special challenge for the participants was to try glean new insights about Neurofibromatosis, type 2 (NF2) from genome data from an individual with NF2.
Computational biologists and developers with expertise in artificial intelligence (AI) and machine learning participated in the hackathon, including 170 in-person registrants, 100 remote registrants, and 37 registered teams. Twenty-one of the teams opted to present their projects.
Projects focused on four goals:
- Propose drug treatment pathways using existing medications.
- Rank dataset mutations.
- Analyze NF2 research using Neuro-Linguistic Programming (NLP) and present new treatment options.
- Identify a potential new drug intervention using DeepChem, a machine-learning approach to drug discovery.
Specific project topics ranged from analysis of potential phosphorylation sites, enhancers, gene expression, and variant calling to pharmacogenomics. Projects used literature and molecular data from a number of sources: gene expression data from NCBI’s GEO and TCGA, known SNPs and structural variants from ClinVar and similar databases, raw genomic data from induced pluripotent stem cells (iPSCs) and other control cell types.
One of the most impressive things about the event was how much the AI developers and computational biologists learned about each other’s disciplines in a very short time.