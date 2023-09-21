Variant Call Format (VCF) files provide a crucial way to record and share information about genetic variants across samples. NCBI joined forces with the National Institute of Allergy and Infectious Diseases (NIAID) to co-host the VCF Files for Population Genomics Codeathon (July 31 – August 4). The codeathon focused on innovative methods for harnessing VCF files to analyze large datasets using the COVID-19 Genome Sequence Dataset, sourced from the National Library of Medicine (NLM) and NCBI’s SARS-CoV-2 Variant Calling Pipeline. This virtual event was a booming success and brought together experts in viral evolution, molecular epidemiology, and population genomics.
We received outstanding participation and engagement!
- 62 participants from academia, government, and industries across the world
- 8 teams collaborated and worked on the projects listed below
- 5,000+ views of final presentations
- 100+ strong applicants
- 21 different countries represented
Together, NCBI and NIAID formulated event objectives that guided team projects, which explored how SARS-CoV-2 VCF files could enhance downstream applications by predicting emerging variants, assessing therapeutic options, and improving data clustering and modeling.
Team Projects:
|Project
|Description
|Transforming VCFs for Data Science
|Simplified VCF data analysis through innovative mapping and parsing.
|Wastewater vs. Clinical Isolates Variants
|Compared SARS-CoV-2 variants in clinical isolates and wastewater samples, providing a broader view of viral dynamics.
|Predicting SARS-CoV-2 Evolution
|Analyzed intra-host mutations to identify minor alleles absent in consensus FASTA files, informing future variant forecasts.
|VCF Spike Protein Variants and ACE2 Population Frequencies
|Visualized relationships between Spike protein and ACE2 receptor variants, offering insights into viral entry and susceptibility.
|Minor Variant Miners
|Used language models to uncover minor variant origins, shedding light on sequencing errors and mutation patterns.
|Preparing for a New VCF Standard
|Defined a minimum set of bioinformatic tools for new VCF replacements to support the transition to alternative data structures for large-scale population genetic data.
|VarAi
|Used AI to create a specialized language model for genetic insights through natural language queries.
|Linkage Landscape Initiative
|Created a linkage landscape dataset and visualization for SARS-CoV-2 to provide insights into correlated mutations and their significance.
By championing collaboration, innovation, and bioinformatics expertise, the NCBI-NIAID VCF Files for Population Genomics Codeathon demonstrated the power of genomics research in addressing real-world challenges and driving scientific progress.
