Variant Call Format (VCF) files provide a crucial way to record and share information about genetic variants across samples. NCBI joined forces with the National Institute of Allergy and Infectious Diseases (NIAID) to co-host the VCF Files for Population Genomics Codeathon (July 31 – August 4). The codeathon focused on innovative methods for harnessing VCF files to analyze large datasets using the COVID-19 Genome Sequence Dataset, sourced from the National Library of Medicine (NLM) and NCBI’s SARS-CoV-2 Variant Calling Pipeline. This virtual event was a booming success and brought together experts in viral evolution, molecular epidemiology, and population genomics.
We received outstanding participation and engagement!
- 62 participants from academia, government, and industries across the world
- 8 teams collaborated and worked on the projects listed below
- 5,000+ views of final presentations
- 100+ strong applicants
- 21 different countries represented
Together, NCBI and NIAID formulated event objectives that guided team projects, which explored how SARS-CoV-2 VCF files could enhance downstream applications by predicting emerging variants, assessing therapeutic options, and improving data clustering and modeling.
|Transforming VCFs for Data Science||Simplified VCF data analysis through innovative mapping and parsing.|
|Wastewater vs. Clinical Isolates Variants||Compared SARS-CoV-2 variants in clinical isolates and wastewater samples, providing a broader view of viral dynamics.|
|Predicting SARS-CoV-2 Evolution||Analyzed intra-host mutations to identify minor alleles absent in consensus FASTA files, informing future variant forecasts.|
|VCF Spike Protein Variants and ACE2 Population Frequencies||Visualized relationships between Spike protein and ACE2 receptor variants, offering insights into viral entry and susceptibility.|
|Minor Variant Miners||Used language models to uncover minor variant origins, shedding light on sequencing errors and mutation patterns.|
|Preparing for a New VCF Standard||Defined a minimum set of bioinformatic tools for new VCF replacements to support the transition to alternative data structures for large-scale population genetic data.|
|VarAi||Used AI to create a specialized language model for genetic insights through natural language queries.|
|Linkage Landscape Initiative||Created a linkage landscape dataset and visualization for SARS-CoV-2 to provide insights into correlated mutations and their significance.|
By championing collaboration, innovation, and bioinformatics expertise, the NCBI-NIAID VCF Files for Population Genomics Codeathon demonstrated the power of genomics research in addressing real-world challenges and driving scientific progress.
Stay up to date
Though this event has concluded, we encourage you to keep an eye out for upcoming codeathons and workshops.
For inquiries about this codeathon or participation in future events, please feel free to contact us at firstname.lastname@example.org.
To contact or connect with NIAID, please email NIAIDOGATRA@niaid.nih.gov.