Successful NCBI-NIAID Codeathon Explored VCF Files in Population Genomics

Successful NCBI-NIAID Codeathon Explored VCF Files in Population Genomics

Variant Call Format (VCF) files provide a crucial way to record and share information about genetic variants across samples. NCBI joined forces with the National Institute of Allergy and Infectious Diseases (NIAID) to co-host the VCF Files for Population Genomics Codeathon (July 31 – August 4). The codeathon focused on innovative methods for harnessing VCF files to analyze large datasets using the COVID-19 Genome Sequence Dataset, sourced from the National Library of Medicine (NLM) and NCBI’s SARS-CoV-2 Variant Calling Pipeline. This virtual event was a booming success and brought together experts in viral evolution, molecular epidemiology, and population genomics.  

We received outstanding participation and engagement!

  • 62 participants from academia, government, and industries across the world 
  • 8 teams collaborated and worked on the projects listed below 
  • 5,000+ views of final presentations 
  • 100+ strong applicants 
  • 21 different countries represented

Together, NCBI and NIAID formulated event objectives that guided team projects, which explored how SARS-CoV-2 VCF files could enhance downstream applications by predicting emerging variants, assessing therapeutic options, and improving data clustering and modeling.  

Team Projects: 

Project Description
Transforming VCFs for Data Science  Simplified VCF data analysis through innovative mapping and parsing.  
Wastewater vs. Clinical Isolates Variants  Compared SARS-CoV-2 variants in clinical isolates and wastewater samples, providing a broader view of viral dynamics.  
Predicting SARS-CoV-2 Evolution  Analyzed intra-host mutations to identify minor alleles absent in consensus FASTA files, informing future variant forecasts.  
VCF Spike Protein Variants and ACE2 Population Frequencies  Visualized relationships between Spike protein and ACE2 receptor variants, offering insights into viral entry and susceptibility.  
Minor Variant Miners  Used language models to uncover minor variant origins, shedding light on sequencing errors and mutation patterns.  
Preparing for a New VCF Standard  Defined a minimum set of bioinformatic tools for new VCF replacements to support the transition to alternative data structures for large-scale population genetic data.  
VarAi  Used AI to create a specialized language model for genetic insights through natural language queries.  
Linkage Landscape Initiative  Created a linkage landscape dataset and visualization for SARS-CoV-2 to provide insights into correlated mutations and their significance.  

By championing collaboration, innovation, and bioinformatics expertise, the NCBI-NIAID VCF Files for Population Genomics Codeathon demonstrated the power of genomics research in addressing real-world challenges and driving scientific progress.  

Learn more  

Get more details on NCBI’s Codeathons GitHub page and check out the video recordings of the teams’ final presentations.  

Stay up to date 

Follow us on social @NCBI and join our mailing list to keep up to date with NCBI news and events.   

Though this event has concluded, we encourage you to keep an eye out for upcoming codeathons and workshops 


For inquiries about this codeathon or participation in future events, please feel free to contact us at codeathons@ncbi.nlm.nih 

To contact or connect with NIAID, please email 

Leave a Reply