Variant Call Format (VCF) files provide a crucial way to record and share information about genetic variants across samples. NCBI joined forces with the National Institute of Allergy and Infectious Diseases (NIAID) to co-host the VCF Files for Population Genomics Codeathon (July 31 – August 4). The codeathon focused on innovative methods for harnessing VCF files to analyze large datasets using the COVID-19 Genome Sequence Dataset, sourced from the National Library of Medicine (NLM) and NCBI’s SARS-CoV-2 Variant Calling Pipeline. This virtual event was a booming success and brought together experts in viral evolution, molecular epidemiology, and population genomics.
We received outstanding participation and engagement!
- 62 participants from academia, government, and industries across the world
- 8 teams collaborated and worked on the projects listed below
- 5,000+ views of final presentations
- 100+ strong applicants
- 21 different countries represented
Continue reading “Successful NCBI-NIAID Codeathon Explored VCF Files in Population Genomics”
Millions of SARS-CoV-2 samples from around the world have been made publicly available as assembled and unassembled sequence data in GenBank and the Sequence Read Archive (SRA). Now you can find sequences with a particular mutation by searching with the protein and the amino acid change (e.g. S:F486V. Visit our SARS-CoV-2 Variant Overview on NCBI Virus and click on the ‘Mutation’ tab to get started (Figure 1).
Figure 1: SARS-CoV-2 Variants Overview. Arrows indicate important features on the page, including the “Lineages” and “Mutations” tabs to switch between views, the search box, and the information box describing the mutation format. The results are also indicated, including a summary of the total records found that contain the searched term as well as the results table. Continue reading “NCBI Virus: Mutation-Based Search for SARS-CoV-2 Data”
End of the COVID-19 Public Health Emergency
During the COVID-19 pandemic, we provided the NCBI SARS-CoV-2 Resources Page as a central location to help you quickly and easily find our SARS-CoV-2 related content and tools. Since the federal public health emergency is now over, this page will be redirected to the SARS-CoV-2 Data Hub in NCBI Virus effective August 1, 2023.
Don’t worry! All this information will remain available. Check out the NLM Knowledge Base to access a list of NCBI SARS-CoV-2 data and tools.
Stay up to date
Follow us on Twitter @NCBI and join our mailing list to keep up to date with NCBI Virus and other NCBI news.
If you have questions or would like to provide feedback, please reach out to us at email@example.com.
To make it easier for you to find and access Sequence Read Archive (SRA) data, we are re-organizing and improving our cloud storage systems.
Beginning April 2023, we will move the SARS-CoV-2 normalized data and source files from the COVID-19 data buckets on Amazon Web Services (AWS) and Google Cloud Platform (GCP) to the NIH NCBI SRA on AWS registry. We will also remove the SARS-CoV-2 original format data from AWS and GCP COVID-19 buckets and make them available in AWS cold storage. If you need these data, you can request them using the Cloud Data Delivery Service (CDDS).
Where and how will I be able to access SARS-CoV-2 normalized data after this change occurs?
To ensure a smooth transition, we want you to have enough time to adjust your scripts and pipelines to minimize disruption to your analyses. Continue reading “Streamlining Access to SRA COVID-19 Datasets on the Cloud”
SARS-CoV-2 genomic data is critical for monitoring the viral spread and evolution of the COVID-19 pandemic, identifying newly emerging variants, and developing and evaluating the countermeasures. As of September 2022, over 13 million SARS-CoV-2 genomes have been sequenced across the world, making it the most sequenced pathogen ever. A cornerstone of genomic analysis is building a phylogeny, which demonstrates the relatedness of individual isolates to the rest of the sequenced genomes. However, the volume of SARS-CoV-2 genomes presents novel opportunities beyond phylogenies, as well as computational challenges to traditional methods of genomic analyses and visualization. Continue reading “NCBI-NIAID Beyond Phylogenies Codeathon was a success!”
Interested in understanding how sequence data are submitted, processed, and made publicly available in GenBank and the Sequence Read Archive (SRA)? Announcing the GenBank and SRA Data Processing webpage!
Here you can learn about procedures that the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine (NLM), uses for processing submitted data and public posting, as well as key definitions of data status. Continue reading “Announcing the GenBank and SRA Data Processing Webpage”
Still waiting for an analysis pipeline that can uniformly process raw sequence data produced by a variety of sequencing platforms? Your wait is over! Announcing the SARS-CoV-2 Variant Calling Pipeline, which is now operational and optimized to provide support for multiple sequencing platforms including, Illumina, Oxford Nanopore, and PacBio.
This new pipeline can make allele frequency calls equal to or above 15%. See our publication preprint and our GitHub repository for more details. This optimized pipeline is a result of the efforts of the COVID-19 research community, led by the NIH Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) initiative, a public-private partnership for a coordinated research strategy to support and speed up the development of COVID-19 treatments and vaccines. Continue reading “Announcing the NCBI SARS-CoV-2 Variant Calling Pipeline and Related Data Products”
NCBI Microbial Pathogen and SARS-CoV-2 Resources in the Cloud
Get hands-on experience with NCBI Pathogen Detection and SARS-CoV-2 Surveillance data in the cloud. No prior cloud experience necessary!
NCBI staff are presenting a workshop at the American Society for Microbiology Next-Generation Sequencing (ASM NGS) 2022 Meeting on Sunday, October 16, 2022 from 10 am – 3 pm ET (with a 1 hour break) to help conference attendees learn about two NCBI cloud-hosted resources, Pathogen Detection and SARS-CoV-2 Genome Sequence datasets. Continue reading “NCBI Workshop at the ASM NGS 2022 Meeting”
Need SARS-CoV-2 assembled genome sequences or specific SARS-CoV-2 protein sequences? You can find them on the new SARS-CoV-2 taxonomy page brought to you by NCBI Datasets.
The NCBI Datasets SARS-CoV-2 taxonomy page brings you both SARS-CoV-2 genomes and proteins, basic information about SARS-CoV-2, and connections to related NCBI pages, all in one place (see Figures 1 and 2).
Figure 1. NCBI Datasets SARS-CoV-2 taxonomy page. For command-line access, try the datasets command-line tool (top box). For customized filtering options, check out NCBI Virus (bottom box).
If you scroll down the taxonomy page you will find a table of SARS-CoV-2 proteins, each with “Actions” that provide the option to download a package of protein sequences from all annotated SARS-CoV-2 genomes (Figure 2), as well as links to NCBI Gene and the protein sequence from the reference genome.
Figure 2. NCBI Datasets SARS-CoV-2 taxonomy page (cont’d). Click the blue download button to download a package of all SARS-CoV-2 genomes (6 M and counting as of 7/15/22), or just the SARS-CoV-2 reference genome (top box). Below that you see a table of SARS-CoV-2 proteins, each with “Actions” available through the three-dot menu that provides the option to download a package of protein sequences from all annotated SARS-CoV-2 genomes (bottom boxes).
We want to hear from you! Check out the new SARS-CoV-2 taxonomy page and let us know what you think. Contact us with questions or feedback.
Join our mailing list to keep up to date with Datasets and other NCBI news.
Join us on June 15 , 2022 at 12PM US eastern time learn about the NCBI Virus resource – a community portal for viral sequence data that has been important in supporting SARS-CoV-2 research and management of the COVID-19 pandemic. Enhancements to NCBI Virus that support these efforts include: SARS-CoV-2 specific filters, a dedicated web interface that reports on geotemporal prevalence of sequence records for SARS2 lineages, plus details on NCBI’s lineage-defining mutations.
- Date and time: Wed, June 15, 2022 12:00 PM – 12:45 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI webinars playlist on the NLM YouTube channel. You can learn about future webinars on the NCBI Outreach Events page.