Tag: SARS-CoV-2

Streamlining Access to SRA COVID-19 Datasets on the Cloud

Streamlining Access to SRA COVID-19 Datasets on the Cloud

To make it easier for you to find and access Sequence Read Archive (SRA) data, we are re-organizing and improving our cloud storage systems.  

Beginning April 2023, we will move the SARS-CoV-2 normalized data and source files from the COVID-19 data buckets on Amazon Web Services (AWS) and Google Cloud Platform (GCP) to the NIH NCBI SRA on AWS registry. We will also remove the SARS-CoV-2 original format data from AWS and GCP COVID-19 buckets and make them available in AWS cold storage. If you need these data, you can request them using the Cloud Data Delivery Service (CDDS). 

Where and how will I be able to access SARS-CoV-2 normalized data after this change occurs?

To ensure a smooth transition, we want you to have enough time to adjust your scripts and pipelines to minimize disruption to your analyses.   Continue reading “Streamlining Access to SRA COVID-19 Datasets on the Cloud”

NCBI-NIAID Beyond Phylogenies Codeathon was a success!

NCBI-NIAID Beyond Phylogenies Codeathon was a success!

SARS-CoV-2 genomic data is critical for monitoring the viral spread and evolution of the COVID-19 pandemic, identifying newly emerging variants, and developing and evaluating the countermeasures. As of September 2022, over 13 million SARS-CoV-2 genomes have been sequenced across the world, making it the most sequenced pathogen ever. A cornerstone of genomic analysis is building a phylogeny, which demonstrates the relatedness of individual isolates to the rest of the sequenced genomes. However, the volume of SARS-CoV-2 genomes presents novel opportunities beyond phylogenies, as well as computational challenges to traditional methods of genomic analyses and visualization. Continue reading “NCBI-NIAID Beyond Phylogenies Codeathon was a success!”

Announcing the GenBank and SRA Data Processing Webpage

Announcing the GenBank and SRA Data Processing Webpage

Interested in understanding how sequence data are submitted, processed, and made publicly available in GenBank and the Sequence Read Archive (SRA)? Announcing the GenBank and SRA Data Processing webpage!

Here you can learn about procedures that the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine (NLM), uses for processing submitted data and public posting, as well as key definitions of data status. Continue reading “Announcing the GenBank and SRA Data Processing Webpage”

Announcing the NCBI SARS-CoV-2 Variant Calling Pipeline and Related Data Products

Announcing the NCBI SARS-CoV-2 Variant Calling Pipeline and Related Data Products

Still waiting for an analysis pipeline that can uniformly process raw sequence data produced by a variety of sequencing platforms? Your wait is over! Announcing the SARS-CoV-2 Variant Calling Pipeline, which is now operational and optimized to provide support for multiple sequencing platforms including, Illumina, Oxford Nanopore, and PacBio.

This new pipeline can make allele frequency calls equal to or above 15%. See our publication preprint and our GitHub repository for more details. This optimized pipeline is a result of the efforts of the COVID-19 research community, led by the NIH Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) initiative, a public-private partnership for a coordinated research strategy to support and speed up the development of COVID-19 treatments and vaccines. Continue reading “Announcing the NCBI SARS-CoV-2 Variant Calling Pipeline and Related Data Products”

NCBI Workshop at the ASM NGS 2022 Meeting

NCBI Workshop at the ASM NGS 2022 Meeting

NCBI Microbial Pathogen and SARS-CoV-2 Resources in the Cloud

Get hands-on experience with NCBI Pathogen Detection and SARS-CoV-2 Surveillance data in the cloud. No prior cloud experience necessary!

NCBI staff are presenting a workshop at the American Society for Microbiology Next-Generation Sequencing (ASM NGS) 2022 Meeting on Sunday, October 16, 2022 from 10 am – 3 pm ET (with a 1 hour break) to help conference attendees learn about two NCBI cloud-hosted resources, Pathogen Detection and SARS-CoV-2 Genome Sequence datasets. Continue reading “NCBI Workshop at the ASM NGS 2022 Meeting”

Announcing the NCBI Datasets SARS-CoV-2 taxonomy page

Announcing the NCBI Datasets SARS-CoV-2 taxonomy page

Need SARS-CoV-2 assembled genome sequences or specific SARS-CoV-2 protein sequences? You can find them on the new SARS-CoV-2 taxonomy page brought to you by NCBI Datasets.

The NCBI Datasets SARS-CoV-2 taxonomy page brings you both SARS-CoV-2 genomes and proteins, basic information about SARS-CoV-2, and connections to related NCBI pages, all in one place (see Figures 1 and 2).

Figure 1. NCBI Datasets SARS-CoV-2 taxonomy page. For command-line access, try the datasets command-line tool (top box). For customized filtering options, check out NCBI Virus (bottom box).

If you scroll down the taxonomy page you will find a table of SARS-CoV-2 proteins, each with “Actions” that provide the option to download a package of protein sequences from all annotated SARS-CoV-2 genomes (Figure 2), as well as links to NCBI Gene and the protein sequence from the reference genome.

Figure 2. NCBI Datasets SARS-CoV-2 taxonomy page (cont’d). Click the blue download button to download a package of all SARS-CoV-2 genomes (6 M and counting as of 7/15/22), or just the SARS-CoV-2 reference genome (top box). Below that you see a table of SARS-CoV-2 proteins, each with “Actions” available through the three-dot menu that provides the option to download a package of protein sequences from all annotated SARS-CoV-2 genomes (bottom boxes).

We want to hear from you! Check out the new SARS-CoV-2 taxonomy page and let us know what you think. Contact us with questions or feedback.

Join our mailing list to keep up to date with Datasets and other NCBI news.

June 15 Webinar: What’s new with NCBI Virus?

June 15 Webinar: What’s new with NCBI Virus?

Join us on June 15 , 2022 at 12PM US eastern time learn about the NCBI Virus resource – a community portal for viral sequence data that has been important in supporting SARS-CoV-2 research and management of the COVID-19 pandemic. Enhancements to NCBI Virus that support these efforts include: SARS-CoV-2 specific filters, a dedicated web interface that reports on geotemporal prevalence of sequence records for SARS2 lineages, plus details on NCBI’s lineage-defining mutations.

  • Date and time: Wed, June 15, 2022 12:00 PM – 12:45 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI webinars playlist on the NLM YouTube channel. You can learn about future webinars on the NCBI Outreach Events page.

Introducing SARS-CoV-2 Variants Overview, NLM’s latest tool in the fight against COVID-19 

Introducing SARS-CoV-2 Variants Overview, NLM’s latest tool in the fight against COVID-19 

The National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) has released a new resource, called the SARS-CoV-2 Variants Overview, that aggregates data related to SARS-CoV-2 variants from sequences available in NCBI’s GenBank and Sequence Read Archive (SRA) databases.

SARS-CoV-2 Variants Overview, a freely available online dashboard, was developed with guidance from the TRACE Working Group as part of NLM’s participation in the National Institutes of Health (NIH) Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) initiative, a public-private partnership for a coordinated research strategy to support and speed up the development of COVID-19 treatments and vaccines.

One impetus for development of the dashboard is that unassembled SRA data cannot be processed through Pango tools, and many SARS-CoV-2 samples are only represented in SRA. The Pango nomenclature is being used by researchers and public health agencies worldwide to track the transmission and spread of SARS-CoV-2, including variants of concern. Thus, we developed a uniform approach to making variant calls from SRA records and assigning Pangolin lineages on the basis of these results. This means that submission groups do not have to go through the effort of creating assemblies. Continue reading “Introducing SARS-CoV-2 Variants Overview, NLM’s latest tool in the fight against COVID-19 “

Four new options to simplify your SARS-CoV-2 submissions

Four new options to simplify your SARS-CoV-2 submissions

We have recently added several exciting improvements to the SARS-CoV-2 GenBank submission process based on community feedback. To save you time, NCBI completes feature annotation for you, which means SARS-CoV-2 GenBank submission only requires a FASTA file and source metadata. Here are other new features to ease and simplify your submission workflow.

Automatically remove failed sequences from a submission: On the web, a single click lets you opt-in to automatic removal of failed sequences (Figure 1) so that the rest of your sequences can be swiftly accessioned! A report provided after the submission lists your failed sequences and points out potential sequence problems so that you can take a closer look after your error-free sequences are released. This option is also available for submission via FTP.

Need to set up FTP submissions? The NCBI team is here to help. Contact gb-admin@ncbi.nlm.nih.gov.

Figure 1. GenBank submission page showing the option to remove sequences with processing errors.

Continue reading “Four new options to simplify your SARS-CoV-2 submissions”