The NCBI Datasets SARS-CoV-2 taxonomy page brings you both SARS-CoV-2 genomes and proteins, basic information about SARS-CoV-2, and connections to related NCBI pages, all in one place (see Figures 1 and 2).
Figure 1. NCBI Datasets SARS-CoV-2 taxonomy page. For command-line access, try the datasets command-line tool (top box). For customized filtering options, check out NCBI Virus (bottom box).
If you scroll down the taxonomy page you will find a table of SARS-CoV-2 proteins, each with “Actions” that provide the option to download a package of protein sequences from all annotated SARS-CoV-2 genomes (Figure 2), as well as links to NCBI Gene and the protein sequence from the reference genome.
Figure 2. NCBI Datasets SARS-CoV-2 taxonomy page (cont’d).Click the blue download button to download a package of all SARS-CoV-2 genomes (6 M and counting as of 7/15/22), or just the SARS-CoV-2 reference genome (top box). Below that you see a table of SARS-CoV-2 proteins, each with “Actions” available through the three-dot menu that provides the option to download a package of protein sequences from all annotated SARS-CoV-2 genomes (bottom boxes).
We want to hear from you! Check out the new SARS-CoV-2 taxonomy page and let us know what you think. Contact us with questions or feedback.
Join us on June 15 , 2022 at 12PM US eastern time learn about the NCBI Virus resource – a community portal for viral sequence data that has been important in supporting SARS-CoV-2 research and management of the COVID-19 pandemic. Enhancements to NCBI Virus that support these efforts include: SARS-CoV-2 specific filters, a dedicated web interface that reports on geotemporal prevalence of sequence records for SARS2 lineages, plus details on NCBI’s lineage-defining mutations.
Date and time: Wed, June 15, 2022 12:00 PM – 12:45 PM EDT
One impetus for development of the dashboard is that unassembled SRA data cannot be processed through Pango tools, and many SARS-CoV-2 samples are only represented in SRA. The Pango nomenclature is being used by researchers and public health agencies worldwide to track the transmission and spread of SARS-CoV-2, including variants of concern. Thus, we developed a uniform approach to making variant calls from SRA records and assigning Pangolin lineages on the basis of these results. This means that submission groups do not have to go through the effort of creating assemblies. Continue reading “Introducing SARS-CoV-2 Variants Overview, NLM’s latest tool in the fight against COVID-19 “→
We have recently added several exciting improvements to the SARS-CoV-2 GenBank submission process based on community feedback. To save you time, NCBI completes feature annotation for you, which means SARS-CoV-2 GenBank submission only requires a FASTA file and source metadata. Here are other new features to ease and simplify your submission workflow.
Automatically remove failed sequences from a submission: On the web, a single click lets you opt-in to automatic removal of failed sequences (Figure 1) so that the rest of your sequences can be swiftly accessioned! A report provided after the submission lists your failed sequences and points out potential sequence problems so that you can take a closer look after your error-free sequences are released. This option is also available for submission via FTP.
Join us on June 30, 2021 at 12PM eastern time to learn how to use the new NCBI Datasets resource to find and download gene, genome and SARS-CoV-2 sequence and annotation. You will learn how to access these datasets through either the web interface or the new command-line tools that allow you to incorporate these data in your bioinformatic workflows.
Date and time: Wed, June 30, 2021 12:00 PM – 12:45 PM EDT
The NCBI structure viewer iCn3D version 3 is now available on the NCBI web site and from GitHub.
Analysis of 3D Structures
You can use the current version with the icn3d package at npm to write scripts to call functions in iCn3D. For example, this script on GitHub can calculate the change in interactions due to a mutation. The results of this analysis for the structure (6M0J) of the SARS-CoV-2 spike protein bound to the ACE2 receptor are displayed in Figure 1. These show the predicted changes in interactions with other residues in the the SARS-CoV-2 spike protein and in the ACE2 receptor when the asparagine (N) at position 501 of the spike protein is changed to a tyrosine (Y). You can also run these scripts from the command line to process a list of 3D structures to get and analyze annotations.
Figure 1. iCn3D viewer showing the predicted interactions with other residues in the spike protein and in the ACE2 target when the asparagine (N) at position 501 of the SARS-CoV-2 spike protein is substituted with tyrosine (Y), highlighted in yellow. Interactions were calculated using the script interactions2.js.
During the COVID-19 pandemic, it is critical to collect descriptive information about the provenance and attributes of SARS-CoV-2 genomic samples so that the course of the virus may be tracked and analyzed. The NCBI Submission Portal now includes a dedicated BioSample submission package to help further improve the quality and richness of submitted SARS-CoV-2 sample metadata. The SARS-CoV-2 clinical or host-associated package presents a framework and standardized fields for submitters to provide attributes considered useful for the rapid analysis and surveillance of SARS-CoV-2 clinical and host-associated cases. For example, mandatory attributes include collection date and geographic location, while suggested but optional attributes include date of SARS-CoV-2 vaccination, vaccine received, and host disease outcome.
Looking for genomes for the B.1.1.7 SARS-CoV-2 variant? NCBI now supports searches for SARS-CoV-2 variant names such as B.1.1.7, B.1.351, or P.1. For example, search for B.1.1.7 (Figure 1) and you’ll see a virus classification box with an option to download a SARS-CoV-2 data package. SARS-CoV-2 data packages include genome and protein sequences and a detailed data report for all SARS-CoV-2 genomes classified as that variant. SARS-CoV-2 genome lineages are classified by pangolin, using the pangoLEARN algorithm.
Figure 1. SARS-CoV-2 variant search result with button to download a data package containing data for all SARS-CoV-2 genomes matching that variant lineage, B.1.1.7 in this case.
NCBI Datasets, the new set of services for downloading genome assembly and annotation data (previous Datasets posts), has redesigned and reorganized web pages to make it easier to find and access the services and documentation you need.