One impetus for development of the dashboard is that unassembled SRA data cannot be processed through Pango tools, and many SARS-CoV-2 samples are only represented in SRA. The Pango nomenclature is being used by researchers and public health agencies worldwide to track the transmission and spread of SARS-CoV-2, including variants of concern. Thus, we developed a uniform approach to making variant calls from SRA records and assigning Pangolin lineages on the basis of these results. This means that submission groups do not have to go through the effort of creating assemblies. Continue reading “Introducing SARS-CoV-2 Variants Overview, NLM’s latest tool in the fight against COVID-19 “→
We have recently added several exciting improvements to the SARS-CoV-2 GenBank submission process based on community feedback. To save you time, NCBI completes feature annotation for you, which means SARS-CoV-2 GenBank submission only requires a FASTA file and source metadata. Here are other new features to ease and simplify your submission workflow.
Automatically remove failed sequences from a submission: On the web, a single click lets you opt-in to automatic removal of failed sequences (Figure 1) so that the rest of your sequences can be swiftly accessioned! A report provided after the submission lists your failed sequences and points out potential sequence problems so that you can take a closer look after your error-free sequences are released. This option is also available for submission via FTP.
Join us on June 30, 2021 at 12PM eastern time to learn how to use the new NCBI Datasets resource to find and download gene, genome and SARS-CoV-2 sequence and annotation. You will learn how to access these datasets through either the web interface or the new command-line tools that allow you to incorporate these data in your bioinformatic workflows.
Date and time: Wed, June 30, 2021 12:00 PM – 12:45 PM EDT
The NCBI structure viewer iCn3D version 3 is now available on the NCBI web site and from GitHub.
Analysis of 3D Structures
You can use the current version with the icn3d package at npm to write scripts to call functions in iCn3D. For example, this script on GitHub can calculate the change in interactions due to a mutation. The results of this analysis for the structure (6M0J) of the SARS-CoV-2 spike protein bound to the ACE2 receptor are displayed in Figure 1. These show the predicted changes in interactions with other residues in the the SARS-CoV-2 spike protein and in the ACE2 receptor when the asparagine (N) at position 501 of the spike protein is changed to a tyrosine (Y). You can also run these scripts from the command line to process a list of 3D structures to get and analyze annotations.
Figure 1. iCn3D viewer showing the predicted interactions with other residues in the spike protein and in the ACE2 target when the asparagine (N) at position 501 of the SARS-CoV-2 spike protein is substituted with tyrosine (Y), highlighted in yellow. Interactions were calculated using the script interactions2.js.
During the COVID-19 pandemic, it is critical to collect descriptive information about the provenance and attributes of SARS-CoV-2 genomic samples so that the course of the virus may be tracked and analyzed. The NCBI Submission Portal now includes a dedicated BioSample submission package to help further improve the quality and richness of submitted SARS-CoV-2 sample metadata. The SARS-CoV-2 clinical or host-associated package presents a framework and standardized fields for submitters to provide attributes considered useful for the rapid analysis and surveillance of SARS-CoV-2 clinical and host-associated cases. For example, mandatory attributes include collection date and geographic location, while suggested but optional attributes include date of SARS-CoV-2 vaccination, vaccine received, and host disease outcome.
Looking for genomes for the B.1.1.7 SARS-CoV-2 variant? NCBI now supports searches for SARS-CoV-2 variant names such as B.1.1.7, B.1.351, or P.1. For example, search for B.1.1.7 (Figure 1) and you’ll see a virus classification box with an option to download a SARS-CoV-2 data package. SARS-CoV-2 data packages include genome and protein sequences and a detailed data report for all SARS-CoV-2 genomes classified as that variant. SARS-CoV-2 genome lineages are classified by pangolin, using the pangoLEARN algorithm.
Figure 1. SARS-CoV-2 variant search result with button to download a data package containing data for all SARS-CoV-2 genomes matching that variant lineage, B.1.1.7 in this case.
NCBI Datasets, the new set of services for downloading genome assembly and annotation data (previous Datasets posts), has redesigned and reorganized web pages to make it easier to find and access the services and documentation you need.
It’s time we do another roundup of what’s been happening on YouTube!
First up, the NCBI YouTube channel has merged with the NLM YouTube channel. You’ll now be able to find diverse content all on one channel, from tips on using resources to fascinating moments in the history of medicine and more!
Interested in human genes involved in COVID-19 biology? NCBI’s RefSeq group has been hard at work compiling a set of human genes with roles in coronavirus infection and disease. You can now see and search for these genes and their regulatory elements in NCBI Gene and RefSeq.
Figure 1. Top section of the human ACE2 record in the Gene database. COVID-19 information can be found in the Summary and Annotation information sections.