NLM’s Conserved Domain Database (CDD) has expanded its scope to now include 153 new viral protein domain family models for the annotation of coronaviruses, including models such as for the S1 subunit of coronavirus Spike proteins (cd21527), the nucleocapsid (N) protein of coronavirus (cd21595), and the coronavirus RNA-dependent RNA polymerase (cd21530).
Each curated domain model consists of a multiple sequence alignment containing conserved sequence features that may have been confirmed experimentally, plus links to relevant publications. When available, the domain models include 3D structures with links to interactive 3D views and interacting partners.
Check out this tabular summary of SARS-CoV-2 gene products for links to matching conserved domain models and representative 3D protein structures.
Want to view these alignments in 3D space? We’ve updated iCn3D, a web-based 3D structure viewer, with new rendering, annotation, and alignment features. Read more about how you can use iCn3D to view and analyze SARS-CoV-2-related structures.
Don’t forget to review our SARS-CoV-2 resources page to keep up to date on other coronavirus data at NCBI!
The latest version of the Conserved Domain Database contains 2,128 new or updated NCBI-curated domains and now mirrors Pfam version 32 as well as models from NCBIfams, a collection of protein family hidden Markov models (HMMs) for improving bacterial genome annotation. We have also added fine-grained classifications of the cupin and PBP1 superfamilies. You can find this updated content on the CDD FTP site. Read on for detailed release statistics.
Continue reading “Conserved Domain Database (CDD) v. 3.18 is now available”
The 2020 Nucleic Acids Research database issue features papers from NCBI staff on GenBank, ClinVar and more. These papers are also available on PubMed. To read an article, click on the PMID number listed below.
“Database resources of the National Center for Biotechnology Information”
by Eric W Sayers, Jeff Beck, J Rodney Brister, Evan E Bolton, Kathi Canese et al. (PMID: 31602479)
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 38 distinct databases. This article provides a brief overview of the NCBI Entrez system of databases, followed by a summary of resources that were either introduced or significantly updated in the past year, including PubMed, PMC, Bookshelf, BLAST databases and more!
Continue reading “Read about NCBI resources in 2020 Nucleic Acids Research database issue”
We are now showing the curated evidence used for assigning names and, if possible, gene symbols, publications, and Enzyme Commission numbers on nearly 70% (83 million) microbial RefSeq proteins. This evidence includes a hierarchical collection of curated Hidden Markov Model (HMM)-based and BLAST-based protein families, and conserved domain architectures.
Continue reading “Evidence for naming the protein now on non-redundant refseq records (WP_ accessions)”
The latest version of the Conserved Domain Database contains 3,272 new or updated NCBI-curated domains and now mirrors Pfam version 31 as well as models from NCBIfams, a collection of protein family hidden Markov models (HMMs) for improving bacterial genome annotation. A fine-grained classification of the major facilitator superfamily has also been added. You can find this updated content on the CDD FTP site.
Continue reading “Conserved Domain Database (CDD) 3.17 is now available”
If you’re a protein researcher, one thing you may want to do is to find homologs for a protein of interest on the basis of its sequence. This can provide insights into what the protein does and how it does it, and may identify proteins with known three-dimensional structures that can serve as models for the protein of interest. The Conserved Domains Database (CDD) groups proteins that have strong sequence similarity to protein domain fingerprints and allows you to search these groups with any protein sequence. Such searches are often more sensitive than standard BLAST searches since the scoring matrices used are tuned to locate important functional sites and sequence motifs that are highly conserved within the domain. You can then use the results to explore the evolutionary relationships of these proteins or identify these important sequence and structural features.
Here is a method to find protein sequences from many organisms that contain a particular conserved domain:
Continue reading “Using Conserved Domains to Find Protein Homologs”