Consistent protein nomenclature is indispensable for communication, literature searching and entry retrieval. NCBI, the European Bioinformatics Institute (EMBL-EBI), the Protein Information Resource (PIR) and the Swiss Institute for Bioinformatics (SIB) revised and reorganized previous guidelines from UniProt and NCBI. This joint effort produced universal guidelines in nomenclature and protein naming to promote clarity in communication and improve consistency in data retrieval across databases.
These guidelines are exclusively focused on nomenclature, providing rules about universal formatting and protein naming choices; they do not include best practices for identifying or predicting function. They cover usage of language, abbreviations, symbols, punctuation, notation, terms and style. Sources of protein names and options for protein naming are also discussed.
During the 2018 INSDC annual meeting, the three collaborating sequence databases (DDBJ, EBI and GenBank) agreed to recommend these guidelines to their submitters. The Protein Naming Guidelines working group plans to write a peer-reviewed publication about protein naming and to track future changes to this document in GitHub.
Do you ever want to see the flanking genes of a protein match from a BLAST search? On June 20th, we’ll show you how to see the genomic context of bacterial proteins using the identical protein report and the graphical sequence viewer. You will also learn to use these reports in detail and how to get these genomic contexts in batch for a set of protein matches using the identical proteins report and EDirect .
Date and time: Wed, June 20, 2018 12:00 PM – 12:30 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
A study (PMID: 28158543) published in the July 2017 issue of Bioinformatics collects, classifies and analyzes single nucleotide variants (SNVs) that may affect response to currently approved drugs. They identified 2,640 SNVs of interest, most of which occur rarely in populations (minor allele frequency <0.01).
The researchers used protein sequence alignment tools and mined open data from multiple information resources accessed through E-utilities including PubChem Compound (Kim et al., 2016 PMID: 26400175), NCBI Gene (Maglott D, et al., 2014. PMID: 25355515), NCBI Protein (Sayers, 2013), MMDB (Madej et al., 2012 PMID: 22135289), PDB (Berman et al., 2000 PMID: 10592235), dbSNP (Sherry et al., 2001 PMID: 11125122), and ClinVar (Landrum et al., 2016 PMID: 26582918).
Questions, comments, and other feedback may be sent to Yanli Wang.
On Wednesday, February 14, 2018, NCBI will present a webinar that will show you how to quickly retrieve sequences in any format from NCBI.
Date & time: Wed, Feb 14, 2018 12:00 PM – 12:30 PM EST
Ever need to quickly grab a protein or nucleotide sequence in FASTA or another format from NCBI? This NCBI Minute will show you how to accomplish this using the nucleotide and protein web pages, an NCBI URL, and – the most flexible way – through the commandline EDirect client that accesses the E-Utilities API.
BLAST is a powerful search tool, but often a search is just the beginning of the journey. We put ourselves in the shoes of a researcher who has just sequenced a handful of samples from the latest viral outbreak and tried to understand what information would be most useful. We also reached out to researchers in the field and asked: a) what questions do they really want to answer? and b) how can NCBI best provide the answers? Based on insights from those questions and answers, we developed the new Virus Sequence Search Interface (Fig. 1). The Search Interface is an NCBI Labs project, which means it is an experimental project, and we may modify the resource based on your feedback and experiences.
Figure 1. The Virus Sequence Selection Interface. The Virus Sequence Selection Interface accepts as input nucleotide and protein accessions, as well as FASTA and plain-text formatted sequences. The user selects either “Nucleotide” or “Protein,” depending on the sequence type, and selects the virus type from the pull-down menu below the text entry field.
Sequence Viewer 3.23 has several new features, improvements and bug fixes, including performance optimization for alignment renderings and improved tooltips in uploaded VCF files. For a full list of changes, see the Sequence Viewer release notes.
Sequence Viewer is a graphical view of sequences and color-coded annotations on regions of sequences stored in the Nucleotide and Protein databases.
Have you ever searched the NCBI Protein database and been overwhelmed with the number of sequences returned? Have you tried searching with a protein name, thinking that would greatly limit the results, only to still be presented with many sequences (all with the same name)? It’s a common problem in this time of greatly expanding sequence databases powered by large-scale genomic sequencing of similar organisms. Redundancy in the sequence databases is high and only getting worse.
To address this, in 2013 NCBI released the WP records, which collect identical protein sequences annotated on bacterial genomes. In 2014, NCBI released the Identical Protein Reports on Protein records, which displays information about all other proteins identical to that protein. Now, we are releasing a new resource: Identical Protein Groups (IPG). IPG offers several features: