A study (PMID: 28158543) published in the July 2017 issue of Bioinformatics collects, classifies and analyzes single nucleotide variants (SNVs) that may affect response to currently approved drugs. They identified 2,640 SNVs of interest, most of which occur rarely in populations (minor allele frequency <0.01).
The researchers used protein sequence alignment tools and mined open data from multiple information resources accessed through E-utilities including PubChem Compound (Kim et al., 2016 PMID: 26400175), NCBI Gene (Maglott D, et al., 2014. PMID: 25355515), NCBI Protein (Sayers, 2013), MMDB (Madej et al., 2012 PMID: 22135289), PDB (Berman et al., 2000 PMID: 10592235), dbSNP (Sherry et al., 2001 PMID: 11125122), and ClinVar (Landrum et al., 2016 PMID: 26582918).
Questions, comments, and other feedback may be sent to Yanli Wang.
On Wednesday, February 14, 2018, NCBI will present a webinar that will show you how to quickly retrieve sequences in any format from NCBI.
Date & time: Wed, Feb 14, 2018 12:00 PM – 12:30 PM EST
Ever need to quickly grab a protein or nucleotide sequence in FASTA or another format from NCBI? This NCBI Minute will show you how to accomplish this using the nucleotide and protein web pages, an NCBI URL, and – the most flexible way – through the commandline EDirect client that accesses the E-Utilities API.
BLAST is a powerful search tool, but often a search is just the beginning of the journey. We put ourselves in the shoes of a researcher who has just sequenced a handful of samples from the latest viral outbreak and tried to understand what information would be most useful. We also reached out to researchers in the field and asked: a) what questions do they really want to answer? and b) how can NCBI best provide the answers? Based on insights from those questions and answers, we developed the new Virus Sequence Search Interface (Fig. 1). The Search Interface is an NCBI Labs project, which means it is an experimental project, and we may modify the resource based on your feedback and experiences.
Figure 1. The Virus Sequence Selection Interface. The Virus Sequence Selection Interface accepts as input nucleotide and protein accessions, as well as FASTA and plain-text formatted sequences. The user selects either “Nucleotide” or “Protein,” depending on the sequence type, and selects the virus type from the pull-down menu below the text entry field.
Sequence Viewer 3.23 has several new features, improvements and bug fixes, including performance optimization for alignment renderings and improved tooltips in uploaded VCF files. For a full list of changes, see the Sequence Viewer release notes.
Sequence Viewer is a graphical view of sequences and color-coded annotations on regions of sequences stored in the Nucleotide and Protein databases.
Have you ever searched the NCBI Protein database and been overwhelmed with the number of sequences returned? Have you tried searching with a protein name, thinking that would greatly limit the results, only to still be presented with many sequences (all with the same name)? It’s a common problem in this time of greatly expanding sequence databases powered by large-scale genomic sequencing of similar organisms. Redundancy in the sequence databases is high and only getting worse.
To address this, in 2013 NCBI released the WP records, which collect identical protein sequences annotated on bacterial genomes. In 2014, NCBI released the Identical Protein Reports on Protein records, which displays information about all other proteins identical to that protein. Now, we are releasing a new resource: Identical Protein Groups (IPG). IPG offers several features:
In the July 19, 2013 issue of the journal Science, an interestingarticle describes the discovery and characterization of two “giant” viruses that are proposed to comprise the first members of the “Pandoravirus” genus.
Nadege Philippe and co-workers obtained the viruses from sediment samples in Chile and Australia and found that they have no morphological resemblance to any previously defined virus families. The investigators isolated the genomes of these viruses and sequenced them using a variety of NextGen methodologies. They then assembled the reads into contigs and characterized them using various sequence similarity algorithms (including NCBI’s BLAST and CD-Search). Interestingly, while related to each other, the genomes were not similar to those of any other organism or virus. Additionally, 93% of protein-coding sequences had no recognizable homologs.
If you’re a protein researcher, one thing you may want to do is to find homologs for a protein of interest on the basis of its sequence. This can provide insights into what the protein does and how it does it, and may identify proteins with known three-dimensional structures that can serve as models for the protein of interest. The Conserved Domains Database (CDD) groups proteins that have strong sequence similarity to protein domain fingerprints and allows you to search these groups with any protein sequence. Such searches are often more sensitive than standard BLAST searches since the scoring matrices used are tuned to locate important functional sites and sequence motifs that are highly conserved within the domain. You can then use the results to explore the evolutionary relationships of these proteins or identify these important sequence and structural features.
Here is a method to find protein sequences from many organisms that contain a particular conserved domain: