SmartBLAST: Faster BLASTp search results in a graphical view


BLAST (Basic Local Alignment Search Tool) is a popular tool for finding sequences in a given database that are similar to a query sequence. Traditionally, BLAST displays these results as a sorted list of matches between the query and each database sequence. While this display is useful for examining how each subject sequence matches the query, it treats all subject sequences the same, regardless of the quality of the sequence data or its annotation, and also does not allow easy comparisons between different subject sequences.

For example, the subject sequences may fall into multiple groups of similar sequences, or all of the subject sequences may be more similar to each other than to the query. A common way to obtain this information is to construct a multiple sequence alignment of the query and some or all of the subject sequences, but to this point, BLAST has not provided such alignments directly.

Enter SmartBLAST! SmartBLAST is a new and experimental NCBI tool that makes it easier to complete common sequence analysis tasks, such as finding a candidate protein name for a sequence, locating regions of high sequence conservation, or identifying regions covered by database sequences but missing from the query.

To do this, SmartBLAST performs the following tasks in much less time than it takes to run a typical BLASTp search:

  • a BLASTp comparison of the query with the closest matching sequences available;
  • a parallel BLASTp search to find the closest matches to high quality sequences from model organisms;
  • a multiple alignment between the query and five of the closest matching sequences (usually including two high quality sequences);
  • an analysis that produces a phylogenetic tree from the multiple sequence alignment.

SmartBLAST then presents the results of the above tasks in a graphical view similar to that of a regular BLASTp search.

Figure 1. Search of XP_006115387.1 with SmartBLAST. From left to right, the output displays the following: a phylogenetic tree of all six sequences with the query sequence highlighted in yellow, the titles for the sequences, and a graphical overview of the multiple alignmnent. The query is colored yellow, and the matching sequences are either blue (from nr) or green (from teh reference database). Deletions in the multiple sequence alignment appear as white gaps, and regions in teh original BLASTp pairwise alignments where the query and matchign sequences did not align are show in gray.

Figure 1. Search of XP_006115387.1 with SmartBLAST. From left to right, the output displays the following: a phylogenetic tree of all six sequences with the query sequence highlighted in yellow, the titles for the sequences, and a graphical overview of the multiple alignment. The query is colored yellow, and the matching sequences are either blue (from nr) or green (from the reference database). Deletions in the multiple sequence alignment appear as white gaps, and regions in the original BLASTp pairwise alignments where the query and matching sequences did not align are show in gray. (click to enlarge)

Figure 1 presents a sample SmartBLAST search with a predicted protein (XP_006115387.1) as the query. The display includes a phylogenetic tree in which every leaf-node is labeled with either the common name of the organism (if available) or the scientific name. This tree displays a wide taxonomic diversity that includes reptiles, mammals, and birds. The display also clearly indicates that the query contains N-terminal sequence that is not found in the matching sequences, and C-terminal sequence that BLAST could not align in the top three matches.

SmartBLAST can be accessed at http://blast.ncbi.nlm.nih.gov/smartblast/. It is still in an experimental phase and will change with little or no notice.

Please give it a try and let us know what you think by commenting on this post!

8 thoughts on “SmartBLAST: Faster BLASTp search results in a graphical view

  1. Pingback: Introducing PubMed Labs | NCBI Insights

  2. I like it! It was lightning fast, and gives a lot of useful information that one usually wants when dealing with a new sequence. I can see this being my new ‘go to’ protein blast tool to use as a jumping off point for new proteins. I must say that it did over-emphasize some poor hits in the example I tried, I’m not sure why.

  3. Pingback: The NCBI Minute: quick introductions to NCBI resources | NCBI Insights

    • Here is the NLM style citation for SmartBLAST:

      SmartBLAST [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; 2016 – cited 2016 Jul 12. Available from: http://blast.ncbi.nlm.nih.gov/blast/smartblast/

      If you’re talking about SmartBLAST as a resource and need to cite a paper about it, it is mentioned in this year’s NAR Database Issue, in a paper called “Database resources of the National Center for Biotechnology Information.”: https://www.ncbi.nlm.nih.gov/pubmed/26615191

  4. Pingback: SmartBLAST updated to provide more information, database matches | NCBI Insights

  5. Hi, SmartBlast is great so far. Now, I would love to have the smart blast results but using more than one query. Or at least select which landmark database sequences I want to use.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s