Now Available! BLAST ClusteredNR database for blastx and PSI-BLAST searches

Now Available! BLAST ClusteredNR database for blastx and PSI-BLAST searches

ClusteredNR, the new protein database that provides results with a better overview of protein homologs in a wider range of organisms, is now available for blastx (translated nucleotide query) and PSI-BLAST (Position Specific Iterative BLAST) searches (Figure 1). Simply select ClusteredNR in the database section of the BLAST form. You can even search standard nr at the same time to compare results.

Figure 1. Composite image from the BLAST search forms. The ClusteredNR database is available now for blastx and PSI-BLAST searches in addition to blastp. For all types of searches, you can choose to search both ClusteredNR and standard nr at the same time so you can compare results

ClusteredNR is especially useful with blastx for finding more distant homologs when searching with queries from over-represented groups. For PSI-BLAST, the greater taxonomic scope of ClusteredNR database allows you to work more effectively with the default number target sequences in the first round. The two searches described below highlight these advantages of ClusteredNR.

Example 1. blastx with a rodent mRNA sequence

Standard nr results

Figure 2 shows blastx results from searching the standard nr database with an unannotated Amami-oshima Island spiny rat (Tokudaia osimensis) mRNA sequence (GHEE01458953.1).  The protein matches are all neuronal cell adhesion molecule products from other rodents (Figure 2A). With 100 default matches, all hits are from other species in the rodent clade Muroidea (rats, mice, gerbils, hamsters, voles, etc.; Figure 2B).

Figure 2. Translated (blastx) results from standard nr using an unannotated spiny rat (Tokudaia osimensis) mRNA (GHEE01458953.1). A. The first several matches of the standard nr results showing matches to rat and mouse species. B. The BLAST taxonomy lineage report showing that all 100 matches are to sequences from the rodent clade Muroidea.

ClusteredNR results

The results against ClusteredNR, however, show much broader taxonomic coverage of vertebrate groups including other placental mammals, marsupials, monotremes, birds, lizards, turtles, and crocodilians (Figure 3).

Figure 3. Translated (blastx) results from ClusteredNR  using an unannotated spiny rat (Tokudaia osimensis) mRNA (GHEE01458953.1). A. A portion of the Descriptions section of the  ClusteredNR results showing matches to clusters containing many vertebrate groups in addition to rodents and placental mammals. These include crocodilians (Gharial), iguanian lizards (green anole), birds, and monotremes (Australian echidna). B. The ’Cluster taxonomy tab’ for a 12-member cluster showing that it contains sequences from a crocodilian (Gavialis gangeticus), birds, and turtles. 

Example 2. PSI-BLAST with a plant defensin

Standard nr results

The first round results of a PSI-BLAST search  against standard nr with a defensin protein (NP_180171.1) from Arabidopsis thaliana shows matches only up to an expect value of 5 X 10-9 with the default 500 target sequences. This expect value is six orders of magnitude lower than the default inclusion threshold for PSI-BLAST (expect value 0.005). PSI-BLAST works by generating a position specific score matrix (PSSM) from the information in the BLAST alignments below the threshold. Since the inclusion threshold wasn’t reached, the results from nr are missing a substantial amount of information from more distant matches that may be important to include in the PSSM. Also, a useful feature of web PSI-BLAST is that you can manually select matches above threshold to add their alignment information to the PSI-BLAST PSSM if desired. However, here you don’t have access to any matches above threshold. In a case like this where you don’t reach the inclusion threshold, you may want to edit the search to get more than the default 500 matches and run it again so you can include all relevant proteins and be able to select matches above threshold.

ClusteredNR results

In contrast, the results with ClusteredNR (Figure 4) show cluster matches all the way up to the inclusion threshold of 0.005 and therefore provide more complete information for the PSSM. There are also matches above the threshold allowing you to select and include relevant ones if desired in the next round (Figure 4).Figure 4. First iteration PSI-BLAST results against clusteredNR for the plant defensin (NP_180171.1). Unlike the equivalent results from standard nr, the clusteredNR results reach the inclusion threshold and also have available matches above threshold allowing you to include these hits in the subsequent round if desired.

The ClusteredNR protein database on the web BLAST service provides faster searches, greater taxonomic reach, and easier to interpret results than the traditional nr database. Making ClusteredNR available for blastx and PSI-BLAST searches offers more ways for you to make discoveries with the new database.

We are always working to improve the ClusteredNR database and BLAST to help you identify protein sequences and find homologs. If you have any questions or input on improvements, please write blast-help@ncbi.nlm.nih.gov.

The BLAST ClusteredNR database is part of the NIH Comparative Genomics Resource (CGR), an NLM project to establish an ecosystem to facilitate reliable comparative genomics analyses for all eukaryotic organisms.

Join our mailing list to keep up to date with BLAST and other CGR news.

Leave a Reply