BLAST is a powerful search tool, but often a search is just the beginning of the journey. We put ourselves in the shoes of a researcher who has just sequenced a handful of samples from the latest viral outbreak and tried to understand what information would be most useful. We also reached out to researchers in the field and asked: a) what questions do they really want to answer? and b) how can NCBI best provide the answers? Based on insights from those questions and answers, we developed the new Virus Sequence Search Interface (Fig. 1). The Search Interface is an NCBI Labs project, which means it is an experimental project, and we may modify the resource based on your feedback and experiences.
Figure 1. The Virus Sequence Selection Interface. The Virus Sequence Selection Interface accepts as input nucleotide and protein accessions, as well as FASTA and plain-text formatted sequences. The user selects either “Nucleotide” or “Protein,” depending on the sequence type, and selects the virus type from the pull-down menu below the text entry field.
This tool provides rapid insight into query sequences by presenting Blastn and Blastp results alongside normalized metadata, when available. These include: isolation source, host, country, and date, as well as genetic attributes such as completeness, and segment or protein names when applicable. The normalized metadata is generated via an internal, curator-guided data-processing pipeline that maps sequence-record attributes to standardized vocabularies to provide a user-friendly view of the data.
The interface currently supports BLAST searches for influenza viruses, rotavirus A, dengue viruses, West Nile virus, Zika virus, ebolaviruses, and MERS coronavirus sequences.
Select the type of BLAST search you want to perform: Nucleotide tab for Blastn, and Protein tab for Blastp.
Enter a single query sequence (currently, the interface supports only a single query) in the search box. The accepted formats: accession number, FASTA, bare sequence. You can also use the example included in the interface.
Select the virus of interest from dropdown menu.
The results of the search will appear below the search box (Fig. 2).
Figure 2. Virus Sequence Selection Interface results view. A search performed using the example Ebolavirus accession, KP178538.1 (in italics, below the text entry field), is shown. The default columns for the results table are shown and include: coverage, identity, accession, virus, type, serotype, country, host, and collection date. The displayed columns can be customized by selecting the “Choose Columns” button in the top-left of the table. Additionally, the results can be downloaded by selecting the download button above the top-right of the table. Supported download formats include: FASTA, an accession list, CSV, and XML.
The results table can be customized by adding or deleting the columns from “Select Columns” menu (Fig. 3).
Figure 3. Select the columns menu by pressing the “Choose Columns” link in the Virus Sequence Selection Interface results view.
Not only are BLAST results presented alongside normalized metadata, but the results can be refined by filtering along these terms (Fig. 4).
Figure 4. Filtering Results Using Normalized Metadata. The available filters for the same example Ebolavirus search are shown in Figure 2. The filters may be accessed by selected the “Filters” tab above the top-left of the results table. The filters may also be collapsed by selecting “Filters” again. The filters will vary depending on the sequence type and virus type submitted, but include terms for isolation host and geography, as well as sequence content. Additionally, sequence counts by collection date are shown at the bottom of the filter field, and the results can be restricted to a range via “highlighting” the desired date range.
Additionally, to further facilitate rapidly placing your sequence of interest in a biological context, the results can be viewed as a phylogenetic-tree or as a multiple sequence alignment (Figs. 5, 6, and 7).
Figure 5. How to Generate Multiple Alignments and Phylogenetic Trees. The buttons used to generate multiple sequence alignments and phylogenetic trees are outlined in image below. They are located next to the filters button (if it is collapsed), above the top-left of the results table. By default, a new tab is opened to display the resulting figures.
Figure 6. An Example Multiple Sequence Alignment. A multiple sequence alignment of the results for the example Ebolavirus sequence in Figure 2 is shown. In the top-right of the interface users can select the “Download” button to save the results in FASTA format. Additionally, next to the download button, the “Tools” button allows users to adjust the coloring scheme used.
Figure 7. An Example Phylogenetic Tree. A phylogenetic tree built from the results for the example Ebolavirus sequence in Figure 2 is shown. The tree can be customized by selecting the tree building method, maximum sequence difference, and sequence labels from the corresponding pull-down menus in the top-left of the interface. Additionally, in the “Tools” drop-down (top-right), users can select from a variety of tree layouts, sort orders, and a variety of download formats.
We invite you to try out this tool and send us your feedback!