The version 5 BLAST (dbV5) protein databases are now accession-based. You can access these databases and the nucleotide BLASTDBs on our FTP site.
As we described in a previous post, this means they now contain the GI-less proteins from the NCBI Pathogen Project and other high-throughput projects. The v5 databases are also compatible with proteins from PDB structures with multi-character chain identifiers and will include these as they become available in our other protein systems. Only the latest version of BLAST+ (2.9.0, download) will work with the updated v5 databases and allow you to access all of the most recent protein and nucleotide data. In the winter of 2019, we will stop updating the version 4 BLAST databases and offer the v5 databases as the default for download.
In addition, makeblastdb will be updated in BLAST 2.10.0, due out in October 2019, so by default it creates dbV5 formatted databases.
As you may know, we have been offering a new BLAST results (Figure 1) as a test page since April. In response to your positive reception and after incorporating many improvements that you suggested, we made the new results the default today, August 1, 2019.
You will still be able to access to the traditional results for a several months. This will provide you additional time if you need it to adjust your workflows or teaching materials to the new display.
As you probably know, BLAST has been offering a new results page as an option for standard BLAST for you to test and provide feedback since April. See our post from earlier this spring for more details. We have just added new results pages (Figure 1) for the following four specialized BLAST services for you to test.
We recently updated the version 5 BLAST protein databases, (dbV5), on our FTP site to be completely accession-based. As we described in a previous post, this means they now contain the gi-less proteins from the NCBI Pathogen Project and other high-throughput projects. The v5 databases are also compatible with proteins from PDB structures with multi-character chain identifiers and will include these as they become available in our other protein systems. Only the latest version of BLAST+ (2.9.0, download) will work with the updated v5 databases and allow you to access all of the most recent protein data. At the end of September 2019, we will stop updating the version 4 BLAST databases and offer the v5 databases as the default for download.
Next Wednesday, May 15, 2019 at 11AM, NCBI staff will show you how to use the latest version of standalone BLAST+ (2.9.0) and the new accession-based DBv5 databases with built-in taxonomy information. You will learn how to limit searches to taxonomic groups and to retrieve sequences from the database by taxonomy without having to download an identifier list. You will also learn about additional improvements in the BLAST databases and programs that make them compatible with the new PDB identifiers and gi-less proteins from the Pathogen Detection Project.
Date and time: Wed, May 15, 2018 11:00 AM – 11:30 AM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
Need a refresher of what NCBI offers? Or just feel you aren’t taking full advantage of NCBI resources? Check out some of NCBI’s most recent recordings of NCBI Minute webinars up on the NCBI YouTube channel.
BLAST is a powerful search tool, but often a search is just the beginning of the journey. We put ourselves in the shoes of a researcher who has just sequenced a handful of samples from the latest viral outbreak and tried to understand what information would be most useful. We also reached out to researchers in the field and asked: a) what questions do they really want to answer? and b) how can NCBI best provide the answers? Based on insights from those questions and answers, we developed the new Virus Sequence Search Interface (Fig. 1). The Search Interface is an NCBI Labs project, which means it is an experimental project, and we may modify the resource based on your feedback and experiences.
In the next NCBI Minute on Wednesday, January 10, 2018, NCBI staff will demonstrate the new QuickBLASTP service that can search large databases at least 10X faster than traditional protein-protein BLAST (blastp). You will learn about the strategy QuickBLASTP uses to speed up the search. You will also see how to use the new QuickBLASTP service on the NCBI web BLAST site and how to access and run the standalone kblastp demonstration release.
Date & time: Wed, Jan 10, 2018 12:00 PM – 12:30 PM EST
QuickBLASTP, an accelerated version of BLASTP, adds a new pre-processing step to the non-redundant (nr) protein database. In a matter of seconds, QuickBLASTP will find approximately 97% of the database sequences with 70% or more identity to your query and around 98% of the database sequence with 80% or more identity to your query.
BLAST (Basic Local Alignment Search Tool) is a popular tool for finding sequences in a given database that are similar to a query sequence. Traditionally, BLAST displays these results as a sorted list of matches between the query and each database sequence. While this display is useful for examining how each subject sequence matches the query, it treats all subject sequences the same, regardless of the quality of the sequence data or its annotation, and also does not allow easy comparisons between different subject sequences.
For example, the subject sequences may fall into multiple groups of similar sequences, or all of the subject sequences may be more similar to each other than to the query. A common way to obtain this information is to construct a multiple sequence alignment of the query and some or all of the subject sequences, but to this point, BLAST has not provided such alignments directly.
Enter SmartBLAST! SmartBLAST is a new and experimental NCBI tool that makes it easier to complete common sequence analysis tasks, such as finding a candidate protein name for a sequence, locating regions of high sequence conservation, or identifying regions covered by database sequences but missing from the query.