In modern biomedical research, you often need to analyze very large datasets. This may require computing and storage capacity that exceeds what you have available locally. Working in a cloud environment where you can provision nearly limitless computing power, gain access to enormous data sets, and pay for only what you need is a great option in these cases.
To help with these tasks, NCBI is now providing a Docker version of NCBI BLAST that you can use on the cloud. This implementation will help you work with large volumes of sequence data and the set of NCBI BLAST databases. The BLAST Docker image makes using BLAST on the cloud much more convenient.
- Installation and maintenance of the BLAST programs and databases is all handled by Docker.
- Integration with other tools in your pipelines is easier.
- NCBI BLAST databases are pre-loaded on the Google Cloud, providing fast access.
While we have tested the Docker image on the Google Cloud, the Docker image will allow BLAST to run equally well on any Docker-enabled platform, such as another cloud platform or on your local computer — and you can still can use the cloud-installed BLAST databases.
See the BLAST in the Cloud and database information documentation to get started.
As you probably know, BLAST has been offering a new results page as an option for standard BLAST for you to test and provide feedback since April. See our post from earlier this spring for more details. We have just added new results pages (Figure 1) for the following four specialized BLAST services for you to test.
- Align two or more sequences
We recently updated the version 5 BLAST protein databases, (dbV5), on our FTP site to be completely accession-based. As we described in a previous post, this means they now contain the gi-less proteins from the NCBI Pathogen Project and other high-throughput projects. The v5 databases are also compatible with proteins from PDB structures with multi-character chain identifiers and will include these as they become available in our other protein systems. Only the latest version of BLAST+ (2.9.0, download) will work with the updated v5 databases and allow you to access all of the most recent protein data. At the end of September 2019, we will stop updating the version 4 BLAST databases and offer the v5 databases as the default for download.
For more information on the new database version and BLAST+ (2.9.0), see the previous NCBI Insights article and the recording of our recent webinar.
We have been offering the new BLAST results page (Figure 1) for you to try out since April and have been collecting your comments and feedback. Thank you all for your input on this new results display. Over 90% of your comments have been positive. We have made several changes to the page that address issues or problems that you have pointed out and are also working on adding several additional features that you have suggested in future releases.
At this time, 96% of you who have tried the new page have kept it as your default results page. We are planning to make the new page the default for everyone on August 1, 2019. We will still provide access to the old results for some time to allow people who have workflows or teaching materials to adjust to the new display.
Figure 1. The New BLAST Results with filters directly on the page and a more concise tabbed output that includes the taxonomy report.
Please view our video introduction to the new results to see highlights of the improved display. As always, we will continue to incorporate your feedback into the design and features on the new page, so please test it out and let us know what you think.
IgBLAST is a popular NCBI package for classifying and analyzing immunoglobulin (IG) and T cell receptor (TCR) variable domain sequences. We’ve released a new version (1.14.0) of IgBLAST with three new improvements / bug fixes:
- Adaptive Immune Receptor Repertoire (AIRR) format is more consistent with AIRR specs including changing undefined type (NON, N/A) to empty string, not appending “reversed” to sequence identifier when the query is in reversed orientation, and using standard locus names such as IGH, TRB instead of traditional VH, VB etc.
- The logic for showing CDR3 end of TCR sequences is improved.
- The sequence identifier is restored in the case of no results in AIRR rearrangement format.
IgBLAST 1.12 is available for download from the BLAST FTP area. See the the new manual on GitHub for information about setting up and running IgBLAST.
Next Wednesday, May 15, 2019 at 11AM, NCBI staff will show you how to use the latest version of standalone BLAST+ (2.9.0) and the new accession-based DBv5 databases with built-in taxonomy information. You will learn how to limit searches to taxonomic groups and to retrieve sequences from the database by taxonomy without having to download an identifier list. You will also learn about additional improvements in the BLAST databases and programs that make them compatible with the new PDB identifiers and gi-less proteins from the Pathogen Detection Project.
Date and time: Wed, May 15, 2018 11:00 AM – 11:30 AM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
We have made some recent improvements to the BLAST+ applications that take full advantage of the version 5 BLAST databases (BLASTDBv5), which include built in taxonomic information for sequences and no longer rely on the integer sequence identifiers (gi numbers).
With the latest version of BLAST, you can now:
NCBI Labs is showcasing an experiment to improve the BLAST results page. The goal is to provide a more useful BLAST output that better meets your needs and integrates with your workflows. The new results incorporate feedback from surveys and interviews with BLAST users. We think you’ll find the new results are more compact, easier to navigate, and expose useful formatting and other features that you may not have known about.
The results page has organism, percent identity, and E value filters in plain view and easily accessible. The Descriptions and Graphic Summary are on separate tabs, and the popular taxonomy view is on a fourth tab rather than on a separate web page. These changes along with other enhancements make the display more concise and easier to navigate. The figure below shows the new output format.
Figure 1. The New BLAST Results with filters directly on the page and a more concise tabbed output that includes the taxonomy report. The Back to Traditional Results Page link re-loads the results in the standard format.
IgBLAST is a popular NCBI package for classifying and analyzing immunoglobulin and T cell receptor variable domain sequences. We’ve released a new version of IgBLAST with three new improvements:
- The new release determines the V gene reading frame from the end of FWR3 region instead of end of V gene. This helps identify the correct reading frames for rearrangements that have insertions or deletions near the V gene end.
- The allowed distance between V gene end and J gene start has been increased to 225 bp to allow detection of ultra long D/N regions.
- The standalone program and files has been repackaged to make it easier to install.
The new release is available from the BLAST FTP area, along with a new manual on GitHub.