ElasticBLAST is a new way to BLAST large numbers of queries, faster and on the cloud. Here are the top three reasons you should use ElasticBLAST:
1. ElasticBLAST can handle much LARGER queries!
ElasticBLAST can search query sets that have hundreds to millions of sequences and against BLAST databases of all sizes.
2. ElasticBLAST is FASTER
ElasticBLAST distributes your searches across multiple cloud instances to process them simultaneously. The ability to scale resources in this way allows you to process large numbers of queries in a shorter time than you could with BLAST+.
3. ElasticBLAST is EASY to run on the cloud
ElasticBLAST is easy to set up using our step-by-step instructions (Amazon Web Services (AWS), Google Cloud Platform (GCP))andallows you to leverage the power of the cloud. Once configured, itmanages the software and database installation, handles partitioning of the BLAST workload among the various instances, and deallocates cloud resources when the searches are done.
ElasticBLASTalso selects the instance (i.e., machine) type for you based on database size. Of course, you can also choose the instance type manually if you prefer. Continue reading “Top 3 reasons to use ElasticBLAST”→
ElasticBLAST is a new tool that helps you run BLAST searches on the cloud. ElasticBLAST is perfect for you if you have thousands to millions of queries to our Basic Local Alignment Search Tool (BLAST ®), or if you want to use cloud infrastructure for your searches. ElasticBLAST can handle large searches that are not appropriate for NCBI web BLAST, and it runs them more quickly than stand-alone BLAST+.
ElasticBLAST works on two of the current NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) partners- Amazon Web Services (AWS) and Google Cloud Platform (GCP). ElasticBLAST works by distributing your searches across multiple cloud instances to process them in tandem. The ability to scale resources in this way allows you to process large numbers of queries in a shorter time than you could with BLAST+. ElasticBLAST can handle millions of queries, and it also supports most BLAST+ options and programs.
Making it easier to run BLAST on the cloud
ElasticBLAST reduces the barrier to using the cloud by creating and managing cloud resources for you. It manages the software and database installation, handles partitioning of the BLAST workload among the various instances and deallocates cloud resources when the searches are done. For example, ElasticBLAST will select the best cloud instance type for your search based on the database metadata that provides database size and memory needs (Figure 1). You can also manually select the instance type if you prefer.
Fig. 1: JSON metadata for the 16S_ribosomal_RNA database. The “bytes-to-cache” information helps ElasticBLAST pick out an instance with the appropriate capacity.
Selecting Databases
ElasticBLAST can access the 28 NCBI databases available on AWS and GCP. These are the same databases that are also available from the NCBI FTP site. For instance, databases available on the two cloud providers include the RefSeq Eukaryotic Representative Genomes database, 16S database based on Targeted Loci, and Human and mouse genomes databases.
You can also provide your own databases, and you can produce the metadata needed to select an instance through a Python script that comes with ElasticBLAST.
Example Runs
ElasticBLAST can perform a variety of searches with query sets that range from hundreds to millions of sequences and BLAST databases of all sizes. Table 1 shows ElasticBLAST searches with query sets that range up to billions of letters using a variety of BLAST databases.
Table 1: Sample ElasticBLAST searches. This table demonstrates the breadth of searches supported by ElasticBLAST. Additionally, the first row demonstrates the ability of ElasticBLAST to use many CPUs (3200) on a cloud provider at once to complete a task in hours that would have taken days on a single machine.
Costs
Because ElasticBLAST runs on cloud providers, using it will incur some cost. Based on current cost structures on AWS and GCP, in most cases these costs are quite small. For example, a protein search with a query of about 20 million residues against a database of about 20 billion residues can cost less than $5. Even a larger search with a query of 3-4 billion DNA bases can cost only around $50. Both cloud services include the option to bid on instances for less than full price, which can result in significant savings. ElasticBLAST can be configured to request such instances. Your costs will obviously vary based on many factors, and we encourage you to explore these options with the individual cloud providers. Also, both AWS and GCP offer a free tier or time-limited trial of their cloud services, and you can find information about using ElasticBLAST with the free tiers here.
Your feedback is crucial to the development and support of ElasticBLAST. If you have any questions or suggestions, please reach out to us at blast-help@ncbi.nlm.nih.gov. We’d love to hear from you.
ElasticBLAST is a cloud-native package developed by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) with support from the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative.
BLAST+ 2.12.0 programs feature better multithreaded searches and support a different threading model, threading by query, that can be more efficient in some situations. The new release is also fully compatible the increase in the numeric range for the GI identifier, which will take effect in the nucleotide database later this year. The list below shows details of the new features and bug fixes. You can download the new BLAST release from the FTP site.
It’s time we do another roundup of what’s been happening on YouTube!
First up, the NCBI YouTube channel has merged with the NLM YouTube channel. You’ll now be able to find diverse content all on one channel, from tips on using resources to fascinating moments in the history of medicine and more!
Join us on December 9, 2020 to learn about containerized BLAST+ in Docker that is ready to use locally and in the cloud. We are staging BLAST databases in some cloud providers making running containerized BLAST as part of a pipeline in the cloud even easier. In this webinar you will learn about the advantages of containerized BLAST and learn how to use it in some practical examples. You will also learn about Elastic BLAST, a cloud application that is useful for aligning extremely large numbers of sequences against BLAST databases.
Date and time: Wed, December 9, 2020 12:00 PM – 12:45 PM EST
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
BLAST+ 2.11.0 release is now available from our FTP site. With this release, BLAST+ now provides usage reports to NCBI to help us improve BLAST. This information is limited to the name of the BLAST program, some basic database metadata, a few BLAST parameters, as well the number and total size of your queries (Figure 1).
Figure 1. An example of the report sent back to NCBI from the 2.11.0 BLAST programs.
We’ve made some recent enhancements to the BLAST+ applications that allow you to:
Limit your search by taxonomy using information built into the BLAST databases
Search sequences by accession faster
Use blastdbcmd to retrieve sequences by taxonomy from a BLAST database
The new version of the BLAST databases (version 5, release notes) supports the items listed above. You can access the new executables on FTP. Sample version 5 databases are also available.
Note: This is an alpha release to allow users to test and comment on new features.