Tag: BLAST+

Save the Date: NCBI at the Bioinformatics Open Science Conference (BOSC), July 2022

Save the Date: NCBI at the Bioinformatics Open Science Conference (BOSC), July 2022

Come visit NCBI at the Bioinformatics Open Science Conference (BOSC), part of the Intelligent Systems for Molecular Biology Conference (ISMB), July 13-16, taking place both in person in Madison, Wisconsin and virtually! We’ll be presenting talks and posters on the latest updates to the NCBI Datasets, BLAST, and Protein resources. You can also join us at the Birds of a Feather (BoF) discussion and the BOSC CollaborationFest (CoFest) to explore these resources and discuss workflows with NCBI staff. Continue reading “Save the Date: NCBI at the Bioinformatics Open Science Conference (BOSC), July 2022”

Introducing ElasticBLAST – BLAST® is now easier, bigger, and faster on the Cloud!

Introducing ElasticBLAST – BLAST® is now easier, bigger, and faster on the Cloud!

ElasticBLAST is a new tool that helps you run BLAST searches on the cloud. ElasticBLAST is perfect for you if you have thousands to millions of queries to our Basic Local Alignment Search Tool (BLAST ®), or if you want to use cloud infrastructure for your searches. ElasticBLAST can handle large searches that are not appropriate for NCBI web BLAST, and it runs them more quickly than stand-alone BLAST+.

ElasticBLAST works on two of the current NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) partners- Amazon Web Services (AWS) and Google Cloud Platform (GCP).  ElasticBLAST works by distributing your searches across multiple cloud instances to process them in tandem. The ability to scale resources in this way allows you to process large numbers of queries in a shorter time than you could with BLAST+. ElasticBLAST can handle millions of queries, and it also supports most BLAST+ options and programs.

Making it easier to run BLAST on the cloud

ElasticBLAST reduces the barrier to using the cloud by creating and managing cloud resources for you. It manages the software and database installation, handles partitioning of the BLAST workload among the various instances and deallocates cloud resources when the searches are done. For example, ElasticBLAST will select the best cloud instance type for your search based on the database metadata that provides database size and memory needs (Figure 1). You can also manually select the instance type if you prefer.

Fig. 1: JSON metadata for the 16S_ribosomal_RNA database. The “bytes-to-cache” information helps ElasticBLAST pick out an instance with the appropriate capacity.

Selecting Databases

ElasticBLAST can access the 28 NCBI databases available on AWS and GCP. These are the same databases that are also available from the NCBI FTP site. For instance, databases available on the two cloud providers include the RefSeq Eukaryotic Representative Genomes database, 16S database based on Targeted Loci, and Human and mouse genomes databases.

You can also provide your own databases, and you can produce the metadata needed to select an instance through a Python script that comes with ElasticBLAST.

Example Runs

ElasticBLAST can perform a variety of searches with query sets that range from hundreds to millions of sequences and BLAST databases of all sizes.  Table 1 shows ElasticBLAST searches with query sets that range up to billions of letters using a variety of BLAST databases.

Table 1: Sample ElasticBLAST searches.  This table demonstrates the breadth of searches supported by ElasticBLAST.  Additionally, the first row demonstrates the ability of ElasticBLAST to use many CPUs (3200) on a cloud provider at once to complete a task in hours that would have taken days on a single machine.

Costs

Because ElasticBLAST runs on cloud providers, using it will incur some cost. Based on current cost structures on AWS and GCP, in most cases these costs are quite small. For example, a protein search with a query of about 20 million residues against a database of about 20 billion residues can cost less than $5. Even a larger search with a query of 3-4 billion DNA bases can cost only around $50. Both cloud services include the option to bid on instances for less than full price, which can result in significant savings. ElasticBLAST can be configured to request such instances. Your costs will obviously vary based on many factors, and we encourage you to explore these options with the individual cloud providers. Also, both AWS and GCP offer a free tier or time-limited trial of their cloud services, and you can find information about using ElasticBLAST with the free tiers here.

Welcome to ElasticBLAST!

Go ahead and run your first ElasticBLAST search! We are sure you’ll love how ElasticBLAST accelerates your research.

Your feedback is crucial to the development and support of ElasticBLAST. If you have any questions or suggestions, please reach out to us at blast-help@ncbi.nlm.nih.gov. We’d love to hear from you.

ElasticBLAST is a cloud-native package developed by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) with support from the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative.

BLAST+ 2.12.0 now available with more efficient multithreaded searches

BLAST+ 2.12.0  programs feature better multithreaded searches and support a different threading model, threading by query, that can be more efficient in some situations.  The new release is also fully compatible the increase in the numeric range for the GI identifier, which will take effect in the nucleotide database later this year.  The list below shows details of the new features and bug fixes.  You can download the new BLAST release from the FTP site.

Continue reading “BLAST+ 2.12.0 now available with more efficient multithreaded searches”

NCBI on YouTube: RAPT and BLAST+ on the Cloud, SARS-CoV-2 genome data in Datasets

It’s time we do another roundup of what’s been happening on YouTube!

First up, the NCBI YouTube channel has merged with the NLM YouTube channel. You’ll now be able to find diverse content all on one channel, from tips on using resources to fascinating moments in the history of medicine and more!

Continue reading “NCBI on YouTube: RAPT and BLAST+ on the Cloud, SARS-CoV-2 genome data in Datasets”

December 9 Webinar: Using BLAST+ in Docker and on the cloud

December 9 Webinar: Using BLAST+ in Docker and on the cloud

Join us on December 9, 2020 to learn about containerized BLAST+ in Docker that is ready to use locally and in the cloud. We are staging BLAST databases in some cloud providers making running containerized BLAST as part of a pipeline in the cloud even easier. In this webinar you will learn about the advantages of containerized BLAST and learn how to use it in some practical examples. You will also learn about Elastic BLAST, a cloud application that is useful for aligning extremely large numbers of sequences against BLAST databases.

  • Date and time: Wed, December 9, 2020 12:00 PM – 12:45 PM EST
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

BLAST+ 2.11.0 now available with limited usage reporting to help improve BLAST

BLAST+ 2.11.0 release is now available from our FTP site.  With this release, BLAST+ now provides usage reports to NCBI to help us improve BLAST.  This information is limited to the name of the BLAST program, some basic database metadata, a few BLAST parameters, as well the number and total size of your queries (Figure 1).

Figure 1. An example of the report sent back to NCBI from the 2.11.0 BLAST programs.

Continue reading “BLAST+ 2.11.0 now available with limited usage reporting to help improve BLAST”

BLAST+ database improved

We’ve made some recent enhancements to the BLAST+ applications that allow you to:

  1. Limit your search by taxonomy using information built into the BLAST databases
  2. Search sequences by accession faster
  3. Use blastdbcmd to retrieve sequences by taxonomy from a BLAST database

The new version of the BLAST databases (version 5, release notes) supports the items listed above. You can access the new executables on FTP. Sample version 5 databases are also available.

Note: This is an alpha release to allow users to test and comment on new features.

Problems/Feedback

Please send problem reports and feedback to blast-help@ncbi.nlm.nih.gov or write to the Help Desk.