Adjust your scripts: new arrangement and naming for BLAST databases on the FTP site!

As we announced, the new default database version for BLAST+ is dbV5.  To complete the transition to the new version, we will modify the directory structure and naming conventions on the BLAST FTP database directory.  We expect to make this change around February 4th, 2020.

Here is a list of what we will change:

  1. All databases at the base of the blastdb directory (/ blast/db/) will be the dbV5 versions.
  2. The version 5 databases will no longer have “_v5” as part of the archive or database names.
  3. We will move the dbV4 databases to a v4 subdirectory (/blast/db/v4/).
  4. The now legacy dbV4 database archives will have “_v4” in their names (e.g., nr_v4.00.tar.gz); we will not rename the files within the archive.
  5. We will no longer update the dbV4 databases.
  6. We will freeze the cloud directory (/blast/db/cloud/) with no new entries after January 13, 2020.
  7. We will provide only nr, nt, swissprot, and pdbaa files in the FASTA directory (/blast/db/FASTA/).

Please adjust your scripts or procedures to accommodate the changes!

If you have any questions or concerns, please contact us.

BLAST+ 2.10.0 now available with improved composition-based statistics

The BLAST+ 2.10.0 release is now available from our FTP site.  The new version offers the following improvements:

  • updated composition-based statistics for protein-protein (including translated BLAST) comparisons to provide stable results when you request fewer than the default number of results
  • an experimental Adaptive Composition Based Statistics option that increases the likelihood of finding novel results.  To enable this option set the environment variable ADAPTIVE_CBS to 1.  We welcome your feedback on this new option.

See the release notes for details on more  improvements and bug fixes with this release.

The new version fully supports the version 5 (v5) databases with built in taxonomy and other improvements. For more information on v5 databases (download), see the previous NCBI Insights article and the recording of our webinar.  If you are still using the older version 4 (v4) databases, we recommend you begin using the v5 version as soon as possible.  We will discontinue updates to the older v4 databases in early 2020.

Protein BLASTDBs are accession-based

The version 5 BLAST (dbV5) protein databases are now accession-based. You can access these databases and the nucleotide BLASTDBs on our FTP site.

As we described in a previous post, this means they now contain the GI-less proteins from the  NCBI Pathogen Project and other high-throughput projects. The v5 databases are also compatible with proteins from PDB structures with multi-character chain identifiers and will include these as they become available in our other protein systems. Only the latest version of BLAST+ (2.9.0, download) will work with the updated v5 databases and allow you to access all of the most recent protein and nucleotide data. In the winter of 2019, we will stop updating the version 4 BLAST databases and offer the v5 databases as the default for download.

In addition, makeblastdb will be updated in BLAST 2.10.0, due out in October 2019, so by default it creates dbV5 formatted databases.

For more information on the new database version and BLAST+ (2.9.0), see the previous NCBI Insights article and the recording of our recent webinar.

Have you tried BLAST+ (2.9.0) and version 5 BLAST databases (dbV5)?

We recently updated the version 5 BLAST protein databases, (dbV5), on our FTP site to be completely accession-based.  As we described in a previous post, this means they now contain the gi-less proteins from the  NCBI Pathogen Project and other high-throughput projects. The v5 databases are also compatible with proteins from PDB structures with multi-character chain identifiers and will include these as they become available in our other protein systems. Only the latest version of BLAST+ (2.9.0, download) will work with the updated v5 databases and allow you to access all of the most recent protein data. At the end of September 2019, we will stop updating the version 4 BLAST databases and offer the v5 databases as the default for download.

For more information on the new database version and BLAST+ (2.9.0), see the previous NCBI Insights article and the recording of our recent webinar.

May 15, 2019 Webinar: Using taxonomic information and other improvements in standalone BLAST+ (2.9.0) and the v5 databases

Next Wednesday, May 15, 2019 at 11AM, NCBI staff will show you how to use the latest version of standalone BLAST+ (2.9.0) and the new accession-based DBv5 databases with built-in taxonomy information. You will learn how to limit searches to taxonomic groups and to retrieve sequences from the database by taxonomy without having to download an identifier list. You will also learn about additional improvements in the BLAST databases and programs that make them compatible with the new PDB identifiers and gi-less proteins from the Pathogen Detection Project.

Date and time: Wed, May 15, 2018 11:00 AM – 11:30 AM EDT

Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Recent enhancements to BLAST+ (2.9.0): built-in taxonomy and access to proteins from the Pathogen Detection Project

We have made some recent improvements to the BLAST+ applications that take full advantage of the version 5 BLAST databases (BLASTDBv5), which include built in taxonomic information for sequences and no longer rely on the integer sequence identifiers (gi numbers).

With the latest version of BLAST, you can now:

Continue reading