We have made some recent improvements to the BLAST+ applications that take full advantage of the version 5 BLAST databases (BLASTDBv5), which include built in taxonomic information for sequences and no longer rely on the integer sequence identifiers (gi numbers).
With the latest version of BLAST, you can now:
- Limit your searches by taxonomy using information built into the BLAST databases
- Limit searches more efficiently when using a list of sequence accessions
- Retrieve sequences by taxonomy from the BLAST database with blastdbcmd
- Search PDB proteins with identifiers up to four-characters long. You can read more about about PDB changes on our Structure database documentation.
Only BLASTDBv5 supports these new features. These new BLAST databases also contain accession-based (gi-less) proteins from important high-throughput genome sequencing projects that are not available in the previous version of BLAST databases. These include proteins from annotation of assemblies from large-scale pathogen surveillance efforts that are part of the NCBI Pathogen Project as well as those coming from large-scale metagenomics surveillance. With the v5 databases, you can perform BLAST searches of all proteins from these assemblies to find the proteins of interest.
For more information on new database version, BLASTDBv5 (download), see the previous NCBI Insights article and the recording of our webinar. We will continue to update the BLAST databases in their current version (BLASTDBv4) until September 2019.