Tag: Basic Local Alignment Search Tool (BLAST)

Now Available! Updated Bacterial and Archaeal Reference Genomes Collection

Download the updated bacterial and archaeal reference genome collection! We built this collection of 19,328 genomes by selecting the “best” genome assembly for each species among the 350,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference).

What’s New?

413 species are represented in this collection for the first time
198 species are represented by a better assembly
27 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment

Continue reading “Now Available! Updated Bacterial and Archaeal Reference Genomes Collection” →

Cleaner BLAST Databases for More Accurate Results

Removing contaminated sequences using NCBI quality assurance tools

Do you use BLAST to identify a sequence or the evolutionary scope of a gene? That can be challenging if contaminated and misclassified sequences are in the BLAST databases and show up in your search results. To address this problem, we now use the NCBI quality assurance tools listed below to systematically remove these misleading sequences from the default nucleotide (nt) and protein (nr) BLAST databases. Continue reading “Cleaner BLAST Databases for More Accurate Results” →

BLAST FASTA Files Will No Longer Be Available on the FTP Site Effective April 2024

Easily generate BLAST FASTA files yourself!

In April 2024, the FASTA (sequence text) files of the sequences in the Basic Alignment Search Tool (BLAST) databases will no longer be available on the FTP site. However, you can easily generate FASTA files yourself from the formatted BLAST databases by using the BLAST utility blastdbcmd that comes with the standalone BLAST programs. This provides you the flexibility to generate organism-specific FASTA files using NCBI’s taxonomy IDs for specific organisms or groups.

See the examples below and the BLAST Command Line Applications User Manual for more details on the standalone BLAST programs and working with the BLAST databases. Continue reading “BLAST FASTA Files Will No Longer Be Available on the FTP Site Effective April 2024” →

Updated Bacterial and Archaeal Reference Genome Collection is Available!

Download the updated bacterial and archaeal reference genome collection! This collection (18,941 genomes as of Jan 18, 2024) was built by selecting the “best” genome assembly for each species among the 330,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference). You can speed up your sequence searches by running them against these high-quality genomes instead of the entire nucleotide or protein database.

The criteria for selecting the reference assembly for a given species include assembly contiguity and completeness and quality of the RefSeq annotation. Continue reading “Updated Bacterial and Archaeal Reference Genome Collection is Available!” →

Using NCBI Data and Tools for Your Research Project

Are you a biology student working on a research project? NCBI offers free access to a wide variety of resources and tools to help you find and download data for your project. 

How and why do you use our resources? Check out the example below:

Your professor has assigned you a research project looking at the sequence and structure of the TP53 gene in the domestic cat (Felis catus). In addition, you were asked to find information on this gene and its genomic region in other members of the cat family (Felidae). Continue reading “Using NCBI Data and Tools for Your Research Project” →

Faster and Focused Searches with BLAST+ 2.15.0

New version now available

Do you use NCBI’s standalone BLAST tool (BLAST+)? The latest version of BLAST+ is now available and includes two exciting new features! You can now run searches faster and focus your searches by organism more easily. Continue reading “Faster and Focused Searches with BLAST+ 2.15.0” →

Comparing Yeast Species Used in Beer Brewing and Bread Making

Using the NIH Comparative Genomics Resource (CGR) to gain knowledge about less-researched organisms

The scientific community relies heavily on model organism research to gain knowledge and make discoveries. However, focusing solely on these species misses valuable variation. Comparative genomics allows us to use knowledge from a model species, such as Saccharomyces cerevisiae, to understand traits in other, related organisms, such as Saccharomyces pastorianus or Saccharomyces eubayanus. Applying this information may provide valuable insight for other less-researched organisms. The National Institutes of Health (NIH) Comparative Genomics Resource (CGR) offers a cutting-edge NCBI toolkit of high-quality genomics data and tools to help you do just that. Continue reading “Comparing Yeast Species Used in Beer Brewing and Bread Making” →

BLAST ClusteredNR Database is Now Available for Download!

Now available! You can download the ClusteredNR protein database, previously only available on the BLAST web application. As recently introduced, our ClusteredNR database allows you to get quicker BLAST results and access to information about the distribution of your hits across a wider range of organisms and evolutionary distances. The package includes the ClusteredNR BLAST database, an SQLite3 database, and several scripts for accessing cluster information and members.

Features & Benefits

Reduced redundancy
Faster searches
More diverse proteins and organisms in your BLAST results

Continue reading “BLAST ClusteredNR Database is Now Available for Download!” →

Now Available! Updated Bacterial and Archaeal Reference Genomes Collection

An updated bacterial and archaeal reference genome collection is available! This collection of 18,343 genomes was built by selecting exactly one genome assembly for each species among the 312,000+ prokaryotic genomes in RefSeq, except for E. coli for which two assemblies were selected as reference.

The criteria for selecting the reference assembly for a given species include assembly contiguity and completeness and quality of the RefSeq annotation.

What’s new?

790 species were added to the collection
199 species are represented by a better assembly (compared to the April 2023 release)
70 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment

Continue reading “Now Available! Updated Bacterial and Archaeal Reference Genomes Collection” →

Which animals can catch and transmit human viral infections?

Using the NIH Comparative Genomics Resource (CGR) to understand susceptibility to SARS-CoV-2 and other infections

Are you conducting research on animal-mediated transmission of human viral infections, such as COVID-19? The National Institutes of Health (NIH) Comparative Genomics Resource (CGR) offers a cutting-edge NCBI toolkit of high-quality genomics data and tools to help with comparative genomics analysis for eukaryotic genes, such as Angiotensin-converting enzyme 2 (ACE2) which is targeted by SARS-CoV-2.

NCBI resources have been beneficial in helping the scientific community understand viral infections associated with public health crises, such as COVID-19 and Influenza, and can be used for study of emerging viruses that may represent new threats. Continue reading “Which animals can catch and transmit human viral infections?” →