Download the updated bacterial and archaeal reference genome collection! We built this collection of 19,328 genomes by selecting the “best” genome assembly for each species among the 350,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference).
What’s New?
413 species are represented in this collection for the first time
198 species are represented by a better assembly
27 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment
Removing contaminated sequences using NCBI quality assurance tools
Do you use BLAST to identify a sequence or the evolutionary scope of a gene? That can be challenging if contaminated and misclassified sequences are in the BLAST databases and show up in your search results. To address this problem, we now use the NCBI quality assurance tools listed below to systematically remove these misleading sequences from the default nucleotide (nt) and protein (nr) BLAST databases.Continue reading “Cleaner BLAST Databases for More Accurate Results”→
In April 2024, the FASTA (sequence text) files of the sequences in the Basic Alignment Search Tool (BLAST) databases will no longer be available on the FTP site. However, you can easily generate FASTA files yourself from the formatted BLAST databases by using the BLAST utility blastdbcmd that comes with the standalone BLAST programs. This provides you the flexibility to generate organism-specific FASTA files using NCBI’s taxonomy IDs for specific organisms or groups.
Download the updated bacterial and archaeal reference genome collection! This collection (18,941 genomes as of Jan 18, 2024) was built by selecting the “best” genome assembly for each species among the 330,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference). You can speed up your sequence searches by running them against these high-quality genomes instead of the entire nucleotide or protein database.
Are you a biology student working on a research project? NCBI offers free access to a wide variety of resources and tools to help you find and download data for your project.
How and why do you use our resources? Check out the example below:
Your professor has assigned you a research project looking at the sequence and structure of the TP53 gene in the domestic cat (Felis catus). In addition, you were asked to find information on this gene and its genomic region in other members of the cat family (Felidae).Continue reading “Using NCBI Data and Tools for Your Research Project”→
Using the NIH Comparative Genomics Resource (CGR) to gain knowledge about less-researched organisms
The scientific community relies heavily on model organism research to gain knowledge and make discoveries. However, focusing solely on these species misses valuable variation. Comparative genomics allows us to use knowledge from a model species, such as Saccharomyces cerevisiae, to understand traits in other, related organisms, such as Saccharomyces pastorianus or Saccharomyces eubayanus. Applying this information may provide valuable insight for other less-researched organisms. The National Institutes of Health (NIH) Comparative Genomics Resource (CGR) offers a cutting-edge NCBI toolkit of high-quality genomics data and tools to help you do just that.Continue reading “Comparing Yeast Species Used in Beer Brewing and Bread Making”→
Now available! You can download the ClusteredNR protein database, previously only available on the BLAST web application. As recently introduced, our ClusteredNR database allows you to get quicker BLAST results and access to information about the distribution of your hits across a wider range of organisms and evolutionary distances. The package includes the ClusteredNR BLAST database, an SQLite3 database, and several scripts for accessing cluster information and members.
Features & Benefits
Reduced redundancy
Faster searches
More diverse proteins and organisms in your BLAST results
An updated bacterial and archaeal reference genome collection is available! This collection of 18,343 genomes was built by selecting exactly one genome assembly for each species among the 312,000+ prokaryotic genomes in RefSeq, except for E. coli for which two assemblies were selected as reference.
The criteria for selecting the reference assembly for a given species include assembly contiguity and completeness and quality of the RefSeq annotation.
What’s new?
790 species were added to the collection
199 species are represented by a better assembly (compared to the April 2023 release)
70 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment
Using the NIH Comparative Genomics Resource (CGR) to understand susceptibility to SARS-CoV-2and other infections
Are you conducting research on animal-mediated transmission of human viral infections, such as COVID-19? The National Institutes of Health (NIH) Comparative Genomics Resource (CGR) offers a cutting-edge NCBI toolkit of high-quality genomics data and tools to help with comparative genomics analysis for eukaryotic genes, such as Angiotensin-converting enzyme 2 (ACE2) which is targeted by SARS-CoV-2.
NCBI resources have been beneficial in helping the scientific community understand viral infections associated with public health crises, such as COVID-19 and Influenza, and can be used for study of emerging viruses that may represent new threats. Continue reading “Which animals can catch and transmit human viral infections?”→