NCBI Replacing Obsolete NCBI Genomes (chromosome) and Removing Human ALU repeat elements (alu_repeats) BLAST databases

NCBI will discontinue both the NCBI Genomes (chromosome) and the Human ALU repeat elements (alu_repeats) BLAST databases in October 2017.

Better alternatives to NCBI Genomes (chromosome)

The existing NCBI Genomes (chromosome) database does not offer complete and non-redundant coverage of genome data. The newly added NCBI RefSeq Genomes Database (refseq_genomes) and the RefSeq Representative Genomes Database (refseq_representative_genomes) are more useful alternatives to the chromosome database. You can select these databases from the database pull-down list on any general BLAST form that searches a nucleotide database (blastn, tblastn).

nucleotide-nucleotide BLAST database menu
Figure 1. The nucleotide-nucleotide BLAST database menu with the recommended (RefSeq Genome and Representative genomes) and deprecated (NCBI genomes (chromosomes) and Human ALU repeats) databases highlighted.

RefSeq Genomes

The RefSeq Genomes database is a comprehensive collection of NCBI Reference Sequence Genomes across all taxonomic groups that has more complete coverage and less redundant data than chromosome. The RefSeq Genomes database is the best choice for general BLAST searches where more comprehensive coverage is desired.

RefSeq Representative Genomes

The RefSeq Representative Genomes database contains the best quality genomes (Reference and Representative Genomes) available at NCBI and provides broad taxonomic coverage with minimum redundancy. The RefSeq Representative Genomes database is the best choice for Primer-BLAST application where limiting redundancy is important for designing target-specific primers.

You may start a BLAST search against the RefSeq Genomes Database by going to the nucleotide BLAST form or design primers with the RefSeq Representative Genomes database on the Primer-BLAST page. On either page, you may also select the database from the database menu. To see more information on the databases, select the database help (“?”) next to the menu.

Obsolete Human ALU repeat elements (alu_repeats)

In addition to removing the chromosome database, we will also remove the Human ALU repeat elements (alu_repeats). The alu_repeats database contains old and limited data on repeat sequences. Better alternatives for filtering interspersed repeats are the species-specific repeat libraries available in the Algorithm parameters section of nucleotide database search forms.

The Filters and Masking section of the nucleotide-nucleotide BLAST Algorithm parameters
Figure 2. The Filters and Masking section of the nucleotide-nucleotide BLAST Algorithm parameters. The species-specific repeats filter for human is enabled. The repeat filter will mask ALUs and other interspersed repetitive elements in the query sequence and eliminates the need to use the ALU repeats database.

If you have any questions or comments about these changes, please write to: blast-help@ncbi.nlm.nih.gov

One thought on “NCBI Replacing Obsolete NCBI Genomes (chromosome) and Removing Human ALU repeat elements (alu_repeats) BLAST databases

Leave a Reply