NCBI will discontinue both the NCBI Genomes (chromosome) and the Human ALU repeat elements (alu_repeats) BLAST databases in October 2017.
Better alternatives to NCBI Genomes (chromosome)
The existing NCBI Genomes (chromosome) database does not offer complete and non-redundant coverage of genome data. The newly added NCBI RefSeq Genomes Database (refseq_genomes) and the RefSeq Representative Genomes Database (refseq_representative_genomes) are more useful alternatives to the chromosome database. You can select these databases from the database pull-down list on any general BLAST form that searches a nucleotide database (blastn, tblastn).
Figure 1. The nucleotide-nucleotide BLAST database menu with the recommended (RefSeq Genome and Representative genomes) and deprecated (NCBI genomes (chromosomes) and Human ALU repeats) databases highlighted.
In June, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms, including Danio rerio (zebrafish):
NCBI is retiring the e-PCR tool effective immediately. The good news is that an existing tool, Primer-BLAST, fills in nicely for the functions of both Forward and Reverse e-PCR, and has the additional benefit of de novo primer design.
NCBI is pleased to announce the initial data release of RefSeq Functional Elements, a resource that provides RefSeq and Gene records for experimentally validated human and mouse non-genic functional elements. Data can be accessed via Gene, Nucleotide, BLAST, BioProject, Graphical Displays and FTP.
The new pig (Sus scrofa) genome annotation produced by the NCBI eukaryotic genome annotation pipeline is now available in RefSeq. This data is now available for download and can be explored in the Genome Data Viewer, with BLAST, and in the Gene database.
Annotation Release 101 for the bottlenose dolphin (Tursiops truncatus) is out in RefSeq! This annotation was based on the NIST Tur_tru v1 assembly, which has a four-fold increase in contiguity from the assembly used in the previous annotation. Over four billion RNA-Seq reads from skin and blood tissue were used for gene prediction. As a result of these improvements, the percent of partially-represented protein-coding genes went down from 24% to 4%. Over 2500 genes that were fragmented in the previous assembly were merged into complete genes. A total of 24,026 genes were annotated, and 17,096 of them were protein-coding. A full report on the annotation can be found here.
Magic-BLAST is a new tool for mapping large sets of next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. Magic-BLAST executables for LINUX, MacOSX, and Windows as well as the source files are available on the FTP site.
Each alignment optimizes a composite score, taking into account simultaneously the two reads of a pair, and in case of RNA-Seq, locating the candidate introns and adding up the score of all exons. Sequencing reads can be provided as NCBI SRA accessions, FASTA or SRA files.
Magic-BLAST implements ideas developed in the NCBI Magic pipeline using the NCBI BLAST libraries. Magic-BLAST is under active development, and we expect the next few releases to occur on a monthly basis. Read more about Magic BLAST on the FTP site.
This blog post is intended for all BLAST users.
ORFfinder is a graphical analysis tool for finding open reading frames (ORFs). We’ve been working on a few updates, and we’d like to find out what you think about them. Read on to find out what you can do with the new ORFfinder.
This article is intended for GenBank data submitters with a basic knowledge of BLAST who submit sequence data from protein-coding genes.
One of the most common problems when submitting DNA or RNA sequence data from protein-coding genes to GenBank is failing to add information about the coding region (often abbreviated as CDS) or incorrectly defining the CDS. Incomplete or incorrect CDS information will prevent you from having accession numbers assigned to your submission data set, but there is a procedure that will help you troubleshoot any problems with the CDS feature annotation: doing a BLAST analysis with your sequences before you submit your data.
Here’s how to use nucleotide BLAST (blastn) and the formatting options menu to analyze, interpret and troubleshoot your submissions:
1. To start the BLAST analysis, go to the BLAST homepage and select “nucleotide blast”.
Figure 1. Select “nucleotide blast”.
BLAST (Basic Local Alignment Search Tool) is a popular tool for finding sequences in a given database that are similar to a query sequence. Traditionally, BLAST displays these results as a sorted list of matches between the query and each database sequence. While this display is useful for examining how each subject sequence matches the query, it treats all subject sequences the same, regardless of the quality of the sequence data or its annotation, and also does not allow easy comparisons between different subject sequences.
For example, the subject sequences may fall into multiple groups of similar sequences, or all of the subject sequences may be more similar to each other than to the query. A common way to obtain this information is to construct a multiple sequence alignment of the query and some or all of the subject sequences, but to this point, BLAST has not provided such alignments directly.
Enter SmartBLAST! SmartBLAST is a new and experimental NCBI tool that makes it easier to complete common sequence analysis tasks, such as finding a candidate protein name for a sequence, locating regions of high sequence conservation, or identifying regions covered by database sequences but missing from the query.