The 2018 Nucleic Acids Research database issue features several papers from NCBI staff that cover the status and future of databases including CCDS, ClinVar, GenBank and RefSeq. These papers are also available on PubMed. To read an article, click on the PMID number listed below.
GenBank release 223.0 (12/15/2017) has 206,293,625 traditional records (including non-bulk-oriented TSA) containing 249,722,163,594 base pairs of sequence data. In addition, there are 551,063,065 WGS records containing 2,466,098,053,327 base pairs of sequence data, 201,559,502 TSA records containing 181,394,660,188 base pairs of sequence data, and 12,695,198 TLS records containing 4,458,042,616 base pairs of sequence data.
In the new version (2.7.1) of the BLAST+ executables, blastdbcmd can look up taxonomic names (e.g., scientific or common name) faster. We have also made some low-level improvement that allow BLAST to multithread more efficiently, especially when available memory is not sufficient for the database.
Note: Some LINUX and MacOSX users may find that they need to increase the number of open file descriptors allowed for a process. The number of allowed open file descriptors can be easily changed with “ulimit -n” (under bash). We suggest setting the limit to at least 1024.
See the BLAST+ release notes for more information.
On Wednesday, November 1, 2017, we will present a webinar on GDV, NCBI’s full-featured genome browser. In this webinar, you’ll learn how to explore and analyze sequences and annotations for eukaryotic RefSeq genome assemblies. We’ll show you how to:
- Search across the entire assembly for genes, products and other markers or jump to a specific position or range
- Display any of seven preselected track sets highlighting various aspects of the assembly or create and load your own custom track sets from your NCBI account.
- Load and display submitted alignment data from NCBI’s GEO or SRA.
- Upload your own annotation and variant data
- Display BLAST or Primer-BLAST results on the assembly in the browser.
Date and time: Wednesday, November 1, 2017 12:00-12:30PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. After the live presentation, the webinar will be uploaded to the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
The newest version of Magic-BLAST (v. 1.3.0) offers improved sensitivity and faster run-times as well as a number of other new features and improvements. These include the ability to set the alignment cut-off score as a function of read length, a maximum edit distance option and optional local cacheing for SRA files. For more information on these and other improvements, see the release notes. You can download the new executables from the NCBI FTP site.
Magic-BLAST is a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. Read more here.
GenBank release 221.0 (8/13/2017) has 203,180,606 traditional records containing 240,343,378,258 base pairs of sequence data. In addition, there are 499,965,722 WGS records containing 2,242,294,609,510 base pairs of sequence data, 186,777,106 TSA records containing 167,045,663,417 base pairs of sequence data, and 1,628,475 TLS records containing 824,191,338 base pairs of sequence data.
NCBI will discontinue both the NCBI Genomes (chromosome) and the Human ALU repeat elements (alu_repeats) BLAST databases in October 2017.
Better alternatives to NCBI Genomes (chromosome)
The existing NCBI Genomes (chromosome) database does not offer complete and non-redundant coverage of genome data. The newly added NCBI RefSeq Genomes Database (refseq_genomes) and the RefSeq Representative Genomes Database (refseq_representative_genomes) are more useful alternatives to the chromosome database. You can select these databases from the database pull-down list on any general BLAST form that searches a nucleotide database (blastn, tblastn).
NCBI is pleased to announce the initial data release of RefSeq Functional Elements, a resource that provides RefSeq and Gene records for experimentally validated human and mouse non-genic functional elements. Data can be accessed via Gene, Nucleotide, BLAST, BioProject, Graphical Displays and FTP.