Over 1 billion records in GenBank release 231


GenBank release 231.0 (4/19/2019) is now available on the NCBI FTP site. This release has 5.03 terabases and 1.54 billion records.

The release has 212,775,414 traditional records containing 321,680,566,570 base pairs of sequence data. There are also 993,732,214 WGS records containing 4,421,986,382,065 base pairs of sequence data, 311,247,136 bulk-oriented TSA records containing 277,118,019,688 base pairs of sequence data, and 24,240,761 bulk-oriented TLS records containing 9,623,321,565 base pairs of sequence data.

Continue reading

Searching for orthologous genes at NCBI


NCBI is testing a new way to find and retrieve orthologous vertebrate genes. To find orthologs enter a gene symbol (e.g. RAG1) or a gene symbol combined with a taxonomic group (e.g. primate RAG1). Select the matching entry from the suggestions menu or you can select the orthologs option (e.g. Rag1 orthologs) to see all orthologs. Your search will return a results link to the set of orthologs provided by NCBI’s Gene resource. Click on the results link to see information for that ortholog group (Figure 1).

search

Figure 1.  Search  for Rag1 orthologs showing the link to the set of RAG1 genes from vertebrates.

Continue reading

Proposed changes to AGP files for genome assemblies


If you are a consumer or producer of AGP (A Golden Path) files for genome assemblies, please read on.  We’d like your feedback on the proposed changes described here.

As you know, AGP files are used to describe the structure of certain genome assemblies. The AGP file format has not kept up with changes in sequencing technology or International Sequence Database Collaboration (INSDC) feature usage. NCBI is therefore proposing to extend the current AGP v2.0 specification to add new linkage evidence types and a gap type of “contamination” as detailed below and described in the AGP v2.1 proposed specification.

Continue reading

Recent enhancements to BLAST+ (2.9.0): built-in taxonomy and access to proteins from the Pathogen Detection Project


We have made some recent improvements to the BLAST+ applications that take full advantage of the version 5 BLAST databases (BLASTDBv5), which include built in taxonomic information for sequences and no longer rely on the integer sequence identifiers (gi numbers).

With the latest version of BLAST, you can now:

Continue reading

NCBI on YouTube: Request access to controlled data in dbGaP


Do you need access to controlled data in the database of Genotypes and Phenotypes (dbGaP)? This short video will show you how to request data today!

dbGaP archives and distributes the data and results from studies that have investigated the interaction of genotype and phenotype in humans. Responsible stewardship of controlled-access data subject to the NIH GDS Policy is shared among the NIH, the investigators approved to access the data, and the investigators’ institutions.

Conserved Domain Database (CDD) 3.17 is now available


The latest version of the Conserved Domain Database contains 3,272 new or updated NCBI-curated domains and now mirrors Pfam version 31 as well as models from NCBIfams, a collection of protein family hidden Markov models (HMMs) for improving bacterial genome annotation. A fine-grained classification of the major facilitator superfamily has also been added. You can find this updated content on the CDD FTP site.

Continue reading

Genome Workbench 2.13.0 now available


The Genome Workbench team is proud to present version 2.13.0, with the latest usability improvements and bug fixes.  See the full list of changes in the Genome Workbench release notes.

Some of the improvements include:

  • New SNP tracks using the most recent dbSNP release
  • Improved alignment statistics table to correctly account for introns
  • Alignment tooltips report introns separately from gaps
  • Fixes for several interface issues to make MAFFT and BLAST alignments easier to use.

Genome Workbench is an integrated application for viewing and analyzing sequences. Genome Workbench can be used to browse and import data from NCBI and combine it with your own private data.

Easily configure gene feature modes in NCBI’s graphical sequence displays


Did you know you can easily switch between gene feature modes in NCBI’s graphical sequence displays like Sequence Viewer and GDV? You may need to configure gene tracks to suit your needs if, for example, you need to conduct analyses or present quality images.

Use one of two easy access points to the gene configuration menu to show the gene bar, the single line gene model or the expanded modes that show transcripts and CDSs.

Continue reading

New BLAST results page in NCBI LABS


NCBI Labs is showcasing an experiment to improve the BLAST results page. The goal is to provide a more useful BLAST output that better meets your needs and integrates with your workflows. The new results incorporate feedback from surveys and interviews with BLAST users. We think you’ll find the new results are more compact, easier to navigate, and expose useful formatting and other features that you may not have known about.

The results page has organism, percent identity, and E value filters in plain view and easily accessible. The Descriptions and Graphic Summary are on separate tabs, and the popular taxonomy view is on a fourth tab rather than on a separate web page. These changes along with other enhancements make the display more concise and easier to navigate. The figure below shows the new output format.

Blast_resultsFigure 1. The New BLAST Results with filters directly on the page and a more concise tabbed output that includes the taxonomy report. The Back to Traditional Results Page link re-loads the results in the standard format.

Continue reading