Sequence Viewer 3.25 now available


Sequence Viewer 3.25 has several new features, improvements and bug fixes, including a new user interface and programmatic API to attach and show HTTP based BAM files, as well as improved usability of zoom functions and tooltips for RNA features. For a full list of changes, see the Sequence Viewer release notes.

Sequence Viewer is a graphical view of sequences and color-coded annotations on regions of sequences stored in the Nucleotide and Protein databases.

RefSeq release 87 available


RefSeq release 87 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript and protein data available as of March 5, 2018 and contains 155,118,991 records, including 106,245,682 proteins, 21,923,574 RNAs, and sequences from 77,225 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings.

Starting in July 2018, SNP variation features will no longer be in RefSeq genome assembly records – chromosome and contig records with NC_, NT_, NW_ and AC_ accession prefixes.  The RefSeq release notes have more information about this change.

See your data in context with NCBI’s updated Genome Data Viewer


We know it’s important to you to be able to browse and visually inspect variants and alignments from your next-gen sequencing experiments, so we’ve added remote streaming of BAM files to the Genome Data Viewer (GDV). All you need are your BAM files and the index files (.bai extension) in a location that allows HTTP access and you can stream BAM files as custom tracks into the GDV.

Continue reading

GenBank exceeds 3 Terabases in release 224


GenBank release 224.0 (2/13/2018) has 207,040,555 traditional records (including non-bulk-oriented TSA) containing 253,630,708,098 base pairs of sequence data.

In addition, there are 564,286,852 WGS records containing 2,608,532,210,351 base pairs of sequence data, 214,324,264 TSA records containing 193,940,551,226 base pairs of sequence data, and 12,819,978 TLS records containing 4,531,966,831 base pairs of sequence data.

Continue reading

NIH Data Science Collaborative Hackathon April 16 – 18, 2018


The NCBI will assist with a data science hackathon to take place on the NIH Campus in Bethesda, Maryland, from April 16-18, 2018.

The hackathon will focus on tools for advanced analysis of biomedical datasets including text, images, next generation sequencing data, proteomics, and metadata. Many individuals who attend these events have already engaged in the use of large datasets or in the development of informatics tools, code, or pipelines; however, researchers who are in the earlier stages of their data science journey, including students and postdocs are also encouraged to apply. Some projects are available to other non-scientific developers, mathematicians, or librarians.

The event is open to anyone selected for the hackathon and willing to travel to Bethesda, Maryland.

Continue reading

March 21 webinar – Introducing the NCBI Pathogen Detection Isolates Browser


In this next NCBI webinar, you will learn how to use the Pathogen Detection Isolate Browser to search for pathogen isolates, identify closely related isolates of interest, and find pathogens encoding particular antimicrobial resistance genes.

Date and time: Wed, Mar 21, 2018 12:00 PM – 12:30 PM EDT

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

The Pathogen Detection Isolate Browser is a web-based portal that integrates the genomic sequences, metadata, antibiotic susceptibility and resistance gene information, and SNP cluster information.

Each year in the U.S. approximately 48 million Americans (approximately 1 in 6) are affected by foodborne illnesses, 128,000 are hospitalized and 3,000 die, as estimated by the CDC. The NCBI Pathogen Detection Project was created in collaboration with FDA, CDC, USDA and others to use whole genome sequencing data for foodborne disease surveillance. Pathogens isolated from patients, food and environmental samples, from state, federal, and other labs, are sequenced and the data submitted in real time to NCBI. The Pathogen Detection analysis pipeline assembles the sequences and compares them to other isolates in its database to identify closely related sequences, thereby facilitating identification of cases involved in an outbreak and potential sources of contamination.

Bioinformatics paper uses NCBI open data to analyze drug response


study (PMID: 28158543) published in the July 2017 issue of Bioinformatics collects, classifies and analyzes single nucleotide variants (SNVs) that may affect response to currently approved drugs. They identified 2,640 SNVs of interest, most of which occur rarely in populations (minor allele frequency <0.01).

The researchers used protein sequence alignment tools and mined open data from multiple information resources accessed through E-utilities including PubChem Compound (Kim et al., 2016 PMID: 26400175), NCBI Gene (Maglott D, et al., 2014. PMID: 25355515), NCBI Protein (Sayers, 2013), MMDB (Madej et al., 2012 PMID: 22135289), PDB (Berman et al., 2000 PMID: 10592235), dbSNP (Sherry et al., 2001 PMID: 11125122), and ClinVar (Landrum et al., 2016 PMID: 26582918).

Questions, comments, and other feedback may be sent to Yanli Wang.

Genome Workbench 2.12.8 now available


The Genome Workbench team is proud to present version 2.12.8, with the latest usability improvements and bug fixes.  See the full list of changes in the Genome Workbench release notes.

Some of the improvements include:

  • Improved FASTA format view (context menu) and the addition of a “Expand All” option
  • Improved rendering of internal unaligned regions
  • Automatically open the target folder to export files quickly
  • Installation of automatic PROXY detection
  • Fixed bug in OS version

Genome Workbench is an integrated application for viewing and analyzing sequences. The Genome Workbench can be used to browse data in GenBank and combine data with your own private data.

Expression teasers and indexing added to Gene


Last February, we added gene expression data to Gene. Now, you can access these data in a few new ways.

gene record expression teaser

Figure 1. The expression teaser text from the human CYP2C19 gene record. CYP2C19 is a phase-one drug-metabolism gene expressed in liver and other organs/tissues involved in metabolizing drugs and other xenobiotics.

Expression pattern “teasers” in Summary

We’ve added a brief sentence describing the expression pattern to the Summary section. This teaser sentence describes tissue-specific expression of the gene, with a link to the complete description that appears in the Expression section.

Continue reading