GenBank exceeds 3 Terabases in release 224

GenBank exceeds 3 Terabases in release 224

GenBank release 224.0 (2/13/2018) has 207,040,555 traditional records (including non-bulk-oriented TSA) containing 253,630,708,098 base pairs of sequence data.

In addition, there are 564,286,852 WGS records containing 2,608,532,210,351 base pairs of sequence data, 214,324,264 TSA records containing 193,940,551,226 base pairs of sequence data, and 12,819,978 TLS records containing 4,531,966,831 base pairs of sequence data.

Continue reading “GenBank exceeds 3 Terabases in release 224”

NIH Data Science Collaborative Hackathon April 16 – 18, 2018

NIH Data Science Collaborative Hackathon April 16 – 18, 2018

The NCBI will assist with a data science hackathon to take place on the NIH Campus in Bethesda, Maryland, from April 16-18, 2018.

The hackathon will focus on tools for advanced analysis of biomedical datasets including text, images, next generation sequencing data, proteomics, and metadata. Many individuals who attend these events have already engaged in the use of large datasets or in the development of informatics tools, code, or pipelines; however, researchers who are in the earlier stages of their data science journey, including students and postdocs are also encouraged to apply. Some projects are available to other non-scientific developers, mathematicians, or librarians.

The event is open to anyone selected for the hackathon and willing to travel to Bethesda, Maryland.

Continue reading “NIH Data Science Collaborative Hackathon April 16 – 18, 2018”

March 21 webinar – Introducing the NCBI Pathogen Detection Isolates Browser

March 21 webinar – Introducing the NCBI Pathogen Detection Isolates Browser

In this next NCBI webinar, you will learn how to use the Pathogen Detection Isolate Browser to search for pathogen isolates, identify closely related isolates of interest, and find pathogens encoding particular antimicrobial resistance genes.

Date and time: Wed, Mar 21, 2018 12:00 PM – 12:30 PM EDT

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

The Pathogen Detection Isolate Browser is a web-based portal that integrates the genomic sequences, metadata, antibiotic susceptibility and resistance gene information, and SNP cluster information.

Each year in the U.S. approximately 48 million Americans (approximately 1 in 6) are affected by foodborne illnesses, 128,000 are hospitalized and 3,000 die, as estimated by the CDC. The NCBI Pathogen Detection Project was created in collaboration with FDA, CDC, USDA and others to use whole genome sequencing data for foodborne disease surveillance. Pathogens isolated from patients, food and environmental samples, from state, federal, and other labs, are sequenced and the data submitted in real time to NCBI. The Pathogen Detection analysis pipeline assembles the sequences and compares them to other isolates in its database to identify closely related sequences, thereby facilitating identification of cases involved in an outbreak and potential sources of contamination.

Bioinformatics paper uses NCBI open data to analyze drug response

study (PMID: 28158543) published in the July 2017 issue of Bioinformatics collects, classifies and analyzes single nucleotide variants (SNVs) that may affect response to currently approved drugs. They identified 2,640 SNVs of interest, most of which occur rarely in populations (minor allele frequency <0.01).

The researchers used protein sequence alignment tools and mined open data from multiple information resources accessed through E-utilities including PubChem Compound (Kim et al., 2016 PMID: 26400175), NCBI Gene (Maglott D, et al., 2014. PMID: 25355515), NCBI Protein (Sayers, 2013), MMDB (Madej et al., 2012 PMID: 22135289), PDB (Berman et al., 2000 PMID: 10592235), dbSNP (Sherry et al., 2001 PMID: 11125122), and ClinVar (Landrum et al., 2016 PMID: 26582918).

Questions, comments, and other feedback may be sent to Yanli Wang.

Genome Workbench 2.12.8 now available

Genome Workbench 2.12.8 now available

The Genome Workbench team is proud to present version 2.12.8, with the latest usability improvements and bug fixes.  See the full list of changes in the Genome Workbench release notes.

Some of the improvements include:

  • Improved FASTA format view (context menu) and the addition of a “Expand All” option
  • Improved rendering of internal unaligned regions
  • Automatically open the target folder to export files quickly
  • Installation of automatic PROXY detection
  • Fixed bug in OS version

Genome Workbench is an integrated application for viewing and analyzing sequences. The Genome Workbench can be used to browse data in GenBank and combine data with your own private data.

Expression teasers and indexing added to Gene

Last February, we added gene expression data to Gene. Now, you can access these data in a few new ways.

gene record expression teaser
Figure 1. The expression teaser text from the human CYP2C19 gene record. CYP2C19 is a phase-one drug-metabolism gene expressed in liver and other organs/tissues involved in metabolizing drugs and other xenobiotics.

Expression pattern “teasers” in Summary

We’ve added a brief sentence describing the expression pattern to the Summary section. This teaser sentence describes tissue-specific expression of the gene, with a link to the complete description that appears in the Expression section.

Continue reading “Expression teasers and indexing added to Gene”

NCBI-UCSC Genomics Hackathon April 2-4, 2018

NCBI-UCSC Genomics Hackathon April 2-4, 2018

From April 2 -4, 2018, the NCBI will help with a bioinformatics hackathon in Northern California hosted by the University of California, Santa Cruz (UCSC)!  The hackathon will focus on advanced bioinformatics analysis of next generation sequencing data, proteomics, and metadata.

This event is for researchers, including students and postdocs, who have already engaged in the use of bioinformatics data or in the development of pipelines for bioinformatics analyses from high-throughput experiments. Some projects are available to other non-scientific developers, mathematicians, or librarians.

The event is open to anyone selected for the hackathon and willing to travel to UCSC.

Working groups of five to six individuals will be formed into five to eight teams.  These teams will build pipelines and tools to analyze large datasets within a cloud infrastructure.  Potential subjects for this iteration include:

  • Developing a framework for nesting containerized bioinformatics workflows in cloud infrastructure.
  • Extending the GA4GH API to map fastq files
  • Machine learning pipelines for germline rare variants linked to phenotypes
  • A simple, open-source mapper for nanopore data
  • An automated pipeline for named entity recognition from biomedical literature

Please see the application form for more details and additional projects. Continue reading “NCBI-UCSC Genomics Hackathon April 2-4, 2018”

EDirect for PubMed starts March 5

EDirect for PubMed starts March 5

Beginning Monday, March 5, 2018, the National Library of Medicine (NLM) will present EDirect for PubMed, part of the Insider’s Guide to Accessing NLM Data.

This newly expanded series of interactive workshops will introduce new users to the basics of using EDirect to access exactly the PubMed data you need, in the format you need. Over the course of five 90-minute sessions (plus an optional “office hours”), students will learn how to use EDirect commands to access PubMed, design custom output formats, create basic data pipelines to get data quickly and efficiently, and develop simple strategies for solving real-world PubMed data-gathering challenges. EDirect requires access to a Unix environment but we will send easy installation instructions for Windows and Mac computers before the class starts. No prior Unix knowledge is required; novice users are welcome! Continue reading “EDirect for PubMed starts March 5”

March 7 NCBI Minute: Textbooks for free on the NCBI Bookshelf

March 7 NCBI Minute: Textbooks for free on the NCBI Bookshelf

The next NCBI Minute highlights some of the highly used classic textbooks available (for free!) on the NCBI Bookshelf, points out some new ones that have been recently added, introduces why several publishers and authors find this a valuable resource to boost their readership, and how to join in by adding new and updating existing textbooks on the NCBI Bookshelf.

Date and time: Wed, March 7, 2018 12:00 PM – 12:30 PM EST

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Since 1999, NCBI has worked with publishers and authors to provide an additional way for readers to access their products. Housed at the U.S. National Library of Medicine, users of the NCBI Bookshelf can freely access Books, Reports, and Documents. Classic textbooks are some of the most popular and heavily used entries, with hundreds of thousands of people using their favorite book every month!