A study (PMID: 28158543) published in the July 2017 issue of Bioinformatics collects, classifies and analyzes single nucleotide variants (SNVs) that may affect response to currently approved drugs. They identified 2,640 SNVs of interest, most of which occur rarely in populations (minor allele frequency <0.01).
The researchers used protein sequence alignment tools and mined open data from multiple information resources accessed through E-utilities including PubChem Compound (Kim et al., 2016 PMID: 26400175), NCBI Gene (Maglott D, et al., 2014. PMID: 25355515), NCBI Protein (Sayers, 2013), MMDB (Madej et al., 2012 PMID: 22135289), PDB (Berman et al., 2000 PMID: 10592235), dbSNP (Sherry et al., 2001 PMID: 11125122), and ClinVar (Landrum et al., 2016 PMID: 26582918).
Questions, comments, and other feedback may be sent to Yanli Wang.
For the past decade, dbGaP, the database of Genotypes and Phenotypes, has been the worldwide resource for genome-wide data. To celebrate this milestone, NCBI threw a party!
The inaugural article in NLM In Focus’s new series on NLM scientists features Kim Pruitt, PhD. Dr. Pruitt is a staff scientist at NCBI; she heads the Reference Sequence Database, better known as RefSeq.
In the article, Dr. Pruitt shares her career trajectory as well as pearls of wisdom for young scientists.
Click on the picture to read NLM’s profile on Kim Pruitt, PhD.
On August 23, Drs. Stephen Bryant and Evan Bolton received the American Chemical Society (ACS) 2016 Herman Skolnik Award for their work in developing, maintaining, and expanding the National Center for Biotechnology Information’s PubChem database of chemical substances and their biological activities. The award was presented at the ACS 252nd National Meeting & Exposition in Philadelphia.
Figure 1. Drs. Bryant and Bolton receive the American Chemical Society 2016 Herman Skolnik Award.
This post is geared toward fungi researchers as well as RefSeq and BLAST users.
Fungi have unique characteristics that can make it difficult to identify and classify species based on morphology. To address these issues, Conrad Schoch, NCBI’s fungi taxonomist, and Barbara Robbertse, NCBI’s fungi RefSeq curator, in collaboration with outside mycology experts, are curating a set of fungal sequences from internal transcribed spacer (ITS) regions of the nuclear ribosomal RNA genes. This set of standard DNA sequences for fungal taxa not only addresses these difficulties in identifying and classifying fungal species by morphology, but is also essential for analyzing environmental (metagenomics) sequencing studies. The curated ITS sequences, described in a recent article in Database (PMC Free Article), all have associated specimen data and, when possible, are taken from sequences from type materials, ensuring correct species identification and tracking of name changes. This article will show you how to access these ITS sequences and search them using the specialized Targeted Loci BLAST service.
The fungal ITS sequences are a RefSeq Targeted Loci BioProject (PRJNA177353). As you may know, a BioProject is a collection of biological data related to a single initiative; in this case, the goal is to collect and curate fungal sequences from targeted loci – specific molecular markers such as protein coding or ribosomal RNA genes used for phylogenetic analysis.
A series of press releases, including one by Science Publishing, recently announced the first findings of the Avian Phylogenomics Consortium, who analyzed genome sequences and annotation data for 48 bird genomes representing all of the bird taxonomic orders. All of the sequenced genomes, along with any annotation provided by the submitter, are available in NCBI resources including Assembly, Nucleotide, Protein, the Sequence Read Archive (SRA), and BLAST, or from species-specific GenBank genomes FTP directories. RNA-Seq data for some of the bird species can be found in SRA.
With the exception of three very fragmented assemblies, NCBI annotated the genome assemblies submitted by the Avian Phylogenomics Consortium using NCBI’s Eukaryotic Genome Annotation Pipeline, and these annotations are now part of the RefSeq project. The RefSeq project also generated annotations for an additional 6 bird assemblies, for a total of 51 RefSeq genomes. A summary of all the bird genomes that have RefSeq annotation is here.
Figure 1. A selection of the bird genomes with RefSeq annotation. At the top right is a legend describing resource links for each bird genome. Detailed annotation reports, accessible through the “AR” link in the far right column, are available for those genomes annotated in 2014. RefSeq annotation is on organism-specific BLAST pages (the “B” link) and on FTP (the “F” link). Click on the picture to go to the summary table.
The Tasmanian devil (Sarcophilus harrisii), the last remaining large marsupial carnivore, now faces extinction because of a strange and deadly infection, a transmissible cancer known as Transmissible Devil Facial Tumor Disease (TDFTD). In a previous NCBI Insights post, we discussed gene expression data from the tumors that established their neural origin and showed the tumors were likely derived from Schwann cells. In this post, we’ll consider some of the genome sequencing projects in the NCBI databases and explore evidence that the tumor originated in a different individual than the affected animal supporting the idea that the tumor cells themselves are infectious agents. Continue reading
On a typical day, researchers download about 30 terabytes of data from NCBI in an effort to make discoveries. NCBI began providing online access to data in the early 1990s, starting with the GenBank database of DNA sequences. Over the years we’ve greatly expanded the types and quantity of data available. You can now find on our site descriptions and data from experimental studies such as next-generation sequencing projects, bioactivity assays for small molecules, microarray datasets and genome-wide association studies.
The White House recently recognized these efforts by awarding NCBI Director David J. Lipman with the “Open Science” Champion of Change Award . The scientific community has recognized the benefits of open data. Access to this information serves as a source of both original and supplemental data for exploration and validation [2-4], which improves the power of experimental data  while increasing the speed and decreasing the cost of discovery .
In this post, we summarize three recent cases where researchers used data from an NCBI resource/database to make significant discoveries.
The Tasmanian devil (Sarcophilus harrisii), the last remaining large marsupial carnivore, now faces extinction because of a strange and deadly infection: a transmissible cancer known as Devil Facial Tumor Disease. These tumor infections are apparently passed to other devils through bites during mating or during squabbles over carrion when devils gather to feed. In this unusual situation, the cancer cells themselves are the infectious agent.
The failure of devil immune systems to recognize and destroy the foreign tumor cells may be related to a decline in genetic diversity and may serve as a warning about the vulnerability of species with reduced gene pools. The advent of next-generation sequencing has provided an unprecedented opportunity to track the spread and identify the origin of this unusual zoonosis, as well as to examine the population structure of an endangered mammal and generate a complete genome sequence for this unique marsupial.