Human annotation release 109 for GRCh38.p12 is available in RefSeq


You can now download human annotation release 109 on FTP or explore it in the Genome Data Viewer, in the Gene database, and with BLAST.

Highlights in release 109:

  • A total of 20,203 protein-coding genes and 17,871 non-coding genes were annotated.
  • The number of annotated curated transcripts increased by 17% and genes with two or more curated alternative variants increased by 8%.
  • The annotation includes 6,862 features and 2,075 GeneIDs for non-genic functional elements, such as regulatory regions and known structural elements. For example, see the opsin locus control region (OPSIN-LCR).

Continue reading

Bioinformatics paper uses NCBI open data to analyze drug response


study (PMID: 28158543) published in the July 2017 issue of Bioinformatics collects, classifies and analyzes single nucleotide variants (SNVs) that may affect response to currently approved drugs. They identified 2,640 SNVs of interest, most of which occur rarely in populations (minor allele frequency <0.01).

The researchers used protein sequence alignment tools and mined open data from multiple information resources accessed through E-utilities including PubChem Compound (Kim et al., 2016 PMID: 26400175), NCBI Gene (Maglott D, et al., 2014. PMID: 25355515), NCBI Protein (Sayers, 2013), MMDB (Madej et al., 2012 PMID: 22135289), PDB (Berman et al., 2000 PMID: 10592235), dbSNP (Sherry et al., 2001 PMID: 11125122), and ClinVar (Landrum et al., 2016 PMID: 26582918).

Questions, comments, and other feedback may be sent to Yanli Wang.

Expression teasers and indexing added to Gene


Last February, we added gene expression data to Gene. Now, you can access these data in a few new ways.

gene record expression teaser

Figure 1. The expression teaser text from the human CYP2C19 gene record. CYP2C19 is a phase-one drug-metabolism gene expressed in liver and other organs/tissues involved in metabolizing drugs and other xenobiotics.

Expression pattern “teasers” in Summary

We’ve added a brief sentence describing the expression pattern to the Summary section. This teaser sentence describes tissue-specific expression of the gene, with a link to the complete description that appears in the Expression section.

Continue reading

5 NCBI articles in 2018 Nucleic Acids Research database issue


The 2018 Nucleic Acids Research database issue features several papers from NCBI staff that cover the status and future of databases including CCDS, ClinVar, GenBank and RefSeq. These papers are also available on PubMed. To read an article, click on the PMID number listed below.

Continue reading

Updated HIV-1 interaction datasets in Gene


We recently updated the HIV-1 interaction datasets in Gene with data provided by the Southern Research Institute (SRI).

The protein interactions dataset now has:

  • 8,005 interactions,
  • 16,215 interaction descriptions,
  • 3,859 proteins encoded by 3,757 human genes,
  • and 6,822 publications.

The replication interactions dataset now has:

  • 1,595 interactions,
  • 1,854 interaction descriptions,
  • 1,583 proteins encoded by 1,583 human genes,
  • and 229 publications.

Data are also available at the RefSeq HIV-1 website and the GeneRIF FTP site.

September 2017: NCBI to present EDirect workshop at NLM


On September 18, 2017, NCBI staff will offer a workshop on EDirect, NCBI’s suite of programs for easy command line access to literature and biomolecular records. To join the workshop, please register.

NOTE: This is an in-person workshop at the National Library of Medicine on the NIH campus in Bethesda, MD, USA. The course is limited to 22 participants.

Continue reading

RefSeq Functional Elements now public


NCBI is pleased to announce the initial data release of RefSeq Functional Elements, a resource that provides RefSeq and Gene records for experimentally validated human and mouse non-genic functional elements. Data can be accessed via GeneNucleotideBLASTBioProjectGraphical Displays and FTP.

Continue reading