About NCBI Staff

The National Center for Biotechnology Information (NCBI), a division of the U.S. National Library of Medicine, provides access to scientific and biomedical databases, software tools for analyzing molecular data, and performs research in computational biology.

New taxonomy files available with lineage, type, and host information


NCBI is now producing a new set of taxonomy files that include the taxonomic lineage of taxa, information on type strains and material, and host information. These files are particularly helpful for people maintaining local installations of NCBI data. You can download the new archive (new_taxdump.tar.gz) from the taxonomy directory on the FTP site (ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/). The new files are typematerial.dmp, typeoftype.dmp, rankedlineage.dmp, fullnamelineage.dmp,
taxidlineage.dmp, and host.dmp. Please see the readme file for details of the file contents. The original taxonomy file archive without the new content will remain available under its original name, taxdump.tar.gz. The section below shows the entries for the monkey species Cercopithecus lomamiensis from the new ranked lineage and  type material files.


1191211	|	Cercopithecus lomamiensis	|		|	Cercopithecus	|	Cercopithecidae	|	Primates	|	Mammalia	|	Chordata	|	Metazoa	|	Eukaryota	|

1191211	|	Cercopithecus lomamiensis	|	holotype	|	YPM 14080	|
1191211	|	Cercopithecus lomamiensis	|	holotype	|	YPM MAM 14080	|
1191211	|	Cercopithecus lomamiensis	|	paratype	|	YPM 14189	|
1191211	|	Cercopithecus lomamiensis	|	paratype	|	YPM 14191	|
1191211	|	Cercopithecus lomamiensis	|	paratype	|	YPM 14192	|
1191211	|	Cercopithecus lomamiensis	|	paratype	|	YPM MAM 14189	|
1191211	|	Cercopithecus lomamiensis	|	paratype	|	YPM MAM 14191	|
1191211	|	Cercopithecus lomamiensis	|	paratype	|	YPM MAM 14192	|

Top panel: Ranked lineage for Cercopithecus lomamiensis from rankedlineage.dmp. The ranks are species, genus, family, order, class, phylum, kingdom, and superkingdom. Bottom panel: Type material information from typematerial.dmp. The columns are taxonomy id, name, type designation, collection/repository details.

Tree

 

February 14th NCBI Minute: How to quickly retrieve a sequence from NCBI


On Wednesday, February 14, 2018, NCBI will present a webinar that will show you how to quickly retrieve sequences in any format from NCBI.

Date & time: Wed, Feb 14, 2018 12:00 PM – 12:30 PM EST

Ever need to quickly grab a protein or nucleotide sequence in FASTA or another format from NCBI? This NCBI Minute will show you how to accomplish this using the nucleotide and protein web pages, an NCBI URL, and – the most flexible way – through the commandline EDirect client that accesses the E-Utilities API.

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Webinar Concept in Flat Design.

North Carolina Research Triangle Hackathon March 12-14, 2018


The UNC Curriculum in Bioinformatics and Computational Biology and NCBI will host a data science hackathon from March 12-14, 2018 on the campus of the University of North Carolina at Chapel Hill. Projects addressed during the hackathon will involve general bioinformatics and genomic analyses in addition to text, image, and sequence processing.

This event is for researchers, including students and postdocs, who have already engaged in the use of large datasets or in the development of pipelines for analyses from high-throughput experiments. Some projects may involve other non-scientific developers, mathematicians, or librarians.

To be considered for the event, you must apply and be able to travel to the UNC campus in Chapel Hill (see details below).  Applications are due Monday, February 12th, 2017 by 11:59 pm ET.

Continue reading

NLM Webinar: Insider’s Guide to Accessing NLM Data: Welcome to E-utilities for PubMed (Tuesday, February 13 at 1pm EST)


Want to do more with PubMed?

Want to extract just the PubMed data you need, in the format you want?

Dreaming of creating your own PubMed tool or interface, but don’t know where to start?

Join us on Tuesday, February 13 at 1pm EST for a one-hour introductory webinar designed to teach you more powerful and flexible ways of accessing NLM data, starting with the Application Programming Interfaces (APIs) for PubMed and other NCBI databases. This presentation is part of the Insider’s Guide, a series aimed at librarians and other information specialists who have experience using PubMed via the traditional Web interface, but now want to dig deeper.

This class will start with the very basics of APIs, before showing you how to get started using the E-utilities API to search and retrieve records from PubMed. The class will also showcase some specific tools and utilities that information specialists can use to work with E-utilities, helping to prepare you for subsequent Insider’s Guide classes. We will finish by looking at some practical examples of E-utilities in the real world, and hopefully inspire you to get out and put these lessons to use!

Date and time: Tuesday, February 13, 1:00 pm – 2:00 pm EST

To register or for more information go to: https://goo.gl/jCWP7A

Questions? Contact us at https://dataguide.nlm.nih.gov/contact


Already know the basics of APIs, and ready to put E-utilities to use? Our hands-on Insider’s Guide course, “EDirect for PubMed”, starts March 5. For registration and more information, go to: https://goo.gl/9jSVRt

Feb 7 webinar “How to Run an NCBI-style Hackathon at Your Institution”


On Wednesday, February 7, 2018, NCBI will present a webinar that will show you how to plan and run an NCBI-style hackathon at your own institution.

Date & time: Wed, Feb 7, 2018 12:00 PM – 1:00 PM EST

Register here: http://bit.ly/2zYU7Kp

NCBI organizes 2-3-day hackathons at sites throughout the United States. In these events, participants work in small collaborative groups on workflows, scripts or applications to create bioinformatic solutions to problems in fields such as text mining, next-gen sequence analysis, medical informatics, and many others. Code from Hackathon projects is available on the public NCBI Hackathon GitHub site.

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

PubMed Commons to be Discontinued


PubMed Commons has been a valuable experiment in supporting discussion of published scientific literature. The service was first introduced as a pilot project in the fall of 2013 and was reviewed in 2015. Despite low levels of use at that time, NIH decided to extend the effort for another year or two in hopes that participation would increase. Unfortunately, usage has remained minimal, with comments submitted on only 6,000 of the 28 million articles indexed in PubMed.

While many worthwhile comments were made through the service during its 4 years of operation, NIH has decided that the low level of participation does not warrant continued investment in the project, particularly given the availability of other commenting venues.

The discontinuation plan is as follows:

  • New comments will be accepted through February 15, 2018.
  • Comments will continue to be visible on the PubMed and PubMed Commons websites through March 2, 2018.
  • Users wishing to access the comments after March 2, 2018, will be able to download them from NCBI’s website.

Many thanks to all of you who participated in this experimental effort to enhance the opportunities for interaction about published biomedical literature.

RefSeq release 86 is now public


RefSeq release 86 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available, as of January 8, 2018 and contains 149,493,466 records, including 102,133,844 proteins, 21,370,778 RNAs, and sequences from 75,218 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings.

Two important notes follow; please see the RefSeq release notes for more information.

Non-human SNP data dropped

Non-human SNPs were dropped from all RefSeq FTP files in the daily FTP files starting in December 2017, and in this full release (January 2018).

HPRD features removed

We have dropped a set of features, originally imported from HPRD, from human transcript and protein RefSeq records.

5 NCBI articles in 2018 Nucleic Acids Research database issue


The 2018 Nucleic Acids Research database issue features several papers from NCBI staff that cover the status and future of databases including CCDS, ClinVar, GenBank and RefSeq. These papers are also available on PubMed. To read an article, click on the PMID number listed below.

Continue reading

GenBank release 223.0 is available via FTP, Entrez and BLAST


GenBank release 223.0 (12/15/2017) has 206,293,625 traditional records (including non-bulk-oriented TSA) containing 249,722,163,594 base pairs of sequence data. In addition, there are 551,063,065 WGS records containing 2,466,098,053,327 base pairs of sequence data, 201,559,502 TSA records containing 181,394,660,188 base pairs of sequence data, and 12,695,198 TLS records containing 4,458,042,616 base pairs of sequence data.

Continue reading