GenBank release 221.0 (8/13/2017) has 203,180,606 traditional records containing 240,343,378,258 base pairs of sequence data. In addition, there are 499,965,722 WGS records containing 2,242,294,609,510 base pairs of sequence data, 186,777,106 TSA records containing 167,045,663,417 base pairs of sequence data, and 1,628,475 TLS records containing 824,191,338 base pairs of sequence data.
Entrez Direct is a UNIX/LINUX command-line interface to E-utilities, the API to the NCBI Entrez system. One of Entrez Direct’s most useful features is its ability to parse and reformat complex XML data returns from EFetch. In this post, we will explore how to use these features to parse, reformat and process specific data from PubMed records downloaded in XML using EFetch. Though this post focuses on PubMed, the technique is universal and applies to any XML returned by E-utilities from any database. The example explored here is also presented briefly in the Entrez Direct documentation; here we’ll dive in a bit depeer to see how it works. Let’s get started!
NCBI, in collaboration with NLM and the National Network of Libraries of Medicine NLM Training Center (NTC) at the University of Utah, recently presented the second offering of A Librarian’s Guide to NCBI. Health Sciences Librarians from 17 universities and two federal agencies attended the five-day intensive course on the NIH campus. This second offering of the training continues to prepare health science librarians for supporting NCBI molecular databases and tools, and training patrons in the use of NCBI resources at their own institutions.
As before, all the course materials are available online. Feel free to learn from them, adapt them for your own teaching, and share them with others. You can use the links below to access the updated 2014 course materials. These include the slide sets with demonstrations and practice problems.