Exploring Entrez Direct: Parsing the XML Output of E-utilities


Entrez Direct is a UNIX/LINUX command-line interface to E-utilities, the API to the NCBI Entrez system. One of Entrez Direct’s most useful features is its ability to parse and reformat complex XML data returns from EFetch. In this post, we will explore how to use these features to parse, reformat and process specific data from PubMed records downloaded in XML using EFetch. Though this post focuses on PubMed, the technique is universal and applies to any XML returned by E-utilities from any database. The example explored here is also presented briefly in the Entrez Direct documentation; here we’ll dive in a bit depeer to see how it works. Let’s get started!

Continue reading