Given the size of modern sequence databases, finding the complete genome sequence for a bacterium among the many other partial sequences can be a challenge. In addition, if you want to download sequences for many bacterial species, an automated solution might be preferable.
In this post we’ll discuss how to download bacterial genomes programmatically for a list of species using the E-utilities, the application programming interface (API) to NCBI’s Entrez system of databases. We’ll also take advantage of NCBI’s redesigned Genome database, which links all genome sequences for a given species to one record, making it easy to obtain the desired sequences once you find the right Genome record. In principle you can apply the procedure below to other simple genomes that are represented by a single sequence. Future posts will address additional considerations that apply to complex, eukaryotic genomes.