This full release incorporates genomic, transcript, and protein data available as of March 13, 2019. It contains 192,722,653 records, including 135,670,032 proteins, 25,840,272 RNAs, and sequences from 88,816 organisms.
We now have many improvements to our search functionality on NCBI’s global search page that will benefit users trying to find prokaryotic assemblies and genes. These improvements aim to highlight the best results and provide links to related NCBI content, so you don’t have to sift through pages of results and navigate between different NCBI resources.
Given the size of modern sequence databases, finding the complete genome sequence for a bacterium among the many other partial sequences can be a challenge. In addition, if you want to download sequences for many bacterial species, an automated solution might be preferable.
In this post we’ll discuss how to download bacterial genomes programmatically for a list of species using the E-utilities, the application programming interface (API) to NCBI’s Entrez system of databases. We’ll also take advantage of NCBI’s redesigned Genome database, which links all genome sequences for a given species to one record, making it easy to obtain the desired sequences once you find the right Genome record. In principle you can apply the procedure below to other simple genomes that are represented by a single sequence. Future posts will address additional considerations that apply to complex, eukaryotic genomes.