This blog post is intended for geneticists and dataflow engineers who need to compare genetic variants.
Have you ever tried to determine if two genetic variants are the same? If so, you’re not alone. There are competing ways to represent variants, handling ambiguous assignments, as well as reconciling updates to underlying sequence models. To help you with these problems, we’re introducing a new set of web services for comparing and grouping variants.
NCBI has scheduled the next round of HTTPS tests, following up from the initial tests performed on September 15.
The schedule for these tests is as follows (all times are EDT):
NCBI offers extensive collections of sequences through its BLAST services (http://blast.ncbi.nlm.nih.gov) for comparing and identifying DNA, RNA and protein sequences. NCBI now deposits descriptions of these sequence collections, known as BLAST databases, in a special database called blastdbinfo that you can access through the Entrez Programming Utilities (E-Utilities). Using blastdbinfo, you can enable a program to find an appropriate database and then send BLAST searches to that database using either the BLAST URL API or standalone BLAST (installed locally).
Given the size of modern sequence databases, finding the complete genome sequence for a bacterium among the many other partial sequences can be a challenge. In addition, if you want to download sequences for many bacterial species, an automated solution might be preferable.
In this post we’ll discuss how to download bacterial genomes programmatically for a list of species using the E-utilities, the application programming interface (API) to NCBI’s Entrez system of databases. We’ll also take advantage of NCBI’s redesigned Genome database, which links all genome sequences for a given species to one record, making it easy to obtain the desired sequences once you find the right Genome record. In principle you can apply the procedure below to other simple genomes that are represented by a single sequence. Future posts will address additional considerations that apply to complex, eukaryotic genomes.