Announcing NCBI Datasets – try it out!

NCBI introduces Datasets, a new resource that lets you easily gather data from across NCBI databases. Our first release allows you to find and download genomic sequence and annotation data for all eukaryotic organisms through our user-friendly web interface.

Our web interface also provides an interactive taxonomy tree that lets you browse for your favorite organism. We are currently testing the web interface in the NCBI labs environment. To try it out, enter a taxonomic name or assembly accession and click on the ‘Get Data’ button in the search results panel.

Here’s what it looks like when you search ‘apes’:

Figure 1. Searching NCBI for “apes” brings up a box labeled “GENOMES”.

For bacterial and viral genomes, use our command-line interface or RESTful API.

Getting genome data is more intuitive than ever. You can find the data you need by entering an assembly accession, NCBI Taxonomy ID or taxonomic name (scientific or common) for any tax rank (e.g., ‘apes’ or ‘Lactobacillus’).

Figure 2. Check the box next to any assembly you’d like to download.

By default, downloaded content includes genomic fasta sequence and a detailed data report for your selected assemblies delivered as a zip file in the BD Bag format. You can also include annotation data, such as gff3, gbff, transcript and protein fasta sequence.  Give NCBI Datasets a try today and let us know what you think!

Leave a Reply