Download Assembled Genome Data Programmatically with NCBI Datasets

Download Assembled Genome Data Programmatically with NCBI Datasets

As previously announced, NCBI’s Assembly and Genome record pages will be redirected to new NCBI Datasets pages in June 2023. The NCBI Datasets Command Line Interface (CLI) tools provide easy, straightforward programmatic downloads of assembled genome sequence data. We invite you to check them out and let us know what you think! 

Features & Benefits of NCBI Datasets
  • Get assembled genome sequence, annotation, and metadata, including transcripts and proteins, in one easy step. 
  • Querying is easy and flexible! Retrieve data using organism name, assembly accession, or BioProject accession. 
  • Request data for multiple assemblies in one request – it is now simpler and faster to download large amounts of data. 
  • Metadata is derived from multiple databases and metadata schemas are documented. 


For example, to get sequence and metadata for the human reference genome, you could use the datasets command line tool with the following command: 

datasets download genome taxon human --reference --filename 

E-Utilities & EDirect remain available as additional command line options

Don’t worry! You can still search using Entrez syntax from the Assembly and Genome homepages. There will be no changes to how you access the Assembly and Genome databases using E-Utilities or EDirect. 

Learn more

For more details about NCBI Datasets programmatic access to genome assembly data, see our how-to guide. 

Stay up to date

NCBI Datasets is a part of the NIH Comparative Genomics Resource (CGR). CGR facilitates reliable comparative genomics analyses for all eukaryotic organisms through an NCBI Toolkit and community collaboration.    

Follow us on Twitter @NCBI and join our mailing list to keep up to date with NCBI Datasets and other CGR news.    


If you have questions or would like to provide feedback, please reach out to us at 

Leave a Reply