Downloading NCBI Biological Data and Creating Custom Reports Using the Command Line
This workshop concluded on April 25, 2023. The workshop materials are available here.
This workshop is for biological researchers who would like to incorporate NCBI command-line clients into their workflows to access and process NCBI molecular data and metadata. In this workshop you will learn to use both the EDirect suite and the Datasets command-line interface (CLI) to download gene sequences, genome assemblies and their associated metadata, and create custom reports that cross reference biological features and sequence data. You do not need to have prior experience with EDirect or the Datasets CLI tools (datasets and dataformat), but you will need to be familiar with NCBI databases and comfortable using the Unix/Linux shell to get the most out of this workshop.
In this workshop you will learn how to:
- Use the EDirect suite to search for and collect sequence and gene data data across NCBI databases
- Incorporate the the EDirect XML parser Xtract into workflows to create and format custom reports
- Use the Datasets CLI to access and download genome sequences and metadata in order to build custom databases
- Use the dataformat tool to generate reports from downloaded genome metadata to classify and filter genomes by biological criteria
- Incorporate these tools into workflows with other bioinformatic tools such as BLAST
Due to curricular and technical limits, we’ve capped the number of spots to provide the best workshop experience. If you register to apply, you will be notified of your application status 2 weeks before the scheduled event.
We recommend having access to a stable internet connection and modern web browser on a laptop or desktop computer to be able to successfully participate in the hands-on exercises.