An NCBI Guide to Finding and Analyzing Metagenomic Data
This workshop concluded on October 28, 2021. Workshop materials are available here.
Metagenomics is the study of genetic material collected from a mixed community of organisms within their original environmental conditions. Recent efforts in this field have studied clinical and environmental samples to characterize the taxonomic composition and biological properties of microbial communities within the living ecosystem. This work often involves large, complex multi-organism datasets and specific tools to untangle unique components of a sample. For years, NCBI has been storing and making available metagenomic sequences and their natural environmental origins (e.g., metadata) in a variety of formats. More recently, we have also created some tools for analyzing metagenomic data, such as MagicBLAST (a tool to align high-throughput sequence reads to a reference sequence) and STAT (a tool to map high-throughput sequence read data on a taxonomic hierarchy).
This workshop is designed for researchers to gain hands-on experience with accessing and analyzing metagenomic data using the various relevant resources available on the NCBI website. Additionally, some exercises in this workshop will provide experience with a Jupyter Notebook for bash- and python-based accessing and processing of this metagenomic data.
In this workshop you will learn how to:
- Search and retrieve metagenomic sequence data from multiple NCBI databases
- Explore STAT-generated taxonomic compositional analysis of SRA metagenomic reads
- Align SRA metagenomic reads to an NCBI sequence database using MagicBLAST