As reported in the journal Plant Disease, a recent collaboration between National Library of Medicine’s NCBI and the U.S. Department of Agriculture’s Animal and Plant Health Inspection Service (APHIS) analyzed public sequence records for the fungal genus Colletotrichum, an important group of fungal plant pathogens that are a significant threat to food production. Colletotrichum species are challenging to identify accurately, and public sequences may contain out of date taxonomic information. The study improved the accuracy of species names assigned to Colletotrichum database sequences, verified a comprehensive set of reliable reference markers for the genus, and produced a multi-marker tree as well as the genome based interactive tree shown in Figure 1.
Figure 1. Views from genome assembly derived multi-protein distance tree that shows the analysis of publicly available Colletotrichum genomes. The interactive tree is available online. You can browse, search, download, and export the tree. As an example search, you can demonstrate that assembly GCA_002901105.1 was incorrectly labeled as Colletotrichum gloeosporioides. Searching the tree for the name “Colletotrichum gloeosporioides” highlights two clades. Clicking the node for the Truncatum species complex and clicking “Show descendants” expands the clade and shows that assembly GCA_002901105.1, which was labelled as gloeosporioides, clusters with the Truncatum species complex. You can find more details on the tree building process in the supplementary material for the publication and on GitHub.
This post is geared toward fungi researchers as well as RefSeq and BLAST users.
Fungi have unique characteristics that can make it difficult to identify and classify species based on morphology. To address these issues, Conrad Schoch, NCBI’s fungi taxonomist, and Barbara Robbertse, NCBI’s fungi RefSeq curator, in collaboration with outside mycology experts, are curating a set of fungal sequences from internal transcribed spacer (ITS) regions of the nuclear ribosomal RNA genes. This set of standard DNA sequences for fungal taxa not only addresses these difficulties in identifying and classifying fungal species by morphology, but is also essential for analyzing environmental (metagenomics) sequencing studies. The curated ITS sequences, described in a recent article in Database (PMC Free Article), all have associated specimen data and, when possible, are taken from sequences from type materials, ensuring correct species identification and tracking of name changes. This article will show you how to access these ITS sequences and search them using the specialized Targeted Loci BLAST service.
The fungal ITS sequences are a RefSeq Targeted Loci BioProject (PRJNA177353). As you may know, a BioProject is a collection of biological data related to a single initiative; in this case, the goal is to collect and curate fungal sequences from targeted loci – specific molecular markers such as protein coding or ribosomal RNA genes used for phylogenetic analysis.