This post is geared toward fungi researchers as well as RefSeq and BLAST users.
Fungi have unique characteristics that can make it difficult to identify and classify species based on morphology. To address these issues, Conrad Schoch, NCBI’s fungi taxonomist, and Barbara Robbertse, NCBI’s fungi RefSeq curator, in collaboration with outside mycology experts, are curating a set of fungal sequences from internal transcribed spacer (ITS) regions of the nuclear ribosomal RNA genes. This set of standard DNA sequences for fungal taxa not only addresses these difficulties in identifying and classifying fungal species by morphology, but is also essential for analyzing environmental (metagenomics) sequencing studies. The curated ITS sequences, described in a recent article in Database (PMC Free Article), all have associated specimen data and, when possible, are taken from sequences from type materials, ensuring correct species identification and tracking of name changes. This article will show you how to access these ITS sequences and search them using the specialized Targeted Loci BLAST service.
The fungal ITS sequences are a RefSeq Targeted Loci BioProject (PRJNA177353). As you may know, a BioProject is a collection of biological data related to a single initiative; in this case, the goal is to collect and curate fungal sequences from targeted loci – specific molecular markers such as protein coding or ribosomal RNA genes used for phylogenetic analysis.
As of now, there are 2,813 sequences representing a diverse set of 2,720 fungal species. You can easily retrieve the entire set by following the link from the BioProject record or from the RefSeq Targeted Loci page, which also provides information about other rRNA Targeted Loci projects. To retrieve only the sequences from type material, add “sequence from type”[Filter] to the query provided by the BioProject link. You can also download the complete set from the genomes area of the NCBI FTP site.
The ITS reference sequences contain the 5.8S ribosomal RNA gene and the flanking internal transcribed spacer regions (ITS1 and ITS2) as well as the proximal portion of the 28S rRNA gene when available. A graphical view and feature table of the reference ITS region record (NR_111838) from Pseudogymnoascus destructans (formerly Geomyces destructans), the causative agent of white-nose syndrome in hibernating bats (PubMed), is shown in Figure 1.
Fungal ITS sequences are useful in identifying unknown fungal ITS targeted regions in BLAST searches. You can easily search the fungal ITS Reference Sequences on the new Targeted Loci BLAST page to quickly assign a name or find a closely related fungal species.
Select the Internal transcribed spacer region (ITS) database from the pull-down list. You can also select the checkbox of “Sequences from type material” to search only those associated with type material or cultures. Figure 2 shows the settings needed on the Targeted Loci BLAST form.
Targeted Loci BLAST page is especially helpful when a search of the default database on the main BLAST page finds best matches to environmental fungal sequences that have incomplete taxonomic information. Figure 3 shows the results of a BLAST search against the RefSeq Fungal ITS sequences using an uncultured fungus ITS clone sequence (INSDC Accession: DQ421263) as a query. The best hit is the Penicillium subrubescens CBS 132785 ITS sequence (NR_111863) with a single mismatch in the alignment.
The same search against the default nucleotide database, even with the exclude “Uncultured/environmental sample sequences” box selected, finds a large number of incompletely identified records that push the Penicillium subrubescens ITS sequence hit to position 70 in the output (now show), making it difficult to assign the most likely source organism for the unidentified query sequence.
The RefSeq fungal ITS sequences are an essential resource for fungal phylogenetic studies and analysis and identification of fungal sequences from environmental sequencing projects. The linkage to type materials makes them particularly valuable for assigning accurate names. Currently, RefSeq records represent most of the fungal Orders. NCBI curators will continue to expand the set to improve the coverage at the Family and Genus levels.