The impact of fungal diseases on human health has often been neglected, but increased association of fungal infections with severe illness and death during the COVID-19 pandemic has brought fungal diseases into the spotlight.
According to the CDC, the most common fungal co-infections in patients with COVID-19 include aspergillosis or invasive candidiasis including healthcare-associated infection from Candida auris. Other reported diseases are mucormycosis, coccidioidomycosis and cryptococcosis. Aspergillosis is commonly caused by Aspergillus fumigatus, mucormycosis by Rhizopus species, coccidioidomycosis by Coccidioides immitis and C. posadasii and cryptococcosis by Cryptococcus neoformans.
This post explores several NCBI resources that have relevant information about the fungal pathogens implicated in these COVID-19 related illnesses.
Correctly identified and annotated genome assemblies are available for the fungal taxa implicated as co-infections in COVID-19 patients are summarized in table below. These and many other fungi are also available as curated RefSeq genome assemblies.
The genomes table (Figure 1) now offers filters for:
Reference genomes — switch it on to only show reference or representative genomes
Annotated — switch it on to only show annotated genomes
Assembly level — use the assembly level slider to select higher-quality genomes
Year released — use the slider to limit your results to recent genomes
In addition, the new Actions column connects you to NCBI’s Genome Data Viewer, BLAST, and Assembly. The Text filter box lets you search by the name of the assembly, species/infraspecies, or submitter.Figure 1. The new Datasets Genomes page with primate assemblies showing the STATUS switches (reference genomes, annotated); expanded filters section with ASSEMBLY LEVEL and YEAR RELEASED sliding selectors; and the Actions column menu with access to Assembly details, BLAST, the Genome Data Viewer, and Download options. Continue reading “Introducing the new NCBI Datasets Genomes page”→
The NCBI Assembly database now provides sequence and metadata for more than 1 million genome assemblies from over 85,000 different species.
Assembly crossed the 1 million genome assemblies milestone on Sunday, April 18, 2021 (Figure 1).
Figure 1. Assembly status and growth. More than 1 million assemblies are now searchable through the NCBI web site (top panel). The number of genome assemblies at NCBI has accelerated rapidly in the past decade.
NCBI’s genome Assembly has a number of significant improvements!
Assembly records now have a link to Primer-BLAST making it easy to design primers in the context of a specific eukaryote genome assembly. Figure 1 shows the Assembly page for the Genome Reference Consortium Mouse Build 39 (GRCm39) with the link to Primer-BLAST.
We have updated the bacterial and archaeal representative genome collection! The current collection contains over 13,000 assemblies selected from the 203,000 prokaryotic RefSeq assemblies to represent their respective species. The collection has increased by 11% since August 2020. We’ve included about 1,400 species for the first time, have used better assemblies for 1,177 species, and have removed 65 species because of changes in NCBI Taxonomy or uncertainty in their species assignment.
We have updated the collection of representative genome assemblies for Bacteria and Archaea. As announced in April, this set is now recalculated three times a year. We selected a total of 11,727 prokaryotic assemblies to represent their respective species among the 192,000 assemblies in RefSeq. Six hundred and thirty-five species were included in the collection for the first time, while 395 organisms from undefined species (such as Bacillus bacterium) were removed. We were able to choose a higher-quality representative than in the previous set for 18% of Bacterial and Archaeal species due to improvements in the logic of the selection that is now based on the assembly length, number of pseudo CDSs called in the PGAP annotation, number of scaffolds, whether Gene IDs are available in the Gene database for the assembly that is currently representative, and type strain status. You can see the exact criteria in order of importance on the Prokaryotic RefSeq Genomes page. Now that the new selection process is in place, we expect future updates to have fewer changes. We will replace a representative only if the assembly has changed RefSeq status or if a substantially better assembly becomes available.
You can download the reference and representative set from the Assembly resource. If you are interested in the annotation on these genomes, you can limit searches to proteins annotated on representative genomes by adding “refseq_select[filter]” to any query in the Protein database. For example, you can find all proteins annotated on representative genomes in the genus Klebsiella by using the query: “Klebsiella[organism] AND refseq_select[filter]“. A BLAST database of proteins annotated on representative genomes will be coming soon. Stay tuned!
The Prokaryote type strain report provides information on type-strains for over 18,000 species. We revised and expanded the report to make it easier to identify cases where sequencing or establishing type material would have the biggest impact on improving prokaryote taxonomy and accurate identification. These cases include species with designated type strains but without a sequenced type strain assembly and species without designated type material. We hope that the community will prioritize sequencing type strains for the former set of species (Table 1) and establishing a neotype or reftype, where applicable (as defined in Cuifo et al 2018) for the latter set (Table 2).
Table 1. The top 10 candidate species for sequencing type-strains sorted by the number of assemblies. These have designated type strains but no type strain assembly. We generated the list by sorting by “number of assemblies from type materials per species”, then by decreasing “number of assemblies per taxon”, then filtering out “type materials and coidentical strains” = “na”.
Table 2. The top 10 candidates for proposing a reftype assembly, or neotype where applicable sorted by the number of assemblies. These species have no designated type strain. We generated the list by selecting for “type materials and coidentical strains” = “na”, “number of assemblies from type materials per species” = 0, and sorting by decreasing “number of assemblies per taxon”, then filtering out Candidatus.
As we described in an earlier post, GenBank uses average nucleotide identity (ANI) analysis to find and correct misidentified prokaryotic genome assemblies. You can now access ANI data for the more than 600,000 GenBank bacterial and archaeal genome assemblies through a downloadable report (ANI_report_prokaryotes.txt) available from the genomes/ASSEMBLY_REPORTS area of the FTP site. The README describes the contents of the report in detail. You can use the ANI data to evaluate the taxonomic identity of genome assemblies of interest for yourself.
The new ANI_report_prokaryotes.txt replaces the older ANI_report_bacteria.txt in the same directory. We are no longer updating the ANI_report_bacteria.txt file and will remove it after 31st May 2020.
We have updated the collection of representative and reference assemblies for Bacteria and Archaea to better reflect the taxonomic breadth of the prokaryotes in RefSeq. We chose the 11,478 representative assemblies in the new collection from the 180,000+ prokaryotic assemblies in RefSeq today. We have selected one representative or reference assembly for every species based on several criteria including contiguity, completeness and whether the assembly is from type material. We have also updated the reference and representative microbial Blast database to reflect these changes. This reference and representative set will be updated three times a year to reflect changes in RefSeq. In addition, as we announced on Feb 14, we have reduced the number of reference genome assemblies — the subset of representative assemblies with annotation provided by outside experts — to 15. See the list in our previous post . We have re-annotated the 104 assemblies that are no longer reference with or Prokaryotic Genome Annotations Pipel (PGAP).