We are excited to introduce a Foreign Contamination Screen (FCS) tool that you can now run yourself, with enhanced contaminant detection sensitivity to improve your genome assemblies and facilitate high-quality data submissions to GenBank. If you submit genome assembly data to GenBank, the FCS tool is for you!
What is the FCS tool?
FCS, a quality assurance process used to make data suitable for submission, consists of two parts: FCS-adaptor and FCS-GX. FCS-adaptor searches for short sequences that are used as part of the lab preparation process and sometimes wind up in the final assembly by mistake. FCS-GX searches for sequences from a wide range of organisms including bacteria, fungi, protists, viruses, and others to identify sequences that don’t look like they are from the intended organism. In each case, you receive a report of the coordinates and identities of potential contaminants to be reviewed and removed (see Figure 1 for a sample report of the FCS-GX summary output). Both tools are designed to screen both eukaryote and prokaryote genomes.
Figure 1. FCS-GX report showing the summary of contamination identified in a tomato genome. The output indicates there are 83 sequences, adding up to 381 kb total length, to be removed from a mix of insect, fungal, and bacterial sources.
How do I use FCS?
FCS is available from GitHub. Simply download the two programs (FCS-adaptor and FCS-GX), and follow a few steps as outlined in the Quickstart. Both tools are also easy and inexpensive to run on commercial clouds such as Amazon Web Services (AWS) or Google Cloud Platform (GCP), and can screen genomes in a fraction of the time of other approaches.
Why is FCS important?
Having high quality data available for analysis is necessary in order to arrive at accurate conclusions during research. With FCS, rapid detection of contaminants from foreign organisms in assembled genomes ensures that high value data is being provided for submission and available for reuse. We’ve already used FCS-GX to remove over one hundred megabases of contaminants and thousands of erroneous genes and proteins from previously submitted eukaryote genomes to make the data more useful for all.
We want to hear from you!
We will update the FCS tool based on your feedback, so try it out and let us know what you think. Please contact us with comments and suggestions.
FCS is part of the NIH Comparative Genomics Resource (CGR), an NLM project to establish an ecosystem to facilitate reliable comparative genomics analyses for all eukaryotic organisms.
Join our mailing list to keep up to date with FCS and other CGR news.