The Tasmanian devil (Sarcophilus harrisii), the last remaining large marsupial carnivore, now faces extinction because of a strange and deadly infection: a transmissible cancer known as Devil Facial Tumor Disease. These tumor infections are apparently passed to other devils through bites during mating or during squabbles over carrion when devils gather to feed. In this unusual situation, the cancer cells themselves are the infectious agent.
The failure of devil immune systems to recognize and destroy the foreign tumor cells may be related to a decline in genetic diversity and may serve as a warning about the vulnerability of species with reduced gene pools. The advent of next-generation sequencing has provided an unprecedented opportunity to track the spread and identify the origin of this unusual zoonosis, as well as to examine the population structure of an endangered mammal and generate a complete genome sequence for this unique marsupial.
One way for you to access Tasmanian devil data at NCBI is through the BioProject database, which consolidates links to all of the data related to a study in a single place. If you search this database with the term “Tasmanian devil”, you will retrieve five BioProject records: three are genome sequencing projects (PRJNA65325, PRJNA51853, and PRJNA167725) that will be the subjects of a future post on the devil, and two are next-generation transcriptome sequencing projects focusing on mRNA (PRJNA79479) and miRNA (PRJNA118101). We will take a look at these RNA data in the present post.
Elizabeth Murchison and colleagues report on the mRNA and miRNA transcriptomes in a study that shows the remarkable potential of next-generation sequencing data to provide rapid insights into tissue-specific gene expression (PMCID: 2982769).
Let’s first look at the sequence data generated by the mRNA experiment, reported in PRJNA79479. The data are next-generation mRNA and microRNA (miRNA) expression profiles of tumor and normal testicular (testis) tissue and are available in the NCBI Sequence Read Archive (SRA). The testis transcriptome data are in experiment SRX010967. The facial tumor data are in SRX010966. Each of these experiments comprises about a million reads from three sequencing runs each. These represent a gene expression snapshot from the two tissues. Murchison and colleagues report that the tumor sample is enriched in transcripts typical of nerve tissue and is consistent with a Schwann cell origin. Nerve-specific transcripts present at high levels in the tumor include myelin protein zero (MPZ), myelin basic protein (MBP), and nerve growth factor receptor (NGFR). Expression of the pro-opiomelanocortin (POMC) gene, a gene normally expressed in the pituitary gland, also shows elevated expression in the tumor.
Despite the large size of these datasets you can perform some analysis on them using tools on the NCBI website. The SRA transcriptomes have been processed and added to NCBI’s SRA BLAST service. The testis and tumor samples are available as separate databases listed under Sarcophilus harrisii. You can easily compare the relative level of expression for any of the genes listed above by searching these two databases.
For example, searching each of these databases with the Tasmanian devil POMC-like transcript (XM_003757795) shows that reads matching this gene product are much more abundant in the facial tumor than in the testis database as shown in the BLAST graphical overview immediately below. Of course to make this a useful comparison, you must consider the sizes of the two databases. In this case, the tumor transcriptome is smaller (888,453 sequences; 152,473,966 bases) than the testis transcriptome (1,357,698 sequences; 237,435,784 bases) confirming the high level of this transcript in the tumor.
You can run these two BLAST searches yourself by following these links:
Now, let’s look at the other transcriptome study in the BioProjects database (PRJNA118101) that links to an SRA submission (SRA010797) of lllumina-generated sequence reads of miRNAs from five tumors and ten normal tissue samples. Here, we will look for evidence of the brain-associated microRNA 338 (MIR338) that is highly represented in the tumor samples as compared with the normal tissue samples.
Although the miRNA reads in these samples are too short to search effectively using BLAST, the SRA Run Browser allows you to quickly count the number of reads for a particular sequence using the Filter search. If you retrieve the data for the facial tumor (GSM458090 and load SRR034113), select the “Reads” tab and filter the reads with the sequence of the 22 base 3’ stem portion of MIR338 (TCCAGCATCAGTGATTTTGTTG), you can see that 126,963 out of 2.6 million reads contain this miRNA sequence as shown in the output below. Repeating the steps with a non-cancerous tissue such as the liver (GSM458084, SRR034107) only finds 38 out of 1.7 million reads. The much higher level of expression in the tumor samples for the brain-associated microRNA 338 (MIR338) helps supports the assertion that the devil facial tumor has a neural and potentially a Schwann cell origin.
In a future post on this topic, we’ll look at nuclear and mitochondrial genomes for the Tasmanian devil. These data have been isolated from normal cells as well as tumor samples. This information provides a way to look at the population structure and diversity of the wild Tasmanian devil population, and also provides insight into the evolution and spread of a cancer that metastasizes to other individuals.