Gene Expression Counts on NCBI RefSeq Eukaryotic Genomes

Gene Expression Counts on NCBI RefSeq Eukaryotic Genomes

We’re rolling out exciting new features to NCBI RefSeq’s Eukaryote Genome Annotation Pipeline (EGAP)! Now you can get a better understanding of gene expression observed in different RNA-seq datasets with our newly added gene expression counts. These are determined using featureCounts based on the EGAP-produced RefSeq annotation and the set of RNA-seq runs aligned with the STAR aligner as part of the annotation process.  

For example, RNA-seq data for gorilla from PRJNA414978 show a shift in expression of immune-related genes after treatment to induce a strong interferon-mediated response, which can be visualized with tools such as DESeq2. 

Screenshot of a gorilla gene heat map

Figure 1: Differential expression profiles of innate immunity genes in response to polyinosinic:polycytidylic acid transfection in gorilla. The heatmap shows z-scores of featureCounts expression values transformed by DESeq2 for genes with 1-1 orthologs among all non-human primates (y-axis). Replicates are grouped by transfection condition (treatment vs mock) and donor sample. Gene list obtained from Gaska et al. (2019).  

RNASeq expression graphs

While we’ve provided both aggregate and per-sample RNA-seq expression graphs in our Genome Data Viewer (GDV) browser for nearly a decade, the data is now available to download. As a pilot to gauge interest, we’re now providing expression graphs in community-standard bigWig format for each RNA-seq run aligned with the STAR aligner in EGAP. You can use these data with other resources and tools to explore gene expression levels across the genome.  

Where to find it

featureCounts and bigWig files are available for over 200 EGAP-annotated genomes on the NCBI genomes FTP site. For example, see the files available for the recent gorilla assembly. 

Check out the following relevant files: 

GCF_029281585.1-RS_2023_04_gene_expression_counts.txt.gz
GCF_029281585.1-RS_2023_04_rnaseq_alignment_summary.txt
GCF_029281585.1-RS_2023_04_rnaseq_runs.txt
bigWig files: RNASeq_coverage_graphs

Stay up to date

These new features were developed as part of the NIH Comparative Genomics Resource (CGR). CGR facilitates reliable comparative genomics analyses for all eukaryotic organisms through an NCBI Toolkit and community collaboration.     

Follow us on social @NCBI and join our mailing list to keep up to date with RefSeq and other CGR news. 

Questions?

If you have questions or would like to provide feedback, please write to our help desk.   

Leave a Reply