NCBI RefSeq has finished its initial annotation of the new mouse reference assembly, GRCm39, recently released by the Genome Reference Consortium. This is the first coordinate-changing update to the mouse reference since the 2012 release of GRCm38, resolving over 400 issues, almost doubling the scaffold N50, closing almost half the gaps, and adding 1.9 Mb of sequence. It’s a big deal!Figure 1. The Genome Data Viewer showing the annotation for the mouse pseudoautosomal region that includes annotations of four genes that were previously missing: Sts, Nlgn4l, Akap17a, and 2510022D24Rik.
NCBI RefSeq is the first resource to release an annotation for GRCm39, referred to as Mus musculus Annotation Release 109. It includes:
- Over 45 thousand manually curated transcripts, which are a nearly matching set to what we recently annotated on GRCm38.p6 to help with transitioning to the new genome.
- A recalculated set of models, using over 17 billion RNA-seq reads and 76 million PacBio and Oxford Nanopore long transcriptome reads as supporting evidence.
- New predictions of annotated genes (3%).
One improved region is the pseudoautosomal region shared between the X and Y chromosomes, which has been particularly challenging to assemble. The ChrX PAR region in GRCm39 includes four new genes that were previously missing: Sts, Nlgn4l, Akap17a, and 2510022D24Rik (Figure 1).
We’ll be working off of this new assembly for future curation, and digging into the regions that help improve gene representation.
You can download the annotation from our FTP site, or from our new NCBI Datasets service. It’s also available in NCBI’s Genome Data Viewer, including RNA-seq expression tracks from 58 samples, assembly alignments to GRCm38 and other mouse assemblies, and more. You can use NCBI’s Remap service to convert genomic data you have based on GRCm38 or other mouse assemblies to the new reference assembly coordinates.