We have updated our annotation for the mouse reference genome, GRCm38.p6. It includes:
- Markup for RefSeq Select, which identifies one representative transcript and protein for every protein-coding gene. Find features with the ‘tag=RefSeq Select’ attribute in GFF3 for those analyses where you need just a single transcript or protein for each coding gene. You can also find these RefSeqs in Entrez using the query ‘refseq_select[filter].’
- Annotation updates made in the last year for over 2000 genes, including over 4000 new or revised curated transcripts. This includes targeted curation to ensure we are representing well-expressed and conserved transcripts for inclusion in RefSeq Select.
- Annotation of over 2300 regulatory and other functional element features from over 900 biological regions. These are now identified with the source “RefSeqFE” in GFF3 column 2 for easy parsing.
When citing, please refer to this annotation as NCBI Mus musculus Annotation Release 108.20200622. You can find the data in:
- NCBI Datasets
- NCBI Assembly
- Genomes FTP
- NCBI’s Genome Data Viewer
- BLAST nr/nt and custom RefSeq databases
This is our last update before upgrading to the new major assembly version just released by the Genome Reference Consortium, GRCm39. We expect to be cranking up our compute farm in the next few weeks to produce a full annotation based on our latest curation and extensive short (Illumina) and long (PacBio IsoSeq and nanopore) RNA-seq data, which should be released later this summer. Stay tuned!