NCBI RefSeq has finished its initial annotation of the new rat reference assembly, mRatBN7.2, recently released by the Darwin Tree of Life Project at the Wellcome Sanger Institute. This is the first coordinate-changing update to the rat reference since the 2014 release of Rnor_6.0 from the Rat Genome Sequencing Consortium and brings the rat assembly into the modern age with a nearly 300x increase in contig N50 and 9x increase in scaffold N50 lengths. It’s a major improvement!
NCBI RefSeq is the first resource to release an annotation for mRatBN7.2, referred to as Rattus norvegicus Annotation Release 108 (AR108). Details of this annotation, including statistics on the annotation products, the input data used in the pipeline and intermediate alignment results, can be found here.
The new assembly and annotation greatly reduces the number of artificially duplicated genes with a 50% decrease in the BUSCO duplicated gene score to a level comparable to what is seen in the mouse GRCm39 AR109 annotation. For example, a mis-assembly in the previous assembly duplicated a 12 kb region of chr20, resulting in extra copies of the Tnf, Lta, and Ltb genes. We have now resolved these to single genes comparable to the orthologs in the human and mouse genomes (see Figure 1).
Figure 1. The new assembly and annotation with single copies of Lta, Tnf, and ltb genes (top) compared to the old assembly containing gene duplications (bottom).
These improvements help identify nearly 250 more 1:1 orthologs versus the human genome, available through NCBI Gene. We’ll be working from this new assembly to further refine the annotation and help identify assembly regions that need additional improvement, which will be undertaken by partners in the Genome Reference Consortium.
You can download the annotation from our FTP site, or from our new NCBI Datasets service. It is also available in NCBI’s Genome Data Viewer, including RNA-seq expression tracks from 258 samples, assembly alignments to Rnor_6.0 and other rat assemblies, and more. You can use NCBI’s Remap service to convert genomic data you may have based on Rnor_6.0 or other rat assemblies to the new reference assembly coordinates.