Human GRCh37 (hg19) RefSeq annotation update 

The NCBI RefSeq group has been in overdrive, making improvements to our human genome annotation and reference transcript and protein sets, with 8,000 new and 15,000 updated transcripts in the last year alone! That’s about 30% of our curated transcript dataset (the transcripts with NM_ and NR_ accessions), with a big focus on transcripts that are well-expressed, have conserved exons, or are transcribed from new promoters.

With all these improvements, we’ve been updating the RefSeq annotation of GRCh38.p13 every quarter. But what about GRCh37 (hg19), which many of you still use?

We’ve heard your requests, and we’ve now released an updated annotation for GRCh37.p13, referred to as Homo sapiens Annotation Release 105.20201022, including a complete set of the latest curated RefSeq transcripts. It’s available for download by FTP or from Datasets, and for browsing in NCBI’s Genome Data Viewer (GDV).

screenshot of new grch37 refseq annotation track in gdv
Figure 1. You can browse the updated annotation (at top) in GDV. There are two sets of NMs with 5′ ends  (A, B) not previously represented on GRCh37. Also, there is a greater diversity of alternative splicing represented in the 5′ UTR (C). The previous annotation is directly below.

This annotation also includes markup of ‘RefSeq Select‘ transcripts, providing clinical labs with a complete set of the latest transcripts we recommend for clinical reporting in the context of GRCh37. Note about 5% of RefSeq Select transcripts have sequence differences vs GRCh37 which should be taken into account for variant reporting. You can spot those by the note on the annotated mRNA and CDS features to alert you about the differences, and there are BAM alignment files available on FTP to help with remapping variants between the genome and transcripts.

What if you’re a clinical lab still using older RefSeq transcript versions for reporting?

While we encourage you to update to use GRCh38 so you can take full advantage of our MANE collaboration with Ensembl/GENCODE, you can still submit to ClinVar using older RefSeq transcripts, even ones from 1999. You can also use our Variation Services APIs to help convert between GRCh37 and GRCh38, or from old to current RefSeq transcripts.

If you have any questions or feedback, we’d love to hear from you at refseq-support@nlm.nih.gov.

Leave a Reply