Sequence updates in human genome assembly GRCh38: filling in the gaps

In a previous blog post, we explained several important concepts about the human reference genome.  We presented a region of human chromosome 17 as an example of a location where the genome sequence was not fully assembled.  In this post, we are going to revisit the same gapped region to see how the Genome Reference Consortium (GRC) changed this part of the genome in GRCh38, the updated human reference assembly released in December 2013.  This region represents just one of the more than 1,000 changes and improvements that the GRC introduced in GRCh38.

First, we’ll examine corresponding regions of chromosome 17 in the previous (GRCh37.p13) and current (GRCh38) reference assemblies (Figure 1). The representation of the region spanning from 21,200K to 21,700K in GRCh37.p13 contained a gap of unknown size (the blank area with no components in the figure), which was arbitrarily set to 100K. The ‘K’ indicates a kilo base pair, or 1,000 bp.

blog 91 fig 1

Figure 1: Updates to the reference human genome assembly in a region of chromosome 17. Top panel: A region of chromosome 17 (NC_000017.10) from the GRCh37.p13 assembly showing the components. Bottom panel: The corresponding region of chromosome 17 (NC_000017.11) in the GRCh38 assembly showing the new components. Components shared between the two builds are marked with checks. Components not present in GRCh38 are marked with an “X”. The labeled components AC233702.5 and ABBA01006765.1 are two of the 11 new components in GRCh38.

GRCh38 contains new information in this region, including 11 new components that expanded the area by about 500K. This expansion resulted in a change in coordinates for all downstream genes and features on chromosome 17. Moreover, three of the components in GRCh37.p13 within this region are no longer in GRCh38, and therefore any genes or features annotated on those components have moved somewhere else in the new build or have been deleted.

Major changes such as these require researchers to take a moment to assess the consequences of the updates on any genes of interest in the many regions that have been updated in GRCh38. To make this easier, NCBI offers a Genome Remapping Service that converts mapping data from one build to another, as described in a previous post. You can use this tool to confirm that the region on chromosome 17 between 21,200 K and 21,700 K in GRCh37.p13 roughly corresponds to the region between 21,300 K and 22,200K on chromosome 17 in GRCh38.

In a future post, we will take a closer look at this region of chromosome 17 to see how the gene annotations changed as a result of the new components.

