NCBI’s Genome Remapping Service assists in the transition to the new human genome reference assembly (GRCh38)

In late December 2013, the Genome Reference Consortium (GRC) released an updated version of the human reference genome assembly, GRCh38, and submitted these new sequences to GenBank. This is the first time in four years that a new major version of the human genome has become available to the genomics community.

Perhaps you’ve been working on data mapped to the previous assembly (GRCh37) that became available in March 2009, or maybe you are still using an even earlier version, such as NCBI36 from March 2006. Is there a way to reduce the amount of time and effort required to reanalyze your data in the context of the new assembly?

Yes! It’s NCBI’s Genome Remapping Service, or NCBI Remap for short.

NCBI Remap is a tool that allows you to convert annotation data from one coordinate system to another, such as from GRCh37 to GRCh38. This remapping uses genomic alignments to project features from one sequence to the other. In a nutshell, you provide your own data based on the coordinates of a specific assembly and tell NCBI Remap to which assembly you’d like to convert the coordinates, and you’ll get back coordinate mapping files for your data.

The Remap tool is particularly helpful in mapping data for comparing your data with NCBI RefSeq annotations. The new annotation (version 106) corresponding to this new assembly is anticipated to be available later this month. In the meantime, data submitted with RefSeq identifiers will find their data mapped onto the GenBank record, for example:  NC_000019.9:g.45411941T>C maps to CM000681.2:44908684.

If you have a small amount of data, you can just copy and paste the data into the large text box labeled ‘Paste data here (see figure 1). Otherwise, you can just upload a data file.  NCBI Remap accepts several file formats that are commonly used in the bioinformatics community, for example:

Unless specified, you’ll be able to download your remapped data in the same format as your original upload, however you can select a different format for the output if you prefer. Alternatively, you can view the resulting data files directly in our client side viewer called Genome Workbench.
Figure 1. Screenshots of the NCBI Genome Remapping Service. This example shows the coordinate mapping from GRCh37 to the new GRCh38 assembly for three SNP positions using a GFF file.
Figure 1. Screenshots of the NCBI Genome Remapping Service. This example shows the coordinate mapping from GRCh37 to the new GRCh38 assembly for three SNP positions using a GFF file.
If you would like to take advantage of this service using your own pipeline or by scripting, an NCBI Remapping Service API is available.  Finally, there is an FTP directory with downloadable files of assembly-assembly alignment mapping coordinates.

For more information about NCBI’s Remapping Service, take a look at the following:

3 thoughts on “NCBI’s Genome Remapping Service assists in the transition to the new human genome reference assembly (GRCh38)

Leave a Reply