NCBI Insights

Converting GI Numbers to Accession.version

As you may have read in previous posts, NCBI is in the process of changing the way we handle GI numbers for sequence records.

In short, we are moving to a time when accession.version identifiers, rather than GI numbers, will be the primary identifiers for sequence records.

As part of this transition, an obvious question for any of you currently using GI numbers is how to convert a GI number to an accession.version, so that you can make appropriate updates. The good news is that it’s pretty easy if you have no more than a few thousand GIs to convert.

Use EFetch to convert GIs

You can use the NCBI E-utility EFetch:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=663070995,568815587&rettype=acc

For those of you unfamiliar with EFetch, let’s break that down a bit. The first part of the URL is fixed:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore

“db=nuccore” establishes that you’ll be downloading data from the nuccore database. If you want to convert protein GIs, then you would use “db=protein” instead of “db=nuccore”.

Next comes the &id parameter with a list of GI numbers separated by commas:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=663070995,568815587

This requests data for two records: GIs 663070995 and 568815587.

Finally comes the real trick – setting the &rettype parameter to “acc”.

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=663070995,568815587&rettype=acc

This defines an output format where each line contains the accession.version of a single GI, and the order of the lines in the output matches the order of the GIs in the URL. In this case the result would be:

NM_001178.5
NC_000011.10

The result indicates that the accession.version for GI 663070995 is ‘NM_001178.5’ and the accession for 568815587 is “NC_000011.10 “. If one of your input GIs is invalid, then you’ll get a blank line in the output file:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=663070995,100,568815587&rettype=acc

Notice the GI “100” between the two previous GIs. There is no record with GI 100,  so you get this:

NM_001178.5

NC_000011.10

You can list approximately 250 GIs in a single URL (using HTTPS GET), but you can put several thousand in an HTTPS POST call. Just be sure to adhere to our usage guidelines. No more than 3 calls per second, please!

For more information about doing these conversions, please view our recent webinar on this topic.