Converting GI Numbers to Accession.version


As you may have read in previous posts, NCBI is in the process of changing the way we handle GI numbers for sequence records.

In short, we are moving to a time when accession.version identifiers, rather than GI numbers, will be the primary identifiers for sequence records.

As part of this transition, an obvious question for any of you currently using GI numbers is how to convert a GI number to an accession.version, so that you can make appropriate updates. The good news is that it’s pretty easy if you have no more than a few thousand GIs to convert.

Use EFetch to convert GIs

You can use the NCBI E-utility EFetch:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=663070995,568815587&rettype=acc

For those of you unfamiliar with EFetch, let’s break that down a bit. The first part of the URL is fixed:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore

“db=nuccore” establishes that you’ll be downloading data from the nuccore database. If you want to convert protein GIs, then you would use “db=protein” instead of “db=nuccore”.

Next comes the &id parameter with a list of GI numbers separated by commas:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=663070995,568815587

This requests data for two records: GIs 663070995 and 568815587.

Finally comes the real trick – setting the &rettype parameter to “acc”.

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=663070995,568815587&rettype=acc

This defines an output format where each line contains the accession.version of a single GI, and the order of the lines in the output matches the order of the GIs in the URL. In this case the result would be:

NM_001178.5
NC_000011.10

The result indicates that the accession.version for GI 663070995 is ‘NM_001178.5’ and the accession for 568815587 is “NC_000011.10 “. If one of your input GIs is invalid, then you’ll get a blank line in the output file:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=663070995,100,568815587&rettype=acc

Notice the GI “100” between the two previous GIs. There is no record with GI 100,  so you get this:

NM_001178.5

NC_000011.10

You can list approximately 250 GIs in a single URL (using HTTPS GET), but you can put several thousand in an HTTPS POST call. Just be sure to adhere to our usage guidelines. No more than 3 calls per second, please!

For more information about doing these conversions, please view our recent webinar on this topic.

5 thoughts on “Converting GI Numbers to Accession.version

  1. Great post, it does clarify some previous issues, thank you very much for the clarification.
    My question is in the cases when we have more than a “few thousand” GIs.
    I have implemented a workaround in my code – a “translation” step to convert GIs to Accession.version numbers. It is very similar to the example you provide here.
    The catch is that this extra step takes quite a while to complete with a large number of sequences (say, 1M) – in fact, my application now takes *nearly* as long to translate GIs to Accession.version as to download the sequence records. It is also “stressing” your servers twice as much as before, at least as far as connection requests go.
    Is there any way to retrieve sequence identifiers as accession.version directly when using esearch? IMO that would be the true solution to the problem, as adding a “translation” step is just a work around.
    Once again, thanks for keeping us, the API users, in the loop of the changes you are making! It is an effort that is really appreciated.

  2. Pingback: Converting Lots of GI Numbers to Accession.version | NCBI Insights

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s