As you may have read in previous posts, NCBI is in the process of changing the way we handle GI numbers for sequence records. In short, we are moving to a time when accession.version identifiers, rather than GI numbers, will be the primary identifiers for sequence records.

In a previous post, we outlined a method for converting GI numbers (used to identify sequence records) to accession.version identifiers. That method used the E-utility EFetch and is capable of handling cases where you have no more than a few thousand GI numbers to convert.

What if you have more?

We now have a bulk conversion resource that will allow you to handle very large jobs. The resource consists of a Python script coupled with a database file (about 40 GB uncompressed). You’ll need to download both of these files (gi2accession.py and gi2acc_lmdb.gz) to local disk, and then you can process as needed.

The files are available here: ftp.ncbi.nlm.nih.gov/genbank/livelists/gi2acc_mapping/.

The script works in two modes: interactive and bulk.

Interactive

$ ./gi2_accession.py

gi: 42

42 CAA44840.1 416

After entering the GI number, the script responds with the GI, the corresponding accession.version, and the length of the sequence (in residues).

Bulk

./gi2accession.py < list_of_gis.txt

In this case, the script will accept an input stream of GI numbers (e.g., from a file, one per line) and then output a line for each GI with the same three columns as above.

Further instructions for using the script are in a README file in the FTP directory.

Please be aware that you’ll need about 40GB of disk space, along with Python 2.7 or higher and the Python lmdb package.

Let us know if you have comments or questions about this resource.

Quick Tips

6 thoughts on “Converting Lots of GI Numbers to Accession.version”

You could also use a REST service provided documented under http://holgerbrandl.github.io/kotlin/2017/01/06/building-a-gi-to-accession-conversion-rest-service.html

Holger Brandl says:

January 6, 2017 at 9:58 am

It would be great if comments could be edited to fix typos. 🙂

Loading...

Reply

Hi,

Could you check this entry?

GI Accession
1680002 AH007344.1
1680091 AH003807.1

For example, AH007344.1(Accession) have the 340 nt 10 segments sequence information.

but, In this directory (ftp.ncbi.nlm.nih.gov/genbank/livelists/gi2acc_mapping/)

this information AH007344.1(Accession) length is 1362.

Why this record is not same each in NCBI?

Thanks

Python script does not work under Python 3″ uses old style ‘print “…”‘ instead of new ‘print(“…”)’ . 😛

Thomas W Rackers says:

December 6, 2017 at 2:07 pm

… under Python 3: …

Loading...

Reply

Pingback: NCBI Insights : NCBI’s GI sequence identifiers will soon exceed 32-bit numbers. Are you and your software ready?

Holger Brandl says:

January 6, 2017 at 9:53 am

You could also use a REST service provided documented under http://holgerbrandl.github.io/kotlin/2017/01/06/building-a-gi-to-accession-conversion-rest-service.html

Loading...

1. Holger Brandl says:
  
  January 6, 2017 at 9:58 am
  
  It would be great if comments could be edited to fix typos. 🙂
  
  Loading...
  
Brandon says:

March 16, 2017 at 4:17 am

Hi,

Could you check this entry?

GI Accession
1680002 AH007344.1
1680091 AH003807.1

For example, AH007344.1(Accession) have the 340 nt 10 segments sequence information.

but, In this directory (ftp.ncbi.nlm.nih.gov/genbank/livelists/gi2acc_mapping/)

this information AH007344.1(Accession) length is 1362.

Why this record is not same each in NCBI?

Thanks

Loading...

Thomas W Rackers says:

December 6, 2017 at 2:06 pm

Python script does not work under Python 3″ uses old style ‘print “…”‘ instead of new ‘print(“…”)’ . 😛

Loading...

1. Thomas W Rackers says:
  
  December 6, 2017 at 2:07 pm
  
  … under Python 3: …
  
  Loading...
  
Pingback: NCBI Insights : NCBI’s GI sequence identifiers will soon exceed 32-bit numbers. Are you and your software ready?

NCBI Insights

Converting Lots of GI Numbers to Accession.version

What if you have more?

Interactive

Bulk

Like this:

6 thoughts on “Converting Lots of GI Numbers to Accession.version”

Leave a ReplyCancel reply

What if you have more?

Interactive

Bulk

Share this post:

Like this:

6 thoughts on “Converting Lots of GI Numbers to Accession.version”

Leave a ReplyCancel reply

Discover more from NCBI Insights