NCBI’s GI sequence identifiers will soon exceed 32-bit numbers. Are you and your software ready?

In 2016, NCBI announced that it was curtailing its display of its numeric ‘GI’ in popular sequence data formats such as FASTA and GenBank flatfiles. Due to the continued growth of GenBank, NCBI will soon begin assigning GIs exceeding the signed 32-bit threshold of 2,147,483,647 for those remaining sequence types that still receive these identifiers.

NCBI has updated products including Entrez systemGenBank (Nucleotide), BLAST™ and the C++ Toolkit to prepare for that moment by upgrading GI-related code and APIs to accept 64-bit integers. This change over is projected for late 2021. Stay tuned for additional communications from NCBI and take note of the following information if you think you may be impacted.

For a seamless transition, all organizations and developers using our products should review software for any remaining reliance on GIs and compatibility with these larger identifiers. Note that this update requires no changes to submission procedures or assignment of accessions. 

Software developers and organizations with specialty software built to interface with NCBI data and consume a sequence database UID (i.e. GI), process the GI from an ASN1 or XML product, or process the GI from any tabular product on FTP, should review all code to ensure that the new, longer GIs will be handled properly. Alternatively, software developers can make updates to use accession.version identifiers instead of GIs as described in a previous post. We also recommend checking any JSONs in libraries if you have large numeric IDs converted to strings.

We encourage all our customers to update to the latest versions of a variety of NCBI-provided programmatic and command line tools and also check certain web tools as described below.

  • Programmatic, command line customers & software developers
    • → Review any and all NCBI binaries in use. Upgrade to the latest versions of all binaries although all NCBI tool versions since 2018 support the larger 64-bit GIs.
  • Web customers
    • → E-Utilities – you can request links that include the UID with the new, longer GI.
      • Tip: Any UID greater than 2147483647 is an updated 64-bit identifier

NCBI is here to help and welcomes all your feedback! Stay tuned here or on NCBI Twitter where we will share updates and additional information.

Please contact with any questions about this change or to determine if any software you are using is affected.

One thought on “NCBI’s GI sequence identifiers will soon exceed 32-bit numbers. Are you and your software ready?

Leave a Reply