GenBank will start using expanded accession formats by December 2018


By the end of 2018, GenBank and other INSDC members will expand the accession formats used for sequencing projects. We have assigned almost all the possible accession numbers using the current, shorter formats. Using these longer formats will allow us to expand accession ranges and give us greater capacity.

The expanded format for Whole Genome Shotgun (WGS), Transcriptome Shotgun Assembly (TSA), and Targeted Locus Study (TLS) sequencing projects will use a six-letter Project Code prefix and a two-digit Assembly-Version number followed by 7, 8, or 9 digits (for example, AAAAAA020000001).

Non-WGS/TLS/TSA nucleotide sequences currently use a “2+6” format, two-letter prefix followed by six digits. This format will be expanded to eight digits.

Protein sequences currently use a “3+5” accession format. By the end of 2018, this format will use seven digits.

You will need to adjust any processing methods to accommodate these new identifier formats.  Please write to the helpdesk with any questions about the new formats.

One thought on “GenBank will start using expanded accession formats by December 2018

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s