GenBank will start using expanded accession formats by December 2018

By the end of 2018, GenBank and other INSDC members will expand the accession formats used for sequencing projects. We have assigned almost all the possible accession numbers using the current, shorter formats. Using these longer formats will allow us to expand accession ranges and give us greater capacity.

The expanded format for Whole Genome Shotgun (WGS), Transcriptome Shotgun Assembly (TSA), and Targeted Locus Study (TLS) sequencing projects will use a six-letter Project Code prefix and a two-digit Assembly-Version number followed by 7, 8, or 9 digits (for example, AAAAAA020000001).

Non-WGS/TLS/TSA nucleotide sequences currently use a “2+6” format, two-letter prefix followed by six digits. This format will be expanded to eight digits.

Protein sequences currently use a “3+5” accession format. By the end of 2018, this format will use seven digits.

You will need to adjust any processing methods to accommodate these new identifier formats.  Please write to the helpdesk with any questions about the new formats.

3 thoughts on “GenBank will start using expanded accession formats by December 2018

Leave a Reply to Matt Cancel reply