GenBank release 232


GenBank release 232.0 (6/20/2019) is now available on the NCBI FTP site. This release has 5.47 terabases and 1.58 billion records.

The release has 213 million traditional records containing 329.8 billion base pairs of sequence data. There are also 1 billion WGS records containing 4.8 trillion base pairs of sequence data, 319.9 million bulk-oriented TSA records containing 285.3 trillion base pairs of sequence data, and 25 million bulk-oriented TLS records containing 10 billion base pairs of sequence data.

GenBank growth between releases

During the 59 days between the close dates for releases 231.0 and 232.0, GenBank grew by 442.7 billion bases and 39.7 million records. Broken down into components, this is:

  • “Traditional”: 8 trillion base pairs, 608,344 sequence records
  • WGS: 425 trillion base pairs, 29 million sequence records
  • TSA: 8 trillion base pairs, 8 million sequence records
  • TLS: 559 million base pairs, 1 million sequence records

During that same period, 618,896 records were updated. An average of 20,888 traditional records were added and/or updated per day.

The total number of sequence data files increased by 33 with this release. The divisions are as follows:

  • BCT: 13 new files, now a total of 348
  • CON: 3 new files, now a total of 207
  • PAT: 2 new files, now a total of 197
  • PLN: 10 new files, now a total of 156
  • VRT: 4 new files, now a total of 109

For downloading purposes, please keep in mind that the uncompressed GenBank release 232.0 flatfiles require roughly 1006 GB (sequence files only). The ASN.1 data require approximately 793 GB.

Upcoming changes

The set of legal values for the /linkage_evidence qualifier of the assembly_gap feature will be expanded to include “proximity ligation” on or after October 15 2019.

The current working description for proximity ligation is: “ligation of segments of DNA that were brought into proximity in chromatin (Hi-C and related technologies)”. See also the proposed AGP 2.1 specification.

For additional release information including exact numbers of base pairs and records added, please see the GenBank release notes, as well as in the README files in the genbank and ASN.1 (ncbi-asn1) directories on FTP.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s