GenBank exceeds 3 Terabases in release 224

GenBank exceeds 3 Terabases in release 224

GenBank release 224.0 (2/13/2018) has 207,040,555 traditional records (including non-bulk-oriented TSA) containing 253,630,708,098 base pairs of sequence data.

In addition, there are 564,286,852 WGS records containing 2,608,532,210,351 base pairs of sequence data, 214,324,264 TSA records containing 193,940,551,226 base pairs of sequence data, and 12,819,978 TLS records containing 4,531,966,831 base pairs of sequence data.

During the 60 days between the close dates for GenBank releases 223.0 and 224.0, the traditional portion of GenBank grew by 3,908,544,504 base pairs and by 746,930 sequence records. During that same period, 62,840 records were updated – an average of 13,496 records added or updated per day.

Between releases 223.0 and 224.0, the WGS component of GenBank grew by 142,434,157,024 base pairs and by 13,223,787 sequence records. The TSA component grew by 12,545,891,038 base pairs and by 12,764,762 sequence records. The TLS component grew by 73,924,215 base pairs and by 124,780 sequence records.

The total number of sequence data files increased by 36 with this release. The divisions are as follows:

  • BCT: 22 new files, now a total of 450
  • CON: 1 less file, now a total of 362
  • ENV: 1 new file, now a total of 101
  • INV: 2 new files, now a total of 161
  • PAT: 3 new files, now a total of 323
  • PLN: 2 new files, now a total of 168
  • VRL: 2 new files, now a total of 53
  • VRT: 5 new files, now a total of 85

For downloading purposes, please keep in mind that the uncompressed GenBank release 224.0 flatfiles require roughly 871 GB (sequence files only). The ASN.1 data require approximately 719 GB.

More information about GenBank release 224.0, including upcoming changes, is available in the release notes, as well as in the README files in the genbank and ASN.1 (ncbi-asn1) directories on FTP.

2 thoughts on “GenBank exceeds 3 Terabases in release 224

Leave a Reply