GenBank release 225: Over 1 billion sequence records stored!

GenBank release 225: Over 1 billion sequence records stored!

GenBank release 225.0 (4/14/2018) has 208,452,303 traditional records (including non-bulk-oriented TSA) containing 260,189,141,631 base pairs of sequence data. In addition, there are 621,379,029 WGS records containing 2,784,740,996,536 base pairs of sequence data, 227,364,990 TSA records containing 205,232,396,043 base pairs of sequence data, and 14,782,654 TLS records containing 5,612,769,448 base pairs of sequence data.

During the 60 days between the close dates for GenBank releases 224.0 and 225.0, the traditional portion of GenBank grew by 6,558,433,533 base pairs and by 1,411,748 sequence records. During that same period, 86,960 records were updated – an average of 24,978 records added or updated per day.

Between releases 224.0 and 225.0, the WGS component of GenBank grew by 176,208,786,185 base pairs and by 57,092,177 sequence records. The TSA component grew by 11,291,844,817 base pairs and by 13,040,726 sequence records. The TLS component grew by 1,080,802,617 base pairs and by 1,962,676 sequence records.

The total number of sequence data files increased by 33 with this release. The divisions are as follows:

  • BCT: 24 new files, now a total of 474
  • CON: 3 new files, now a total of 365
  • ENV: 1 new file, now a total of 102
  • HTG: 1 new file, now a total of 155
  • INV: 2 new files, now a total of 163
  • MAM: 16 new files, now a total of 55
  • PAT: 7 new files, now a total of 330
  • PLN: 4 new files, now a total of 172
  • VRL: 1 new file, now a total of 54

For downloading purposes, please keep in mind that the uncompressed GenBank release 224.0 flatfiles require roughly 885 GB (sequence files only). The ASN.1 data require approximately 727 GB.

More information about GenBank release 225.0 is available in the release notes, as well as in the README files in the genbank and ASN.1 (ncbi-asn1) directories on FTP. See Section 1.4.1 of the release notes for details about future accession format changes for WGS/TSA/TLS sequencing projects, and for protein sequences.

One thought on “GenBank release 225: Over 1 billion sequence records stored!

Leave a Reply