GenBank release 239 is available

GenBank release 239.0 (8/18/2020) is now available on the NCBI FTP site. This release has 9.89 trillion bases and 2.12 billion records.

The current release has 218,642,238 traditional records containing 654,057,069,549 base pairs of sequence data. There are also 1,408,122,887 WGS records containing 8,841,649,410,652 base pairs of sequence data, 417,524,567 bulk-oriented TSA records containing 366,968,951,160 base pairs of sequence data, and 75,682,157 bulk-oriented TLS records containing 27,825,059,498 base pairs of sequence data.

Growth between releases

During the 60 days between the close dates for GenBank Releases 238.0 and 239.0, the ‘traditional’ portion of GenBank grew by 226,233,810,648 basepairs and by 1,520,005 sequence records. During that same period, 80,474 records were updated. An average of 26,675 ‘traditional’ records were added and/or updated per day.

Between releases 238.0 and 239.0, the WGS component of GenBank grew by 727,603,148,494 basepairs and by 105,270,272 sequence records. The TSA component of GenBank grew by 7,021,242,098 basepairs and by 7,799,517 sequence records. The TLS component of GenBank grew by 324,424,370 basepairs and by 618,976 sequence records.

The total number of sequence data files increased by 425 with this release. The divisions are as follows:

  • BCT: 37 new files, now a total of 490
  • ENV: 2 new files, now a total of 62
  • INV: 9 new files, now a total of 95
  • MAM: 5 new files, now a total of 76
  • PAT: 7 new files, now a total of 212
  • PLN: 321 new files, now a total of 547
  • PRI: 1 new file, now a total of 35
  • ROD: 7 new files, now a total of 41
  • VRL: 2 new files, now a total of 38
  • VRT: 35 new files, now a total of 182

Note: The unusually large increase in the number of PLN-division files is due to an influx of multiple sets of near-gigabase-scale chromosomal records for wheat (Triticum aestivum) and barley (Hordeum vulgare subsp. vulgare).

For downloading purposes, please keep in mind that the uncompressed GenBank Release 239.0 sequence data flatfiles require roughly 1,461 GB. The ASN.1 data files require approximately 938 GB.

More information about GenBank release 239.0 is available in the release notes, as well as in the README files in the genbank and ASN.1 (ncbi-asn1) directories on FTP.

Leave a Reply