GenBank release 238 is available

GenBank release 238.0 (6/19/2020) is now available on the NCBI FTP site. This release has 8.93 trillion bases and 2 billion records.

The current release has 217,122,233 traditional records containing 427,823,258,901 base pairs of sequence data. There are also 1,302,852,615 WGS records containing 8,114,046,262,158 base pairs of sequence data, 409,725,050 bulk-oriented TSA records containing 359,947,709,062 base pairs of sequence data, and 75,063,181 bulk-oriented TLS records containing 27,500,635,128 base pairs of sequence data.

Growth between releases

During the 59 days between the close dates for GenBank releases 237.0 and 238.0, the traditional portion of GenBank grew by 12,053,230,952 base pairs and by 590,404 sequence records. During that same period, 72,649 records were updated. An average of 11,238 traditional records were added and/or updated per day.

Between releases 237.0 and 238.0, the WGS component of GenBank grew by 325,913,040,820 base pairs and by 35,305,186 sequence records. The TSA component of GenBank grew by 10,254,957,534 base pairs and by 13,332,770 sequence records. The TLS component of GenBank grew by 2,885,364,815 base pairs and by 9,542,049 sequence records.

The total number of sequence data files increased by 46 with this release. The divisions are as follows:

  • BCT: 21 new files, now a total of 453
  • MAM: 7 new files, now a total of 71
  • PAT: 1 new file, now a total of 205
  • PLN: 14 new files, now a total of 226
  • SYN: 1 new file, now a total of 28
  • VRL: 2 new files, now a total of 38

For downloading purposes, please keep in mind that the uncompressed GenBank release 238.0 sequence data flatfiles require roughly 1163 GB. The ASN.1 data require approximately 867 GB.

More information about GenBank release 238.0 is available in the release notes, as well as in the README files in the genbank and ASN.1 (ncbi-asn1) directories on FTP.

Leave a Reply