Month: May 2020

GenBank release 237 is available

GenBank release 237.0 (4/21/2020) is now available on the NCBI FTP site. This release has over 8.58 trillion bases and 1.95 billion records.

The release has 216,531,829 traditional records containing 415,770,027,949 base pairs of sequence data. There are also 1,267,547,429 WGS records containing 7,788,133,221,338 base pairs of sequence data, 396,392,280 bulk-oriented TSA records containing 349,692,751,528 base pairs of sequence data, and 65,521,132 bulk-oriented TLS records containing 24,615,270,313 base pairs of sequence data.

During the 63 days between the close dates for GenBank Releases 236.0 and 237.0, the ‘traditional’ portion of GenBank grew by 16,393,173,077 base pairs and by 317,614 sequence records. During that same period, 55,268 records were updated. An average of 5,919 ‘traditional’ records were added and/or updated per day.

Between releases 236.0 and 237.0, the WGS component of GenBank grew by 819,141,955,586 basepairs and by 60,826,741 sequence records. The TSA component of GenBank grew by 8,698,462,463 basepairs and by 9,747,409 sequence records. The TLS component of GenBank grew by 10,945,592,117 basepairs and by 31,483,761 sequence records.

The total number of sequence data files increased by 59 with this release. The divisions are as follows:

  • BCT: 14 new files, now a total of 432
  • CON: 1 new file, now a total of 217
  • ENV: 1 new file, now a total of 60
  • INV: 6 new files, now a total of 86
  • MAM: 15 new files, now a total of 64
  • PLN: 8 new files, now a total of 212
  • VRT: 14 new files, now a total of 175

For downloading purposes, the uncompressed GenBank release 237.0 flat files require roughly 1142 GB, including the sequence files and the *.txt files. The ASN.1 data files require approximately 844 GB.

More information about GenBank release 237.0 is available in the Release Notes, as well as in the README files in the GenBank and ASN.1 (ncbi-asn1) directories on FTP.