GenBank release 236.0 (2/20/2020) is now available on the NCBI FTP site. This release has over 7.72 trillion bases and 1.84 billion records.
The release has 216,214,215 traditional records containing 399,376,854,872 base pairs of sequence data. There are also 1,206,720,688 WGS records containing 6,968,991,265,752 base pairs of sequence data, 386,644,871 bulk-oriented TSA records containing 340,994,289,065 base pairs of sequence data, and 34,037,371 bulk-oriented TLS records containing 13,669,678,196 base pairs of sequence data.
During the 70 days between the close dates for GenBank Releases 235.0 and 236.0, the ‘traditional’ portion of GenBank grew by 10,959,596,863 base pairs and by 881,195 sequence records. During that same period, 62,552 records were updated. An average of 13,482 ‘traditional’ records were added and/or updated per day.
Between releases 235.0 and 236.0, the WGS component of GenBank grew by 691,440,065,062 base pairs and by 79,696,818 sequence records. The TSA component of GenBank grew by 15,561,272,936 base pairs and by 19,451,027 sequence records. The TLS component of GenBank grew by 2,389,081,582 base pairs and by 5,810,191 sequence records. The VRT component of GenBank decreased due to the suppression of 40 chromosomal records for the Coregonus sp. ‘balchen’ genome, with 2.1Gbp of sequence data. This organism is already represented by underlying sequence contigs plus chromosomal CON-division/scaffold records built from those contigs. The 40 suppressed records are redundant with those scaffolds, and their suppression resulted in fewer VRT-division files.
The total number of sequence data files increased by 48 with this release. The divisions are as follows:
- BCT: 17 new files, now a total of 418
- CON: 4 new files, now a total of 216
- ENV: 1 new file, now a total of 59
- MAM: 10 new files, now a total of 49
- PAT: 2 new files, now a total of 204
- PLN: 18 new files, now a total of 204
- VRL: 1 new file, now a total of 36
- VRT: 5 fewer files, now a total of 161
For downloading purposes, the uncompressed GenBank release 236.0 flat files require roughly 1117 GB, including the sequence files and the *.txt files.