Over 1 billion records in GenBank release 231

Over 1 billion records in GenBank release 231

GenBank release 231.0 (4/19/2019) is now available on the NCBI FTP site. This release has 5.03 terabases and 1.54 billion records.

The release has 212,775,414 traditional records containing 321,680,566,570 base pairs of sequence data. There are also 993,732,214 WGS records containing 4,421,986,382,065 base pairs of sequence data, 311,247,136 bulk-oriented TSA records containing 277,118,019,688 base pairs of sequence data, and 24,240,761 bulk-oriented TLS records containing 9,623,321,565 base pairs of sequence data.

During the 60 days between the close dates for GenBank releases 230.0 and 231.0, the traditional portion of GenBank grew by 17,971,055,938 base pairs and 515,037 sequence records.

During that same period, 63,674 records were updated. An average of 9,645 traditional records were added and/or updated per day.

Between releases 230.0 and 231.0, the WGS component of GenBank grew by 257,472,420,386 base pairs and by 48,712,902 sequence records. The TSA component of GenBank grew by 13,181,133,983 base pairs and by 16,474,706 sequence records. The TLS component of GenBank grew by 476,485,480 base pairs and by 980,832 sequence records.

The total number of sequence data files increased by 33 with this release. The divisions are as follows:

  • BCT: 11 new files, now a total of 335
  • CON: 1 less file, now a total of 204
  • INV: 3 new files, now a total of 71
  • MAM: 1 new file, now a total of 33
  • PAT: 2 new files, now a total of 195
  • PLN: 3 new files, now a total of 146
  • PRI: 1 new file, now a total of 34
  • VRL: 1 new file, now a total of 33
  • VRT: 38 new files, now a total of 105

The substantial increase in the number of VRT sequence files is due to an influx of chromosome-scale eukaryotic sequences since GenBank release 230.0. Please read section 1.3.1 of the Release Notes for more information.

For downloading purposes, please keep in mind that the uncompressed GenBank release 231.0 flatfiles require roughly 990 GB (sequence files only). The ASN.1 data require approximately 783 GB.

More information about GenBank release 231.0 is available in the release notes, as well as in the README files in the genbank and ASN.1 (ncbi-asn1) directories on FTP.

Leave a Reply