Over 1 billion records in GenBank release 231


GenBank release 231.0 (4/19/2019) is now available on the NCBI FTP site. This release has 5.03 terabases and 1.54 billion records.

The release has 212,775,414 traditional records containing 321,680,566,570 base pairs of sequence data. There are also 993,732,214 WGS records containing 4,421,986,382,065 base pairs of sequence data, 311,247,136 bulk-oriented TSA records containing 277,118,019,688 base pairs of sequence data, and 24,240,761 bulk-oriented TLS records containing 9,623,321,565 base pairs of sequence data.

During the 60 days between the close dates for GenBank releases 230.0 and 231.0, the traditional portion of GenBank grew by 17,971,055,938 base pairs and 515,037 sequence records.

During that same period, 63,674 records were updated. An average of 9,645 traditional records were added and/or updated per day.

Between releases 230.0 and 231.0, the WGS component of GenBank grew by 257,472,420,386 base pairs and by 48,712,902 sequence records. The TSA component of GenBank grew by 13,181,133,983 base pairs and by 16,474,706 sequence records. The TLS component of GenBank grew by 476,485,480 base pairs and by 980,832 sequence records.

The total number of sequence data files increased by 33 with this release. The divisions are as follows:

  • BCT: 11 new files, now a total of 335
  • CON: 1 less file, now a total of 204
  • INV: 3 new files, now a total of 71
  • MAM: 1 new file, now a total of 33
  • PAT: 2 new files, now a total of 195
  • PLN: 3 new files, now a total of 146
  • PRI: 1 new file, now a total of 34
  • VRL: 1 new file, now a total of 33
  • VRT: 38 new files, now a total of 105

The substantial increase in the number of VRT sequence files is due to an influx of chromosome-scale eukaryotic sequences since GenBank release 230.0. Please read section 1.3.1 of the Release Notes for more information.

For downloading purposes, please keep in mind that the uncompressed GenBank release 231.0 flatfiles require roughly 990 GB (sequence files only). The ASN.1 data require approximately 783 GB.

More information about GenBank release 231.0 is available in the release notes, as well as in the README files in the genbank and ASN.1 (ncbi-asn1) directories on FTP.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s