GenBank release 233

GenBank release 233

GenBank release 233.0 (8/21/2019) is now available on the NCBI FTP site. This release has 6.26 terabases and 1.65 billion records.

The release has 213,865,349 traditional records containing 366.7 billion base pairs of sequence data. There are also 1.07 billion WGS records containing 5.6 trillion base pairs of sequence data, 331.3 million bulk-oriented TSA records containing 294.7 trillion base pairs of sequence data, and 26 million bulk-oriented TLS records containing 10.5 billion base pairs of sequence data.

GenBank growth between releases

During the 63 days between the close dates for releases 232.0 and 233.0, the GenBank components grew as follows:

  • Traditional: 366.7 billion base pairs, 213.9 million sequence records
  • WGS: 5.58 trillion base pairs, 1.07 billion sequence records
  • TSA: 294.7 billion base pairs, 331.3 million sequence records
  • TLS: 10.5 billion base pairs, 26.4 million sequence records

During that same period, 97,720 records were updated. An average of 9,195 traditional records were added and/or updated per day.

The total number of sequence data files increased by 33 with this release. The divisions are as follows:

  • BCT: 17 new files, now a total of 365
  • CON: 1 new file, now a total of 208
  • ENV: 1 new file, now a total of 58
  • INV: 6 new files, now a total of 77
  • PAT: 1 new file, now a total of 198
  • PLN: 25 new files, now a total of 181
  • VRL: 1 new file, now a total of 34
  • VRT: 48 new files, now a total of 157

For downloading purposes, please keep in mind that the uncompressed GenBank release 233.0 flatfiles require roughly 1057 GB (sequence files only). The ASN.1 data require approximately 809 GB.

Upcoming changes

The set of legal values for the /linkage_evidence qualifier of the assembly_gap feature will be expanded to include “proximity ligation” on or after October 15, 2019.

The current working description for proximity ligation is: “ligation of segments of DNA that were brought into proximity in chromatin (Hi-C and related technologies)”. See also the proposed AGP 2.1 specification.

For additional release information including exact numbers of base pairs and records added, please see the GenBank release notes, as well as in the README files in the genbank and ASN.1 (ncbi-asn1) directories on FTP.

Leave a Reply