Announcing GenBank Release 251.0

Announcing GenBank Release 251.0

GenBank release 251.0 (8/15/2022) is now available on the NCBI FTP site. This release has 19.55 trillion bases and 2.94 billion records. The current release has 239,915,786 traditional records containing 1,492,800,704,497 base pairs of sequence data. There are also 2,024,099,677 WGS records containing 17,511,809,676,629 base pairs of sequence data, 560,196,830 bulk-oriented TSA records containing 497,501,380,386 base pairs of sequence data, and 115,103,527 bulk-oriented TLS records containing 43,852,280,645 base pairs of sequence data. 

Growth between releases

During the 58 days between the close dates for GenBank Releases 250.0 and 251.0, the traditional portion of GenBank grew by 97,172,073,310 basepairs and by 897,893 sequence records. During that same period, 80,044 records were updated. An average of 16,861 traditional records were added and/or updated per day.

Between releases 250.0 and 251.0, the WGS component of GenBank grew by 801,436,670,029 basepairs and by 227,750,563 sequence records. The TSA component of GenBank grew by 12,445,250,625 basepairs and by 13,205,258 sequence records. The TLS component of GenBank grew by 1,852,921,798 basepairs and by 3,961,420 sequence records.

The total number of sequence data files increased by 311 with this release. The divisions are as follows:

  • BCT: 31 new files, now a total of 820
  • ENV: 2 new files, now a total of 72
  •  INV: 151 new files, now a total of 866
  • MAM: 8 new files, now a total of 141
  • PLN: 20 new files, now a total of 952
  • ROD: 24 new files, now a total of 214
  • VRL: 63 new files, now a total of 774
  • VRT: 12 new files, now a total of 315

Sequence data file notes

The number of CON-division flatfiles remains elevated due to the inclusion of “external annotation” within a set of CON records that were updated in April 2022. We will resolve the issue as soon as possible. The CON file count will likely decrease for GenBank 252.0.

Additional Information

For downloading purposes, please keep in mind that the uncompressed GenBank release 251.0 sequence data flatfiles require roughly 2,585 GB. The ASN.1 data files require approximately 1,396 GB.

For more information about GenBank release 251.0, see the release notes, as well as the README files in the GenBank and ASN.1 (ncbi-asn1) directories on FTP.

Leave a Reply