Announcing GenBank release 252.0

Announcing GenBank release 252.0

Now over 3 billion records!

GenBank release 252.0 (10/17/2022) is now available on the NCBI FTP site. This release has 20.35 trillion bases and 3.10 billion records. The current release has 240,539,282 traditional records containing 1,562,963,366,851 base pairs of sequence data. There are also 2,167,900,306 WGS records containing 18,231,960,808,828 base pairs of sequence data, 574,020,080 bulk-oriented TSA records containing 511,476,787,957 base pairs of sequence data, and 115,123,306 bulk-oriented TLS records containing 43,860,512,749 base pairs of sequence data. 

Growth between releases

During the 63 days between the close dates for GenBank Releases 251.0 and 252.0, the traditional portion of GenBank grew by 70,162,662,354 basepairs and by 623,496 sequence records. During that same period, 25,466 records were updated. An average of 10,301 traditional records were added and/or updated per day.

Between releases 251.0 and 252.0, the WGS component of GenBank grew by 720,151,132,199 basepairs and by 143,800,629 sequence records. The TSA component of GenBank grew by 13,975,407,571 basepairs and by 13,823,250 sequence records. The TLS component of GenBank grew by 8,232,104 basepairs and by 19,779 sequence records.

The total number of sequence data files increased by 216 with this release. The divisions are as follows:

  • BCT: 37 new files, now a total of 857
  • CON: 28 files removed, now a total of 231
  • ENV: 3 new files, now a total of 75
  •  INV: 99 new files, now a total of 965
  • PLN: 61 new files, now a total of 1013
  • VRL: 39 new files, now a total of 813
  • VRT: 5 new files, now a total of 320

Sequence data file notes

With GenBank Release 249.0 in April 2022, we noticed an unusually large increase of 36 sequence flatfiles for the CON-division. The increase was due to the inclusion of “external annotation” erroneously incorporated into the ASN.1 version of 174 WGS-associated chomosomal scaffolds within a set of CON records.

The rendering and content of these 174 records in the GenBank flatfile representation was not negatively impacted by this error. However, customers who use the ASN.1 representation of GenBank records would have seen dramatic increases in their sizes.

We corrected the problem with this October 2022 GenBank Release 252.0 and the overall number of CON-division files has decreased. We apologize for any difficulties this caused.

Additional Information

For downloading purposes, please keep in mind that the uncompressed GenBank release 252.0 sequence data flatfiles require roughly 2,815 GB. The ASN.1 data files require approximately 1,432 GB.

For more information about GenBank release 252.0, see the release notes, as well as the README files in the GenBank and ASN.1 (ncbi-asn1) directories on FTP.

Leave a Reply