GenBank Release 250.0 is available!

GenBank Release 250.0 is available!

GenBank release 250.0 (6/17/2022) is now available on the NCBI FTP site. This release has 18.63 trillion bases and 2.69 billion records. 

The current release has 239,017,893 traditional records containing 1,395,628,631,187 base pairs of sequence data. There are also 1,796,349,114 WGS records containing 16,710,373,006,600 base pairs of sequence data, 546,991,572 bulk-oriented TSA records containing 485,056,129,761 base pairs of sequence data, and 111,142,107 bulk-oriented TLS records containing 41,999,358,847 base pairs of sequence data.

Growth between releases

During the 60 days between the close dates for GenBank Releases 249.0 and 250.0, the ‘traditional’ portion of GenBank grew by 129,473,740,269 basepairs and by 1,497,575 sequence records. During that same period, 61,874 records were updated. An average of 25,991 ‘traditional’ records were added and/or updated per day.

Between releases 249.0 and 250.0, the WGS component of GenBank grew by 638,852,304,430 basepairs and by 14,974,897 sequence records. The TSA component of GenBank grew by 10,635,053,313 basepairs and by 12,220,986 sequence records. The TLS component of GenBank grew by 675,166,504 basepairs and by 1,321,720 sequence records.

The total number of sequence data files increased by 407 with this release. The divisions are as follows:

  • BCT: 41 new files, now a total of 789
  •  INV: 77 new files, now a total of 715
  • MAM: 8 new files, now a total of 133
  • PAT: 1 new file, now a total of 251
  • PHG: 1 new file, now a total of 6
  • PLN: 51 new files, now a total of 932
  • PRI: 1 new file, now a total of 57
  • ROD: 113 new files, now a total of 190
  • VRL: 108 new files, now a total of 711
  • VRT: 6 new files, now a total of 303

Sequence data file notes

The number of CON-division flatfiles remains elevated. The problem might be the erroneous presence of quality-score data, with all scores being “-1”. Records in the largest of the gbcon*.qscore.gz files demonstrate the problem. The issue will be resolved before the August 251.0 release.

Additional Information

For downloading purposes, please keep in mind that the uncompressed GenBank release 250.0 sequence data flatfiles require roughly 2,585 GB. The ASN.1 data files require approximately 1,297 GB.

For more information about GenBank release 250.0, see the release notes, as well as the README files in the GenBank and ASN.1 (ncbi-asn1) directories on FTP.

Leave a Reply