Announcing GenBank Release 249.0

Announcing GenBank Release 249.0

GenBank release 249.0 (4/19/2022) is now available on the NCBI FTP site. This release has 17.85 trillion bases and 2.66 billion records.

The current release has 237,520,318 traditional records containing 1,266,154,890,918 base pairs of sequence data. There are also 1,781,374,217 WGS records containing 16,071,520,702,170 base pairs of sequence data, 534,770,586 bulk-oriented TSA records containing 474,421,076,448 base pairs of sequence data, and 109,820,387 bulk-oriented TLS records containing 41,324,192,343 base pairs of sequence data.  

Growth between releases

During the 59 days between the close dates for GenBank releases 248.0 and 249.0, the ‘traditional’ portion of GenBank grew by 92,170,809,197 base pairs and by 1,182,034 sequence records. During that same period, 60,526 records were updated. An average of 21,060 ‘traditional’ records were added and/or updated per day.

Between releases 248.0 and 249.0, the WGS component of GenBank grew by 643,398,561,350 basepairs and by 30,869,210 sequence records. The TSA component of GenBank grew by 9,407,919,946 basepairs and by 10,305,985 sequence records. The TLS component of GenBank grew by 3,084,362 basepairs and by 10,421 sequence records.

The total number of sequence data files increased by 316 with this release. The divisions are as follows:

  • BCT: 36 new files, now a total of 748
  • CON: 36 new files, now a total of 259
  • EST: 1 new file, now a total of 577
  • GSS: 1 less file, now a total of 270
  • INV: 79 new files, now a total of 638
  • PAT: 3 new files, now a total of 250
  • PLN: 79 new files, now a total of 881
  • VRL: 78 new files, now a total of 603
  • VRT: 5 new files, now a total of 297

Sequence data file notes

The decrease in the number of GSS-division flatfiles is due to a small fluctuation in file packaging. There was no actual decrease in the number of GSS records for GenBank 249.0.

The increase in the number of CON-division flatfiles is unexpected. Something unusual about the added records may have triggered a problem with the file packaging technique that is used for GenBank releases. This will be investigated, and if it proves true, there could be a decrease in the number of CON-division flatfiles for the June 2022 GenBank release.

Additional Information

For downloading purposes, please keep in mind that the uncompressed GenBank release 249.0 sequence data flatfiles require roughly 2,401 GB. The ASN.1 data files require approximately 1,293 GB.

For more information about GenBank release 249.0, see the release notes, as well as the README files in the GenBank and ASN.1 (ncbi-asn1) directories on FTP.

 

Leave a Reply