GenBank release 245.0

GenBank release 245.0 (8/18/2021) is now available on the NCBI FTP site. This release has 15.31 trillion bases and 2.49 billion records.

The current release has 231,982,592 traditional records containing 940,513,260,726 base pairs of sequence data. There are also 1,653,427,055 WGS records containing 13,888,187,863,722 base pairs of sequence data, 498,305,045 bulk-oriented TSA records containing 440,578,422,611 base pairs of sequence data, and 106,995,218 bulk-oriented TLS records containing 39,930,167,315 base pairs of sequence data.

Growth between releases

During the 53 days between the close dates for GenBank Releases 244.0 and 245.0, the ‘traditional’ portion of GenBank grew by 74,503,469,767 basepairs and by 4,093,703 sequence records. During that same period, 228,701 records were updated. An average of 81,555 ‘traditional’ records were added and/or updated per day.

Between releases 244.0 and 245.0, the WGS component of GenBank grew by 445,213,517,285 basepairs and by 20,630,449 sequence records. The TSA component of GenBank grew by 3,983,481,446 basepairs and by 3,663,687 sequence records. The TLS component of GenBank grew by 1,732,053,961 basepairs and by 4,332,289 sequence records.

The total number of sequence data files increased by 226 with this release. The divisions are as follows:

  • BCT: 21 new files, now a total of 639
  • CON: 1 new file, now a total of 222
  • ENV: 2 new files, now a total of 67
  • INV:  65 new files, now a total of 365
  • MAM: 8 new files, now a total of 99
  • PAT: 15 new files, now a total of 245
  • PLN: 44 new files, now a total of 708
  • ROD: 21 new files, now a total of 77
  • VRL: 43 new files, now a total of 173
  • VRT: 6 new files, now a total of 272

Upcoming Change: New /regulatory_class values for the regulatory feature

As of the October 2021 GenBank Release 246.0, new values will be supported for the /regulatory_class qualifier:

  • recombination_enhancer : A regulatory region that promotes or induces the process of recombination.
  • uORF (or regulatory_uORF) : A short open reading frame that is found in the 5′ untranslated region of an mRNA and plays a role in translational regulation.

See the release notes for more information about the new /regulatory_class values.

Additional Information

For downloading purposes, please keep in mind that the uncompressed GenBank release 245.0 sequence data flatfiles require roughly 1,888 GB. The ASN.1 data files require approximately 1,098 GB.

More information about GenBank release 245.0 is available in the release notes, as well as in the README files in the GenBank and ASN.1 (ncbi-asn1) directories on FTP.

 

 

Leave a Reply