GenBank release 244.0

GenBank release 244.0 (6/26/2021) is now available on the NCBI FTP site. This release has 14.78 trillion bases and 2.46 billion records.

The current release has 227,888,889 traditional records containing 866,009,790,959 base pairs of sequence data. There are also 1,632,796,606 WGS records containing 13,442,974,346,437 base pairs of sequence data, 494,641,358 bulk-oriented TSA records containing 436,594,941,165 base pairs of sequence data, and 102,662,929 bulk-oriented TLS records containing 38,198,113,354 base pairs of sequence data.

Growth between releases

During the 57 days between the close dates for GenBank Releases 243.0 and 244.0, the ‘traditional’ portion of GenBank grew by 33,608,991,448 basepairs and by 765,688 sequence records. During that same period, 43,966 records were updated. An average of 14,204 ‘traditional’ records were added and/or updated per day.

Between releases 243.0 and 244.0, the WGS component of GenBank grew by 710,926,294,414 basepairs and by 42,126,147 sequence records. The TSA component of GenBank grew by 11,518,457,706 basepairs and by 13,486,438 sequence records. The TLS component of GenBank grew by 199,578,893 basepairs and by 267,176 sequence records.

The total number of sequence data files increased by 127 with this release. The divisions are as follows:

  • BCT: 30 new files, now a total of 618
  • INV:  29 new files, now a total of 300
  • PLN: 7 new files, now a total of 664
  • VRL: 54 new files, now a total of 130
  • VRT: 7 new files, now a total of 266

Delay in GenBank 244.0

Due to the significant delays which occurred for the prior GenBank 243.0 release (see Section 1.3.1 of the GenBank 243.0 release notes), processing for this GenBank 244.0 release was also impacted. Delivery is ten days later than our target date: June 25th rather than June 15th. We expect to be back on schedule for the August 2021 GenBank release, and regret any inconvenience caused by the delay.

Upcoming Change: New /regulatory_class values for the regulatory feature

As of the October 2021 GenBank Release 246.0, new values will be supported for the /regulatory_class qualifier:

  • recombination_enhancer : A regulatory region that promotes or induces the process of recombination.
  • uORF (or regulatory_uORF) : A short open reading frame that is found in the 5′ untranslated region of an mRNA and plays a role in translational regulation.

Further details about this change will be made available in August via the release notes for GenBank 245.0.

Additional Information

For downloading purposes, please keep in mind that the uncompressed GenBank release 244.0 sequence data flatfiles require roughly 1,780 GB. The ASN.1 data files require approximately 1,062 GB.

More information about GenBank release 244.0 is available in the release notes, as well as in the README files in the GenBank and ASN.1 (ncbi-asn1) directories on FTP.

Leave a Reply