GenBank release 241.0

GenBank release 241.0 (12/21/2020) is now available on the NCBI FTP site. This release has 12.98 trillion bases and 2.27 billion records.

The current release has 221,467,827 traditional records containing 723,003,822,007 base pairs of sequence data. There are also 1,517,995,689 WGS records containing 11,830,842,428,018 base pairs of sequence data, 446,397,378 bulk-oriented TSA records containing 392,206,975,386 base pairs of sequence data, and 88,039,152 bulk-oriented TLS records containing 33,036,509,446 base pairs of sequence data.

Growth between releases

During the 54 days between the close dates for GenBank releases 240.0 and 241.0, the ‘traditional’ portion of GenBank grew by 24,315,727,961 basepairs and by 2,412,620 sequence records. During that same period, 169,921 records were updated. An average of 47,825 ‘traditional’ records were added and/or updated per day.

Between releases 240.0 and 241.0, the WGS component of GenBank grew by 2,615,026,858,509 basepairs and by 85,121,437 sequence records. The TSA component of GenBank grew by 9,210,313,116 basepairs and by 10,428,999 sequence records. The TLS component of GenBank grew by 4,221,710,578 basepairs and by 9,861,794 sequence records.

The total number of sequence data files increased by 91 with this release. The divisions are as follows:

  • BCT: 21 new files, now a total of 533
  • CON: 1 new file, now a total of 219
  • ENV: 1 new file, now a total of 63
  • INV: 35 new files, now a total of 132
  • PAT: 4 new files, now a total of 217
  • PLN: 11 new files, now a total of 605
  • VRL: 1 new file, now a total of 45
  • VRT: 17 new files, now a total of 231

Upcoming Change: New /circular_RNA Qualifier

Complementing the new “circRNA” ncRNA class, a new qualifier will be introduced on/after GenBank Release 242.0 in February 2021. The circular_RNA preliminary definition is as follows:

  • Qualifier: /circular_RNA
  • Definition: indicates that exons are out-of-order or overlapping because this spliced RNA product is a circular RNA (circRNA) created by backsplicing (for example, when a downstream exon in the gene is located 5′ of an upstream exon in the RNA product)
  • Comment: qualifier should be used on features such as CDS, mRNA, tRNA and other features that are produced as a result of a backsplicing event. This qualifier should be used only when the splice event is indicated in the “join” operator, such as: join(complement(69611..69724),139856..140087)

Examples demonstrating the use of /circular_RNA will be provided in forthcoming GenBank release notes.

Additional Information

For downloading purposes, please keep in mind that the uncompressed GenBank release 241.0 sequence data flatfiles require roughly 1,562 GB. The ASN.1 data files require approximately 976 GB.

More information about GenBank release 241.0 is available in the release notes, as well as in the README files in the GenBank and ASN.1 (ncbi-asn1) directories on FTP.

Leave a Reply