GenBank release 230.0 (2/15/2019) with 4.74 Terabases and 1.47 billion records is now available from the NCBI FTP site (flatfiles, ASN.1). There are two notable changes with this release. Because we have increased in the target maximum uncompressed file-size, the number of files dropped by about 1,000. We are also now assigning expanded WGS and protein accessions. WGS accessions now may have a six-letter Project Code prefix, a two-digit Assembly-Version number, followed by seven, eight, or nine digits, for example AAAABB010000001. Protein accessions may now have three-letter followed by seven digits, for example EAA0000001. See section 1.3.1 and 1.3.2 of the Release Notes for details.
The release has 212,260,377 traditional records containing 303,709,510,632 base pairs of sequence data. There are also 945,019,312 WGS records containing 4,164,513,961,679 base pairs of sequence data, 294,772,430 bulk-oriented TSA records containing 263,936,885,705 base pairs of sequence data, and 23,259,929 bulk-oriented TLS records containing 9,146,836,085 base pairs of sequence data.
During the 64 days between the close dates for GenBank Releases 229.0
and 230.0, the traditional portion of GenBank grew by 18,020,968,446
basepairs and 978,962 sequence records. During that same period,
25,301 records were updated. An average of 15,691 ‘traditional’ records
were added and/or updated per day.
Between releases 229.0 and 230.0, the WGS component of GenBank grew by
507,794,538,583 basepairs and by 171,246,122 sequence records, the TSA component of grew by 15,343,993,517 basepairs and by 19,926,957 sequence records, and the TLS component grew by 635,006,804 basepairs and by 2,335,341 sequence records.
For downloading purposes, please keep in mind that the uncompressed GenBank release 230.0 flatfiles require roughly 964 GB (sequence files only). The ASN.1 data require approximately 773 GB.
For additional release information, see the README files in either of
the directories linked above, and the Release Notes.