Growth between releases
During the 60 days between the close dates for GenBank Releases 249.0 and 250.0, the ‘traditional’ portion of GenBank grew by 129,473,740,269 basepairs and by 1,497,575 sequence records. During that same period, 61,874 records were updated. An average of 25,991 ‘traditional’ records were added and/or updated per day.
Between releases 249.0 and 250.0, the WGS component of GenBank grew by 638,852,304,430 basepairs and by 14,974,897 sequence records. The TSA component of GenBank grew by 10,635,053,313 basepairs and by 12,220,986 sequence records. The TLS component of GenBank grew by 675,166,504 basepairs and by 1,321,720 sequence records.
The total number of sequence data files increased by 407 with this release. The divisions are as follows:
- BCT: 41 new files, now a total of 789
- INV: 77 new files, now a total of 715
- MAM: 8 new files, now a total of 133
- PAT: 1 new file, now a total of 251
- PHG: 1 new file, now a total of 6
- PLN: 51 new files, now a total of 932
- PRI: 1 new file, now a total of 57
- ROD: 113 new files, now a total of 190
- VRL: 108 new files, now a total of 711
- VRT: 6 new files, now a total of 303
Sequence data file notes
The number of CON-division flatfiles remains elevated. The problem might be the erroneous presence of quality-score data, with all scores being “-1”. Records in the largest of the gbcon*.qscore.gz files demonstrate the problem. The issue will be resolved before the August 251.0 release.
For downloading purposes, please keep in mind that the uncompressed GenBank release 250.0 sequence data flatfiles require roughly 2,585 GB. The ASN.1 data files require approximately 1,297 GB.