GenBank release 221.0 is available via FTP, Entrez and BLAST

GenBank release 221.0 is available via FTP, Entrez and BLAST

GenBank release 221.0 (8/13/2017) has 203,180,606 traditional records containing 240,343,378,258 base pairs of sequence data. In addition, there are 499,965,722 WGS records containing 2,242,294,609,510 base pairs of sequence data, 186,777,106 TSA records containing 167,045,663,417 base pairs of sequence data, and 1,628,475 TLS records containing 824,191,338 base pairs of sequence data.

During the 56 days between the close dates for GenBank releases 220.0 and 221.0, the traditional portion of GenBank grew by 5,346,015,635 base pairs and by 1,517,038 sequence records. During that same period, 180,064 records were updated – an average of 30,305 records added or updated per day.

Between releases 220.0 and 221.0, the WGS component of GenBank grew by 77,610,616,141 base pairs and by 12,073,955 sequence records. The TSA component grew by 8,932,694,344 base pairs and by 9,964,976 sequence records. The TLS component of GenBank did not change between releases 220.0 and 221.0.

The total number of sequence data files increased by 45 with this release. The divisions are as follows:

  • BCT: 21 new files, now a total of 391
  • CON: 4 less files, now a total of 359
  • INV: 1 new file, now a total of 155
  • PAT: 3 new files, now a total of 294
  • PLN: 2 new files, now a total of 150
  • PRI: 1 new file, now a total of 57
  • TSA: 4 new files, now a total of 234
  • VRL: 1 new file, now a total of 50
  • VRT: 16 new files, now a total of 80

For downloading purposes, please keep in mind that the uncompressed GenBank release 221.0 flatfiles require roughly 841 GB (sequence files only). The ASN.1 data require approximately 698 GB.

More information about GenBank release 221.0 is available in the release notes, as well as in the README files in the genbank (ftp.ncbi.nih.gov) and ASN.1 (ncbi-asn1) directories.

Leave a Reply