GenBank release 227 available through FTP, BLAST & Entrez


GenBank release 227.0 (8/13/2018) has 208,831,050 traditional records including non-bulk-oriented TSA) containing 260,806,936,411 base pairs of sequence data. There are also 665,309,765 WGS records containing 3,204,855,013,281 base pairs of sequence data, 249,295,386 bulk-oriented TSA records containing 225,520,004,678 base pairs of sequence data, and 15,822,538 bulk-oriented TLS records containing 6,077,824,493 base pairs of sequence data.

During the 57 days between the close dates for GenBank releases 226.0 and 227.0, the traditional portion of GenBank declined by 3,150,948,128 base pairs and 944,298 sequence records. See Section 1.3.1 of the release notes for further information about this net decrease.

During that same period, 477,074 records were updated, and 1,520,363 new records were added. An average of 35,042 traditional records were added or updated per day.

Between releases 226.0 and 227.0, the WGS component of GenBank grew by 206,237,689,195 base pairs and by 25,505,660 sequence records. The TSA component grew by 8,963,318,047 base pairs and by 10,507,052 sequence records. The TLS component grew by 181,313,025 base pairs and by 429,497 sequence records.

The total number of sequence data files increased by 33 with this release. The divisions are as follows:

  • BCT: 28 new files, now a total of 520
  • CON: 4 new files, now a total of 369
  • GSS: 2 new files, now a total of 308
  • INV: 60 less files, now a total of 108
  • PAT: 2 new files, now a total of 337
  • PLN: 24 new files, now a total of 198
  • PRI: 1 new file, now a total of 59
  • VRL: 1 new file, now a total of 57

For downloading purposes, please keep in mind that the uncompressed GenBank release 227.0 flatfiles require roughly 894 GB (sequence files only). The ASN.1 data require approximately 738 GB.

More information about GenBank release 227.0 is available in the release notes, as well as in the README files in the genbank and ASN.1 (ncbi-asn1) directories on FTP. See Section 1.4.1 of the release notes for details about future accession format changes for WGS/TSA/TLS sequencing projects, and for protein sequences.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s