Important changes coming to prokaryotic Reference and Representative genome assemblies

We are making changes to the set of bacterial and archaeal RefSeq Reference and Representative assemblies in February 2020.

  • We will reduce the number of Reference assemblies to 15 that have annotation provided by outside experts (Table 1) and re-annotate the 105 other current Reference assemblies using the latest Prokaryotic Genome Annotation Pipeline (PGAP) software. The re-annotated assemblies will lose reference status.
  • We will reassess and revise the set of Representative assemblies so that there is one assembly per species to better reflect the taxonomic diversity of the RefSeq bacterial and archaeal assemblies.

Assembly Strain
GCF_000191145.1 Acinetobacter pittii PHEA-2
GCF_000009045.1 Bacillus subtilis subsp. subtilis str. 168
GCF_000009085.1 Campylobacter jejuni subsp. jejuni NCTC 11168 = ATCC 700819
GCF_000022005.1 Caulobacter crescentus NA1000
GCF_000008725.1 Chlamydia trachomatis D/UW-3/CX
GCF_000007765.2 Coxiella burnetii RSA 493
GCF_000008865.2 Escherichia coli O157:H7 str. Sakai
GCF_000005845.2 Escherichia coli str. K-12 substr. MG1655
GCF_000240185.1 Klebsiella pneumoniae subsp. pneumoniae HS11286
GCF_000196035.1 Listeria monocytogenes EGD-e
GCF_000195955.2 Mycobacterium tuberculosis H37Rv
GCF_000006765.1 Pseudomonas aeruginosa PAO1
GCF_000006945.2 Salmonella enterica subsp. enterica serovar Typhimurium str. LT2
GCF_000006925.2 Shigella flexneri 2a str. 301
GCF_000013425.1 Staphylococcus aureus subsp. aureus NCTC 8325

Table 1. The set of 15 prokaryotic assemblies that will retain Reference status. These are regularly updated by an involved community of microbiologists.

These improvements have an impact on the following resources:

  • Microbial genomes BLAST
    • Updated Reference and Representative genomes databases.
  • Assembly
    • Fewer reference assemblies.
    • Different set of representative assemblies.
  • Genome
    • Updated¬† list of Reference genomes.
    • Changes to the assemblies listed in the “Representative” section of the individual Genome organism pages.
  • PGAP
    • Reduced list of Reference assemblies (began with software version 4.11, released in January 2020).
    • Protein alignments annotated on the Reference assemblies at the genus level given higher weight. This is a change compared to prior PGAP software where alignments of proteins on the reference genome(s) in the same clade were given higher weight.

Leave a Reply