Proposed changes to AGP files for genome assemblies

If you are a consumer or producer of AGP (A Golden Path) files for genome assemblies, please read on.  We’d like your feedback on the proposed changes described here.

As you know, AGP files are used to describe the structure of certain genome assemblies. The AGP file format has not kept up with changes in sequencing technology or International Sequence Database Collaboration (INSDC) feature usage. NCBI is therefore proposing to extend the current AGP v2.0 specification to add new linkage evidence types and a gap type of “contamination” as detailed below and described in the AGP v2.1 proposed specification.

Proposed changes from AGP v2.0 to AGP v2.1:

  • Add ‘proximity-ligation’ and ‘pcr’ to the set of accepted linkage evidence values
  • Drop ‘strobe’ from the set of accepted linkage evidence values
  • Expand the definition of ‘paired-end’ linkage evidence to include ‘mate-pairs’ and molecular-barcode techniques
  • Add a gap-type of ‘contamination’
    • definition: a gap inserted in place of foreign sequence to maintain the coordinates
    • usage: treated as linked to preserve the original scaffold but with linkage evidence ‘unspecified’

Timeline

April 16 – May 7: Comment period
May 8 – May 10: AGP v2.1 proposal finalized
May 12 – May 16: AGP v2.1 approved at the annual INSDC meeting
Summer 2019: NCBI begins accepting the new linkage-evidence types, and using the contamination gap type

Note: NCBI would continue to accept genome submissions in AGP v2.0 format.

We are seeking your input on these proposed changes. Please comment on this post or write to suggest@ncbi.nlm.nih.gov if you have any comments or suggestions.

One thought on “Proposed changes to AGP files for genome assemblies

  1. GenBank has some examples where ‘strobe’ is used, therefore, we will retain ‘strobe’ as a linkage-evidence option in AGP v2.1

Leave a Reply