ClinVar’s new XML aggregated by Variation ID

Now it’s easier than ever to access all data in ClinVar for a variant or set of variants across all reported diseases.  ClinVar’s new XML is organized by variant only (Variation ID), instead of the variant-disease pair. This reduces redundancy, for example in cases where a variant is related to several disease concepts, and makes the XML consistent with the ClinVar web pages. You can get ClinVarVariationRelease XML from the /xml/clinvar_variation/ directory on the ClinVar FTP site.  New features in ClinVarVariationRelease XML shown in Figure 1 include:

  • Explicit elements to distinguish between variants that were directly interpreted and “included” variants, those that were interpreted only as part of a Haplotype or Genotype. The clinical significance for included variants is indicated as “no interpretation for the single variant”.
  • Explicit elements to distinguish records for simple allele,  haplotypes, and genotypes
  • The Replaces element that provides a history and indicates accessions that were merged into the current accession.
  • A section that  maps the submitted name or identifier for the interpreted condition to the corresponding name used in ClinVar and the MedGen Concept Identifier (CUI)

ClinVarXML_markupFigure 1.  ClinVar variant-centric XML showing a variant record for a haplotype (VCV000236230) that comprises two included variations (SimpleAlleles) that are marked as “no interpretation for the single variant”.  The record includes all the condition records (RCVList) with names and identifiers from MedGen, OMIM and other sources.

To learn more about how to use this data, read our documentation.

Tell us how ClinVar has helped you by writing to us at clinvar@ncbi.nlm.nih.gov.

One thought on “ClinVar’s new XML aggregated by Variation ID

Leave a Reply