Now it’s easier than ever to access all data in ClinVar for a variant or set of variants across all reported diseases. ClinVar’s new XML is organized by variant only (Variation ID), instead of the variant-disease pair. This reduces redundancy, for example in cases where a variant is related to several disease concepts, and makes the XML consistent with the ClinVar web pages. You can get ClinVarVariationRelease XML from the /xml/clinvar_variation/ directory on the ClinVar FTP site. New features in ClinVarVariationRelease XML shown in Figure 1 include:
Explicit elements to distinguish between variants that were directly interpreted and “included” variants, those that were interpreted only as part of a Haplotype or Genotype. The clinical significance for included variants is indicated as “no interpretation for the single variant”.
Explicit elements to distinguish records for simple allele, haplotypes, and genotypes
The Replaces element that provides a history and indicates accessions that were merged into the current accession.
A section that maps the submitted name or identifier for the interpreted condition to the corresponding name used in ClinVar and the MedGen Concept Identifier (CUI)
Figure 1. ClinVar variant-centric XML showing a variant record for a haplotype (VCV000236230) that comprises two included variations (SimpleAlleles) that are marked as “no interpretation for the single variant”. The record includes all the condition records (RCVList) with names and identifiers from MedGen, OMIM and other sources.
To learn more about how to use this data, read our documentation.
If you’ve been searching in ClinVar, you might have noticed search improvements introduced in December that reliably connect you with information on your variant of interest. ClinVar has broadened its search capability to accept many different ways of expressing the same variation, including variation described on RefSeq transcripts and proteins. If your variant expression is not reported in ClinVar, we alert you to other variants at the same genomic location or link you to related information in other NCBI resources such as dbSNP, LitVar, and PubMed. ClinVar will also now interpret expressions that contain minor errors or warn you about improper syntax that it cannot interpret.
Figure 1. Improved search results in Clinvar showing mapping of an HGVS expression to the equivalent variant in ClinVar.
Here are some example queries that show the improved search results.
NM_001318787.1:c.2258G>A – an HGVS expression that is not in ClinVar, but ClinVar has an alternate expression for a variant (Figure 1).
NM_004958.3:c.7365C>A – a variant not in ClinVar, but another variant is at the same genomic location is in ClinVar.
NM_002113.2:c.19delG – a variant is not in ClinVar, but there is additional information for the variant in other databases.
We welcome your feedback on your search experience and any additional ideas on how to improve searching in ClinVar.
MedGen is a free, comprehensive resource for one-stop access to essential information on phenotypic health topics related to medical genetics as collected from established high-quality sources. It integrates terminology from multiple primary ontologies (or nomenclatures) to facilitate standardization and more accurate results from search queries.