Improved ClinVar search quickly connects you to information about variants

If you’ve been searching in ClinVar, you might have noticed search improvements introduced in December that reliably connect you with information on your variant of interest. ClinVar has broadened its search capability to accept many different ways of expressing the same variation, including variation described on RefSeq transcripts and proteins. If your variant expression  is not reported in ClinVar, we alert you to other variants at the same genomic location or link you to related information in other NCBI resources such as dbSNP, LitVar, and PubMed. ClinVar will also now interpret expressions that contain minor errors or warn you about improper syntax that it cannot interpret.

sensor2Figure 1.  Improved search results in Clinvar showing mapping of an HGVS expression to the equivalent variant in ClinVar.

Here are some example queries that show the improved search results.

NM_001318787.1:c.2258G>A – an HGVS expression that is not in ClinVar, but ClinVar has an alternate expression for a variant (Figure 1).

NM_004958.3:c.7365C>A – a variant not in ClinVar, but another variant is at the same genomic location is in ClinVar.

NM_002113.2:c.19delG – a variant is not in ClinVar, but there is additional information for the variant in other databases.

We welcome your feedback on your search experience and any additional ideas on how to improve searching in ClinVar.

February 6 Webinar: New Variation Services for Normalizing, Remapping, and Annotating Variants

Join us on Wednesday, February, 2019, when NCBI staff will show you how to use a new set of NCBI variation services that rely on a variant data model called SPDI (Sequence Position Deletion Insertion). These services and data model allow you to inter-convert, map and disambiguate variants in standard formats (RefSNP accessions, HGVS and VCF). Unlike many current variant notation systems, SPDI provides unambiguous, machine-readable definitions of variants. SPDI not only powers SNP build and mapping procedures at NCBI but also our variant sensors that are active in the global search and ClinVar. These services and notation system provide valuable new tools for people who work with sequence variants.additional variant information.

Date and time: Wed, Feb 6, 2019 12:00 PM – 12:30 PM EDT


After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

dbSNP build 152 uses SPDI variant notation

dbSNP build 152 is a small incremental update from build 151 provided for you to begin testing and integrating the new build products into your workflow. Build 152 uses the new system with SPDI variant notation and is now available on FTP and the new RefSNP webpage.

The release notes have more information about what’s new in build 152. If you have any questions or comments, send us an email.

November 14 Webinar: Variant Interpretation using NCBI Resources

Next Wednesday, November 14, 2018, NCBI staff will show you how to use NCBI’s genome browsers and other resources to interpret variants. The graphical displays of Genome Data Viewer (GDV) and Variation Viewer offer an interactive experience that allows you to explore NCBI’s rich collection of annotations, datasets and literature for deciphering your variant-associated data. In this presentation, we’ll step through case studies and show you how to quickly display relevant NCBI track sets — including the new RefSeq Functional Elements track, upload a file or remotely-hosted dataset and display these as a track, and use browser tracks to identify known variants, then assess variant functional and clinical significance and allele frequency. You will also learn how to navigate from the browsers to NCBI resources such as ClinVar, dbSNP and PubMed, for additional variant information.

Date and time: Wed, Nov 14, 2018 12:00 PM – 12:45 PM EDT


After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.



See how dbSNP improves data quality at ASHG 2018

NCBI staff will share knowledge on various topics at the American Society of Human Genetics (ASHG) conference this month  in San Diego. Here, on NCBI Insights, we feature some preliminary details for one of NCBI’s dbSNP posters.

You can visit poster 1692W “Improving dbSNP Data Quality and Annotation for Variant Interpretation” on Wednesday, Oct. 17 from 3 PM to 4 PM at ASHG.

Continue reading

Standalone variation services replace Variation Reporter

As of July 2018, a new set of standalone variation services replaces the variant matching functions of Variation Reporter. Variation Reporter was a tool designed to search human sequence variation data by location and to report matching variants found in dbSNP, dbVar, and ClinVar.

The new services are faster, better at handling variants in repeat regions, and scalable to accommodate the continued explosive growth of variation volume. You can find more information about the services in the initial blog post and online SPDI document.

If you would like to report any issues related to these new services and/or would like to provide comments, please write to

If you have any specific questions about the NCBI site in general, contact us at

We appreciate your continued support and interaction with the NCBI tools.

dbSNP database doubles in size twice in 13 months

In little over a year, dbSNP human data have doubled in size from 150 million Reference SNP (rs) records to 325 million in Build 150, and again to more than 650 million rs records in Build 151. 580 million of these rs records have frequency data in Build 151.This explosive growth makes dbSNP the world’s largest public human variation database. Current trends suggest that large-scale WGS and WES projects will discover millions of new variations in the next few years.

Build 151 was released in March 2018. The data are available for web search and FTP download.

NCBI’s dbSNP houses variation and frequency data from large-scale projects including 1000Genomes, GO-ESP, ExAC, GnomAD, TOPMED and HLI, as well as focused studies like locus-specific databases (LSDB) and clinical sources. The rs records are annotated on RefSeq genomes, mRNA and protein sequences and integrated with other NCBI resources (e.g., Assembly, Gene, RefSeq, PubMed, and BioProject). The database is used worldwide in personal genomics, medical genetics, and for managing, annotating and analysis of variation data.

Important dbSNP updates: New JSON data files, RefSNP report, API

dbSNP is moving to the new design with new products ready for testing including new JSON data files, the RefSNP page, and an API.

New JSON data files

Human Build 151 release is the last build that will provide relational database table dumps on the FTP site. Instead, dbSNP data will be available as a cumulative file of RefSNP objects in the JSON format in future build releases. These JSON files are available now for users to begin migration and testing. Tutorials for parsing JSON are on GitHub.

Continue reading

Bioinformatics paper uses NCBI open data to analyze drug response

study (PMID: 28158543) published in the July 2017 issue of Bioinformatics collects, classifies and analyzes single nucleotide variants (SNVs) that may affect response to currently approved drugs. They identified 2,640 SNVs of interest, most of which occur rarely in populations (minor allele frequency <0.01).

The researchers used protein sequence alignment tools and mined open data from multiple information resources accessed through E-utilities including PubChem Compound (Kim et al., 2016 PMID: 26400175), NCBI Gene (Maglott D, et al., 2014. PMID: 25355515), NCBI Protein (Sayers, 2013), MMDB (Madej et al., 2012 PMID: 22135289), PDB (Berman et al., 2000 PMID: 10592235), dbSNP (Sherry et al., 2001 PMID: 11125122), and ClinVar (Landrum et al., 2016 PMID: 26582918).

Questions, comments, and other feedback may be sent to Yanli Wang.

RefSeq release 85 is now public

RefSeq release 85 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available, as of November 6, 2017, and contains 146,710,309 records, including 100,043,962 proteins, 20,905,608 RNAs, and sequences from 73,996 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings. See the RefSeq release notes for more information.

Continue reading