The ALFA dataset: New aggregated allele frequency from dbGaP and dbSNP now available

NIH’s data sharing policy now allows unrestricted access to genomic summary results for data from NCBI’s Database of Genotypes and Phenotypes (dbGaP).  Pooled allele frequency data from dbSNP and the dbGaP summary results are available as the new Allele Frequency Aggregator (ALFA) dataset. The ALFA dataset includes aggregated and harmonized array chip genotyping, exome, and genome sequencing data. The ALFA data are open access and freely available for you to incorporate into your workflows and applications from the dbSNP web pages (Figure 1), through FTP,and the Variation Services API. dbGaP currently has data for more than 2 million study subjects, approximately 1 million of whom have genotype data that is suitable for input into the ALFA dataset. The first release of ALFA contains data on about 100,000 subjects, and we hope to complete processing of data on the other 925,000 subjects within the next year. This volume and variety of data promises unprecedented opportunities to identify genetic factors that influence health and disease.  Register to attend our April 22 webinar and read on to learn more.

ALFAFigure 1.  ALFA allele frequencies for a variant (rs4988235) in the promotor of the lactase gene showing frequency differences across populations.

Continue reading

New in ClinVar – notifications for changes in the clinical interpretation of variants

We have added a new feature to ClinVar that allows you to follow a particular variant and be notified if the overall clinical interpretation in ClinVar changes, for example from a pathogenic category to a non-pathogenic one.  This service will let you know about changes that may require you to update your analysis reports and contact your patients and ordering physicians. The new feature allows you to follow a variant from the  variation page (Figure 1).  Simply click the “Follow” button to begin receiving notifications.

FollowFigure 1. A ClinVar variant page (VCV000541155.1) showing the ‘Follow’ button. The text on the button changes to ‘Following’ after you add  it to your followed variants. Clicking ‘Following’ presents the option to ‘Unfollow’, which removes the variant from the followed list when clicked.

Continue reading

View BAM alignments in the NCBI genome browsers and sequence viewers sorted by haplotype tag

NCBI’s genome browsers and graphical sequence viewers now allow you to view BAM alignments sorted by haplotype tag. This option is useful for analyzing variants within a sequenced sample and can help you detect or validate structural variants.GDV_bamsFigure 1. Remote BAM alignment data sorted by haplotype tag in the Genome Data Viewer. The remote BAM file was added through the “User Data and Track Hubs” feature in GDV.  You can load the remote BAM for this example through https://go.usa.gov/xpM9c. The sorted display shows that haplotype 1 contains a significant deletion in this region relative to haplotype 2 and the reference genome assembly. Aligned reads not assigned a haplotype tag in the BAM file are grouped under the heading “haplotype not set” (not shown). 

Continue reading

ClinVar Celebrates 1 Million Submissions

1M

 

Text: 1 million submitted records in ClinVar represent more than 568,000 unique variants ClinVar is proud to announce the submission of the one millionth record to its database.

The millionth submission was published on Friday, December 20, 2019, a milestone achievement for providing open access to human variant data with asserted consequence to the clinical genetics and research communities.

ClinVar extends its thanks to the many laboratories, partners, and members of the community whose efforts and adoption of the practice of data-sharing paved the way for this achievement. All organizations that contributed to ClinVar’s genetics resources share in this accomplishment, with special recognition reserved for ClinGen and several of their members, including EGL Genetic Diagnostics/Eurofins Clinical Diagnostics, GeneDx, Invitae, and Laboratory for Molecular Medicine/Partners HealthCare Personalized Medicine, whose early submissions helped jump-start ClinVar’s database.

Continue reading

December 4 Webinar: Human population genetic variation data at NCBI

On Wednesday, December 4, 2019 at 12 PM, NCBI staff will present a webinar on the population variation datasets at NCBI such as 1000 Genomes, ExAC, GnomAD, and TopMed that are currently included on dbSNP records. You will learn how to find the data, and how you can used this information to interpret and prioritize variants for further study. You will also see a preview a new initiative, the dbGaP Allele Frequency Aggregator (ALFA), that is based on more than 150,000 subjects in 60 dbGaP studies.

  • Date and time: Wed, Dec 4, 2019 12:00 PM – 12:45 PM EDT
  • Register

Continue reading

NCBI at ASHG 2019: Two Data CoLabs Demonstrate How to Analyze NextGen Sequence Data and Access Genetic Variation Population Data

NCBI will be attending the American Society of Human Genetics (ASHG) 2019 in Houston Texas on Oct 15-19.

This year, we will be presenting two CoLabs – interactive sessions where you can learn about new NCBI tools and resources. Read on below for a description of each CoLab and join us at ASHG next week!

Continue reading

dbSNP celebrates 20 years!

dbSNP was established in August 1999 as a collaboration between NCBI and the National Human Genome Research Institute (NHGRI) as a database of small scale nucleotide variants. The database includes both common and rare single-base nucleotide variation (SNV), short (=< 50bp) deletion/insertion polymorphisms, and other classes of small genetic variations.

Continue reading

Structural Variant Hackathon

NCBI is pleased to announce a Structural Variant Hackathon at the Baylor College of Medicine, Houston Texas, immediately before ASHG on October 11-13, 2019.

We’re specifically looking for folks who have experience in working with structural variants, complex disease, precision medicine, and similar genomic analysis.  If this describes you, please apply! This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for large scale genomic analyses from high-throughput experiments (please note that the event itself will focus on open access public human data).

Potential topics include:

  • Mapping structural variants to public databases
  • Calculating the heritability of different types of structural variants
  • CNV effect on isoform expression
  • Assembly accuracy for metagenomics
  • Quality assessment in large cohorts

The hackathon runs from 9 am – 6 pm each day, with the potential to extend into the evening hours each day. There will also be optional social events at the end of each day. Working groups of five to six individuals, with various backgrounds and expertise, will be formed into five to eight teams with an experienced leader. These teams will build pipelines and tools to analyze large datasets within a cloud infrastructure. Each day, we will come together to discuss progress on each of the topics, bioinformatics best practices, coding styles, etc.

There will be no registration fee associated with attending this event.

Note: Participants will need to bring their own laptop to this program. No financial support for travel, lodging, or meals is available for this event.

Continue reading

ClinVar’s new XML aggregated by Variation ID

Now it’s easier than ever to access all data in ClinVar for a variant or set of variants across all reported diseases.  ClinVar’s new XML is organized by variant only (Variation ID), instead of the variant-disease pair. This reduces redundancy, for example in cases where a variant is related to several disease concepts, and makes the XML consistent with the ClinVar web pages. You can get ClinVarVariationRelease XML from the /xml/clinvar_variation/ directory on the ClinVar FTP site.  New features in ClinVarVariationRelease XML shown in Figure 1 include:

  • Explicit elements to distinguish between variants that were directly interpreted and “included” variants, those that were interpreted only as part of a Haplotype or Genotype. The clinical significance for included variants is indicated as “no interpretation for the single variant”.
  • Explicit elements to distinguish records for simple allele,  haplotypes, and genotypes
  • The Replaces element that provides a history and indicates accessions that were merged into the current accession.
  • A section that  maps the submitted name or identifier for the interpreted condition to the corresponding name used in ClinVar and the MedGen Concept Identifier (CUI)

ClinVarXML_markupFigure 1.  ClinVar variant-centric XML showing a variant record for a haplotype (VCV000236230) that comprises two included variations (SimpleAlleles) that are marked as “no interpretation for the single variant”.  The record includes all the condition records (RCVList) with names and identifiers from MedGen, OMIM and other sources.

To learn more about how to use this data, read our documentation.

Tell us how ClinVar has helped you by writing to us at clinvar@ncbi.nlm.nih.gov.

50,000 new clinically relevant structural variation calls in dbVar

We’ve expanded the catalog of clinically relevant structural variants (SV) in dbVar by adding 57,520 ClinVar records.  You can access the newly added data through study nstd102.

The updated collection includes:

  • 20,000 new SVs, and more than 37,000 copy number variants (CNV) observed in ClinGen laboratories during routine cytogenomic laboratory testing that were previously accessioned separately at dbVar
  • 15,000 SVs asserted as ‘Pathogenic’ or ‘Likely pathogenic’ for thousands of clinical genetic disorders including breast, ovarian, and colon cancers; hypercholesterolemia; schizophrenia; Duchenne Muscular Dystrophy; autism spectrum disorders; and many others
  • links to more than 1,600 related PubMed articles and thousands of related data records in ClinVar, OMIM, GeneReviews, MedGen, MeSH, etc.

You can browse dbVar studies on the web or download the data.  We provide dbVar data  in a number of standard formats (VCF, GVF, and TSV) mapped to assemblies GRCh38, GRCh37, and NCBI36 allowing you perform analysis using standard tools and integrate the data into your bioinformatic workflows.

Visit our Walkthrough page to learn how to use these new dbVar data to help interpret structural variation in your favorite gene or genomic region.