Next Wednesday, November 14, 2018, NCBI staff will show you how to use NCBI’s genome browsers and other resources to interpret variants. The graphical displays of Genome Data Viewer (GDV) and Variation Viewer offer an interactive experience that allows you to explore NCBI’s rich collection of annotations, datasets and literature for deciphering your variant-associated data. In this presentation, we’ll step through case studies and show you how to quickly display relevant NCBI track sets — including the new RefSeq Functional Elements track, upload a file or remotely-hosted dataset and display these as a track, and use browser tracks to identify known variants, then assess variant functional and clinical significance and allele frequency. You will also learn how to navigate from the browsers to NCBI resources such as ClinVar, dbSNP and PubMed, for additional variant information.
Date and time: Wed, Nov 14, 2018 12:00 PM – 12:45 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
NCBI staff will share knowledge on various topics at the American Society of Human Genetics (ASHG) conference this month in San Diego. Here, on NCBI Insights, we feature some preliminary details for one of NCBI’s dbSNP posters.
You can visit poster 1692W “Improving dbSNP Data Quality and Annotation for Variant Interpretation” on Wednesday, Oct. 17 from 3 PM to 4 PM at ASHG.
As of July 2018, a new set of standalone variation services replaces the variant matching functions of Variation Reporter. Variation Reporter was a tool designed to search human sequence variation data by location and to report matching variants found in dbSNP, dbVar, and ClinVar.
The new services are faster, better at handling variants in repeat regions, and scalable to accommodate the continued explosive growth of variation volume. You can find more information about the services in the initial blog post and online SPDI document.
If you would like to report any issues related to these new services and/or would like to provide comments, please write to email@example.com.
If you have any specific questions about the NCBI site in general, contact us at firstname.lastname@example.org.
We appreciate your continued support and interaction with the NCBI tools.
In little over a year, dbSNP human data have doubled in size from 150 million Reference SNP (rs) records to 325 million in Build 150, and again to more than 650 million rs records in Build 151. 580 million of these rs records have frequency data in Build 151.This explosive growth makes dbSNP the world’s largest public human variation database. Current trends suggest that large-scale WGS and WES projects will discover millions of new variations in the next few years.
Build 151 was released in March 2018. The data are available for web search and FTP download.
NCBI’s dbSNP houses variation and frequency data from large-scale projects including 1000Genomes, GO-ESP, ExAC, GnomAD, TOPMED and HLI, as well as focused studies like locus-specific databases (LSDB) and clinical sources. The rs records are annotated on RefSeq genomes, mRNA and protein sequences and integrated with other NCBI resources (e.g., Assembly, Gene, RefSeq, PubMed, and BioProject). The database is used worldwide in personal genomics, medical genetics, and for managing, annotating and analysis of variation data.
dbSNP is moving to the new design with new products ready for testing including new JSON data files, the RefSNP page, and an API.
New JSON data files
Human Build 151 release is the last build that will provide relational database table dumps on the FTP site. Instead, dbSNP data will be available as a cumulative file of RefSNP objects in the JSON format in future build releases. These JSON files are available now for users to begin migration and testing. Tutorials for parsing JSON are on GitHub.
A study (PMID: 28158543) published in the July 2017 issue of Bioinformatics collects, classifies and analyzes single nucleotide variants (SNVs) that may affect response to currently approved drugs. They identified 2,640 SNVs of interest, most of which occur rarely in populations (minor allele frequency <0.01).
The researchers used protein sequence alignment tools and mined open data from multiple information resources accessed through E-utilities including PubChem Compound (Kim et al., 2016 PMID: 26400175), NCBI Gene (Maglott D, et al., 2014. PMID: 25355515), NCBI Protein (Sayers, 2013), MMDB (Madej et al., 2012 PMID: 22135289), PDB (Berman et al., 2000 PMID: 10592235), dbSNP (Sherry et al., 2001 PMID: 11125122), and ClinVar (Landrum et al., 2016 PMID: 26582918).
Questions, comments, and other feedback may be sent to Yanli Wang.
RefSeq release 85 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available, as of November 6, 2017, and contains 146,710,309 records, including 100,043,962 proteins, 20,905,608 RNAs, and sequences from 73,996 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings. See the RefSeq release notes for more information.
Starting in March 2018, SNP variation features will no longer be in RefSeq genome assembly records – chromosome and contig records with NC_, NT_, NW_ and AC_ accession prefixes. This change affects both the ASN.1 and flatfile records. Because the number of variants is already enormous and still growing, removing SNP features from these large genomic records will significantly reduce the size of RefSeq FTP files and make downloading and processing easier. We will continue to include SNPs on NG_-prefixed genomic records, and transcript (NM_, NR_, XM_, XR_) and protein (NP_, XP_, YP_) sequences.
Reminder: As of September 2017, NCBI has stopped accepting submissions for non-human SNPs in dbSNP and dbVar. RefSeq flatfiles will stop presenting non-human variant data in November 2017.
Subscribe to the refseq-announce listserv for regular updates on RefSeq.
RefSeq release 84 is now accessible online, via FTP and through NCBI’s programming utilities.
This full release incorporates genomic, transcript, and protein data available, as of September 11, 2017, and contains 140,627,690 records, including 95,563,598 proteins, 20,356,598 RNAs, and sequences from 72,965 organisms.
The release is provided in several directories as a complete dataset and as divided by logical groupings. See the RefSeq release notes for more information.
Phasing out support for non-human organisms
As of September 1, 2017, the dbSNP and dbVar databases have stopped accepting submissions for non-human organisms. Submissions for non-human variation will now be accepted by the European Variation Archive, one of our partners in the International Nucleotide Sequence Database (INSDC).
NCBI dbSNP is pleased to announce a newly designed Reference SNP (RefSNP, rs) Report webpage to provide enhanced performance and presentation for access to individual RefSNP records. This Alpha version of the report enables browsing of submitted and computed RefSNP variant data from the redesigned dbSNP build system.
Figure 1. The dbSNP RefSNP Report Alpha for rs268.