Join NCBI at ASHG 2018, October 16-20


Putting your schedule together for ASHG? Don’t forget to look at all of NCBI’s activities, which include 1 GRC Workshop, 1 Booth (#315), 2 Co-Labs, and 10 Poster presentations. We created a handy schedule below, with links to posts where we’ve highlighted events.

Booth #315, Exhibit Hall:

  • Wednesday, October 17, 10:00 AM – 4:30 PM
  • Thursday, October 18, 10:00 AM – 4:30 PM
  • Friday, October 19, 10:00 AM – 4:30 PM

Visit us at the booth to provide feedback, have questions answered or just to chat!

Continue reading

Matched Annotation by NCBI and EMBL-EBI (MANE): a new joint venture to define a set of representative transcripts for human protein-coding genes


The RefSeq project at the NCBI and the Ensembl/GENCODE project at EMBL-EBI have provided independent high-quality human reference gene datasets to biologists since the sequencing of the human genome. Now we’re joining together on an exciting new project we’re calling Matched Annotation from the NCBI and EMBL-EBI or MANE, to provide a matched set of well-supported transcripts for human protein-coding genes and define one representative transcript for each gene. Both RefSeq and Ensembl will continue to provide a rich set of alternate transcripts per gene.

The MANE project builds on the successful CCDS collaboration (PMCID: PMC5753299) and incorporates feedback from RefSeq and Ensembl/GENCODE users who requested a common reference transcript dataset including one or a few key transcripts for each gene where the RefSeq and Ensembl/GENCODE transcripts are identical in length and sequence, and completely match the human reference genome sequence. We expect to later expand the project to include a larger subset of full-length transcripts that more fully represent the functional complexity of many genes. We’re leveraging public deep sequencing datasets to optimize 5’ and 3’ UTR endpoints to more accurately reflect transcriptional processes. To pick representative transcripts, we’ve developed computational methods to evaluate and integrate transcript expression levels, protein conservation, support from archived transcript submissions, clinical relevance, and other factors. Complex genes are subject to review by annotation experts from both groups to agree on a representative transcript and often make improvements to both annotation sets.

The unified, high-quality transcript set provided by the MANE project will simplify the task of choosing a transcript for comparative genomics, clinical reporting, and basic research. When integrated across different public genome resources, this minimal, identically annotated transcript set will eliminate the need to choose between RefSeq and Ensembl/GENCODE datasets for genomic analyses. This will also make it easy for researchers who currently prefer one dataset over the other to exchange data or translate coordinates (or HGVS variation expressions) between RefSeq and Ensembl annotation results. Furthermore, the perfect alignment of all MANE transcripts to GRCh38 will make the set compatible with NGS-based sequencing technologies and other resources that use the latest and highest-quality reference human genome assembly available.

Our goal is for the final MANE dataset to be stable, although individual sequences and the dataset as a whole will be versioned and allow for future updates and expansions as needed to incorporate significant new data and additional curation. We plan to release a partial “beta” transcript set by the end of the year for testing, and a large sequence update in the next few months to refine 5’ and 3’ RefSeq transcript ends and match the GRCh38 sequence. Ensembl plans to release similar updates in spring 2019.

We’re looking forward to your feedback! Next week, we will be presenting the project at the annual American Society for Human Genetics (ASHG) meeting in San Diego, CA, USA. Please attend our talks scheduled in the Genome Reference Consortium (GRC) workshop on Tuesday, October 16, at 1:00 PM, and in the Importance of Isoform Expression in Variant Interpretation Session (#94) on Saturday, October 20th at 9:15 AM.  You can also visit us at the NCBI or Ensembl booths and posters throughout the meeting or send us feedback at info@ncbi.nlm.nih.gov. We’re looking forward to your valuable input on our new initiative!

See improvements in NCBI’s genome visualization and analysis tools at ASHG


GDV_homepage

In 2016, NCBI introduced the Genome Data Viewer (GDV). This past May, the GDV replaced the aging Map Viewer. Over the past year, NCBI has kept you updated about GDV through announcements, webinars, and blogs. Now you can gather information and get an overview of all the changes to GDV in person at ASHG!

Check out Poster 1670F “What’s new with NCBI tools for genome visualization and analysis.” on Friday, Oct. 19 from 3 PM to 4 PM
(Exhibit Hall, Ground Level)

Continue reading

NCBI at ASHG 2018: Data and Clinical CoLabs introduce interactive graphical displays and medical genetics resources


As you know, NCBI will be attending American Society of Human Genetics (ASHG) 2018 in San Diego.

This year, we have two CoLabs – interactive sessions where you can learn about freely available NCBI tools and resources. Read on below for a description of each CoLab and join us at ASHG in two weeks!

Continue reading

October 10 Webinar: Using NCBI Medical Genetics Resources: MedGen, ClinVar, GTR


Next Wednesday, October 10, 2018,  NCBI staff will show you how to use the NCBI resources MedGen, ClinVar, and GTR to locate records for a specified list of symptoms or clinical features, explore specific disease-causing variants, see the review status of the clinical significance for a genetic variant, and find tests relevant to a clinical feature, gene or disease. You will also learn which resource works best for different types of searches.

Date and time: Wed, Oct 10, 2018 12:00 PM – 12:45 PM EDT

Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

See how dbSNP improves data quality at ASHG 2018


NCBI staff will share knowledge on various topics at the American Society of Human Genetics (ASHG) conference this month  in San Diego. Here, on NCBI Insights, we feature some preliminary details for one of NCBI’s dbSNP posters.

You can visit poster 1692W “Improving dbSNP Data Quality and Annotation for Variant Interpretation” on Wednesday, Oct. 17 from 3 PM to 4 PM at ASHG.

Continue reading

NCBI at ASHG 2018: “Storage and use of dbGaP data in the cloud”


As the American Society of Human Genetics (ASHG) conference is around the corner, the NCBI staff begin to prep for their presentations in San Diego. Here is some background for dbGaP’s poster about their process to improve data storage and accessibility.

Visit Poster 1435T “Storage and use of dbGaP data in the cloud” Thursday, October 18 from 2 PM to 3PM. (Exhibit Hall, Ground Floor)

Continue reading

October 3, 2018 Webinar: Using BLAST Well


Next Wednesday, October 3, 2018, the lead of the NCBI BLAST group will show you how to be more effective with NCBI’s standalone BLAST applications. You will learn how to optimize database selection, output formats, taxonomy information and use our next-gen alignment program Magic-BLAST.  You can also use many of these strategies to improve your web BLAST searches.

Date and time: Wed, Oct 3, 2018 12:00 PM – 12:45 PM EDT

Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

dbVar now provides easy-to-use human non-redundant SV reference datasets to aid the interpretation of structural variants


dbVar non-redundant SV (NR SV) datasets include more than 2.2 million deletions, 1.1 million insertions, and 300,000 duplications. These data are aggregated from over 150 studies including 1000 Genomes Phase 3, Simons Genome Diversity Project, ClinGen, ExAC, and others. You can use NR SV data files to filter and annotate variants in a broad range of applications:

  1. Clinicians can easily filter patients’ genome data to find SV that overlap with variants previously reported as clinically significant.
  2. Researchers can compare the results of their own genome-wide SV surveys with dbVar NR data to identify variants that are novel or rare, those which may be pathogenic, and in some cases obtain allele frequencies for matching variants. Users can also annotate SV data with NR SV and other genomic annotations to prioritize those variants most likely to impact biological function.
  3. Developers of variant analysis pipelines can use dbVar NR data to help identify novel variants, calibrate their algorithms, or simply integrate the data into downstream analysis tools and workflows.

dbVar’s NR SV reference data are updated monthly. These updates include new database submissions. We welcome your feedback on the content and usability of these files so that we can improve them.

For more information, please see our GitHub site, which includes brief tutorials and access to NR SV datasets by >FTP.