Are you interested in high quality genomic annotations for human and mouse? Check out the Consensus Coding Sequence (CCDS) project! Release 23 of the CCDS project is now available in Entrez Gene. This release compares NCBI’s Mus musculus annotation release 108 to Ensembl’s annotation release 98. This update adds 1,570 new CCDS records and 175 genes to the mouse CCDS dataset. In total, release 23 includes 27,219 CCDS records that correspond to 20,486 genes.
On Wednesday, November 13, 2019 at 12 PM, NCBI staff will present a webinar on NCBI resources for next-gen sequence analysis. You will learn about key resources that support multiple aspects of next-gen sequence analyses, including quality control, alignment, data visualization and interpreting results. You will also see how to access and apply these resources for both SRA and your own RNASeq/DNASeq datasets. Whether you’re embarking on your first analysis or already have a background in bioinformatics, you’ll find tools that meet your needs!
- Date and time: Wed, Nov 13, 2019 12:00 PM – 12:45 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
NCBI is pleased to announce a Biomedical Data Science Codeathon in collaboration with Carnegie Mellon in Pittsburgh, PA on January 8-10, 2020.
We’re specifically seeking people with experience working with complex diseases, precision medicine, and genomic analyses. If this describes you, please apply! This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for large scale genomic analyses from high-throughput experiments. The event is open to anyone selected for the codeathon and willing to travel to Pittsburgh.
Potential topics include:
- Virus Genome Graph tools
- Image analysis pipelines
- RNAseq pipelines
- Cancer graph genomes
- Complex Disease Analysis
GenBank release 234.0 (10/14/2019) is now available on the NCBI FTP site. This release has 6.69 trillion bases and 1.68 billion records.
The release has 216,763,706 traditional records containing 386,197,018,538 base pairs of sequence data. There are also 1,097,629,174 WGS records containing 5,985,250,251,028 base pairs of sequence data, 342,811,151 bulk-oriented TSA records containing 305,371,891,408 base pairs of sequence data, and 27,460,978 bulk-oriented TLS records containing 10,848,455,369 base pairs of sequence data.
You now have access to bulk settings options for track hubs in the Genome Data Viewer (GDV) and Sequence Viewer. These settings allow you to pick the default tracks that load into the viewer from your chosen track hub. You can access the bulk options menu for by clicking on the collapsed menu or “hamburger” icon (stack of horizontal bars) at the right end of the track grouping in the Configure Track Hubs dialog (Figure 1).Figure 1. The Configure Track Hubs dialog in GDV. You can activate the bulk settings menu for a connected track hub by clicking on hamburger icon at the right of the track grouping. Clicking Select Default tracks checks on all of the tracks in that grouping, Smoothed PhyloCSF in this case. Continue reading
As you may have heard, we are working on a new version of PubMed, and we’ve recently released some new features that you can check out.
A new user guide answers many common questions about how best to use the new site. We’ve also added links on the new PubMed homepage to many popular sites including the E-utilities, Advanced Search, and the MeSH database.
The action menu (Figure 1) now contains Collections and My Bibliography, allowing you to manage and share groups of citations. After running a search, you will also find a “Create alert” link under the search box that lets you set up automatic My NCBI email updates for your search.
Going forward, we will continue to develop new features leading up to the time when this new version of PubMed will replace the legacy PubMed. As this progresses, we would love to hear what you think about these new additions! Please use the “Feedback” button (available on every page of the new PubMed) to submit your comments, questions, or concerns.
The latest dbVar data release includes the Genome in a Bottle benchmark structural variant (SV) callset (pre-print Zook et al. 2019) – a highly scrutinized, carefully curated set of 12,745 sequence-resolved deletions, insertions, and delins variants from Personal Genome Project Ashkenazi trio son HG002. The data serve as a robust benchmark standard with which to measure the performance of sequencing and variant-calling pipelines. It “reliably identifies both false negatives and false positives in high-quality SV callsets” (pre-print Zook et al. 2019) that are based on short-, linked-, and long-read sequencing as well as optical mapping.
Here are the latest videos on our YouTube channel. Subscribe to get alerts for new videos.
Genome Workbench version 3 is a major upgrade, including the addition of the Genome Submission Wizard. This video guides you through the wizard, from uploading your genome data file to completion of the submitter report, which is ready to submit to GenBank using tools such as Submission Portal or BankIt. Note: An on-line tutorial is under “Manuals” on the Genome Workbench home page.
You can now download images in both PDF and Scaled Vector Graphics (SVG) formats from our Sequence Viewer and genome browsers such as the Genome Data Viewer! SVG files are ideal for editing in image editors and provide high quality graphics for publications, posters, and presentations. Both the PDF and SVG files that you download contain vector graphics for high fidelity images.
You can download image files by choosing the “Printer-Friendly PDF/SVG” option under the Tools menu from any Graphical Sequence Viewer application (Figure 1).
Figure 1. Printer friendly download options from the graphical view in the Genome Data Viewer. You can download either PDF or SVG formats, which are easily edited in standard graphics applications.
The new ClinVar
The new design for ClinVar pages is now our default view! Thank you for the feedback on the new design while it was under development. The redesigned pages have several new features described in a previous post. The current post highlight some of these improvements in the new ClinVar including the separate variant and condition views, retrieving specific versions of records, and support for ClinVar variant accessions and XML in the E-Utilities .
Using the New ClinVar Pages variant (VCV) and condition views (RCV)
One important improvement in ClinVar is the separate variant-centric and condition-centric views represented by (VCV) accession number and the (RCV) accessions respectively. The VCV record shows ClinVar data aggregated by a variant or set of variants (haplotype). The RCV aggregates conditions reported for a particular variant or set of variants. These two pages are especially useful in cases where there are different interpretations for a variant as the examples below show.
BRCA2 variant: hereditary breast and ovarian cancer
Variants in the BRCA2 gene may cause hereditary breast and ovarian cancer. However, there are many different terms that represent “hereditary breast and ovarian cancer” or related conditions. If you look at an RCV record for only one term, such as “Breast ovarian cancer, familial, 2”, you may only see that the variant has been interpreted as Likely pathogenic. Using the VCV record, you can view all of the interpretations for this variant, so that you see that the variant has been interpreted as both Likely pathogenic for “Breast ovarian cancer, familial, 2” and Uncertain significance for “Hereditary breast and ovarian cancer syndrome” (Figure 1). Aggregating conditions on the VCV record makes it clear that the variant is pathogenic for some forms of hereditary breast cancer
Figure 1. Aggregating by condition on the VCV record for NM_000059.3:c.67G>A makes clear that the variant is likely pathogenic for some forms of hereditary breast cancer even though the interpretation is uncertain for a one breast cancer syndrome.
SCN5A variant: Brugada syndrome and Long QT syndrome 3
Variants in the SCN5A gene may cause two different arrhythmogenic disorders: Brugada syndrome and Long QT syndrome 3. For the coding region variant VCV000067672.1, you can see that there seem to be conflicting interpretations of pathogenicity (Figure 2). But when you look at the interpretations for each disorder using the Conditions tab, you’ll see that the these apparently conflicting interpretations are for different disorders (conditions). The variant has been interpreted as Pathogenic for Long QT syndrome 3 (RCV000677695.1) but as Uncertain significance for Brugada syndrome (RCV000638649.1). The RCV records allow you to distinguish different interpretations for different disorders.
Figure 2. The conditions interpreted for the variant NM_000335.4:c.1604G>A. The variant has a different interpretation for the two arrythmogenic disorders.
Likewise starting from the point of view of a condition such as Brugada syndrome you could quickly find out that the same variant has been interpreted in different ways for other conditions by linking to the variant report.
Retrieving specific version of ClinVar (accession.version)
ClinVar records have versioned accessions (accession. version) that allow you to retrieve a specific version of a record. These work in a similar way to version records in other NCBI molecular resources. For example you can retrieve the most recent version of a record by searching with the accession without the version, VCV000007105 or retrieve a previous version by searching with the full accession.version, VCV000007105.3. (Note: Version specific searching for ClinVar records works only on the ClinVar resource. An All Databases search only retrieves the most recent version.)
Changes to E-utilities (esearch, efetch, esummary)
The new web pages use ClinVar’s new variation-centric XML as the source of data and new accession numbers, beginning with VCV. E-utilities for ClinVar also now support VCV accessions and return the new XML format. You can now use E-Fetch to retrieve the latest VCV record using VCV accession number, an accession.version or a variation ID.
We are continually working to improve the display and usability of the website. Please use the feedback button on each Variation page, send us your comments, and let us know how ClinVar has helped you at firstname.lastname@example.org.