NCBI on YouTube: new videos on PubMed, My Bibliography, sequence data and more

Here are the latest videos on our YouTube channel. Subscribe to get alerts for new videos.

Introducing the Genome Submission Wizard in Genome Workbench v3.0

Genome Workbench version 3 is a major upgrade, including the addition of the Genome Submission Wizard. This video guides you through the wizard, from uploading your genome data file to completion of the submitter report, which is ready to submit to GenBank using tools such as Submission Portal or BankIt. Note: An on-line tutorial is under “Manuals” on the Genome Workbench home page.

Continue reading

Vector graphics downloads now available in NCBI genome browsers and sequence views

You can now download images in both PDF and Scaled Vector Graphics (SVG) formats from our Sequence Viewer and genome browsers such as the Genome Data Viewer!  SVG files are ideal for editing in image editors and provide high quality graphics for publications, posters, and presentations. Both the PDF and SVG files that you download contain vector graphics for high fidelity images.

You can download image files by choosing the “Printer-Friendly PDF/SVG” option under the Tools menu from any Graphical Sequence Viewer application (Figure 1).

SVG_GDVFigure 1. Printer friendly download options from the graphical view in the Genome Data Viewer.  You can download either PDF or SVG formats, which are easily edited in standard graphics applications. 


Visit the new ClinVar for easier variant interpretation!

The new ClinVar

The new design for ClinVar pages is now our default view!  Thank you for the feedback on the new design while it was under development. The redesigned pages have several new features described in a previous post.  The current post highlight some of these improvements in the new ClinVar including the separate variant and condition views, retrieving specific versions of records, and support for ClinVar variant accessions and XML in the E-Utilities .

Using the New ClinVar Pages variant (VCV) and condition views (RCV)

One important improvement in ClinVar is the separate variant-centric  and condition-centric views represented by (VCV) accession number and the (RCV) accessions respectively. The VCV record shows ClinVar data aggregated by a variant or set of variants (haplotype). The RCV aggregates conditions reported for a particular variant or set of variants.  These two pages are especially useful in cases where there are different interpretations for a variant as the examples below show.

BRCA2 variant: hereditary breast and ovarian cancer

Variants in the BRCA2 gene may cause hereditary breast and ovarian cancer.  However, there are many different terms that represent “hereditary breast and ovarian cancer” or related conditions. If you look at an RCV record for only one term, such as “Breast ovarian cancer, familial, 2”, you may only see that the variant has been interpreted as Likely pathogenic. Using the VCV record, you can view all of the interpretations for this variant, so that you see that the variant has been interpreted as both Likely pathogenic for “Breast ovarian cancer, familial, 2” and Uncertain significance for “Hereditary breast and ovarian cancer syndrome” (Figure 1).  Aggregating conditions on the VCV record makes it clear that the variant is pathogenic for some forms of hereditary breast cancer

BRCA2_1Figure 1. Aggregating by condition on the VCV record for NM_000059.3:c.67G>A  makes clear that the variant is likely pathogenic for some forms of hereditary breast cancer even though the interpretation is uncertain for a one breast cancer syndrome.

SCN5A variant: Brugada syndrome and Long QT syndrome 3

Variants in the SCN5A gene may cause two different arrhythmogenic disorders: Brugada syndrome and Long QT syndrome 3.  For the coding region variant VCV000067672.1,  you can see that there seem to be conflicting interpretations of pathogenicity (Figure 2). But when you look at the interpretations for each disorder using the Conditions tab, you’ll see that the these apparently conflicting interpretations are for different disorders (conditions). The variant has been interpreted as Pathogenic for Long QT syndrome 3 (RCV000677695.1) but as Uncertain significance for Brugada syndrome (RCV000638649.1). The RCV records allow you to distinguish different interpretations for different disorders.

VCV_RCVFigure 2.  The conditions interpreted for the  variant NM_000335.4:c.1604G>A. The variant has a different interpretation for the two arrythmogenic disorders.

Likewise starting from the point of view of a condition such as Brugada syndrome you could quickly find out that the same variant has been interpreted in different ways for other conditions by linking to the variant report.

Retrieving specific version of ClinVar (accession.version)

ClinVar records have versioned accessions (accession. version) that allow you to retrieve a specific version of  a record.  These work in a similar way to version records in other NCBI molecular resources.  For example you can retrieve the most recent version of a record by searching with the accession without the version, VCV000007105 or retrieve a previous version by searching with the full accession.version, VCV000007105.3.  (Note: Version specific searching  for ClinVar records works only on the ClinVar resource. An All Databases search only retrieves  the most recent version.)

Changes to E-utilities (esearch, efetch, esummary)

The new web pages use ClinVar’s new variation-centric XML as the source of data and new accession numbers, beginning with VCV.   E-utilities for ClinVar also now support VCV accessions and return the new XML format.  You can now use E-Fetch to retrieve the latest VCV record using VCV accession number, an accession.version or a variation ID.

VCV accession:


VCV accession.version:


Variation ID:


We are continually working to improve the display and usability of the website. Please use the feedback button on each Variation page, send us your comments, and let us know how ClinVar has helped you at

New publication on AMRFinder, a tool that identifies resistance genes in pathogen genomes!

Read the recent publication (PMID: 31427293) on the AMRFinder, a tool that identifies antimicrobial resistance (AMR) genes in bacterial genome sequences using a high-quality curated AMR gene reference database.  We use the AMRFinder to identify AMR genes in the hundreds of bacterial genomes that NCBI receives every day, and the results of AMRFinder are used in NCBI’s Isolates Browser to provide accurate assessments of AMR gene content. You can install AMRFinder locally and run it yourself. Follow the instructions on our GitHub site.

Since the publication we have upgraded AMRFinder to AMRFinderPlus. The enhanced tool now

  • supports searches based on protein annotations, nucleotide sequences, or both for best results
  • identifies point mutations in CampylobacterE. coli, Shigella, and Salmonella
  • optionally identifies many genes involved in biocide, heat, metal, and stress resistance, as well as many antigenicity and virulence genes
  • provides information about gene function, including resistance to individual antibiotics and other phenotypes

You can learn more about NCBI’s role in helping to combat antimicrobial resistance at the National Database of Antibiotic Resistant Organisms.

NCBI at ASHG 2019: Two Data CoLabs Demonstrate How to Analyze NextGen Sequence Data and Access Genetic Variation Population Data

NCBI will be attending the American Society of Human Genetics (ASHG) 2019 in Houston Texas on Oct 15-19.

This year, we will be presenting two CoLabs – interactive sessions where you can learn about new NCBI tools and resources. Read on below for a description of each CoLab and join us at ASHG next week!

Continue reading

dbSNP celebrates 20 years!

dbSNP was established in August 1999 as a collaboration between NCBI and the National Human Genome Research Institute (NHGRI) as a database of small scale nucleotide variants. The database includes both common and rare single-base nucleotide variation (SNV), short (=< 50bp) deletion/insertion polymorphisms, and other classes of small genetic variations.

Continue reading

New search helps you find prokaryotic proteins

The latest improvement in the NCBI search experience is designed to help you quickly find microbial proteins. Now when you search for a prokaryotic protein name such as recombinase RecA in NCBI’s sequence databases or in the All databases search, a high-quality representative protein sequence is highlighted in a panel at the top of the results page (Figure 1).

The result panel also allows you to quickly link to related resources such as NCBI’s new pages for protein family models, Identical Protein Groups, and SPARCLE, NCBI’s protein domain architecture resource. We also provide as-you-type suggestions so you don’t have to type out some of the long names.


Figure 1.  The result for a search with recombinase RecA. The panel provides access to analysis tools, downloads, and relevant links to the protein family, the RefSeq protein, the identical protein group, and citations in PubMed.

Try these protein name searches, or your own, and use the as-you-type suggestions to assist your searches.

Please let us know how you like these results!

Protein BLASTDBs are accession-based

The version 5 BLAST (dbV5) protein databases are now accession-based. You can access these databases and the nucleotide BLASTDBs on our FTP site.

As we described in a previous post, this means they now contain the GI-less proteins from the  NCBI Pathogen Project and other high-throughput projects. The v5 databases are also compatible with proteins from PDB structures with multi-character chain identifiers and will include these as they become available in our other protein systems. Only the latest version of BLAST+ (2.9.0, download) will work with the updated v5 databases and allow you to access all of the most recent protein and nucleotide data. In the winter of 2019, we will stop updating the version 4 BLAST databases and offer the v5 databases as the default for download.

In addition, makeblastdb will be updated in BLAST 2.10.0, due out in October 2019, so by default it creates dbV5 formatted databases.

For more information on the new database version and BLAST+ (2.9.0), see the previous NCBI Insights article and the recording of our recent webinar.

Virus hunting in the cloud codeathon, v2

We are pleased to announce the second installment of the Virus Hunting Codeathon that will take place from November 4-6, 2019 at the University of Maryland in College Park.

The NCBI will help run this bioinformatics codeathon, hosted by the UMIACS and CBCB at the University of Maryland. The purpose of this event is to continue develop techniques, code, and pipelines to identify known, taxonomically definable, and novel viruses from metagenomic datasets on cloud infrastructure.

This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for virological analyses from high-throughput experiments. We especially encourage people who have experience in Computational Virus Hunting or related fields to participate.  The event is open to anyone selected for the codeathon and willing to travel to College Park (see below).


  • Fast, federated indexing
    • Big Query
  • Metadata features
  • Genome graphs for viruses
  • Approximate taxonomic analysis
  • Domain/HMM Boundary and Taxonomic Refinement
  • Bringing together approximate taxonomy and domain models
  • Sequence data quality metrics
  • Phage-host interactions

We will provide the final list of projects before the codeathon starts.

Continue reading