Vector graphics downloads now available in NCBI genome browsers and sequence views

You can now download images in both PDF and Scaled Vector Graphics (SVG) formats from our Sequence Viewer and genome browsers such as the Genome Data Viewer!  SVG files are ideal for editing in image editors and provide high quality graphics for publications, posters, and presentations. Both the PDF and SVG files that you download contain vector graphics for high fidelity images.

You can download image files by choosing the “Printer-Friendly PDF/SVG” option under the Tools menu from any Graphical Sequence Viewer application (Figure 1).

SVG_GDVFigure 1. Printer friendly download options from the graphical view in the Genome Data Viewer.  You can download either PDF or SVG formats, which are easily edited in standard graphics applications. 


New results for organelle genome searches

As part of our ongoing effort to improve your search experience, we’ve made it easier for you to find the sequence of your favorite organelle genome plus all the information and data associated with it. To find organelle genomes, search for an organism name combined with an organelle description, for example human mitochondriontomato chloroplast or Toxoplasma gondii RH apicoplast.

A new results panel will appear with links to the organelle genome sequence, annotated genes, and related phylogenetic and population studies. The panel appears with these searches in an All Databases search or within any of NCBI’s sequence databases including Gene, Nucleotide, Protein, Genome, Assembly.  For the human mitochondrial genome, a graphical schematic of the genome allows you to navigate to individual mitochondrial encoded genes (Figure 1).


Figure 1.  The organelle genome results for a search with human mitochondrion. The panel provides access to analysis tools, downloads, and other relevant results. Clicking any of the gene objects on the genome graphic links leads to the relevant Gene record, for example Gene ID: 4512 in the case of COX1.

Try it out using the following example searches and  let us know what you think!

September 11 Webinar: A beginner’s guide to genes and sequences at NCBI

On Wednesday, September 11, 2019 at 12 PM, NCBI staff will present a webinar for people with limited experience working with gene and sequence information. You will learn about the kinds of data available for genes and sequences, how to select the most informative records, and how to find related genes and sequences using pre-computed information and the BLAST sequence search service.

  • Date and time: Wed, Sep 11, 2019 12:00 PM – 12:30 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Primer-BLAST now offers help with irrelevant off-target matches

Primer-BLAST, NCBI’s primer-designer and specificity-checker, now offers a way to help you with irrelevant off-target matches.

Sometimes Primer-BLAST can’t design specific primers for your target sequence because of similar non-target sequences in the database. In some cases, you may know that these non-target matches are not important your research and are safe to ignore.  Examples may include tissue-specific splice variants, redundant entries, and predicted sequences.  To help in these cases, you can now choose to allow certain off-target matches. This gives Primer-BLAST greater freedom in primer selection and a better chance of finding highly specific primers.

Improved Search Now Available Across NCBI Databases

Earlier this year, we announced the release of a new and improved search feature that interprets plain language to give better results for common searches. This feature, originally developed in NCBI Labs and later released on the NCBI All Databases search, is now available across several NCBI resources: Nucleotide, Protein, Gene, Genome, and Assembly. Whether you are searching for a specific gene or for a whole genome, you will now retrieve NCBI’s best results regardless of the database  you search.

The image below shows the results for a search for human INS in the Nucleotide database. Even though this is a Nucleotide search, the results include relevant information from Gene, Protein, Taxonomy,  plus links to the NCBI reference sequences (RefSeq) as well as access to BLAST and the insulin gene region in NCBI’s genome browser, the Genome Data Viewer.KIS_nuccore_smallFigure 1.  The new natural language search result in the Nucleotide database from a search for human INS.

Try out this new search capability and let us know what you think. And keep visiting the NCBI Labs search page to try our latest experiments, which we’ll also announce here on NCBI Insights.


February 14th NCBI Minute: How to quickly retrieve a sequence from NCBI

On Wednesday, February 14, 2018, NCBI will present a webinar that will show you how to quickly retrieve sequences in any format from NCBI.

Date & time: Wed, Feb 14, 2018 12:00 PM – 12:30 PM EST

Ever need to quickly grab a protein or nucleotide sequence in FASTA or another format from NCBI? This NCBI Minute will show you how to accomplish this using the nucleotide and protein web pages, an NCBI URL, and – the most flexible way – through the commandline EDirect client that accesses the E-Utilities API.

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

New releases from NCBI: IgBLAST 1.7.0 and Sequence Viewer 3.21

IgBLAST 1.7.0 release

A new version of IgBLAST is now available on FTP, with the following new features:

  1. Specify whether overlapping nucleotides at VDJ junctions are allowed in matching V, D, and J genes.
  2. Set a custom J gene mismatch penalty
  3. Report the CDR3 start and stop positions in the sub-region table
  4. Use alignment length instead of percent identity as the tie-breaker for hits with identical blast scores, improving accuracy in the V, D, J gene assignment.

IgBLAST was developed at the NCBI to facilitate the analysis of immunoglobulin and T cell receptor variable domain sequences.

Visualize and Interpret Alignment Data with the Multiple Sequence Alignment Viewer

The NCBI Multiple Sequence Alignment Viewer (MSAV) is a versatile web application that helps you visualize and interpret MSAs for both nucleotide and amino acid sequences. You can display alignment data from many sources, and the viewer is easily embedded into your own web pages with customizable options. An even simpler way to use MSAV is to use our page, upload your data, and share the link to a fully functional viewer displaying your results.

New Pandoravirus Sequences are Accessible in GenBank

In the July 19, 2013 issue of the journal Science, an interesting article describes the discovery and characterization of two “giant” viruses that are proposed to comprise the first members of the “Pandoravirus” genus.

Nadege Philippe and co-workers obtained the viruses from sediment samples in Chile and Australia and found that they have no morphological resemblance to any previously defined virus families. The investigators isolated the genomes of these viruses and sequenced them using a variety of NextGen methodologies. They then assembled the reads into contigs and characterized them using various sequence similarity algorithms (including NCBI’s BLAST and CD-Search). Interestingly, while related to each other, the genomes were not similar to those of any other organism or virus. Additionally, 93% of protein-coding sequences had no recognizable homologs.

The Human Reference Genome – Understanding the New Genome Assemblies

What is a genome assembly?

The haploid human genome consists of 22 autosomal chromosomes and the Y and the X chromosomes. Each of the chromosomes represents a single DNA molecule, a sequence of millions of nucleotide bases.  These molecules are linear, so one might expect that we should represent each chromosome by a single, continuous sequence.

Unfortunately, this is not the case for two main reasons: 1) because of the nature of genomic DNA and the limitations of our sequencing methods, some parts of the genome remain unsequenced, and 2) emerging evidence suggests that some regions of the genome vary so much between individual people that they cannot be represented as a single sequence.

In response to this, modern genomic data sets present a model of the genome known as a genome assembly. This post will introduce the basic concepts of how we produce such assemblies as well as some basic vocabulary.

