New NCBI Datasets home and documentation pages provide easier access

NCBI Datasets, the new set of services for downloading genome assembly and annotation data (previous Datasets posts), has redesigned and reorganized web pages to make it easier to find and access the services and documentation you need.

NCBI Datasets has a fresh new homepage (Figure 1) highlighting the types of data available through our tools. Available data include genome assemblies, genes, and SARS-CoV-2 genomic and protein data.  You can easily access these from the new page or learn more with our new documentation pages.

Figure 1. Features of the new Datasets homepage with quick access to help documentation including the Quickstart and How-to guides as well as access to Genome, Gene, and Coronavirus Data, and the Datasets and Dataformat command-line tools.

New GenBank submission options for SARS-CoV-2 submitters

NCBI is pleased to announce ongoing enhancements to submission of SARS-CoV-2 assembled genomes to GenBank, including a streamlined workflow on the web and a new API option. Both new options mean that you can receive accessions for SARS-CoV-2 data submissions more quickly!

A streamlined workflow with improved interface and enhanced validation on both web and API saves you time and effort and, most importantly, makes it possible to get SARS-CoV-2 accession numbers and public release of data within hours. In addition, we automatically annotate all SARS-CoV-2 genomes to produce standardized, consistent annotation which saves you time and benefits researchers who find your data valuable.

Rapid access to SARS-CoV-2 data from the current public health emergency

As the global health emergency around the Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, formerly 2019-nCoV) continues, we continue to play a key role in providing the biomedical community free and easy access to genome sequences from the coronavirus. You can quickly access these data through the NCBI search (Figure 1).sar-2_sensorFigure 1.  NCBI search results for the term “SARS-COV-2” showing the schematic map of the viral assembly and annotation and buttons that link to the data in the NCBI Virus resource, a specialized BLAST page that searches Betacoronavirus sequences, and the reference assembly download. The bottom panel provides links to the CDC website for COVID-19 information and a link to GenBank®/SRA sequence data.

Continue reading “Rapid access to SARS-CoV-2 data from the current public health emergency”

Dengue virus submission improvements now live!

When there is an outbreak of dengue fever in the world, it’s critical that viral genomic sequence data be submitted by researchers and made available to analyze as soon as possible.  You can now submit Dengue virus sequences to GenBank using a new workflow (Figure 1) in the Submission Portal designed to help make these data available as soon as possible.  The streamlined process, similar to the one described in a previous post for animal mitochondrial COX1 sequences, has an improved interface, enhanced validation, and automatic annotation that saves you time and effort.


Figure 1. The Submission Portal pages for targeted sequence submission workflows. Top panel. The new submission page for entering the workflow. Bottom panel. Submission Portal page with the Dengue virus submission option selected (boxed in red).  The service has options for other targeted submissions including mitochondrial COX1 from multicellular animals (metazoa), ribosomal RNA (rRNA), rRNA-ITS, Influenza virus, and Norovirus sequences.

This update is part of a larger and ongoing effort to consolidate GenBank submissions in a central location.  In addition to Dengue virus data, you can also submit Influenza A, B, C and Norovirus sequences as well as other targeted sequences including mitochondrial COX1 genes from multicellular animals (metazoa), ribosomal RNA (rRNA), and rRNA-ITS through the options on the Submission Portal.  You should submit other types of sequence data including other virus sequences to GenBank using BankIt or tbl2ASN.

You can use the search feature on the Submission Portal to find the appropriate submission tool for your data.

Novel coronavirus complete genome from the Wuhan outbreak now available in GenBank


Get rapid access to Wuhan coronavirus (2019-nCoV) sequence data from the current outbreak as it becomes available. We will continue to update the page with newly released data.

The complete annotated genome sequence of the novel coronavirus associated with the outbreak of pneumonia in Wuhan, China is now available from GenBank for free and easy access by the global biomedical community. Figure 1 shows the relationship of the Wuhan virus to selected coronaviruses.


Figure 1.  Phylogenetic tree showing the relationship of Wuhan-Hu-1 (circled in red) to selected coronaviruses. Nucleotide alignment was done with MUSCLE 3.8. The phylogenetic tree was estimated with MrBayes 3.2.6 with parameters for GTR+g+i.  The scale bar indicates estimated substitutions per site, and all branch support values are 99.3% or higher.

Continue reading “Novel coronavirus complete genome from the Wuhan outbreak now available in GenBank”

Virus hunting in the cloud codeathon, v2

We are pleased to announce the second installment of the Virus Hunting Codeathon that will take place from November 4-6, 2019 at the University of Maryland in College Park.

The NCBI will help run this bioinformatics codeathon, hosted by the UMIACS and CBCB at the University of Maryland. The purpose of this event is to continue develop techniques, code, and pipelines to identify known, taxonomically definable, and novel viruses from metagenomic datasets on cloud infrastructure.

This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for virological analyses from high-throughput experiments. We especially encourage people who have experience in Computational Virus Hunting or related fields to participate.  The event is open to anyone selected for the codeathon and willing to travel to College Park (see below).


  • Fast, federated indexing
    • Big Query
  • Metadata features
  • Genome graphs for viruses
  • Approximate taxonomic analysis
  • Domain/HMM Boundary and Taxonomic Refinement
  • Bringing together approximate taxonomy and domain models
  • Sequence data quality metrics
  • Phage-host interactions

We will provide the final list of projects before the codeathon starts.

Continue reading “Virus hunting in the cloud codeathon, v2”

Virus hunting in the cloud: A hackathon story at ASV 2019

Are you going to ASV 2019?

If you are, join us in a few days for a workshop on the virus hunting hackathon we helped run earlier this year.

Session: Workshop #19: Virus Discovery

Program Number: W-19-8

Time: Sunday, July 21, 7:00 PM CDT

Location: Mayo Auditorium

In this workshop, Dr. Rodney Brister will talk about how 41 scientists from 21 organizations worked to improve the usability of SRA data, identifying datasets that included known viruses and viral signals. Not only is that information now being integrated into a public search interface, but the approach used is also being refined in future hackathons so it can be applied to all SRA datasets.

We hope to see you there!

Genome context graphic now in virus search results

We have a new and improved search experience for viral genes from select human pathogens. When you search  for a virus such as HIV-1 (more examples below),  you now get an interactive graphical representation of the viral genome where you can see all the annotated viral proteins in context. Clicking on the gene / protein objects allows you to access sequences, publications, and analysis tools for the selected protein. This new feature is designed to help you quickly find information relevant to your research on clinically important viruses.Virus_searchFigure 1. Top: The virus genome graphic result for a search with HIV-1 with access to analysis tools, downloads, and relevant results in the Genome and Virus resources. Bottom: The result obtained by clicking the env gene graphic, which provides links to protein and nucleotide sequences, the literature, analysis tools, and downloads.

Try it out using the following example searches and  let us know what you think!

New Norovirus GenBank Submission Service

Do you have Norovirus sequence data to submit to GenBank? Try out the newly-released improvements in our submission service for Norovirus data! The new service offers the following advantages:

  • Faster processing and shorter time to accession numbers
  • Improved user interface
  • Automatic Feature annotation
Figure 1. The submission portal page showing the new option for submitting Norovirus data.

Begin a new Norovirus submission or see how to get started submitting other data to GenBank.

GenBank accepts a wide range of data to support scientific discovery and analysis on sequences from all branches of life.