The complete annotated genome sequence of the novel coronavirus associated with the outbreak of pneumonia in Wuhan, China is now available from GenBank for free and easy access by the global biomedical community. Figure 1 shows the relationship of the Wuhan virus to selected coronaviruses.
Figure 1. Phylogenetic tree showing the relationship of Wuhan-Hu-1 (circled in red) to selected coronaviruses. Nucleotide alignment was done with MUSCLE 3.8. The phylogenetic tree was estimated with MrBayes 3.2.6 with parameters for GTR+g+i. The scale bar indicates estimated substitutions per site, and all branch support values are 99.3% or higher.
We are pleased to announce the second installment of the Virus Hunting Codeathon that will take place from November 4-6, 2019 at the University of Maryland in College Park.
The NCBI will help run this bioinformatics codeathon, hosted by the UMIACS and CBCB at the University of Maryland. The purpose of this event is to continue develop techniques, code, and pipelines to identify known, taxonomically definable, and novel viruses from metagenomic datasets on cloud infrastructure.
This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for virological analyses from high-throughput experiments. We especially encourage people who have experience in Computational Virus Hunting or related fields to participate. The event is open to anyone selected for the codeathon and willing to travel to College Park (see below).
Fast, federated indexing
Genome graphs for viruses
Approximate taxonomic analysis
Domain/HMM Boundary and Taxonomic Refinement
Bringing together approximate taxonomy and domain models
Sequence data quality metrics
We will provide the final list of projects before the codeathon starts.
In this workshop, Dr. Rodney Brister will talk about how 41 scientists from 21 organizations worked to improve the usability of SRA data, identifying datasets that included known viruses and viral signals. Not only is that information now being integrated into a public search interface, but the approach used is also being refined in future hackathons so it can be applied to all SRA datasets.
We have a new and improved search experience for viral genes from select human pathogens. When you search for a virus such as HIV-1 (more examples below), you now get an interactive graphical representation of the viral genome where you can see all the annotated viral proteins in context. Clicking on the gene / protein objects allows you to access sequences, publications, and analysis tools for the selected protein. This new feature is designed to help you quickly find information relevant to your research on clinically important viruses.Figure 1. Top: The virus genome graphic result for a search with HIV-1 with access to analysis tools, downloads, and relevant results in the Genome and Virus resources. Bottom: The result obtained by clicking the env gene graphic, which provides links to protein and nucleotide sequences, the literature, analysis tools, and downloads.
Try it out using the following example searches and let us know what you think!
We’re specifically looking for folks who have experience in computational virus hunting or adjacent fields to identify known, taxonomically-definable and novel viruses from a few hundred thousand metagenomic datasets that we’ll put on cloud infrastructure. This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for virological analyses from high-throughput experiments. If this describes you, please apply! The event is open to anyone selected for the hackathon and willing to travel to SDSU (see below).
The 2018 Nucleic Acids Research database issue features several papers from NCBI staff that cover the status and future of databases including CCDS, ClinVar, GenBank and RefSeq. These papers are also available on PubMed. To read an article, click on the PMID number listed below.
BLAST is a powerful search tool, but often a search is just the beginning of the journey. We put ourselves in the shoes of a researcher who has just sequenced a handful of samples from the latest viral outbreak and tried to understand what information would be most useful. We also reached out to researchers in the field and asked: a) what questions do they really want to answer? and b) how can NCBI best provide the answers? Based on insights from those questions and answers, we developed the new Virus Sequence Search Interface (Fig. 1). The Search Interface is an NCBI Labs project, which means it is an experimental project, and we may modify the resource based on your feedback and experiences.
Figure 1. The Virus Sequence Selection Interface. The Virus Sequence Selection Interface accepts as input nucleotide and protein accessions, as well as FASTA and plain-text formatted sequences. The user selects either “Nucleotide” or “Protein,” depending on the sequence type, and selects the virus type from the pull-down menu below the text entry field.
This blog post is for researchers, students, and postdocs, as well as non-scientific developers, mathematicians and librarians.
This summer, we were quite busy running and cohosting hackathons. These events educate participants, allow for networking among computational biologists and produce bioinformatics software prototypes. Read on for a review of products from our Summer 2017 hackathons.