If you’re interested in visualizing and analyzing genomic data, then you’ll want to check out a new way to run Genome Workbench: in the cloud! Genome Workbench is a desktop application (both Windows and Mac) that lets you analyze genomic data in one place. You can run tools such as BLAST and create views such as multiple sequence alignments, and much more. You can run Genome Workbench on a cloud environment from your local desktop computer. This manual will show you how.
There are many advantages to using Genome Workbench in the cloud:
- You can easily compare your data to the complete GenBank and RefSeq datasets without needing to download them
- You can run BLAST searches against standard databases or any custom databases you’ve assembled in the cloud
- All of the data (e.g. FASTA, BAM, GFF files) remain in the cloud with no need for local copies
- You won’t pay egress fees for downloading data
Give it a try and let us know how it goes!
NCBI is pleased to announce a single-cell focused codeathon at the New York Genome Center, January 15 -17. To apply, please complete the application form by December 30, 2019. Read on if you need more information about the event.
We are pleased to announce the second installment of the Virus Hunting Codeathon that will take place from November 4-6, 2019 at the University of Maryland in College Park.
The NCBI will help run this bioinformatics codeathon, hosted by the UMIACS and CBCB at the University of Maryland. The purpose of this event is to continue develop techniques, code, and pipelines to identify known, taxonomically definable, and novel viruses from metagenomic datasets on cloud infrastructure.
This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for virological analyses from high-throughput experiments. We especially encourage people who have experience in Computational Virus Hunting or related fields to participate. The event is open to anyone selected for the codeathon and willing to travel to College Park (see below).
- Fast, federated indexing
- Metadata features
- Genome graphs for viruses
- Approximate taxonomic analysis
- Domain/HMM Boundary and Taxonomic Refinement
- Bringing together approximate taxonomy and domain models
- Sequence data quality metrics
- Phage-host interactions
We will provide the final list of projects before the codeathon starts.
In modern biomedical research, you often need to analyze very large datasets. This may require computing and storage capacity that exceeds what you have available locally. Working in a cloud environment where you can provision nearly limitless computing power, gain access to enormous data sets, and pay for only what you need is a great option in these cases.
To help with these tasks, NCBI is now providing a Docker version of NCBI BLAST that you can use on the cloud. This implementation will help you work with large volumes of sequence data and the set of NCBI BLAST databases. The BLAST Docker image makes using BLAST on the cloud much more convenient.
- Installation and maintenance of the BLAST programs and databases is all handled by Docker.
- Integration with other tools in your pipelines is easier.
- NCBI BLAST databases are pre-loaded on the Google Cloud, providing fast access.
While we have tested the Docker image on the Google Cloud, the Docker image will allow BLAST to run equally well on any Docker-enabled platform, such as another cloud platform or on your local computer — and you can still can use the cloud-installed BLAST databases.
See the BLAST in the Cloud and database information documentation to get started.
You can now download PGAP from GitHub and run it on your machine, compute farm or the cloud, on any public or privately-owned genome. PGAP predicts genes on bacterial and archaeal genomes using the same inputs and applications used inside NCBI. This is a great opportunity for you to try it now and send us comments (please use GitHub issues).
This May, the NCBI will host a women’s collaborative biodata science hackathon on the NIH Campus in Bethesda, Maryland!
We are now collecting project proposals focusing on building tools and pipelines for advanced analysis of biomedical datasets including text, images, next generation sequencing data, proteomics, and metadata. Proposals for tutorial pipelines and educational tools for advanced analysis are also welcome. Submit your project proposal by March 4, 2019.
We are pleased to announce the first ever pangenomics, graphs and haplotypes hackathon.
From March 25-27, 2019, the NCBI will help run a bioinformatics hackathon in Santa Cruz, California, hosted by the University of California, Santa Cruz (UCSC). Potential topics include:
- Building large scale graphs from pangenomes using several assembly methods
- Simplification of mapping
- Resolving haplotypes
- Identification of population-specific structural variants
- Defining haplotype-specific expression, visualization, and coordination with the GRC
As the American Society of Human Genetics (ASHG) conference is around the corner, the NCBI staff begin to prep for their presentations in San Diego. Here is some background for dbGaP’s poster about their process to improve data storage and accessibility.
Visit Poster 1435T “Storage and use of dbGaP data in the cloud” Thursday, October 18 from 2 PM to 3PM. (Exhibit Hall, Ground Floor)
We recently updated the BLAST AMI on Amazon Web Services (AWS). The AMI is preconfigured with BLAST+ 2.7.1 and supports a subset of the NCBI BLAST URL API. The latest version also addresses long download times and preservation of BLAST databases and results between reboots.