The BLAST programs and databases are available in Docker and cloud-ready


In modern biomedical research, you often need to analyze very large datasets. This may require computing and storage capacity that exceeds what you have available locally. Working in a cloud environment where you can provision nearly limitless computing power, gain access to enormous data sets, and pay for only what you need is a great option in these cases.

To help with these tasks, NCBI is now providing a Docker version of NCBI BLAST that you can use on the cloud. This implementation will help you work with large volumes of sequence data and the set of NCBI BLAST databases. The BLAST Docker image makes using BLAST on the cloud much more convenient.

  • Installation and maintenance of the BLAST programs and databases is all handled by Docker.
  • Integration with other tools in your pipelines is easier.
  • NCBI BLAST databases are pre-loaded on the Google Cloud, providing fast access.

While we have tested the Docker image on the Google Cloud, the Docker image will allow BLAST to run equally well on any Docker-enabled platform, such as another cloud platform or on your local computer  — and you can still can use the cloud-installed  BLAST databases.

See the  BLAST in the Cloud and  database information documentation to get started.

Genome Workbench 3.0, now with support for preparing GenBank genome submissions


Genome Workbench version 3.0 (release notes) is now available. An important new feature is the submission preparation wizard that allows you to prepare prokaryotic and eukaryotic genome sequences for submission to GenBank. This wizard is the first step toward offering a better alternative to the Sequin submission tool.

You simply load your sequences into Genome Workbench and use the submission wizard to enter information about your submission through a set of dialog boxes and then save a submission-ready data file.  The package also includes tools for editing your sequences, annotation, and metadata.

See the tutorial video on our YouTube channel or the Genome Workbench documentation for more details on how to enable the wizard and prepare a submission.

Try our new SRA data management tools!


Have you ever needed to correct or improve SRA metadata after submitting, change the release date for your data or share your data with reviewers? Now you can perform these tasks yourself using the SRA data management features now LIVE in Submission Portal!

If you have an SRA submission and associated BioProject and BioSample, you can log into the Submission Portal, go to the Manage data tab, click into that BioProject and easily perform the following common tasks (Figure 1).

Continue reading

New human genome annotation release with MANE Select and other improvements!


There’s a new RefSeq annotation available for the human genome, and it’s quite an update!

About the release

Annotation release 109.20190607 is the first release of our new bimonthly annotation schedule as announced in a previous post.   The annotated sequences are  the latest sequences for the GRCh38, patch 13 assembly, GRCh38.p13 (GCF_000001405.39). The chromosome backbone sequences remain the  same, but we’ve added 45 patch sequences representing novel and improved sequences that the Genome Reference Consortium will incorporate into the primary assembly in the future. The new annotation places the latest curated RefSeq transcripts and functional elements on the genome but keeps the same model dataset as in annotation release 109 except when the models have been replaced by curated RefSeqs or other review. We are also flagging MANE and other RefSeq Select transcripts.  Continue reading for more details on these improvements below. You can download the updated annotation here!

Continue reading

Microbial Virulence in the Cloud hackathon August 13 – 15 2019


From August 13 – 15 2019, the NCBI will run a bioinformatics hackathon on the NIH campus!

We’re specifically looking for folks who have experience in working with computational microbial genomics, evolutionary biology, antimicrobial resistance, and similar genomic analysis.  If this describes you, please apply! This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for large scale genomic analyses from high-throughput experiments (please note that the event itself will focus on open access public human).

Continue reading

GenBank release 232


GenBank release 232.0 (6/20/2019) is now available on the NCBI FTP site. This release has 5.47 terabases and 1.58 billion records.

The release has 213 million traditional records containing 329.8 billion base pairs of sequence data. There are also 1 billion WGS records containing 4.8 trillion base pairs of sequence data, 319.9 million bulk-oriented TSA records containing 285.3 trillion base pairs of sequence data, and 25 million bulk-oriented TLS records containing 10 billion base pairs of sequence data.

Continue reading

The periodic table turns 150!


NCBI and PubChem are celebrating the 150th anniversary of the periodic table of chemical elements, one of the most recognized tools in science. The scientific community has declared 2019 to be “The International Year of the Periodic Table”. We’re celebrating by launching the PubChem Periodic Table and Element pages, where you can find chemical element data and data sources. There’s always more to learn, so check out the PubChem blog for more about this incredible old resource.

New BLAST results for specialized searches now available for testing


As you probably know,  BLAST has been offering a new results page as an option for standard BLAST for you to test and provide feedback since April. See our post from earlier this spring for more details. We have just added new results pages (Figure 1) for the following four specialized BLAST services for you to test.

  1. PSI-BLAST
  2. PHI-BLAST
  3. DELTA-BLAST
  4. Align two or more sequences

Continue reading

50,000 new clinically relevant structural variation calls in dbVar


We’ve expanded the catalog of clinically relevant structural variants (SV) in dbVar by adding 57,520 ClinVar records.  You can access the newly added data through study nstd102.

The updated collection includes:

  • 20,000 new SVs, and more than 37,000 copy number variants (CNV) observed in ClinGen laboratories during routine cytogenomic laboratory testing that were previously accessioned separately at dbVar
  • 15,000 SVs asserted as ‘Pathogenic’ or ‘Likely pathogenic’ for thousands of clinical genetic disorders including breast, ovarian, and colon cancers; hypercholesterolemia; schizophrenia; Duchenne Muscular Dystrophy; autism spectrum disorders; and many others
  • links to more than 1,600 related PubMed articles and thousands of related data records in ClinVar, OMIM, GeneReviews, MedGen, MeSH, etc.

You can browse  dbVar studies on the web or download the data.  We provide dbVar data  in a number of standard formats (VCF, GVF, and TSV) mapped to assemblies GRCh38, GRCh37, and NCBI36 allowing you perform analysis using standard tools and integrate the data into your bioinformatic workflows.

Visit our Walkthrough page to learn how to use these new dbVar data to help interpret structural variation in your favorite gene or genomic region.