Presentation on NCBI’s genome browser at Rocky Mountain Genomics Hackcon

On June 18, 2019, NCBI’s Sanjida Rangwala will demonstrate the rich data visualization capabilities of NCBI’s genome browser at a conference that is part of the Rocky Mountain Genomics Hackcon.  As mentioned in a previous post, NCBI staff will also participate in an NCBI-style Hackathon  as part of the larger event.  The genome browser presentation and demonstration will show you how to create visuals that provide insights and show connections among genes, transcripts, variation,  epigenomics and GWAS data from NCBI sources. You will also see  how you can upload your own data and embed NCBI viewers on your own pages.

Genome context graphic now in virus search results

We have a new and improved search experience for viral genes from select human pathogens. When you search  for a virus such as HIV-1 (more examples below),  you now get an interactive graphical representation of the viral genome where you can see all the annotated viral proteins in context. Clicking on the gene / protein objects allows you to access sequences, publications, and analysis tools for the selected protein. This new feature is designed to help you quickly find information relevant to your research on clinically important viruses.Virus_searchFigure 1. Top: The virus genome graphic result for a search with HIV-1 with access to analysis tools, downloads, and relevant results in the Genome and Virus resources. Bottom: The result obtained by clicking the env gene graphic, which provides links to protein and nucleotide sequences, the literature, analysis tools, and downloads.

Try it out using the following example searches and  let us know what you think!

NCBI to help with Rocky Mountain Genomics HackCon, June 17 – 21, 2019

The NCBI will participate in a one-day conference on June 18, 2019 and a hackathon, June 19-21, 2019 as a part of Rocky Mountain Genomics Hackcon 2019 at the BioFrontiers Institute in Boulder Colorado.

The conference will feature technical speakers in precision medicine, metagenomics, and advanced RNA-Seq analysis, as well as an exhibitor and poster session. The hackathon will focus on creating visualization tools for exploratory data analysis.

Many people who attend these events have experience working with large datasets or the development of informatics tools, code, or pipelines; however, researchers who are in earlier stages of their data science journey, including students and postdocs are also encouraged to apply. Some projects are available to other non-scientific developers, mathematicians, or librarians. The event is open to anyone selected for the hackathon and willing to travel to Boulder, Colorado.  

Please visit the Rocky Mountain Genomics Hackcon 2019 site for more details and information on how to attend.

Have you tried BLAST+ (2.9.0) and version 5 BLAST databases (dbV5)?

We recently updated the version 5 BLAST protein databases, (dbV5), on our FTP site to be completely accession-based.  As we described in a previous post, this means they now contain the gi-less proteins from the  NCBI Pathogen Project and other high-throughput projects. The v5 databases are also compatible with proteins from PDB structures with multi-character chain identifiers and will include these as they become available in our other protein systems. Only the latest version of BLAST+ (2.9.0, download) will work with the updated v5 databases and allow you to access all of the most recent protein data. At the end of September 2019, we will stop updating the version 4 BLAST databases and offer the v5 databases as the default for download.

For more information on the new database version and BLAST+ (2.9.0), see the previous NCBI Insights article and the recording of our recent webinar.

New BLAST results to become the default view August 1, 2019

We have been offering the new BLAST results page (Figure 1) for you to try out since April and have been collecting your comments and feedback.   Thank you all for your input on this new results display.  Over 90% of your comments have been positive.   We have made several changes to the page that address issues or problems that you have pointed out and are also working on adding several additional features that you have suggested in future releases.

At this time, 96% of you who have tried the new page have kept it as your default results page.  We are planning to make the new page the default for everyone on August 1, 2019. We will still provide access to the old results for some time to allow people who have workflows or teaching materials to adjust to the new display.

Blast_resultsFigure 1. The New BLAST Results with filters directly on the page and a more concise tabbed output that includes the taxonomy report. 

Please view our video introduction to the new results to see highlights of the improved display. As always, we will continue to incorporate your feedback into the design and features on the new page, so please test it out and let us know what you think.

IgBLAST (1.14.0) is now available with several improvements

IgBLAST is a popular NCBI package for classifying and analyzing immunoglobulin (IG) and T cell receptor (TCR) variable domain sequences. We’ve released a new version (1.14.0) of IgBLAST with three new improvements / bug fixes:

  1. Adaptive Immune Receptor Repertoire (AIRR) format is more consistent with AIRR specs including changing undefined type (NON, N/A) to empty string, not appending “reversed” to sequence identifier when the query is in reversed orientation, and using standard locus names such as IGH, TRB instead of traditional VH, VB etc.
  2.  The logic for showing CDR3 end of TCR sequences is improved.
  3. The sequence identifier is restored in the case of no results in AIRR rearrangement format.

IgBLAST 1.12 is available for download from the BLAST FTP area.  See the the new manual on GitHub for information about setting up and running IgBLAST.

RefSeq release 94 with MANE and RefSeq Select markup, protein name evidence, and improved [Candida] auris assembly

RefSeq release 94 is now available through NCBI web services, FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available, as of May 13, 2019 and contains 200,311,267 records, including 141,839,334 proteins, 26,534,602 RNAs, and sequences from 91,873 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Prokaryotic Genome Annotation Pipeline (PGAP) now produces results suitable for submission to GenBank

We are happy to announce that you can now submit your genome sequences annotated by  your own local copy of the standalone Prokaryotic Genome Annotation Pipeline (PGAP) to GenBank.

How does it work? Download PGAP from GitHub, provide some basic information and the FASTA sequences for your genome sequence, and run the pipeline on your own machine, compute farm or the cloud. PGAP will produce annotation consistent with NCBI’s internal PGAP. Submit the resulting annotated genome to GenBank through the genome submission portal, and get an accession back.

As with any other submitted assembly, PGAP-annotated genomes will be screened for foreign contaminants and vector sequences at submission.  Any annotated assemblies that don’t pass may need to be modified. We are developing an automated process to handle these edits!

We are also working on other  improvements to stand-alone PGAP such as a module for calculating Average Nucleotide Identity (ANI) to confirm the assembly’s taxonomic assignment. Stay tuned for new developments!