Phasing out support for non-human genome organism data in dbSNP and dbVar

This blog post is directed toward people who use dbSNP and dbVar, particularly those who submit non-human data to the two databases.

dbSNP and dbVar archive, process, display and report information related to germline and somatic variations from multiple species. These two databases have grown rapidly as sequencing and other discovery technologies have evolved, and now contain nearly two billion variants from over 360 species.

Based on projected growth and the resources required to archive and distribute the data, continued support for all organisms will become unsustainable for NCBI in the near future. Therefore, NCBI will phase out support for all non-human organisms in dbSNP and dbVar, and will support only human variation.

Genome data download made easy!

This blog post is directed toward Assembly users.

A new “Download assemblies” button is now available in the Assembly database. This makes it easy to download data for multiple genomes without having to write scripts.

For example, you can run a search in Assembly and use check boxes (see left side of screenshot below) to refine the set of genome assemblies of interest. Then, just open the “Download assemblies” menu, choose the source database (GenBank or RefSeq), choose the file type, and start the download. An archive file will be saved to your computer that can be expanded into a folder containing your selected genome data files.
NCBI researchers and collaborators discover novel group of giant viruses

Nearly complete set of translation-related genes lends support to hypothesis that giant viruses evolved from smaller viruses

An international team of researchers, including NCBI’s Eugene Koonin and Natalya Yutin, has discovered a novel group of giant viruses (dubbed “Klosneuviruses”) with a more complete set of translation machinery genes than any virus that has been described to date. “This discovery significantly expands our understanding of viral evolution,” said Koonin. “These are the most ‘cell-like’ viruses ever identified. However, the computational analysis of the virus genomes shows that these viruses have not evolved from cells by reductive evolution but rather have evolved from smaller viruses, gradually acquiring genes from their hosts at different stages of their evolution.”

GenBank release 219.0 is available via FTP

GenBank release 219.0 (4/14/2017) has 200,877,884 traditional records containing 231,824,951,552 base pairs of sequence data. In addition, there are 451,840,147 WGS records containing 2,035,032,639,807 base pairs of sequence data, 165,068,542 TSA records containing 149,038,907,599 base pairs of sequence data, as well as 1,438,349 TLS records containing 636,923,295 base pairs of sequence data.

May 10th NCBI Minute: How to Locate and Use Human Genomes and Annotations from NCBI

Next week, NCBI staff will show you how to quickly find and download human genome annotations from both the web and the command line for incorporation into your workflows. We will also show you how to convert the accessions in these files to those used in other bioinformatics databases, as well as how to visualize these annotations on our Genome Data Viewer.

Date and time: Wednesday, May 10, 2017 12:00 PM – 12:30 PM EDT

After the live presentation, the webinar will be uploaded to the NCBI YouTube channel. Any related materials will be accessible from the Webinars and Courses page; you can also learn about future webinars there.