About NCBI Staff

The National Center for Biotechnology Information (NCBI), a division of the U.S. National Library of Medicine, provides access to scientific and biomedical databases, software tools for analyzing molecular data, and performs research in computational biology.

dbGaP 10th Anniversary Symposium June 9, 2017

dbGaP (the NIH database of Genotypes and Phenotypes) is celebrating its 10th Anniversary this year! We are proud to support over 850 studies and 1.6 million samples.

We invite you to join us at the dbGaP 10th Anniversary Symposium to be held on June 9, 2017; 1:30-3:00 PM Wilson Hall, Building-1 on the NIH Bethesda campus. The symposium includes six lightning talks highlighting the past, present and future of the dbGaP resource followed by a social hour.

Feel free to distribute this flyer (click to enlarge). We hope to see you at the Symposium!

Please contact bioinformatics-training@ncbi.nlm.nih.gov if you have any questions.

Reasonable accommodations will be provided for individuals with disabilities.

Read on for a list of speakers and abstracts.

Continue reading

Retiring and replacing the BLink protein similarity service

NCBI is discontinuing the BLink protein similarity service effective immediately. BLink provided graphical access to related proteins from protein records in the Entrez system. Because of the increasing volume of data in the protein database, BLink has become less useful as a tool for finding related sequences and is no longer maintainable.

Temporary replacement for BLink

The BLink service will redirect to a live protein-protein BLAST search against the Landmark database used by SmartBLAST. The Landmark database, described in the SmartBLAST documentation , provides matches from 27 selected cellular organisms with well-annotated complete genomes representing a broad taxonomic range. The results from the redirected BLink search will be shown as a Tax BLAST report  as shown in the figure below. The Tax BLAST report emphasizes the taxonomic source of the protein matches as did the BLink output.  From this new starting point, you can explore additional protein similarities through the BLAST service by re-submitting the search against other blast databases including the non-redundant (nr) database.

A Tax BLAST report

Figure 1. The Tax BLAST report for proton ATPase A. (click to enlarge)

Continue reading

QuickBLASTP adds pre-processing to BLAST search

quickblastp option under "Program Selection"

Figure 1. The QuickBLASTP option is available under “Program Selection”.

QuickBLASTP, an accelerated version of BLASTP, adds a new pre-processing step to the non-redundant (nr) protein database. In a matter of seconds, QuickBLASTP will find approximately 97% of the database sequences with 70% or more identity to your query and around 98% of the database sequence with 80% or more identity to your query.

Currently, QuickBLASTP will only accept searches with a total query length less than 10,000 residues. You may only search the nr database with QuickBLASTP.

RefSeq release 82 now public

RefSeq release 82 is accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available as of May 8, 2017 and contains 127,098,289 records, including 84,756,971 proteins, 18,901,573 RNAs, and sequences from 69,035 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Continue reading

Phasing out support for non-human genome organism data in dbSNP and dbVar

This blog post is directed toward people who use dbSNP and dbVar, particularly those who submit non-human data to the two databases.

dbSNP and dbVar archive, process, display and report information related to germline and somatic variations from multiple species. These two databases have grown rapidly as sequencing and other discovery technologies have evolved, and now contain nearly two billion variants from over 360 species.

Based on projected growth and the resources required to archive and distribute the data, continued support for all organisms will become unsustainable for NCBI in the near future. Therefore, NCBI will phase out support for all non-human organisms in dbSNP and dbVar, and will support only human variation.

NCBI will phase out support for non-human organisms in dbSNP and dbVar following this timeline:

  • September 1, 2017 – dbSNP and dbVar stop accepting non-human variant data submissions
  • November 1, 2017 – dbSNP and dbVar interactive websites and related NCBI services stop presenting non-human variant data. The data will, however, continue to be available for download on the dbSNP and dbVar FTP sites.

Any non-human data that is already in the databases or that is submitted before September 1, 2017 will continue to be available via the dbSNP and dbVar FTP download sites.

If you want to submit non-human variation data now or after September 1, 2017, European Bioinformatics Institute (EBI) – one of our partners in the International Nucleotide Sequence Database (INSDC) – is accepting these data in the European Variation Archive.

Finally, we would like to thank all the submitters and users who have supported dbSNP and dbVar throughout the years.

Genome data download made easy!

This blog post is directed toward Assembly users.

A new “Download assemblies” button is now available in the Assembly database. This makes it easy to download data for multiple genomes without having to write scripts.

For example, you can run a search in Assembly and use check boxes (see left side of screenshot below) to refine the set of genome assemblies of interest. Then, just open the “Download assemblies” menu, choose the source database (GenBank or RefSeq), choose the file type, and start the download. An archive file will be saved to your computer that can be expanded into a folder containing your selected genome data files.


Figure 1. The “Download Assemblies” button is at the top right of the Assembly page. When you click on it, you will see options for source database and file type, and a download button. There are several options for file type, including Genomic GFF.

Continue reading

Eleven eukaryotic annotations added to RefSeq in April 2017

Central Bearded Dragon (Pogona vitticeps)
(Credit: Mark Sum, USGS. Public domain.)

In April, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following eleven organisms:

  • Bombus terrestris (buff-tailed bumblebee)
  • Ceratitis capitata (Mediterranean fruit fly)
  • Athalia rosae (coleseed sawfly)
  • Dendrobium catenatum (a monocot)
  • Phalaenopsis equestris (a monocot)
  • Orbicella faveolata (stony coral)
  • Pogona vitticeps (central bearded dragon)
  • Oryzias latipes (Japanese medaka)
  • Sesamum indicum (sesame)
  • Jatropha curcas (a eudicot)
  • Amborella trichopoda (a flowering plant)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

NCBI researchers and collaborators discover novel group of giant viruses

Nearly complete set of translation-related genes lends support to hypothesis that giant viruses evolved from smaller viruses

An international team of researchers, including NCBI’s Eugene Koonin and Natalya Yutin, has discovered a novel group of giant viruses (dubbed “Klosneuviruses”) with a more complete set of translation machinery genes than any virus that has been described to date. “This discovery significantly expands our understanding of viral evolution,” said Koonin. “These are the most ‘cell-like’ viruses ever identified. However, the computational analysis of the virus genomes shows that these viruses have not evolved from cells by reductive evolution but rather have evolved from smaller viruses, gradually acquiring genes from their hosts at different stages of their evolution.”

Continue reading