About NCBI Staff

The National Center for Biotechnology Information (NCBI), a division of the U.S. National Library of Medicine, provides access to scientific and biomedical databases, software tools for analyzing molecular data, and performs research in computational biology.

As-you-type-suggestions come to NCBI Labs


In a recent post, we described a new way to search our databases in NCBI Labs. We have now added a suggestions dropdown to the search bar that should make life easier for many NCBI visitors.

The as-you-type suggestions are simple, natural language-like queries we described in the previous post. They’ll help you avoid typos and save time if you’re searching for organisms with long or hard-to-spell names.

These suggestions are meant to direct you to high value results. As we improve the search experience, you may notice changes to the suggestions. We welcome your feedback on ways to enhance this new feature.

Here’s a quick look at what to expect:

dropdown suggestions ncbi labs search

Figure 1. As-you-type suggestions appear in a dropdown. Note how “human” is recognized as homo sapiens. Many common organisms are supported in this manner, e.g. “mouse”, “cow”.)

Summer 2018 NIH Data Hackathon July 23-25, 2018


From July 23rd to 25th, 2018, NCBI will host a data science hackathon on the NIH campus. This hackathon will focus on genomics as well as general Data Science analyses including text, image and sequence processing. This event is for researchers, including students and postdocs, who have already engaged in the use of large datasets or in the development of pipelines for analyses from high-throughput experiments. Some projects are available to other non-scientific developers, mathematicians, or librarians.

The event is open to anyone selected for the hackathon and willing to travel to the NIH campus in Bethesda, Maryland.

Continue reading

Improved annotation of Streptomyces RefSeq genomes


We’ve completed the RefSeq reannotation of over 1,000 Streptomyces genomes! The genomes were reannotated using the Prokaryotic Genome Annotation Pipeline (PGAP). PGAP detected nearly 100% of ribosomally synthesized and post-translationally modified peptide natural products (RiPP)-encoding genes from known families, despite their small size, using a set of over 30 hidden Markov Models (HMMs) built by RefSeq biocurators. Over 70% (251) of lasso peptides now present in Streptomyces RefSeq genomes (354) were annotated for the first time.

If you are aware of any class of RiPP precursor in Streptomyces that was not found in our recent re-annotation, please contact us through the NCBI Help Desk, and we will add new HMMs to the rules we use to find and annotate RiPP precursor genes.

Important dbSNP updates: New JSON data files, RefSNP report, API


dbSNP is moving to the new design with new products ready for testing including new JSON data files, the RefSNP page, and an API.

New JSON data files

Human Build 151 release is the last build that will provide relational database table dumps on the FTP site. Instead, dbSNP data will be available as a cumulative file of RefSNP objects in the JSON format in future build releases. These JSON files are available now for users to begin migration and testing. Tutorials for parsing JSON are on GitHub.

Continue reading

5 new videos on YouTube: Get the most out of BLAST, MedGen, PubChem and more


Here are the latest videos on our YouTube channel. Subscribe to get alerts for new videos.

NCBI Minute: Getting the Most out of Web BLAST Tabular Format

The NCBI web BLAST service has several useful download formats, including tabular formats. All formats allow you to easily save your BLAST results for processing, editing, and annotating.

This video will show you how to use basic Unix tools and EDirect to expand and enhance your tabular saved BLAST results. You will also see learn how to add useful information like taxonomy and sequence titles.

Continue reading

June 20 NCBI Minute: Getting the Genomic Context for BLAST Protein Matches


Do you ever want to see the flanking genes of a protein match from a BLAST search?  On June 20th, we’ll show you how to see the genomic context of bacterial proteins using the identical protein report and the graphical sequence viewer. You will also learn to use these reports in detail and how to get these genomic contexts in batch for a set of protein matches using the identical proteins report and EDirect .

Date and time: Wed, June 20, 2018 12:00 PM – 12:30 PM EDT

Click to register.

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

dbVar Structural Variation Non-redundant Reference Sets (Alpha) Release


dbVar has generated known structural variants (SV) datasets for use in comparisons with user data to aid variant calling, analysis and interpretation.

Files containing Non-Redundant (NR) deletions, insertions, and duplications are now available on GitHub. Additional separate files include preliminary annotations of overlap with ACMG59 genes. All files are in tab-delimited text format.

We encourage you to test these files and provide feedback, either on GitHub or by email.

June 13 NCBI Minute: Using EDirect to Query a Local Installation of PubMed


Next Wednesday, June 13, 2018, we’ll show you how to use EDirect to install PubMed locally and then search and retrieve records from the local instance. You will also see an analysis example that shows the significant speed improvement with the Local Cache and employs some advanced EDirect xtract options to aid with processing records.

Date and time: Wed, June 13, 2018 12:00 PM – 12:30 PM EDT

Click to register.

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Formulas will be displayed in PubMed Titles and abstracts


Starting this month, we will begin displaying formulas in citation titles, abstracts and keywords in PubMed.

Previously, formulas were replaced with “[Formula: see text]” (see Figure 1).

PubMed abstract display without formula

Figure 1. Formula replaced with [Formula: see text] in the PubMed abstract display.

With this enhancement, you will now see formulas in the PubMed summary and abstract displays when these data are available in new citations (see Figure 2).

PubMed abstract showing chemical formula

Figure 2. Formula shown in the PubMed abstract display after June 1, 2018.

We will also be including the MathML 3.0 element tags in PubMed XML. To support the addition of MathML tagging in our XML, we have created a DTD, which you can download now. Existing content will be valid against the new DTD. You can also download sample XML files with MathML 3.0 tags.

New dbVar FTP Directory Structure


NCBI’s database of structural variation, dbVar has a restructured FTP directory. The old directories can be found in archive.

Highlights include:

  • added aggregated vcf files by assembly
  • named files based on major assembly and region or call
  • replaced study-specific directories with file-type directories
  • renamed “.tab” files to “.tsv”
  • moved old human and all non-human files to archive

Refer to README.ftp for full details of the new GVF, VCF, TSV, and XML files.