Troubleshooting GenBank Submissions: Annotating the Coding Region (CDS)

This article is intended for GenBank data submitters with a basic knowledge of BLAST who submit sequence data from protein-coding genes.

One of the most common problems when submitting DNA or RNA sequence data from protein-coding genes to GenBank is failing to add information about the coding region (often abbreviated as CDS) or incorrectly defining the CDS. Incomplete or incorrect CDS information will prevent you from having accession numbers assigned to your submission data set, but there is a procedure that will help you troubleshoot any problems with the CDS feature annotation: doing a BLAST analysis with your sequences before you submit your data.

Here’s how to use nucleotide BLAST (blastn) and the formatting options menu to analyze, interpret and troubleshoot your submissions:

1. To start the BLAST analysis, go to the BLAST homepage and select “nucleotide blast”.

nucleotide blast link. click to start BLAST analysis

Figure 1. Select “nucleotide blast”.

Continue reading

Finding Chemical Probes and Modulators – The Hunt for New Chemical Reagents and Medicines

This blog post is a continuation of last week’s blog on finding biological assay data; it is intended for researchers who use PubChem.

Your research focuses on a protein (receptor or enzyme) for which you’d like to identify a chemical probe or modulator. The probe could help to identify the subcellular location of a protein. A modulator may help to determine the biological effects of a particular protein’s activity. Additionally, finding a novel chemical that binds to your protein might assist you in exploring the use of a new class of therapeutics in drug design.

At NCBI, the PubChem BioAssay database stores biological activity assay information, which makes it possible to find experimentally measured targets for millions of chemicals. This blog post shows a simple workflow to download a table (with raw and kinetic data) of chemicals that have been determined to bind to a particular gene/protein target.

Continue reading

A Fourth Offering of A Librarian’s Guide to NCBI

This blog post is directed toward medical or science librarians in the United States who offer bioinformatics education and support services or are planning to offer such services in the future.

The NCBI, in partnership with the National Library of Medicine Training Center (NTC), will once again offer the Librarian’s Guide to NCBI course on the NIH campus, March 7-11, 2016 (Announcement). This will be the fourth presentation of the course, and there are now 69 graduates of the training program.

These graduates represent 61 libraries, hospitals and government agencies from 27 states and the District of Columbia. Librarian’s Guide graduates now form a core community of NCBI-trained bioinformatics support specialists who maintain collaboration and mutual support through an online forum and monthly NCBI “Office hours” videoconference discussion sessions with course faculty and students. Materials from the 2013, 2014 and 2015 courses are available now, as well as lecture videos for the expression module.

Librarian's Guide 2015 class photo

Figure 1. Participants in the March 2015 A Librarian’s Guide to NCBI course. This class included 29 biomedical and science librarians.

Continue reading

Identifying Chemical Targets – Finding Potential Cross-Reactions and Predicting Side Effects

This blog post is directed toward researchers using PubChem.

You’ve identified a chemical that you’d like to use in your research as a chemical probe for a receptor or an enzyme inhibitor. However, chemicals are known to be able to bind to multiple protein targets, commonly known as “cross-reactivity”. In biological activity assays, this can cause problems with measuring the activity of a specific protein or pathway. If the chemical is employed as a medicant in living organisms, interactions with molecules other than the intended target can cause “side effects”.

At NCBI, the PubChem BioAssay database stores biological activity assay information that makes it possible to find experimentally measured targets for millions of chemicals. This blog post describes a workflow to download a table of gene/protein targets for a particular chemical.

Tamoxifen compound page.

Figure 1. Tamoxifen compound page.

Continue reading

SciENcv Updated to Support New NIH Biosketch Format

This blog post is geared toward researchers.

In November, NIH announced a new format for biographical sketches (biosketches); the new format is required for grant applications submitted for due dates after May 24, 2015 (see NOT-OD-15-032). SciENcv, a tool available through My NCBI for creating biosketches, has been updated to reflect the format changes and to help users convert their existing NIH biosketches from the old format to the new.

What changed with the NIH Biosketch?

Differences between the old and new NIH Biosketch formats include:

  1. Maximum length increased from 4 to 5 pages
  2. Rearranged data in the table at the top of the Biosketch
  3. Section A, Personal Statement can now include up to 4 supporting citations
  4. Section C is now called “Contribution to Science” and should be comprised of up to 5 brief descriptions of your most significant contributions to science, each with up to 4 supporting citations. In addition,  you may also provide a URL to a full list of your published work as found in a publicly available digital database such as My Bibliography. This section is the most notable difference in the new format.

Continue reading

PubMed Also-Viewed: Quickly find related articles

You’ve seen it before on shopping web site: you load a page displaying an item you want and see a list of other items that people bought with the one you’re viewing.

PubMed is free, but finding the important articles on a topic can cost a lot of time. To help you keep on top of the literature – with a little help from your fellow PubMed users – we are introducing a new type of link called “Articles frequently viewed together”. For some PubMed abstracts, you may see this link in the “Related Information” section in the right column.

PubMed Also-Viewed feature

Figure 1. The PubMed Also-Viewed feature.

Not all abstracts will have this link; currently, only 1.3 million out of the 24 million records in PubMed do. The calculation is based on anonymous click data for the last year, so older articles will be especially underrepresented. To find all articles with these relationships, search PubMed with the query “pubmed_pubmed_alsoviewed[filter]”. Add additional terms to narrow the focus to your area of interest.

Please give it a try and let us know what you think by adding comments to this blog post.

SmartBLAST: Faster BLASTp search results in a graphical view

BLAST (Basic Local Alignment Search Tool) is a popular tool for finding sequences in a given database that are similar to a query sequence. Traditionally, BLAST displays these results as a sorted list of matches between the query and each database sequence. While this display is useful for examining how each subject sequence matches the query, it treats all subject sequences the same, regardless of the quality of the sequence data or its annotation, and also does not allow easy comparisons between different subject sequences. For example, the subject sequences may fall into multiple groups of similar sequences, or all of the subject sequences may be more similar to each other than to the query. A common way to obtain this information is to construct a multiple sequence alignment of the query and some or all of the subject sequences, but to this point, BLAST has not provided such alignments directly.

Enter SmartBLAST! SmartBLAST is a new and experimental NCBI tool that makes it easier to answer common sequence analysis tasks, such as finding a candidate protein name for a sequence, locating regions of high sequence conservation, or identifying regions covered by database sequences but missing from the query. To do this, SmartBLAST performs the following tasks in much less time than it takes to run a typical BLASTp search: Continue reading

Introducing PubMed Labs

Welcome to PubMed Labs!

PubMed Labs is all about you. It’s a new NCBI initiative for creating innovative and relevant products by involving you, our user community, from the beginning.

PubMed Labs is about experimentation. It’s a place where you’ll find early versions of new tools, experimental content, and proposed features, as well as an opportunity to suggest ideas to us.

PubMed Labs is about learning. It’s a place where the focus is on figuring out what works, where failure is OK because it’s a learning experience, and where any idea is welcome that can improve our services for our users.

PubMed Labs is about conversation. It’s a place where we can share future plans with you, and you can tell us how we’re doing. It’s a place where we all can come together to create resources that will benefit the broader scientific community.

Join the conversation!

We’re introducing a new category on this blog called “PubMed Labs” that will facilitate this conversation. You can follow these posts by RSS. When we have a new feature for you to try out, we’ll post here with a description of it that will contain the following:

  • The user need the feature is intended to serve
  • How you can activate it
  • What you can expect from it
  • Our plans for it

Then you can try it out and let us know how it went by commenting on the posts. Like it, hate it, we want to know! Or you can propose some additional functions or ideas.

Our first new features are SmartBLAST, an enhancement to protein BLAST, and an “also-viewed” link in PubMed. Each of these is described in accompanying blog posts:

Let us know what you think!

NCBI’s First Hackathon: Advanced Bioinformatic Analysis of Next-Gen Sequencing Data

This blog post is geared toward genomics professionals.

From January 5th-7th, 2015, NCBI, in conjunction with the NIH Office of Data Science, held a genomics hackathon, where genomics professionals gathered to write useful, efficient pipelines for people new to genomics.

After we announced the hackathon, over 130 qualified applicants expressed interest in attending. Four team leads chose 23 attendees from this pool, then assigned initial predefined roles and provided biological guidance for a product in one of four subject areas: DNA-Seq, RNA-Seq, Epigenomics and Metagenomics. Continue reading

NCBI RefSeq’s Antimicrobial Peptide Indexed Field: Facilitating Novel Antibiotic Discovery

This blog post is aimed toward biomedical researchers.

Antibiotic-resistant bacterial infections account for the deaths of tens of thousands of Americans every year. Over the past twenty years, these difficult to treat infections have become more common. Since traditional antibiotics are ineffective in these cases, biomedical researchers are looking for alternatives. NCBI’s RefSeq project has created a new indexed field, “Protein has antimicrobial activity [prop]“, to assist in this search by retrieving useful sequence annotation showing naturally occurring antimicrobial peptides, or AMPs.

Antimicrobial peptides are naturally occurring peptides from a diverse array of species that are a part of an organism’s innate immune system. The RefSeq team recently gathered a list of over 130 human genes encoding one or more experimentally proven AMPs. These peptides are typically less than 100 amino acids and can display bactericidal, antiviral, antifungal, and even antitumor activities, with a specific AMP usually having a subset of these activities. AMPs may be a suitable alternative to traditional antibiotics because they work quickly, efficiently, and tend to have broad spectrum activity. Moreover, since they are naturally-occurring, AMPs are less likely than other compounds to be toxic to host cells or to give rise to AMP-resistant bacterial strains. Continue reading