Troubleshooting GenBank Submissions: Annotating the Coding Region (CDS)

This article is intended for GenBank data submitters with a basic knowledge of BLAST who submit sequence data from protein-coding genes.

One of the most common problems when submitting DNA or RNA sequence data from protein-coding genes to GenBank is failing to add information about the coding region (often abbreviated as CDS) or incorrectly defining the CDS. Incomplete or incorrect CDS information will prevent you from having accession numbers assigned to your submission data set, but there is a procedure that will help you troubleshoot any problems with the CDS feature annotation: doing a BLAST analysis with your sequences before you submit your data.

Here’s how to use nucleotide BLAST (blastn) and the formatting options menu to analyze, interpret and troubleshoot your submissions:

1. To start the BLAST analysis, go to the BLAST homepage and select “nucleotide blast”.

nucleotide blast link. click to start BLAST analysis

Figure 1. Select “nucleotide blast”.

Continue reading

Finding Chemical Probes and Modulators – The Hunt for New Chemical Reagents and Medicines

This blog post is a continuation of last week’s blog on finding biological assay data; it is intended for researchers who use PubChem.

Your research focuses on a protein (receptor or enzyme) for which you’d like to identify a chemical probe or modulator. The probe could help to identify the subcellular location of a protein. A modulator may help to determine the biological effects of a particular protein’s activity. Additionally, finding a novel chemical that binds to your protein might assist you in exploring the use of a new class of therapeutics in drug design.

At NCBI, the PubChem BioAssay database stores biological activity assay information, which makes it possible to find experimentally measured targets for millions of chemicals. This blog post shows a simple workflow to download a table (with raw and kinetic data) of chemicals that have been determined to bind to a particular gene/protein target.

Continue reading

Identifying Chemical Targets – Finding Potential Cross-Reactions and Predicting Side Effects

This blog post is directed toward researchers using PubChem.

You’ve identified a chemical that you’d like to use in your research as a chemical probe for a receptor or an enzyme inhibitor. However, chemicals are known to be able to bind to multiple protein targets, commonly known as “cross-reactivity”. In biological activity assays, this can cause problems with measuring the activity of a specific protein or pathway. If the chemical is employed as a medicant in living organisms, interactions with molecules other than the intended target can cause “side effects”.

At NCBI, the PubChem BioAssay database stores biological activity assay information that makes it possible to find experimentally measured targets for millions of chemicals. This blog post describes a workflow to download a table of gene/protein targets for a particular chemical.

Tamoxifen compound page.

Figure 1. Tamoxifen compound page.

Continue reading

NIHMS Users: Do You Know How Often Your Paper is Being Accessed Via PMC? Here’s How to Find Out.

If you’re reading this, you probably already know that NIH and some other institutions have public access policies that require that peer-reviewed publications resulting from their funding be made available to the public. But did you know that if you complied with your funding agency’s public access policy by depositing your author manuscript in NIH’s PubMed Central (PMC) archive via the NIH Manuscript Submission (NIHMS) system, you can easily obtain statistics on how frequently your paper is being accessed? Continue reading

Exploring Entrez Direct: Parsing the XML Output of E-utilities

Entrez Direct is a UNIX/LINUX command-line interface to E-utilities, the API to the NCBI Entrez system. One of Entrez Direct’s most useful features is its ability to parse and reformat complex XML data returns from EFetch. In this post, we will explore how to use these features to parse, reformat and process specific data from PubMed records downloaded in XML using EFetch. Though this post focuses on PubMed, the technique is universal and applies to any XML returned by E-utilities from any database. The example explored here is also presented briefly in the Entrez Direct documentation; here we’ll dive in a bit depeer to see how it works. Let’s get started!

Continue reading

My Bibliography and SciENcv: How to Delegate Authority to Others to Edit/Create Your Profile and Collections

As a My NCBI account holder, you can invite other individuals to act as your delegate and grant them the ability to view and edit your My Bibliography collection (including Other Citations), as well as the ability to view, edit, and create profiles in your SciENcv.

Inviting a Delegate

The first step is to send a delegate invitation from your NCBI Account Settings page. After you’ve logged in to your NCBI account, click on your username in the top right corner of the screen to access your Account Settings. Then, under the “Delegates” section, click “Add a delegate” and enter the email address for your intended recipient. You can have multiple delegates on your account, and you can control what each delegate has access to from the Delegates section of your Account Settings page.

Acting as a Delegate

If a colleague invites you to become a delegate on their NCBI account, you will receive an email invitation. After you’ve accepted the delegation invitation, you will see your colleague’s Bibliography appear in your Collections list on your My NCBI landing page:

Continue reading

Designing exon-specific primers for the human genome

A common task facing geneticists is to assay for sequence changes at particular locations in genes. These assays are often looking for changes in the coding exon of genes, and the target sequences are typically amplified using PCR from genomic DNA using a pair of specific primers. In this article, we will show you how to use NCBI Reference Sequences and Primer-BLAST, NCBI’s primer designer and specificity checker, to design a pair of primers that will amplify a single exon (exon 15) of the human breast cancer 1 (BRCA1) gene.

Here are the steps to follow to design primers to amplify exon 15 from human BRCA1:

Continue reading

Advice for NIH Grantees: How to comply with the NIH Public Access Policy

“The NIH public access policy requires scientists to submit final peer-reviewed journal manuscripts that arise from NIH funds to PubMed Central immediately upon acceptance for publication.” –

To comply with NIH Public Access Policy, here are the steps you should take:

Determine if the Public Access Policy applies to your publication

Generally, the NIH Public Access Policy applies to any peer-reviewed journal article that was accepted for publication on or after April 7, 2008 and that arose from NIH funding in Fiscal Year 2008 or later.

Determine Applicability for Your Publication

What does the NIH consider to be a ‘journal’?

Continue reading

New SciENcv Features Allow Users To Create and Download Multiple Biosketches

NCBI’s recent update to the SciENcv feature in MyNCBI gives researchers the ability to create multiple biosketches for grants from federal agencies engaged in scientific research, allowing a more tailored and convenient approach to the grant application process.

What is SciENcv?

SciENcv (Science Experts Network Curriculum Vitae) is designed to help researchers assemble an NIH biosketch by extracting information from NIH eRA Commons and PubMed. The SciENcv interagency working group includes NIH, as well as DOD, DOE, EPA, NSF, USDA and the Smithsonian. You can access SciENcv if you have a My NCBI account. My NCBI accounts are free and offer many useful features, such as saving searches, automated e-mail alerts and My Bibliography.

 Create your biosketch

Based on user suggestions, we’ve made it possible to create biosketches in three ways: from scratch, from an external source, or by duplicating an existing profile (see Figure 1). While the eRA Commons data feed is currently the only external data option, we plan on adding other external data sources in a future release of SciENcv.

Figure 1. Three ways to create your NIH biosketches in SciENcv

Figure 1. Three ways to create your NIH biosketches in SciENcv

Continue reading

Sequence updates in human genome assembly GRCh38: filling in the gaps

In a previous blog post, we explained several important concepts about the human reference genome.  We presented a region of human chromosome 17 as an example of a location where the genome sequence was not fully assembled.  In this post, we are going to revisit the same gapped region to see how the Genome Reference Consortium (GRC) changed this part of the genome in GRCh38, the updated human reference assembly released in December 2013.  This region represents just one of the more than 1,000 changes and improvements that the GRC introduced in GRCh38.

Continue reading