SRA Toolkit: the SRA database at your fingertips

The Sequence Read Archive (SRA), NCBI’s largest growing repository of molecular data, archives raw sequencing data and alignment information from high-throughput sequencing platforms, including Roche 454 GS Systems®, Illumina’s Genome Analyzer®, and Complete Genomics® systems.

Researchers commonly use SRA data to make discoveries via comparison of data sets. Data sets can be compared through the SRA web interface, but if you want to integrate these downloads and file conversions into an already existing pipeline, or you simply prefer using a command-line interface, we recommend using the SRA Toolkit.

Continue reading

Fine-tune your web-based search results with SRA Run Selector

Run Selector is a tool available through the Sequence Read Archive (SRA) that allows you to fine-tune your web-based search results. There are over two dozen fields that can be used to filter SRA data in Run Selector. For example, if you need to look at data from a particular sequencing platform and genome assembly, you can use these fields as filters.

After running a web-based search for any keyword in the SRA database, users can dump all the results (up to a maximum of 20,000 experiments) into the Run Selector for fine-tuning. In addition, Run Selector shows you how many runs fall into each of the categories even before a filtering category is selected, allowing you to investigate the data further by noting what is contained within the database.

post 2 fig 1 run results

Figure 1. After searching with SRA, click on “Send to” to open the drop-down menu. Then click on the radio button labeled “Run Selector” to send your search results to Run Selector. Note that you can already see how many runs are in each of the categories to the left.

Continue reading

Troubleshooting GenBank Submissions: Annotating the Coding Region (CDS)

This article is intended for GenBank data submitters with a basic knowledge of BLAST who submit sequence data from protein-coding genes.

One of the most common problems when submitting DNA or RNA sequence data from protein-coding genes to GenBank is failing to add information about the coding region (often abbreviated as CDS) or incorrectly defining the CDS. Incomplete or incorrect CDS information will prevent you from having accession numbers assigned to your submission data set, but there is a procedure that will help you troubleshoot any problems with the CDS feature annotation: doing a BLAST analysis with your sequences before you submit your data.

Here’s how to use nucleotide BLAST (blastn) and the formatting options menu to analyze, interpret and troubleshoot your submissions:

1. To start the BLAST analysis, go to the BLAST homepage and select “nucleotide blast”.

nucleotide blast link. click to start BLAST analysis

Figure 1. Select “nucleotide blast”.

Continue reading

Finding Chemical Probes and Modulators – The Hunt for New Chemical Reagents and Medicines

This blog post is a continuation of last week’s blog on finding biological assay data; it is intended for researchers who use PubChem.

Your research focuses on a protein (receptor or enzyme) for which you’d like to identify a chemical probe or modulator. The probe could help to identify the subcellular location of a protein. A modulator may help to determine the biological effects of a particular protein’s activity. Additionally, finding a novel chemical that binds to your protein might assist you in exploring the use of a new class of therapeutics in drug design.

At NCBI, the PubChem BioAssay database stores biological activity assay information, which makes it possible to find experimentally measured targets for millions of chemicals. This blog post shows a simple workflow to download a table (with raw and kinetic data) of chemicals that have been determined to bind to a particular gene/protein target.

Continue reading

Identifying Chemical Targets – Finding Potential Cross-Reactions and Predicting Side Effects

This blog post is directed toward researchers using PubChem.

You’ve identified a chemical that you’d like to use in your research as a chemical probe for a receptor or an enzyme inhibitor. However, chemicals are known to be able to bind to multiple protein targets, commonly known as “cross-reactivity”. In biological activity assays, this can cause problems with measuring the activity of a specific protein or pathway. If the chemical is employed as a medicant in living organisms, interactions with molecules other than the intended target can cause “side effects”.

At NCBI, the PubChem BioAssay database stores biological activity assay information that makes it possible to find experimentally measured targets for millions of chemicals. This blog post describes a workflow to download a table of gene/protein targets for a particular chemical.

Tamoxifen compound page.

Figure 1. Tamoxifen compound page.

Continue reading

NIHMS Users: Do You Know How Often Your Paper is Being Accessed Via PMC? Here’s How to Find Out.

If you’re reading this, you probably already know that NIH and some other institutions have public access policies that require that peer-reviewed publications resulting from their funding be made available to the public. But did you know that if you complied with your funding agency’s public access policy by depositing your author manuscript in NIH’s PubMed Central (PMC) archive via the NIH Manuscript Submission (NIHMS) system, you can easily obtain statistics on how frequently your paper is being accessed? Continue reading

Exploring Entrez Direct: Parsing the XML Output of E-utilities

Entrez Direct is a UNIX/LINUX command-line interface to E-utilities, the API to the NCBI Entrez system. One of Entrez Direct’s most useful features is its ability to parse and reformat complex XML data returns from EFetch. In this post, we will explore how to use these features to parse, reformat and process specific data from PubMed records downloaded in XML using EFetch. Though this post focuses on PubMed, the technique is universal and applies to any XML returned by E-utilities from any database. The example explored here is also presented briefly in the Entrez Direct documentation; here we’ll dive in a bit depeer to see how it works. Let’s get started!

Continue reading

My Bibliography and SciENcv: How to Delegate Authority to Others to Edit/Create Your Profile and Collections

As a My NCBI account holder, you can invite other individuals to act as your delegate and grant them the ability to view and edit your My Bibliography collection (including Other Citations), as well as the ability to view, edit, and create profiles in your SciENcv.

Inviting a Delegate

The first step is to send a delegate invitation from your NCBI Account Settings page. After you’ve logged in to your NCBI account, click on your username in the top right corner of the screen to access your Account Settings. Then, under the “Delegates” section, click “Add a delegate” and enter the email address for your intended recipient. You can have multiple delegates on your account, and you can control what each delegate has access to from the Delegates section of your Account Settings page.

Acting as a Delegate

If a colleague invites you to become a delegate on their NCBI account, you will receive an email invitation. After you’ve accepted the delegation invitation, you will see your colleague’s Bibliography appear in your Collections list on your My NCBI landing page:

Continue reading

Designing exon-specific primers for the human genome

A common task facing geneticists is to assay for sequence changes at particular locations in genes. These assays are often looking for changes in the coding exon of genes, and the target sequences are typically amplified using PCR from genomic DNA using a pair of specific primers. In this article, we will show you how to use NCBI Reference Sequences and Primer-BLAST, NCBI’s primer designer and specificity checker, to design a pair of primers that will amplify a single exon (exon 15) of the human breast cancer 1 (BRCA1) gene.

Here are the steps to follow to design primers to amplify exon 15 from human BRCA1:

Continue reading

Advice for NIH Grantees: How to comply with the NIH Public Access Policy

“The NIH public access policy requires scientists to submit final peer-reviewed journal manuscripts that arise from NIH funds to PubMed Central immediately upon acceptance for publication.” –

To comply with NIH Public Access Policy, here are the steps you should take:

Determine if the Public Access Policy applies to your publication

Generally, the NIH Public Access Policy applies to any peer-reviewed journal article that was accepted for publication on or after April 7, 2008 and that arose from NIH funding in Fiscal Year 2008 or later.

Determine Applicability for Your Publication

What does the NIH consider to be a ‘journal’?

Continue reading