This blog post is directed toward researchers using PubChem.
You’ve identified a chemical that you’d like to use in your research as a chemical probe for a receptor or an enzyme inhibitor. However, chemicals are known to be able to bind to multiple protein targets, commonly known as “cross-reactivity”. In biological activity assays, this can cause problems with measuring the activity of a specific protein or pathway. If the chemical is employed as a medicant in living organisms, interactions with molecules other than the intended target can cause “side effects”.
At NCBI, the PubChem BioAssay database stores biological activity assay information that makes it possible to find experimentally measured targets for millions of chemicals. This blog post describes a workflow to download a table of gene/protein targets for a particular chemical.
Starting on the compound page for a chemical, like Tamoxifen:
- In the Table of Contents on the left side of the page, click the link for the Biological Test Results section or scroll down to that section. For Tamoxifen, this is section number 15.
- Click “Refine/Analyze”, then click “BioActivity Analysis Tool” to see a summary table of biological activity assay data. NOTE: The list of total BioAssays is shown above the table with the number of Data Rows included. These two numbers may not be the same, as there are sometimes replicates within a particular assay.
3. You can filter the table to include only active BioActivity Outcomes. To do this, click “Active“. Please note that each assay has a specific definition of “active”; this is supplied by the assay submitter.
If you want to, you can slide the bar over and narrow down the list to include only the precise BioActivity Type measured (Ki, IC50, ACO50, Potency) – defined here.
4. To download the table, click the “Data download” link on the right-hand side of the page. This will download the full table in CSV format with a tile in this format: CID_2733526_assaydata.csv.
The first row of the downloaded table states: “The table below shows PubChem BioAssay data for gene “chemical name, such as Tamoxifen” (CID: “CID, for example 2733526”). Columns include these types of data:
- Row #
- Outcome – as defined by the assay submitter
- Activity Type Measured
- Activity Concentration [in uM]
- BioAssay Title
- BioAssay Type
- Gene/Protein Target (with Protein sequence ID) – as provided by the assay submitter. NOTE: Top Targets list the Gene/Protein targets identified in each particular assay – as submitted by the assay labs. These are often not standardized with Official Gene symbols. The full Gene/Protein name and a Protein sequence ID is listed in the “Target” column of the table.
- Relevant PMID – as provided by the assay submitter
Next week, we’ll post another example of exploring biological assay data in PubChem BioAssay. In this post, we’ll start with a specific gene and show you how to download a table containing experimental data of “active” compounds targeting the gene product. This information can be useful in discovery of new chemical probes or lead compounds for drug design.