You may have heard that NCBI is changing the way we handle GI numbers for sequence records in September 2016. Well, you heard right! Here’s the announcement, in case you missed it.
There are a number of issues raised by these changes, but we’re going to answer two questions in this post:
- What pieces of your code will break in September?
- Are GI numbers gone for good?
Professors, you’re busy – really busy. You have to develop and teach your courses and laboratory sessions, coordinate your lab’s research efforts, write grants and publications, and stay current on everything related to your teaching and research topics.
NCBI has information that would help most of these efforts – but there are so many interesting records and so little time to organize them for efficient use. Sign up for a free NCBI Account and let us help you organize your important lists!
Figure 1. The My NCBI login page.
Sign up for an NCBI Account – or sign in to your account if you already have one – and:
- Store and automate your searches;
- Save and manage collections of important records for use in coursework, research projects and federal grants;
- Create public lists for students in your courses and your own Faculty Profile;
- And keep track of everything – right on your My NCBI dashboard.
Read on to find out how to do all of these things and more!
The Sequence Read Archive (SRA), NCBI’s largest growing repository of molecular data, archives raw sequencing data and alignment information from high-throughput sequencing platforms, including Roche 454 GS Systems®, Illumina’s Genome Analyzer®, and Complete Genomics® systems.
Researchers commonly use SRA data to make discoveries via comparison of data sets. Data sets can be compared through the SRA web interface, but if you want to integrate these downloads and file conversions into an already existing pipeline, or you simply prefer using a command-line interface, we recommend using the SRA Toolkit.
Run Selector is a tool available through the Sequence Read Archive (SRA) that allows you to fine-tune your web-based search results. There are over two dozen fields that can be used to filter SRA data in Run Selector. For example, if you need to look at data from a particular sequencing platform and genome assembly, you can use these fields as filters.
After running a web-based search for any keyword in the SRA database, users can dump all the results (up to a maximum of 20,000 experiments) into the Run Selector for fine-tuning. In addition, Run Selector shows you how many runs fall into each of the categories even before a filtering category is selected, allowing you to investigate the data further by noting what is contained within the database.
Figure 1. After searching with SRA, click on “Send to” to open the drop-down menu. Then click on the radio button labeled “Run Selector” to send your search results to Run Selector. Note that you can already see how many runs are in each of the categories to the left.
This article is intended for GenBank data submitters with a basic knowledge of BLAST who submit sequence data from protein-coding genes.
One of the most common problems when submitting DNA or RNA sequence data from protein-coding genes to GenBank is failing to add information about the coding region (often abbreviated as CDS) or incorrectly defining the CDS. Incomplete or incorrect CDS information will prevent you from having accession numbers assigned to your submission data set, but there is a procedure that will help you troubleshoot any problems with the CDS feature annotation: doing a BLAST analysis with your sequences before you submit your data.
Here’s how to use nucleotide BLAST (blastn) and the formatting options menu to analyze, interpret and troubleshoot your submissions:
1. To start the BLAST analysis, go to the BLAST homepage and select “nucleotide blast”.
Figure 1. Select “nucleotide blast”.
This blog post is a continuation of last week’s blog on finding biological assay data; it is intended for researchers who use PubChem.
Your research focuses on a protein (receptor or enzyme) for which you’d like to identify a chemical probe or modulator. The probe could help to identify the subcellular location of a protein. A modulator may help to determine the biological effects of a particular protein’s activity. Additionally, finding a novel chemical that binds to your protein might assist you in exploring the use of a new class of therapeutics in drug design.
At NCBI, the PubChem BioAssay database stores biological activity assay information, which makes it possible to find experimentally measured targets for millions of chemicals. This blog post shows a simple workflow to download a table (with raw and kinetic data) of chemicals that have been determined to bind to a particular gene/protein target.
This blog post is directed toward researchers using PubChem.
You’ve identified a chemical that you’d like to use in your research as a chemical probe for a receptor or an enzyme inhibitor. However, chemicals are known to be able to bind to multiple protein targets, commonly known as “cross-reactivity”. In biological activity assays, this can cause problems with measuring the activity of a specific protein or pathway. If the chemical is employed as a medicant in living organisms, interactions with molecules other than the intended target can cause “side effects”.
At NCBI, the PubChem BioAssay database stores biological activity assay information that makes it possible to find experimentally measured targets for millions of chemicals. This blog post describes a workflow to download a table of gene/protein targets for a particular chemical.
Figure 1. Tamoxifen compound page.
If you’re reading this, you probably already know that NIH and some other institutions have public access policies that require that peer-reviewed publications resulting from their funding be made available to the public. But did you know that if you complied with your funding agency’s public access policy by depositing your author manuscript in NIH’s PubMed Central (PMC) archive via the NIH Manuscript Submission (NIHMS) system, you can easily obtain statistics on how frequently your paper is being accessed? Continue reading
Entrez Direct is a UNIX/LINUX command-line interface to E-utilities, the API to the NCBI Entrez system. One of Entrez Direct’s most useful features is its ability to parse and reformat complex XML data returns from EFetch. In this post, we will explore how to use these features to parse, reformat and process specific data from PubMed records downloaded in XML using EFetch. Though this post focuses on PubMed, the technique is universal and applies to any XML returned by E-utilities from any database. The example explored here is also presented briefly in the Entrez Direct documentation; here we’ll dive in a bit depeer to see how it works. Let’s get started!
As a My NCBI account holder, you can invite other individuals to act as your delegate and grant them the ability to view and edit your My Bibliography collection (including Other Citations), as well as the ability to view, edit, and create profiles in your SciENcv.
Inviting a Delegate
The first step is to send a delegate invitation from your NCBI Account Settings page. After you’ve logged in to your NCBI account, click on your username in the top right corner of the screen to access your Account Settings. Then, under the “Delegates” section, click “Add a delegate” and enter the email address for your intended recipient. You can have multiple delegates on your account, and you can control what each delegate has access to from the Delegates section of your Account Settings page.
Acting as a Delegate
If a colleague invites you to become a delegate on their NCBI account, you will receive an email invitation. After you’ve accepted the delegation invitation, you will see your colleague’s Bibliography appear in your Collections list on your My NCBI landing page: