The first, “Using the SRA RunSelector to Find NGS Datasets”, shows you how to filter the SRA database using metadata details from submitted datasets.
Looking for an old NCBI News story? Check the NCBI News Archive on the Bookshelf. You can browse or search the archive for every News posting we’ve made – in our history!
As you may have read in previous posts, NCBI is in the process of changing the way we handle GI numbers for sequence records. In short, we are moving to a time when accession.version identifiers, rather than GI numbers, will be the primary identifiers for sequence records.
In a previous post, we outlined a method for converting GI numbers (used to identify sequence records) to accession.version identifiers. That method used the E-utility EFetch and is capable of handling cases where you have no more than a few thousand GI numbers to convert.
We now have a bulk conversion resource that will allow you to handle very large jobs. The resource consists of a Python script coupled with a database file (about 40 GB uncompressed). You’ll need to download both of these files (gi2accession.py and gi2acc_lmdb.gz) to local disk, and then you can process as needed.
As you may have read in previous posts, NCBI is in the process of changing the way we handle GI numbers for sequence records.
In short, we are moving to a time when accession.version identifiers, rather than GI numbers, will be the primary identifiers for sequence records.
As part of this transition, an obvious question for any of you currently using GI numbers is how to convert a GI number to an accession.version, so that you can make appropriate updates. The good news is that it’s pretty easy if you have no more than a few thousand GIs to convert.
This blog post is intended for people who refer to chemical names/symbols and synonyms in databases like PubMed and PubChem, or in their own scientific papers. There is a similar post for gene symbols and names.
During the research and publishing process, scientists need to refer to their chemicals-of-interest. While there are standardized nomenclatures (IUPAC, SMILES, InChITM, etc.), different labs sometimes use different names for the same chemical.
The NCBI PubChem project has set up a system to identify and correlate these various names as well as ‘alias’, ‘synonym’, or ‘also known as’ terms that have been used in the literature.
This blog post is intended for people who refer to gene symbols or names in databases such as Gene, ClinVar, or PubMed. There is a similar post for chemical names and symbols.
During the research and publishing process, scientists need to refer to their genes-of-interest. However, different labs sometimes use different gene symbols to refer to the same gene. As you can imagine, this leads to confusion.
To standardize the use of terms, the HUGO Gene Nomenclature Committee (HGNC) sets official gene symbols and names. The NCBI Gene resource reports these official gene symbols and names, as well as additional symbols and names that are included on related sequence records for the same gene or from submitted GeneRIFs.
You may have heard that NCBI is changing the way we handle GI numbers for sequence records in September 2016. Well, you heard right! Here’s the announcement, in case you missed it.
There are a number of issues raised by these changes, but we’re going to answer two questions in this post:
Professors, you’re busy – really busy. You have to develop and teach your courses and laboratory sessions, coordinate your lab’s research efforts, write grants and publications, and stay current on everything related to your teaching and research topics.
NCBI has information that would help most of these efforts – but there are so many interesting records and so little time to organize them for efficient use. Sign up for a free NCBI Account and let us help you organize your important lists!
Sign up for an NCBI Account – or sign in to your account if you already have one – and:
Read on to find out how to do all of these things and more!
The Sequence Read Archive (SRA), NCBI’s largest growing repository of molecular data, archives raw sequencing data and alignment information from high-throughput sequencing platforms, including Roche 454 GS Systems®, Illumina’s Genome Analyzer®, and Complete Genomics® systems.
Researchers commonly use SRA data to make discoveries via comparison of data sets. Data sets can be compared through the SRA web interface, but if you want to integrate these downloads and file conversions into an already existing pipeline, or you simply prefer using a command-line interface, we recommend using the SRA Toolkit.