June 20 NCBI Minute: Getting the Genomic Context for BLAST Protein Matches


Do you ever want to see the flanking genes of a protein match from a BLAST search?  On June 20th, we’ll show you how to see the genomic context of bacterial proteins using the identical protein report and the graphical sequence viewer. You will also learn to use these reports in detail and how to get these genomic contexts in batch for a set of protein matches using the identical proteins report and EDirect .

Date and time: Wed, June 20, 2018 12:00 PM – 12:30 PM EDT

Click to register.

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

February 14th NCBI Minute: How to quickly retrieve a sequence from NCBI


On Wednesday, February 14, 2018, NCBI will present a webinar that will show you how to quickly retrieve sequences in any format from NCBI.

Date & time: Wed, Feb 14, 2018 12:00 PM – 12:30 PM EST

Ever need to quickly grab a protein or nucleotide sequence in FASTA or another format from NCBI? This NCBI Minute will show you how to accomplish this using the nucleotide and protein web pages, an NCBI URL, and – the most flexible way – through the commandline EDirect client that accesses the E-Utilities API.

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Converting Lots of GI Numbers to Accession.version


As you may have read in previous posts, NCBI is in the process of changing the way we handle GI numbers for sequence records. In short, we are moving to a time when accession.version identifiers, rather than GI numbers, will be the primary identifiers for sequence records.

In a previous post, we outlined a method for converting GI numbers (used to identify sequence records) to accession.version identifiers. That method used the E-utility EFetch and is capable of handling cases where you have no more than a few thousand GI numbers to convert.

What if you have more?

We now have a bulk conversion resource that will allow you to handle very large jobs. The resource consists of a Python script coupled with a database file (about 40 GB uncompressed). You’ll need to download both of these files (gi2accession.py and gi2acc_lmdb.gz) to local disk, and then you can process as needed.

Continue reading

Converting GI Numbers to Accession.version


As you may have read in previous posts, NCBI is in the process of changing the way we handle GI numbers for sequence records.

In short, we are moving to a time when accession.version identifiers, rather than GI numbers, will be the primary identifiers for sequence records.

As part of this transition, an obvious question for any of you currently using GI numbers is how to convert a GI number to an accession.version, so that you can make appropriate updates. The good news is that it’s pretty easy if you have no more than a few thousand GIs to convert.

Continue reading

Identifying and Correlating Chemical Names & Synonyms


This blog post is intended for people who refer to chemical names/symbols and synonyms in databases like PubMed and PubChem, or in their own scientific papers. There is a similar post for gene symbols and names.

During the research and publishing process, scientists need to refer to their chemicals-of-interest. While there are standardized nomenclatures (IUPAC, SMILESInChITM, etc.), different labs sometimes use different names for the same chemical.

The NCBI PubChem project has set up a system to identify and correlate these various names as well as ‘alias’, ‘synonym’, or ‘also known as’ terms that have been used in the literature.

Continue reading

Clearing Up Confusion with Human Gene Symbols & Names Using NCBI Gene Data


This blog post is intended for people who refer to gene symbols or names in databases such as Gene, ClinVar, or PubMed. There is a similar post for chemical names and symbols.

During the research and publishing process, scientists need to refer to their genes-of-interest. However, different labs sometimes use different gene symbols to refer to the same gene. As you can imagine, this leads to confusion.

To standardize the use of terms, the HUGO Gene Nomenclature Committee (HGNC) sets official gene symbols and names. The NCBI Gene resource reports these official gene symbols and names, as well as additional symbols and names that are included on related sequence records for the same gene or from submitted GeneRIFs.

Continue reading