NLM Webinar series: “Insider’s Guide to Accessing NLM Data: EDirect for PubMed”


Beginning February 21, 2017, the National Library of Medicine (NLM) will present the three-part webinar series “Insider’s Guide to Accessing NLM Data: EDirect for PubMed.”

This series of workshops will introduce new users to the basics of using EDirect to access exactly the PubMed data you need, in the format you need. Over the course of three 90-minute sessions, students will learn how to use EDirect commands in a Unix environment to access PubMed, design custom output formats, create basic data pipelines to get data quickly and efficiently, and develop simple strategies for solving real-world PubMed data-gathering challenges. No prior Unix knowledge is required; novice users are welcome!

Continue reading

SmartBLAST updated to provide more information, database matches


The SmartBLAST service has recently been updated to emphasize matches to the landmark database, which comprises the proteomes from 26 well-curated genomic assemblies. The display also now presents more information about conserved domains and details about the query.

SmartBLAST quickly finds the closest relatives to a protein query and evaluates the phylogenetic relationship among the query and matched sequences. You can start a SmartBLAST search from the SmartBLAST page or the BLAST home page. Read more about SmartBLAST on NCBI Insights.

New Web Services for Comparing and Grouping Sequence Variants


This blog post is intended for geneticists and dataflow engineers who need to compare genetic variants.

Have you ever tried to determine if two genetic variants are the same? If so, you’re not alone. There are competing ways to represent variants, handling ambiguous assignments, as well as reconciling updates to underlying sequence models. To help you with these problems, we’re introducing a new set of web services for comparing and grouping variants.

Continue reading

Visualize and Interpret Alignment Data with the Multiple Sequence Alignment Viewer


The NCBI Multiple Sequence Alignment Viewer (MSAV) is a versatile web application that helps you visualize and interpret MSAs for both nucleotide and amino acid sequences. You can display alignment data from many sources, and the viewer is easily embedded into your own web pages with customizable options. An even simpler way to use MSAV is to use our page, upload your data, and share the link to a fully functional viewer displaying your results.

Continue reading

Converting Lots of GI Numbers to Accession.version


As you may have read in previous posts, NCBI is in the process of changing the way we handle GI numbers for sequence records. In short, we are moving to a time when accession.version identifiers, rather than GI numbers, will be the primary identifiers for sequence records.

In a previous post, we outlined a method for converting GI numbers (used to identify sequence records) to accession.version identifiers. That method used the E-utility EFetch and is capable of handling cases where you have no more than a few thousand GI numbers to convert.

What if you have more?

We now have a bulk conversion resource that will allow you to handle very large jobs. The resource consists of a Python script coupled with a database file (about 40 GB uncompressed). You’ll need to download both of these files (gi2accession.py and gi2acc_lmdb.gz) to local disk, and then you can process as needed.

Continue reading

Converting GI Numbers to Accession.version


As you may have read in previous posts, NCBI is in the process of changing the way we handle GI numbers for sequence records.

In short, we are moving to a time when accession.version identifiers, rather than GI numbers, will be the primary identifiers for sequence records.

As part of this transition, an obvious question for any of you currently using GI numbers is how to convert a GI number to an accession.version, so that you can make appropriate updates. The good news is that it’s pretty easy if you have no more than a few thousand GIs to convert.

Continue reading

Identifying and Correlating Chemical Names & Synonyms


This blog post is intended for people who refer to chemical names/symbols and synonyms in databases like PubMed and PubChem, or in their own scientific papers. There is a similar post for gene symbols and names.

During the research and publishing process, scientists need to refer to their chemicals-of-interest. While there are standardized nomenclatures (IUPAC, SMILESInChITM, etc.), different labs sometimes use different names for the same chemical.

The NCBI PubChem project has set up a system to identify and correlate these various names as well as ‘alias’, ‘synonym’, or ‘also known as’ terms that have been used in the literature.

Continue reading

Clearing Up Confusion with Human Gene Symbols & Names Using NCBI Gene Data


This blog post is intended for people who refer to gene symbols or names in databases such as Gene, ClinVar, or PubMed. There is a similar post for chemical names and symbols.

During the research and publishing process, scientists need to refer to their genes-of-interest. However, different labs sometimes use different gene symbols to refer to the same gene. As you can imagine, this leads to confusion.

To standardize the use of terms, the HUGO Gene Nomenclature Committee (HGNC) sets official gene symbols and names. The NCBI Gene resource reports these official gene symbols and names, as well as additional symbols and names that are included on related sequence records for the same gene or from submitted GeneRIFs.

Continue reading

NLM In Focus blog profiles Dr. Kim Pruitt, NCBI Staff Scientist


The inaugural article in NLM In Focus’s new series on NLM scientists features Kim Pruitt, PhD. Dr. Pruitt is a staff scientist at NCBI; she heads the Reference Sequence Database, better known as RefSeq.

In the article, Dr. Pruitt shares her career trajectory as well as pearls of wisdom for young scientists.

Click through to read NLM's profile on Kim Pruitt, PhD.

Click on the picture to read NLM’s profile on Kim Pruitt, PhD.

Introducing Magic-BLAST


Magic-BLAST is a new tool for mapping large sets of next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. Magic-BLAST executables for LINUX, MacOSX, and Windows as well as the source files are available on the FTP site.

Each alignment optimizes a composite score, taking into account simultaneously the two reads of a pair, and in case of RNA-Seq, locating the candidate introns and adding up the score of all exons. Sequencing reads can be provided as NCBI SRA accessions, FASTA or SRA files.

Magic-BLAST implements ideas developed in the NCBI Magic pipeline using the NCBI BLAST libraries. Magic-BLAST is under active development, and we expect the next few releases to occur on a monthly basis. Read more about Magic BLAST on the FTP site.