Institutional Repositories in PubMed: a new quick way to free full texts


New icons to click-through to free full texts are starting to appear in PubMed. They take you directly to the publication uploaded in an institutional repository (IR). Here’s an example:

DeepBlue

This one is from Deep Blue, University of Michigan’s Library IR. When you see it on a publication like this one on Ebola, you can get free access to the publication there.

The icons only appear when there is no free full text via the journal or PMC (PubMed Central). So far, only 4 IRs with eligible publications are participating – you can see which ones they are here. They already expand access to around 25,000 publications.

The NCBI program that enables this is LinkOut. You can read more about it in the NLM Technical Bulletin. IRs can apply by email to join LinkOut. And if you are an author at an institution with a repository, support your IR and enable more people to read your work.

Complete RefSeq genome annotation results represented in UCSC genome browser


NCBI’s RefSeq project provides comprehensive annotation of the human and other eukaryotic genomes through a combination of curation and an evidence-based eukaryotic genome annotation pipeline. Our curated records, ‘Known RefSeqs’, can be identified by the accession prefix (NM_, NR_, NG_, NP_). Model RefSeq records (XM_, XR_, and XP_ accession prefixes) are predicted based on transcript evidence (RNA-Seq and more) and protein support from Known RefSeqs, Swiss-Prot, and select INSDC records.

We recognize that many scientists access genome annotation data from one of three sources – NCBI, Ensembl, or UCSC. NCBI provides access to the human (and other) genome annotation results in the Genome Data Viewer, by BLAST and FTP, and per gene in NCBI’s Gene resource. Ensembl provides RefSeq annotation information based directly on the FTP content that NCBI releases.  In the past, UCSC has provided a partial dataset of RefSeq human genome annotation content by aligning Known RefSeq transcripts to the genome using BLAT. Using this approach, additional model RefSeq transcript variants, non-transcribed pseudogenes, and immunoglobulin and T-cell receptor regions, were not available through UCSC services. In rare cases the independent alignment method resulted in small differences in the exon structure compared to NCBI’s placement details as well as some ambiguous placements for transcripts originating from very similar paralogs that are uniquely placed within the NCBI dataset.

Continue reading

PubMed Citations: A New, Faster Process for Correcting Errors


This blog post is directed toward all authors who have articles in PubMed.

Have you ever discovered that your name isn’t spelled correctly in the citation on a PubMed record, or that there are mistakes in your affiliation, the title of the abstract, or other citation data?

We have good news: recently, NLM released the PubMed Data Management System (PMDM), which allows publishers to correct PubMed citation data directly. If you’re an author who has found citation mistakes in PubMed, you should contact the publisher of the journal, and they will make the changes. Changes made in PMDM, should appear in PubMed within 1-2 days.

Authors who report citation errors to NLM will be asked to contact the publisher directly. However, NLM will continue to investigate and address error reports that relate to our value-added data, such as MeSH Headings.

We’re hoping that this new process will shorten and simplify the process of correcting citation errors. You can read more about PMDM in the NLM Technical Bulletin. Please let us know if you have questions or comments, and we’re looking forward to more error-free citations!

Bottlenose dolphin annotation release 101


Annotation Release 101 for the bottlenose dolphin (Tursiops truncatus) is out in RefSeq! This annotation was based on the NIST Tur_tru v1 assembly, which has a four-fold increase in contiguity from the assembly used in the previous annotation. Over four billion RNA-Seq reads from skin and blood tissue were used for gene prediction. As a result of these improvements, the percent of partially-represented protein-coding genes went down from 24% to 4%. Over 2500 genes that were fragmented in the previous assembly were merged into complete genes. A total of 24,026 genes were annotated, and 17,096 of them were protein-coding. A full report on the annotation can be found here.

Continue reading

New video on YouTube: Embed the NCBI Sequence Viewer into Your Pages


The newest video on the NCBI YouTube channel introduces the Sequence Viewer embedding API. A few quick examples illustrate how easy it is to embed Sequence Viewer into your own pages.

Sequence Viewer is a graphical view of sequences and color-coded annotations on regions of sequences stored in the Nucleotide and Protein databases.

Subscribe to the NCBI YouTube channel to receive alerts about new videos ranging from quick tips to full webinar presentations.

NLM Webinar series: “Insider’s Guide to Accessing NLM Data: EDirect for PubMed”


Beginning February 21, 2017, the National Library of Medicine (NLM) will present the three-part webinar series “Insider’s Guide to Accessing NLM Data: EDirect for PubMed.”

This series of workshops will introduce new users to the basics of using EDirect to access exactly the PubMed data you need, in the format you need. Over the course of three 90-minute sessions, students will learn how to use EDirect commands in a Unix environment to access PubMed, design custom output formats, create basic data pipelines to get data quickly and efficiently, and develop simple strategies for solving real-world PubMed data-gathering challenges. No prior Unix knowledge is required; novice users are welcome!

Continue reading

SmartBLAST updated to provide more information, database matches


The SmartBLAST service has recently been updated to emphasize matches to the landmark database, which comprises the proteomes from 26 well-curated genomic assemblies. The display also now presents more information about conserved domains and details about the query.

SmartBLAST quickly finds the closest relatives to a protein query and evaluates the phylogenetic relationship among the query and matched sequences. You can start a SmartBLAST search from the SmartBLAST page or the BLAST home page. Read more about SmartBLAST on NCBI Insights.

New Web Services for Comparing and Grouping Sequence Variants


This blog post is intended for geneticists and dataflow engineers who need to compare genetic variants.

Have you ever tried to determine if two genetic variants are the same? If so, you’re not alone. There are competing ways to represent variants, handling ambiguous assignments, as well as reconciling updates to underlying sequence models. To help you with these problems, we’re introducing a new set of web services for comparing and grouping variants.

Continue reading

Visualize and Interpret Alignment Data with the Multiple Sequence Alignment Viewer


The NCBI Multiple Sequence Alignment Viewer (MSAV) is a versatile web application that helps you visualize and interpret MSAs for both nucleotide and amino acid sequences. You can display alignment data from many sources, and the viewer is easily embedded into your own web pages with customizable options. An even simpler way to use MSAV is to use our page, upload your data, and share the link to a fully functional viewer displaying your results.

Continue reading

Converting Lots of GI Numbers to Accession.version


As you may have read in previous posts, NCBI is in the process of changing the way we handle GI numbers for sequence records. In short, we are moving to a time when accession.version identifiers, rather than GI numbers, will be the primary identifiers for sequence records.

In a previous post, we outlined a method for converting GI numbers (used to identify sequence records) to accession.version identifiers. That method used the E-utility EFetch and is capable of handling cases where you have no more than a few thousand GI numbers to convert.

What if you have more?

We now have a bulk conversion resource that will allow you to handle very large jobs. The resource consists of a Python script coupled with a database file (about 40 GB uncompressed). You’ll need to download both of these files (gi2accession.py and gi2acc_lmdb.gz) to local disk, and then you can process as needed.

Continue reading