August 14 Webinar: An updated PubMed is on its way!

On Wednesday, August 14, 2019 at 11AM, NCBI staff will show you PubMed Labs, a test site that will become the default PubMed early next year. You will get a preview of the new, modern interface, updated features including advanced search, clipboard, options for sharing results, and the new “cite” button. You’ll also learn about features that are still under development and how to give us your feedback on the new PubMed.

The August 14 webinar session is full. We will make the recording available and are offering an encore session on August 28, 2019. 

Register for the August 28 session.

Date: Wed, Aug 14, 2018
Time: 11:00 AM – 11:45 AM EDT


After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

The UniGene web pages are now retired

As we previously announced,  we planned to retire the UniGene web pages at the end of July, 2019.   All UniGene pages now redirect to this post. We have also removed links to UniGene from the NCBI home page and other resources.

Although the web pages are no longer available, you will still be able to download the final UniGene builds as static content from the FTP site.  You will also be able to match UniGene cluster numbers to Gene records by searching Gene with UniGene cluster numbers. For best results, restrict to the “UniGene Cluster Number” field rather than all fields in Gene.  For example, a search with Mm.2108[UniGene Cluster Number] finds the mouse  transthyretin Gene record (Ttr).  You can use the advanced search page to help construct these searches. Keep in mind that the Gene record contains selected Reference Sequences and GenBank mRNA sequences rather than the larger set of expressed sequences in the UniGene cluster.

Please write to us with any comments, concerns, or if you need help using UniGene data.

Evidence for naming the protein now on non-redundant refseq records (WP_ accessions)

We are now showing the curated evidence used for assigning names and, if possible, gene symbols, publications, and Enzyme Commission numbers on nearly 70% (83 million) microbial RefSeq proteins. This evidence includes a hierarchical collection of curated Hidden Markov Model (HMM)-based and BLAST-based protein families, and conserved domain architectures.

On a protein record such as WP_004152100.1,  you can follow the link (NF033727.1) in the Evidence Accession field of the Evidence-For-Name-Assignment comment block (Figure 1) to find out more about the naming evidence, including the thresholds used for defining a match and access to all the prokaryotic proteins that match the evidence (Figure 2). WP_Evid_1Figure 1: The Evidence-For-Name-Assignment block on WP_004152100.1. The name “arsenite efflux transporter metallochaperone ArsD” is based on its match to the evidence NF033727.1, a Hidden Markov model that defines a family of arsenite efflux transporter metallochaperones. Proteins named for this evidence also inherit publications and a gene symbol (arsD) from NF033727.1.

HMM_topandbottomFigure 2: Naming evidence NF033727.1, a Hidden Markov model.  The top part of the page contains a short text description for the protein family defined by the evidence, the thresholds to be included in the family defined by the evidence, and the publications associated with the protein family.  The lower part of the page provides the RefSeq proteins in the family, named by the present evidence (left tab), or named using evidence with a higher-precedence (right tab). You can filter and download the list too!

Sixty-nine percent of available prokaryotic RefSeq proteins now have the Evidence-For-Name-Assignment comment block. The remaining 31% are not yet covered by the evidence system and are named based on BLAST hits to a non-curated collection of protein cluster representatives.

What does this mean for you?

  • You can better differentiate proteins with functional annotation that is based on curated evidence versus Blast hits to a non-curated database. The query “Evidence-For-Name-Assignment[Properties]” in the Protein resource returns all proteins with names based on a curated evidence.
  • You can find and download all archaeal and bacterial proteins that are matched to the same evidence.
  • You can get your publication cited on protein records by providing NCBI better names for a protein.

We welcome your input! Please send your suggestions and feedback to the NCBI Help Desk.

EST and GSS databases now retired

In July 2018, NCBI announced plans to retire the EST and GSS databases, and we have now implemented these changes. We will continue to accept submissions of EST and GSS sequences, but will no longer provide special processes for these sequence types. If you want to submit EST and GSS data, please use tbl2asn. For further details, please visit or or contact

We thank all past and present submitters of EST and GSS data for the invaluable benefit these data have provided to numerous genomic sequencing projects over the years. Please let us know if you have any questions or concerns about these changes!

A new way to find an expanded set of similar genes

We recently showed you a new a way to search for and view sets of orthologous genes  from vertebrates. You can now get an additional set of search results that we are calling similar genes.  These are related through protein architecture to the orthologous gene set and include genes from all metazoans and selected plant, fungal, and protist species. You can quickly find related genes within a species, compare them to those from other annotated metazoan genomes, and have access to other useful gene resources. To find a set of similar genes, enter a gene symbol or select the gene symbol + orthologs option from the selections menu.

For example if you search for ‘AGO2 orthologs‘,  in addition to the  link to orthologs from vertebrates, you’ll get a link to a set of similar genes (Genes with similar protein architectures) across a broad evolutionary spectrum that includes genes from invertebrates, fungi, and green plants (Figure 1).

AGO2_Fig1Figure 1.  Genes with similar protein architectures to AGO2. The original search was AGO2 orthologs, which brings up the suggestion box with the links to similar genes as well as the AGO2 vertebrate orthologs. The similar genes include entries from a broad taxonomic range of eukaryotic organisms.

If you search for ‘GH1‘, you’ll get a link to similar genes that includes members of the growth hormone family that are not part of NCBI’s vertebrate ortholog set.

GH1_Fig2.pngFigure 2. The human subset of genes with similar protein architectures to GH1 showing other members (paralogs) of the GH1 gene family (GH2, CSH1, CSH2, CSHL1). These are not included in the ortholog set.

Try out the  following searches and follow the links to the Genes with similar protein architectures

Please  let us know what you think!

Attention GEO users: Use new GEO FTP subdirectories

On February 1, 2020, NCBI will decommission the following FTP subdirectories for GEO:


Continue reading

RefSeq release 95: naming evidence added to all relevant WP proteins

RefSeq release 95 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available, as of July 8, 2019 and contains 206,416,381 records, including 146,381,777 proteins, 27,212,750 RNAs, and sequences from 93,618 organisms.

Continue reading

Primer-BLAST now offers help with irrelevant off-target matches

Primer-BLAST, NCBI’s primer-designer and specificity-checker, now offers a way to help you with irrelevant off-target matches.

Sometimes Primer-BLAST can’t design specific primers for your target sequence because of similar non-target sequences in the database. In some cases, you may know that these non-target matches are not important your research and are safe to ignore.  Examples may include tissue-specific splice variants, redundant entries, and predicted sequences.  To help in these cases, you can now choose to allow certain off-target matches. This gives Primer-BLAST greater freedom in primer selection and a better chance of finding highly specific primers.

Continue reading

Virus hunting in the cloud: A hackathon story at ASV 2019

Are you going to ASV 2019?

If you are, join us in a few days for a workshop on the virus hunting hackathon we helped run earlier this year.

Session: Workshop #19: Virus Discovery

Program Number: W-19-8

Time: Sunday, July 21, 7:00 PM CDT

Location: Mayo Auditorium

In this workshop, Dr. Rodney Brister will talk about how 41 scientists from 21 organizations worked to improve the usability of SRA data, identifying datasets that included known viruses and viral signals. Not only is that information now being integrated into a public search interface, but the approach used is also being refined in future hackathons so it can be applied to all SRA datasets.

We hope to see you there!

Have you tried OSIRIS, NCBI’s STR analysis tool?

More than 5 years ago, NCBI brought you OSIRIS (Open Source Independent Review and Interpretation System), a free, open-access tool for powerful and intelligent Short Tandem Repeat (STR) analysis.

Short Tandem Repeats (STRs) are repeated short stretches of DNA and are analyzed by measuring the length of the repeated region. They vary from individual to individual and are passed from parent to child.  STR analysis is broadly used in medicine, research and law enforcement – for stem cell transplants, diseases like Huntington’s, verifying research cell lines and samples, determining family relationships, and in criminal cases. In this blog post, we explore how you use OSIRIS in the real world and how your feedback has helped us improve this product. Continue reading