Tag: Gene Ontology (GO)

NCBI hidden Markov models (HMM) release 10.0 now available!

NCBI hidden Markov models (HMM) release 10.0 now available!

Release 10.0 of the NCBI Hidden Markov models (HMM) used by the Prokaryotic Genome Annotation Pipeline (PGAP) is now available for download. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

The 10.0 release contains 15,360 models maintained by NCBI, including 228 that are new since 9.0, 99 that were modified significantly, and 205 that were assigned better names, EC numbers, Gene Ontology (GO) terms, gene symbols or publications. You can search and view the details for these in the Protein Family Model collection, which also includes conserved domain architectures and BlastRules, and find all RefSeq proteins they name.

GO terms associated with HMMs are now propagated to CDSs and proteins annotated with PGAP. In case you missed it, see our previous blog post on this topic.

Announcing new links and annotations on Conserved Domain Search results!

Announcing new links and annotations on Conserved Domain Search results!

Conserved Domain Search (CD Search) results now show domain architecture information and other annotations that further characterize predicted domain and protein function. These include links to PubMed, Gene Ontology (GO) terms, Enzyme Commission (EC) numbers, and the SPARCLE Domain Architecture Viewer. You can use these links on the results to find literature (PubMed), assign biological roles and protein function (GO and EC), and find proteins with the same domain architecture (Domain Architecture Viewer).  These annotations are currently available for a limited number of architectures, but we will continue to add them  as part of our curation effort.

Figure 1 shows the results of an example CD Search showing these new links.  Note that you can use the GO and EC information provided to retrieve protein models with these annotations from the Protein Family Models database, for example GO:0030246[GOTermId] — molecular function carbohydrate binding or  2.7.11.1[ECNumber]non-specific serine/threonine protein kinase.

Figure 1. Conserved Domain Database search results for a hypothetical protein (XP_007132600.1) from the common bean (Phaseolus vulgaris). The results classify the protein as a plant receptor-like protein kinase. The results also show the EC number and the GO terms associated with this domain architecture, a link to a PubMed citation for the protein family (receptor-like protein kinases), and a link to the Domain Architecture Viewer for G-type lectin S-receptor-like serine/threonine-protein kinases. The Domain Architecture Viewer shows other proteins from the NCBI databases with the same domain architecture (order, number and types of domains).  Continue reading “Announcing new links and annotations on Conserved Domain Search results!”

New in RAPT: Better taxonomic assignment and GO annotation

New in RAPT: Better taxonomic assignment and GO annotation

We are excited to announce two improvements to the Read assembly and Annotation Pipeline Tool (RAPT), which allows you to assemble genomic reads for bacterial or archaeal isolates and annotate their genes at the click of a button.

Improved taxonomic assignment

Now RAPT verifies the scientific name you provide with the reads, and corrects it as needed with the Average Nucleotide Identity (ANI) tool, which compares your genome to type strain assemblies in GenBank to place it in the taxonomic tree. So, even if you only have a rough idea of the species you have sequenced, input datasets tailored to your genome will be used for the annotation and you will get the best possible gene set from RAPT. Continue reading “New in RAPT: Better taxonomic assignment and GO annotation”

NCBI hidden Markov models (HMM) release 8.0 now available!

NCBI hidden Markov models (HMM) release 8.0 now available!

Release 8.0 of the NCBI Hidden Markov models (HMM), used by the Prokaryotic Genome Annotation Pipeline (PGAP), is now available for download. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

The 8.0 release contains 15,358 models, including 160 that are new since 7.0. In addition, we have added better names, EC numbers, Gene Ontology (GO) terms, gene symbols or publications to over 550 existing HMMs. You can search and view the details for these in the Protein Family Model collection, which also includes conserved domain architectures and BlastRules, and find all RefSeq proteins they name.

GO terms associated with HMMs are now propagated to  coding sequences and proteins annotated with PGAP. In case you missed it, see our previous blog post on this topic.

New version of PGAP available now!

We are happy to announce the release of a new version of the stand-alone Prokaryotic Genome Annotation Pipeline (PGAP).

This version of PGAP offers a more streamlined experience to users who are uncertain about the taxonomic classification of the genomes they wish to annotate. Adding one flag to the command (--auto-correct-tax) results in the override of the species name provided on input if the taxonomy verification process predicts a different organism with high confidence. Continue reading “New version of PGAP available now!”