The new Protein Family Model resource (Figure 1) provides a way for you to search across the evidence used by the NCBI annotation pipelines to name and classify proteins. You can find protein families by gene symbol, protein function, and many other terms. You have access to related proteins in the family and publications describing members. Protein Family Models includes protein profile hidden Markov models (HMMs) and BlastRules for prokaryotes, and conserved domain architectures for prokaryotes and eukaryotes. The HMMs in the collection include Pfam models, TIGRFAMs as well as models developed at NCBI either de novo, or from NCBI protein clusters. Each of the BlastRules (PMCID: 5753331) consists of one or more model proteins of known biological function with BLAST identity and coverage cutoffs. The conserved domain architectures are based on BLAST-compatible Position Specific Score Matrices (PSSMs) that constitute the NCBI Conserved Domain database.Figure 1. Protein Family Model resource pages. Top panel. Home page. Middle panel, selected results summaries from a fielded search for the DnaK gene product (DnaK[Gene Symbol]). Bottom panel, a portion of an HMM record for DnaK derived from NCBI Protein Clusters (NF009946). The record also includes PubMed citations and HMMER analyses showing the RefSeq proteins named by this method.
All of these models aid gene annotation of whole genomes by the Prokaryotic Genome Annotation Pipeline (PGAP). The model attributes, such as protein name, publication, gene symbol, or EC number are propagated by PGAP to the RefSeq proteins they match. You can easily navigate back and forth between RefSeq proteins and family models since proteins that derive their names from these expert-curated models contain an Evidence-For-Name-Assignment comment block with a link to the family model, and the family model records list the RefSeq proteins in the family. See our previous post for more details.
You can download and use the profiles from the Protein Family Models in you own analyses. The full HMM collection, updated to release 4.0, is available for you to search against your own set of proteins using HMMER. Similarly, you can use the BlastRules definitions to generate a database for searching using BLAST. You can also search your own proteins against the NCBI conserved domain PSSMs online using the Conserved Domain Database search service. Or you can download the conserved domain PSSMs and classify proteins by conserved domain architecture using the standalone NCBI tools Reverse PSI-BLAST (RPS-BLAST), part of the BLAST+ package, and the Subfamily Protein Architecture Labeling Engine (SPARCLE).