Tag: Hidden Markov Models (HMM)

NCBI Hidden Markov Models (HMM) Release 13.0 Now Available!

NCBI Hidden Markov Models (HMM) Release 13.0 Now Available!

Release 13.0 of the NCBI protein profile Hidden Markov models (HMMs) used by the Prokaryotic Genome Annotation Pipeline (PGAP) is now available for download. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

What’s new?

The 13.0 release contains:

  • 16,143 HMMs maintained by NCBI
  • 315 new HMMs since release 12.0
  • 286 HMMs with better names, EC numbers, Gene Ontology (GO) terms, gene symbols or publications

Continue reading “NCBI Hidden Markov Models (HMM) Release 13.0 Now Available!”

New! May 2023 Release of Stand-Alone PGAP

New! May 2023 Release of Stand-Alone PGAP

We are happy to announce the release of a new version of the stand-alone Prokaryotic Genome Annotation Pipeline (PGAP) with many exciting new features.

Improved user interface

This version has an improved user interface that takes the genome FASTA file and associated organism name directly on the command line. For example, to annotate a Vibrio cholerae genome sequence in the file Vchol.fasta:

pgap.py -r -g Vchol.fasta -s 'Vibrio cholerae' -o Vchol.annot

For more details visit our Quick Start page. Continue reading “New! May 2023 Release of Stand-Alone PGAP”

NCBI Hidden Markov Models (HMM) Release 12.0 Now Available!

NCBI Hidden Markov Models (HMM) Release 12.0 Now Available!

Release 12.0 of the NCBI protein profile Hidden Markov models (HMMs) used by the Prokaryotic Genome Annotation Pipeline (PGAP) is now available for download. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

What’s new?

The 12.0 release contains:

  • 15,849 HMMs maintained by NCBI
  • 271 new HMMs since release 11.0
  • 1,248 HMMs with better names, EC numbers, Gene Ontology (GO) terms, gene symbols or publications

Continue reading “NCBI Hidden Markov Models (HMM) Release 12.0 Now Available!”

NCBI hidden Markov models (HMM) release 11.0 now available!

NCBI hidden Markov models (HMM) release 11.0 now available!

Release 11.0 of the NCBI protein profile Hidden Markov models (HMMs) used by the Prokaryotic Genome Annotation Pipeline (PGAP) is now available for download. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package. Continue reading “NCBI hidden Markov models (HMM) release 11.0 now available!”

NCBI hidden Markov models (HMM) release 10.0 now available!

NCBI hidden Markov models (HMM) release 10.0 now available!

Release 10.0 of the NCBI Hidden Markov models (HMM) used by the Prokaryotic Genome Annotation Pipeline (PGAP) is now available for download. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

The 10.0 release contains 15,360 models maintained by NCBI, including 228 that are new since 9.0, 99 that were modified significantly, and 205 that were assigned better names, EC numbers, Gene Ontology (GO) terms, gene symbols or publications. You can search and view the details for these in the Protein Family Model collection, which also includes conserved domain architectures and BlastRules, and find all RefSeq proteins they name.

GO terms associated with HMMs are now propagated to CDSs and proteins annotated with PGAP. In case you missed it, see our previous blog post on this topic.

NCBI hidden Markov models (HMM) release 8.0 now available!

NCBI hidden Markov models (HMM) release 8.0 now available!

Release 8.0 of the NCBI Hidden Markov models (HMM), used by the Prokaryotic Genome Annotation Pipeline (PGAP), is now available for download. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

The 8.0 release contains 15,358 models, including 160 that are new since 7.0. In addition, we have added better names, EC numbers, Gene Ontology (GO) terms, gene symbols or publications to over 550 existing HMMs. You can search and view the details for these in the Protein Family Model collection, which also includes conserved domain architectures and BlastRules, and find all RefSeq proteins they name.

GO terms associated with HMMs are now propagated to  coding sequences and proteins annotated with PGAP. In case you missed it, see our previous blog post on this topic.

New version of PGAP available now!

We are happy to announce the release of a new version of the stand-alone Prokaryotic Genome Annotation Pipeline (PGAP).

This version of PGAP offers a more streamlined experience to users who are uncertain about the taxonomic classification of the genomes they wish to annotate. Adding one flag to the command (--auto-correct-tax) results in the override of the species name provided on input if the taxonomy verification process predicts a different organism with high confidence. Continue reading “New version of PGAP available now!”

New PGAP release: Structural and functional annotation improvements

A new version of the Prokaryotic Genome Annotation Pipeline (PGAP) is available on GitHub. With this release, you can expect:

  • Incremental improvements in structural annotation, driven by increased weight of GeneMarkS2+ ab initio models at loci with only weak evidence, such as low identity and low coverage protein alignments or partial HMM signatures.
  • Better structural annotation and more specific functional annotation as a result of the incorporation of PFAM 34 and extensive curation of HMMs, BlastRules and Conserved Domain architectures by NCBI experts.
  • Fewer overly stringent calls by the taxonomy verification module for several species, including the human pathogens Listeria monocytogenes, Campylobacter lari, and Vibrio vulnificus. This is a result of manual review and adjustment of the minimum percent identity thresholds used by the Average Nucleotide Identity tool.
  • Multiple bug fixes. Notably, users of Azure Debian 10 machines can now run PGAP successfully, as we have incorporated GeneMarkS2+ compiled under Linux kernel 3 into the PGAP image.

Please try this release and send us your feedback!

New models added to the NCBI Hidden Markov models (HMM) collection with release 7.0

Release 7.0 of the NCBI Hidden Markov models (HMM), used by the Prokaryotic Genome Annotation Pipeline (PGAP), is now available for download. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

Figure 1. Recently added HMM-based Protein Family Model for the histidine-histamine antiporter family (NF040512), with GO terms (framed in red).

Continue reading “New models added to the NCBI Hidden Markov models (HMM) collection with release 7.0”

RefSeq release 208 is available!

RefSeq release 208 is available!

RefSeq release 208 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of September 7, 2021, and contains 288,903,207 records, including 210,703,648 proteins, 40,213,945 RNAs, and sequences from 113,002 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings. Continue reading “RefSeq release 208 is available!”