Tag: RefSeq

An Updated Bacterial and Archaeal Reference Genome Collection is Available!

An Updated Bacterial and Archaeal Reference Genome Collection is Available!

Download the updated bacterial and archaeal reference genome collection! We built this collection of 22,082 genomes by selecting the “best” genome assembly for each species among the 440,000+ prokaryotic genomes in RefSeq. 

What’s new? 
  • 28 species are represented in this collection for the first time 
  • 228 species are represented by a better assembly 
  • Six species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment 

Continue reading “An Updated Bacterial and Archaeal Reference Genome Collection is Available!”

Now Available! NCBI Hidden Markov Models (HMM) Release 18.0

Now Available! NCBI Hidden Markov Models (HMM) Release 18.0

Download release 18.0 of the NCBI protein profile Hidden Markov models (HMMs) used by the Prokaryotic Genome Annotation Pipeline (PGAP). Search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package. 

What’s new? 

Release 18.0 contains: 

  • 18,057 HMMs maintained by NCBI 
  • 631 new HMMs since release 17.0 

Continue reading “Now Available! NCBI Hidden Markov Models (HMM) Release 18.0”

Now Available: Updated Bacterial and Archaeal Reference Genome Collection

Now Available: Updated Bacterial and Archaeal Reference Genome Collection

Download the updated bacterial and archaeal reference Genome collection! We built this collection of 21,794 genomes by selecting the “best” genome assembly for each species among the 400,000+ prokaryotic genomes in RefSeq, which is 536 more than was included in the January release. Continue reading “Now Available: Updated Bacterial and Archaeal Reference Genome Collection”

RefSeq Release 229 is Now Available!

RefSeq Release 229 is Now Available!

Check out RefSeq release 229, now available online and from the FTP site. You can access RefSeq data through NCBI Datasets. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

What’s included in this release?

As of March 3, 2025, this full release incorporates genomic, transcript, and protein data containing:

  • 522,879,448 records
  • 399,577,538 proteins
  • 68,985,910 RNAs
  • Sequences from 164,117 organisms 

Continue reading “RefSeq Release 229 is Now Available!”

An updated bacterial and archaeal reference genome collection is available!

An updated bacterial and archaeal reference genome collection is available!

Download the updated bacterial and archaeal reference genome collection! We built this collection of 21,258 genomes by selecting the “best” genome assembly for each species among the 400,000+ prokaryotic genomes in RefSeq.

What’s new?

As previously announced, we updated our release process:

  1. There is now an incremental process. In addition to quarterly releases, there will be weekly updates to create references for new species which do not have a reference genome and to correct any inconsistencies in the set of references due to taxonomic merges. As a result, there may be more frequent updates to the reference set.
  2. There is now a history tracking file available under the ASSEMBLY_REPORTS path on FTP that lists the history of reference genome selection, including both prokaryotes and eukaryotes. 

Continue reading “An updated bacterial and archaeal reference genome collection is available!”

NCBI Resources Highlighted in 2025 Nucleic Acids Research Database Issue

NCBI Resources Highlighted in 2025 Nucleic Acids Research Database Issue

The 2025 Nucleic Acids Research Database Issue features papers from NCBI staff on ClinVar, PubChem, GenBank, RefSeq, and more. The citations are available in PubMed with full-text available in PubMed Central (PMC). To read an article, click on the PMCID number listed below. 

Database resources of the National Center for Biotechnology Information in 2025

PMCID: PMC11701734

NCBI provides online information resources for biology, including the GenBank® nucleic acid sequence repository and the PubMed® repository of citations and abstracts published in life science journals. NCBI is currently developing the NIH Comparative Genomics Resource (CGR) to facilitate reliable comparative genomics analyses with an NCBI Toolkit and community collaboration.

Continue reading “NCBI Resources Highlighted in 2025 Nucleic Acids Research Database Issue”

RefSeq Release 228 is Available!

RefSeq Release 228 is Available!

Check out RefSeq release 228, now available online and from the FTP site. You can access RefSeq data through NCBI Datasets. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

What’s included in this release?

As of January 3, 2025, this full release incorporates genomic, transcript, and protein data containing:

  • 513,096,240 records, including
  • 391,903,900 proteins
  • 67,997,702 RNAs
  • Sequences from 162,138 organisms 

Continue reading “RefSeq Release 228 is Available!”

Now Available! NCBI Hidden Markov Models (HMM) Release 17.0

Now Available! NCBI Hidden Markov Models (HMM) Release 17.0

Download release 17.0 of the NCBI protein profile Hidden Markov models (HMMs) used by the Prokaryotic Genome Annotation Pipeline (PGAP). Search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

What’s New?

Release 17.0 contains:

  • 17,433 HMMs maintained by NCBI
  • 386 new HMMs since release 16.0

Continue reading “Now Available! NCBI Hidden Markov Models (HMM) Release 17.0”

RefSeq Release 227 is Available!

RefSeq Release 227 is Available!

Check out RefSeq release 227, now available online and from the FTP site. You can access RefSeq data through NCBI Datasets. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

What’s included in this release?

As of November 4, 2024, this full release incorporates genomic, transcript, and protein data containing:

  • 497,549,107 records, including
  • 377,783,847 proteins
  • 66,987,567 RNAs
  • Sequences from 159,324 organisms 

Continue reading “RefSeq Release 227 is Available!”

Expansion of Ortholog Data for RefSeq Arthropods

Expansion of Ortholog Data for RefSeq Arthropods

250K+ new Hymenoptera orthologs added 

NCBI is excited to announce the expansion of ortholog data for RefSeq arthropods. This update expands the breadth of arthropod orthology information, offering new insights into evolutionary biology, gene function, and shared pathways. Whether you’re studying insect genetics, developmental biology, or comparative genomics, the expanded ortholog data opens up new possibilities for research. Check out our previous blog to learn how to access the orthologs using NCBI Datasets.  Continue reading “Expansion of Ortholog Data for RefSeq Arthropods”