Now Available! NCBI Hidden Markov Models (HMM) Release 20.0

Now Available! NCBI Hidden Markov Models (HMM) Release 20.0

Download release 20.0 of the NCBI protein profile Hidden Markov models (HMMs) used by the Prokaryotic Genome Annotation Pipeline (PGAP). You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package. 

What’s new? 

Release 20.0 contains: 

  • 18,950 HMMs maintained by NCBI 
  • 497 new HMMs since release 19.0 

Continue reading “Now Available! NCBI Hidden Markov Models (HMM) Release 20.0”

Standards for Sequence Read Archive (SRA) Data Submission

Standards for Sequence Read Archive (SRA) Data Submission

What you need to know for 2026 and beyond! 

The volume of genomic sequencing data is growing rapidly, and we want to ensure that publicly shared datasets remain useful, findable, and scientifically trustworthy. To support this goal, the Sequence Read Archive (SRA) is implementing a set of data submission standards that improve data consistency, quality, and long-term value—benefiting both the research community and the data submitters. 

Why new standards? 

High-quality data and metadata ensure that researchers can effectively reuse sequencing data and that SRA can process submissions quickly and accurately. These standards align SRA submissions with evolving INSDC expectations and best practices in genomic data stewardship by: 

    • Preventing common formatting and metadata errors 
    • Ensuring that sequencing runs remain useful for future scientific research 
    • Speeding up SRA data processing and public release timelines 

These standards are expected to go into effect at the end of 2026. Here’s what to expect:  Continue reading “Standards for Sequence Read Archive (SRA) Data Submission”

New Minimum Requirements for BioProject and BioSample Fields

New Minimum Requirements for BioProject and BioSample Fields

BioProject and BioSample support the discovery, access, and reuse of nucleotide sequence data at NCBI. BioProject records provide a centralized place for accessing diverse data generated as part of a single research initiative. BioSample records describe the biological source materials used to generate those data. To ensure data quality, NCBI, in conjunction with the International Nucleotide Sequence Database Collaboration (INSDC), has established minimum criteria for submission acceptance. These changes are part of NCBI’s broader effort to implement INSDC minimum specifications across sequence data resources and submission workflows. 

What’s new? 

Starting early 2027, new validation requirements will be applied during BioProject and BioSample submissions.  Continue reading “New Minimum Requirements for BioProject and BioSample Fields”

New Minimum Requirements for Prokaryotic and Eukaryotic Genome Submissions

New Minimum Requirements for Prokaryotic and Eukaryotic Genome Submissions

High-quality submissions make genome data more valuable for the entire research community. To support data quality and usability, the International Nucleotide Sequence Database Collaboration (INSDC) has established minimum criteria for genome submission acceptance. Genome submissions to GenBank must meet the applicable INSDC minimum requirements to be processed, assigned accession numbers, and released publicly. These changes are part of NCBI’s broader effort to implement INSDC minimum specifications across sequence data resources and submission workflows. 

What’s new? 

Starting January 2027, the following new requirements will be validated at the time of submission in Submission Portal-Genome:  Continue reading “New Minimum Requirements for Prokaryotic and Eukaryotic Genome Submissions”

GenBank Release 272.0 is Available!

GenBank Release 272.0 is Available!

GenBank release 272.0 (6/12/2026) is now available on the NCBI FTP site. This release has 57.69 trillion bases and 6.52 billion records. 

The current release has:  

  • 261,460,182 traditional records containing 7,289,942,983,522 base pairs of sequence data
  • 4,756,526,485 WGS records containing 45,628,511,953,497 base pairs of sequence data
  • 1,058,643,373 bulk-oriented TSA records containing 898,651,493,967 base pairs of sequence data
  • 191,365,090 bulk-oriented TLS records containing 79,162,820,303 base pairs of sequence data

Continue reading “GenBank Release 272.0 is Available!”

BankIt Submitters: GenBank Submission Update

BankIt Submitters: GenBank Submission Update

As previously announced, major changes are being made to enhance your GenBank submission experience. As of April 2026, you can submit to GenBank using our new simplified wizards in Submission Portal-GenBank. 

What now? 
  • Use Submission Portal-GenBank for all GenBank sequence submissions, except for sequence alignments 
  • Use BankIt only if you are submitting aligned sequences for feature propagation 

Continue reading “BankIt Submitters: GenBank Submission Update”

Now Available: RefSeq Release 235

Now Available: RefSeq Release 235

RefSeq release 235 is now available online and from the FTP site! You can access RefSeq data through NCBI Datasets. The release is provided in several directories as a complete dataset and also as divided by logical groupings.    

What’s included in this release? 

As of May 11, 2026, this full release incorporates genomic, transcript, and protein data containing:   

  • 616,942,961 records  
  • 473,570,633 proteins  
  • 81,124,747 RNAs  
  • Sequences from 180,620 organisms  

Continue reading “Now Available: RefSeq Release 235”

New Data Available! Access Hantavirus Sequences at NCBI

New Data Available! Access Hantavirus Sequences at NCBI

Sequence data from the recent Andes hantavirus outbreak are now available through NLM’s NCBI resources, NCBI Virus web interface and NCBI Datasets command-line tool. These data were submitted by the University Hospitals of Geneva. 

Access through NCBI Virus 

To find sequence records from 2026, search for “Orthohantavirus andesense” in NCBI Virus and apply the “Collection Date” filter. To get a quick overview of Andes hantavirus data available through GenBank, visit the NCBI Virus Outbreak Statistics page (select Andes hantavirus) which shows the collection location and host for recently collected samples.   Continue reading “New Data Available! Access Hantavirus Sequences at NCBI”

Protocol Registration and Results System (PRS) Modernization

Protocol Registration and Results System (PRS) Modernization

Phasing out the Classic PRS 

The National Library of Medicine (NLM) has completed modernization of the Protocol Registration and Results System (PRS) and the public website, ClinicalTrials.gov. The Modernized PRS is now the primary system for protocol registration and results submission, and we will continue to add new features based on your feedback!  

Modernized PRS updates and new features 

Highlights of updated and upcoming features:  Continue reading “Protocol Registration and Results System (PRS) Modernization”

New: May 2026 Release of Stand-Alone PGAP

New: May 2026 Release of Stand-Alone PGAP

With New Evidence Source for Protein Naming 

We are happy to announce the release of a new version of the stand-alone Prokaryotic Genome Annotation Pipeline (PGAP)! 

What’s new? 
  • Software updated to CheckM 1.2.5 
  • Ongoing improved protein family model data used for annotation 
  • Pfam release 38 is being used for structural and functional annotation 
  • New evidence source: superfamilies 

As with previous releases, curators at NCBI continue to expand the library of Protein Family Models (PFMs) used by PGAP for structural and functional annotation.  Continue reading “New: May 2026 Release of Stand-Alone PGAP”