Category: What’s New

NCBI hidden Markov models (HMM) release 11.0 now available!

NCBI hidden Markov models (HMM) release 11.0 now available!

Release 11.0 of the NCBI protein profile Hidden Markov models (HMMs) used by the Prokaryotic Genome Annotation Pipeline (PGAP) is now available for download. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package. Continue reading “NCBI hidden Markov models (HMM) release 11.0 now available!”

Announcing GenBank Release 253.0

Announcing GenBank Release 253.0

GenBank release 253.0 (12/20/2022) is now available on the NCBI FTP site. This release has 21.38 trillion bases and 3.25 billion records. The current release has 241,015,745 traditional records containing 1,635,594,138,493 base pairs of sequence data. There are also 2,241,439,349 WGS records containing 19,086,596,616,569 base pairs of sequence data, 649,918,843 bulk-oriented TSA records containing 611,850,391,049 base pairs of sequence data, and 115,552,377 bulk-oriented TLS records containing 44,009,657,455 base pairs of sequence data. 

Growth between releases

During the 62 days between the close dates for GenBank Releases 252.0 and 253.0, the traditional portion of GenBank grew by 72,630,771,642 base pairs and by 476,463 sequence records. We updated 252,865 records during that same period. We added and/or updated an average of 11,763 traditional records per day! Continue reading “Announcing GenBank Release 253.0”

New RefSeq Annotations!

New RefSeq Annotations!

In October and November, the NCBI Eukaryotic Genome Annotation Pipeline released thirty-one new annotations in RefSeq for the following organisms:

  • Acanthochromis polyacanthus (spiny chromis)
  • Acomys russatus (golden spiny mouse)
  • Andrographis paniculata (eudicot)
  • Antechinus flavipes (yellow-footed antechinus)
  • Apodemus sylvaticus (European woodmouse)
  • Apus apus (common swift)
  • Arachis duranensis (eudicot)
  • Continue reading “New RefSeq Annotations!”
Improving how SRA data is distributed

Improving how SRA data is distributed

NCBI will be incrementally streamlining the Sequence Read Archive (SRA) data distribution model over the next year as SRA Lite becomes the standard SRA file format. This simplified format reduces the average file size for more efficient analysis and storage of large datasets. SRA is the largest publicly available repository of high throughput sequencing data and is available through cloud providers and NCBI servers. Depending on the way you currently access SRA data, your experience may change. If you are using the SRA Toolkit, you can continue to set your location and file format preferences and allow the toolkit to select the best distribution point given your location.  Continue reading “Improving how SRA data is distributed”

Find your favorite gene in aligned assemblies!

Find your favorite gene in aligned assemblies!

New search feature in the Comparative Genome Viewer (CGV)

You asked, we listened! We are pleased to announce that you can now search for a gene in NCBI’s Comparative Genome Viewer (CGV) and navigate directly to its location in the viewer.

Maybe you’re studying a particular gene or gene family, and you want to see if that gene is annotated in the assemblies you’re viewing. Or maybe you know that a gene is annotated on one of the assemblies, but you want to obtain the coordinates of the aligned region on an unannotated assembly. These coordinates may help you find the ortholog for the gene in the aligned assembly.

To start, simply enter a gene symbol, name, or description in the search box at the top of the alignment in GDV and click ‘Search’. Continue reading “Find your favorite gene in aligned assemblies!”

Join NCBI at PAG 30

Join NCBI at PAG 30

San Diego, January 13-18, 2023 

NCBI is looking forward to seeing you in person at the International Plant and Animal Genome Conference (PAG 30), January 13-18, 2023 in San Diego, California.  

We’re especially excited to share our recent efforts on the NIH Comparative Genomics Resource (CGR), a multi-year National Library of Medicine (NLM) project to maximize the impact of eukaryotic research organisms and their genomic data resources on biomedical research.  

We also want to hear from you! If you’re interested in sharing your feedback on your needs and experiences involving comparative genomics tools to inform CGR, consider joining our Feedback Session.

Check out NCBI’s schedule of activities and events:  

Continue reading “Join NCBI at PAG 30”

Announcing the NCBI SARS-CoV-2 Variant Calling Pipeline and Related Data Products

Announcing the NCBI SARS-CoV-2 Variant Calling Pipeline and Related Data Products

Still waiting for an analysis pipeline that can uniformly process raw sequence data produced by a variety of sequencing platforms? Your wait is over! Announcing the SARS-CoV-2 Variant Calling Pipeline, which is now operational and optimized to provide support for multiple sequencing platforms including, Illumina, Oxford Nanopore, and PacBio.

This new pipeline can make allele frequency calls equal to or above 15%. See our publication preprint and our GitHub repository for more details. This optimized pipeline is a result of the efforts of the COVID-19 research community, led by the NIH Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) initiative, a public-private partnership for a coordinated research strategy to support and speed up the development of COVID-19 treatments and vaccines. Continue reading “Announcing the NCBI SARS-CoV-2 Variant Calling Pipeline and Related Data Products”

New Proximity Search Feature Available in PubMed

New Proximity Search Feature Available in PubMed

PubMed, a free National Library of Medicine (NLM) resource supporting the search and retrieval of biomedical and life sciences literature, has a brand-new feature! With proximity search, you can now search for multiple terms appearing in any order within a specified distance of one another in the [Title] or [Title/Abstract] fields. 

Proximity search adds another useful tool to your search toolkit. Proximity searching can be particularly helpful when seeking concepts that may be represented in multiple ways, or to capture variations of a phrase. There is often more than one way to search for a concept. You may try searching for the same terms using a variety of techniques (e.g., combining terms with AND, searching for an exact phrase) and compare the results to help you decide which option(s) to use. You can also build queries that combine proximity searches with other search terms using Boolean operators.  Continue reading “New Proximity Search Feature Available in PubMed”

Updated PubMed E-Utilities Now Live!

Updated PubMed E-Utilities Now Live!

We’ve launched the updated version of E-Utilities API for PubMed. Thank you to all who tested the updated API on the test server and provided feedback.   

This updated version now aligns the functions of the E-utilities with the web version of PubMed released in 2020. For example, search results returned by the updated ESearch E-utility will now match those of web PubMed. To be clear, this update only affects E-utility calls with &db=pubmed. The behavior of all other Entrez databases will not change. 

Why did NCBI do this? 

NCBI released this new API version to provide both consistent behavior for both web and API PubMed interfaces, as well as more reliable performance. To accomplish this, we transferred all E-utility functions to the technology stack that supports web PubMed, so that all PubMed requests use the same stack. This means that previous version of the PubMed E-utilities is no longer available, but that the new version provides the benefits listed above 

Have the URLs for PubMed E-utility calls changed? 

Previous E-utility URLs for PubMed (&db=pubmed) will continue to function with this updated release, with one exception. To obtain more than 10,000 PubMed records, consider using EDirect, which now contains additional logic to batch PubMed search results automatically so that an arbitrary number can be retrieved. See our updated documentation for more details. 

Has the output of PubMed E-utility calls changed? 

Again, in almost all cases, no. Here are the exceptions:  

  • ESearch will now return the same PubMed IDs (PMIDs) that are currently returned by web PubMed 
  • EFetchwill now return XML data by default (&retmode is not set) rather than ASN.1. In other words, the default value of &retmode will become “xml”. 

What should I do if I have trouble using the new API? 

Write to us  if you have any questions or concerns. 

NEW! Streamlining ClinVar Submission of Assertion Criteria

NEW! Streamlining ClinVar Submission of Assertion Criteria

ClinVar is a freely available submission-driven database for information about genomic variation and its relationship to human health. ClinVar holds more than 1.5 million variants, and is powered by submitters around the world, who provide us with their assessments, the evidence, and the criteria they use to guide their interpretation process and come to their conclusions. To streamline the ClinVar submission process, we are simplifying how submitters provide their assertion criteria. In the past, assertion criteria were provided for each variant. Moving forward, one single set of assertion criteria will be associated with an entire submission regardless of the number of variants.  Continue reading “NEW! Streamlining ClinVar Submission of Assertion Criteria”