Category: What’s New

Updated PubMed E-Utilities Now Live!

Updated PubMed E-Utilities Now Live!

We’ve launched the updated version of E-Utilities API for PubMed. Thank you to all who tested the updated API on the test server and provided feedback.   

This updated version now aligns the functions of the E-utilities with the web version of PubMed released in 2020. For example, search results returned by the updated ESearch E-utility will now match those of web PubMed. To be clear, this update only affects E-utility calls with &db=pubmed. The behavior of all other Entrez databases will not change. 

Why did NCBI do this? 

NCBI released this new API version to provide both consistent behavior for both web and API PubMed interfaces, as well as more reliable performance. To accomplish this, we transferred all E-utility functions to the technology stack that supports web PubMed, so that all PubMed requests use the same stack. This means that previous version of the PubMed E-utilities is no longer available, but that the new version provides the benefits listed above 

Have the URLs for PubMed E-utility calls changed? 

Previous E-utility URLs for PubMed (&db=pubmed) will continue to function with this updated release, with one exception. To obtain more than 10,000 PubMed records, consider using <EDirect>, which now contains additional logic to batch PubMed search results automatically so that an arbitrary number can be retrieved. See our updated documentation for more details. 

Has the output of PubMed E-utility calls changed? 

Again, in almost all cases, no. Here are the exceptions:  

  • ESearch will now return the same PubMed IDs (PMIDs) that are currently returned by web PubMed 
  • EFetchwill now return XML data by default (&retmode is not set) rather than ASN.1. In other words, the default value of &retmode will become “xml”. 

What should I do if I have trouble using the new API? 

Write to us  if you have any questions or concerns. 

NEW! Streamlining ClinVar Submission of Assertion Criteria

NEW! Streamlining ClinVar Submission of Assertion Criteria

ClinVar is a freely available submission-driven database for information about genomic variation and its relationship to human health. ClinVar holds more than 1.5 million variants, and is powered by submitters around the world, who provide us with their assessments, the evidence, and the criteria they use to guide their interpretation process and come to their conclusions. To streamline the ClinVar submission process, we are simplifying how submitters provide their assertion criteria. In the past, assertion criteria were provided for each variant. Moving forward, one single set of assertion criteria will be associated with an entire submission regardless of the number of variants.  Continue reading “NEW! Streamlining ClinVar Submission of Assertion Criteria”

Re-evaluating the BLAST Nucleotide Database (nt)

Re-evaluating the BLAST Nucleotide Database (nt)

The ongoing sequencing revolution has resulted in exponential growth of the NCBI BLAST databases. The default BLAST nucleotide database (nt), the most popular Web BLAST database, is currently 903 billion letters and continues to grow rapidly – doubling in size in the last year. This growth will cause longer search times, reduced capacity, and more delays in updating the database. In the not-too-distant future, searching the entire nt database on the web will no longer be possible unless we modify the database scope and composition.

Because of the above concerns, we want to make the default Web BLAST nucleotide database smaller and more efficient. Some options are to:

    • Change its composition to improve the quality of sequence entries included
    • Take steps to slow its growth rate
    • Divide it into several databases by biological or functional categories

Continue reading “Re-evaluating the BLAST Nucleotide Database (nt)”

RefSeq Release 215

RefSeq Release 215

RefSeq release 215 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of November 7, 2022, and contains 335,372,031 records, including 244,583,657 proteins and sequences from 125,116 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings. Continue reading “RefSeq Release 215”

Prokaryotic phylum name changes coming soon!

Prokaryotic phylum name changes coming soon!

Beginning in the first week of January 2023, NCBI Taxonomy will initiate changes to prokaryote phylum names in accordance with the recent inclusion of rank ‘phylum’ in the International Code of Nomenclature for Prokaryotes (ICNP). We first announced this update that involves changes to 42 NCBI taxa about a year ago. We will change several names that have long been in use (e.g., Firmicutes, Proteobacteria) to newly formalized names (e.g., Bacillota, Pseudomonadota) that may be unfamiliar to some.

You will still see the previous names on records and can search using them, but they will not be displayed as prominently as before. The organism names on Entrez records will not change (e.g., Bacillus subtilis). However, we will update the phylum names on the displayed lineages for ~276 million records (see an example in Figure 1 below). Continue reading “Prokaryotic phylum name changes coming soon!”

New and improved SciENcv experience starting January 2023!

New and improved SciENcv experience starting January 2023!

Science Experts Network Curriculum Vitae (SciENcv) is an electronic system that helps you assemble professional information needed to apply for federal grant applications. Starting January 2023, we will be introducing a new and improved SciENcv experience!

SciENcv helps you gather and compile information on expertise, employment, education, and professional accomplishments. You can use SciENcv to create and maintain financial documents and biosketches that are submitted with grant applications.

Why should I use SciENcv?

  • Eliminates the need to repeatedly enter biosketch and financial document information
  • Reduces the administrative burden associated with federal grant submission and reporting requirements
  • Allows you to describe your scientific contributions in your own words

Continue reading “New and improved SciENcv experience starting January 2023!”

Submit your data to dbGaP in 3 easy steps!

Submit your data to dbGaP in 3 easy steps!

Do you have human genetic data from a large-scale study? Submit your data to NCBI’s Database of Genotypes and Phenotypes (dbGaP) to contribute to meaningful discoveries about health. dbGaP contains data from more than 2.8 million study participants who have provided over 3.3 million molecular samples.

How do I submit data to dbGaP?

Step 1: Register your study

Step 2: Submit your data and get your study accession (phs#)

Step 3: Release your data

Continue reading “Submit your data to dbGaP in 3 easy steps!”

CCDS Release 24

CCDS Release 24

An updated dataset of human protein-coding regions from the Consensus Coding Sequence (CCDS) collaboration

Are you interested in a set of high-quality human coding regions (CDS) with equivalent annotation in NCBI’s RefSeq and EMBL-EBI’s (European Molecular Biology Laboratories-European Bioinformatics Institute) Ensembl annotations? Check out the new CCDS Release 24! This CCDS set was generated by comparing RefSeq Annotation Release 110 and Ensembl Release 108.

This update adds 2,746 new CCDS IDs and 237 new genes compared to the last human CCDS build (Release 22, 2018). CCDS Release 24 includes a total of 35,608 CCDS IDs that correspond to 19,107 genes, with 48,062 protein sequences from RefSeq and 47,762 from Ensembl.

The new CCDS release is available on FTP for bulk download and on the CCDS webpage in case you are looking for data on individual genes. Continue reading “CCDS Release 24”

New annotations in RefSeq!

New annotations in RefSeq!

In August and September, the NCBI Eukaryotic Genome Annotation Pipeline released thirty-eight new annotations in RefSeq for the following organisms:

  • Adelges cooleyi (spruce gall adelgid)
  • Aethina tumida (small hive beetle)
  • Anopheles aquasalis (mosquito)
  • Anopheles maculipalpis (mosquito)
  • Anthonomus grandis grandis (boll weevil)
  • Aphis gossypii (cotton aphid)
  • Bactrocera neohumeralis (fly)
  • Bombus affinis (bee)
  • Bombus huntii (bee)
  • Cataglyphis hispanica (ant)
  • Cygnus atratus (black swan) (pictured) Continue reading “New annotations in RefSeq!”
dbGaP: Data and analyses from millions of study participants, samples, and trillions of genotypes!

dbGaP: Data and analyses from millions of study participants, samples, and trillions of genotypes!

Are you familiar with the well-known Framingham Heart Study, a multi-generation study of residents of Framingham, Massachusetts begun in 1948? Much of what is now known about the impact of genetics, lifestyle, and diet on cardiovascular health and disease has come from this research study. (See PMC4159698  for a historical perspective.) Did you know that data from this study and over 2,000 other studies that demonstrate the relationship between genetic and medical outcomes and other phenotypes are available from NCBI’s Database of Genotypes and Phenotypes (dbGaP)?

dbGaP was established in 2007 as a repository of human data from large scale studies. You can access data from more than 2.8 million study participants who have provided over 3.3 million molecular samples. You can retrieve patient-level phenotypic (e.g., demographic, clinical, exposure) data and molecular (e.g., called genotypes omics, sequence) data, and the results of association analyses from genome-scale case-control and longitudinal studies of heritable diseases.

What types of studies and data are available in dbGaP?

dbGaP contains a wide range of studies and types of data, all relating to human genetic and phenotypic measurements. Most dbGaP data are from NIH-funded research, but recently we have expanded to include non-NIH funded studies. An easy way to find dbGaP Studies, Phenotype and Molecular Datasets, Variables, Analyses and Documents is through the dbGaP Advanced Search (Figure 1). The interface allows you to filter results by different characteristics depending on the tab you choose.

Figure 1. The dbGaP Advanced Search interface. Tabs that appear at the top of the web interface allow you to select the studies, datasets, analyses, etc. of interest. Filters (facets) appear on the left (see inset). Click on filters to select values to find Links on the study summary pages provide direct access to data. Top panel:  Studies tab and the corresponding filter categories.  Bottom panel: Molecular data tab results with Study (Framingham SHARe), Markerset Source (Affymetrix) filters applied. 

Continue reading “dbGaP: Data and analyses from millions of study participants, samples, and trillions of genotypes!”