Coming Soon! Updates to the ClinVar Website

Coming Soon! Updates to the ClinVar Website

In order to support the inclusion of submitted somatic variation data, we are updating the ClinVar website. In early 2024, you will begin to see some changes.

What will change?

Variant (VCV) record pages will have an updated look and feel:

  • Simpler layout with no tabs
  • New sections will display somatic classifications
    • Summary section will be divided into germline classification and display the two types of classifications for somatic variants – somatic clinical impact and oncogenicity
    • Sections for conditions, submitted records, functional evidence, and citations will be provided for both germline and somatic classifications
    • A toggle will allow you to select and show just information pertaining to germline or somatic data 

Continue reading “Coming Soon! Updates to the ClinVar Website”

Gene Expression Counts on NCBI RefSeq Eukaryotic Genomes

Gene Expression Counts on NCBI RefSeq Eukaryotic Genomes

We’re rolling out exciting new features to NCBI RefSeq’s Eukaryote Genome Annotation Pipeline (EGAP)! Now you can get a better understanding of gene expression observed in different RNA-seq datasets with our newly added gene expression counts. These are determined using featureCounts based on the EGAP-produced RefSeq annotation and the set of RNA-seq runs aligned with the STAR aligner as part of the annotation process.   Continue reading “Gene Expression Counts on NCBI RefSeq Eukaryotic Genomes”

Gene Ontology (GO) Terms for NCBI RefSeq Eukaryotic Genomes

Gene Ontology (GO) Terms for NCBI RefSeq Eukaryotic Genomes

Are you interested in more functional information about protein-coding genes? We’ve expanded NCBI RefSeq’s Eukaryote Genome Annotation Pipeline (EGAP) to include Gene Ontology (GO) terms computed for most protein-coding genes. We are using the latest version of InterProScan, which now includes analysis based on PANTHER reference trees, on all NCBI RefSeq eukaryotic genomes. That means having comprehensive GO data with inferred biological process, molecular function, and cellular component terms matched with high-quality RefSeq annotations across hundreds of taxa to help drive your research. The data is available on individual records in NCBI’s Gene resource, NCBI Gene FTP, or in community standard .gaf formatted files with each RefSeq genome release on our FTP site.  Continue reading “Gene Ontology (GO) Terms for NCBI RefSeq Eukaryotic Genomes”

RefSeq Release 221

RefSeq Release 221

RefSeq release 221 is now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.

What’s included in this release?

As of November 6, 2023, this full release incorporates genomic, transcript, and protein data containing:

  • 404,657,610 records
  • 300,054,945 proteins
  • 57,882,313 RNAs
  • sequences from 143,819 organisms 

Continue reading “RefSeq Release 221”

NCBI Pathogen Detection Plays Key Role in Identification of a Novel Shiga Toxin Subtype

NCBI Pathogen Detection Plays Key Role in Identification of a Novel Shiga Toxin Subtype

Using the Pathogen Detection pipeline, we recently found new Shiga toxin 2 (stx2) subtypes in isolates from the United States collected as part of the Centers for Disease Control and Prevention (CDC) routine disease surveillance. Our pipeline relies on AMRFinderPlus to identify anti-microbial resistance (AMR), stress-resistance, and virulence genes. We screened over 60,000 E. coli and Shigella genomes for Shiga toxin, a factor associated with food borne illness. These analysis results and full AMRFinderPlus results for now over 270,000 E. coli and Shigella genomes are available in the MicroBIGG-E browser. Continue reading “NCBI Pathogen Detection Plays Key Role in Identification of a Novel Shiga Toxin Subtype”

Now Available! Compare NCBI RefSeq and UniProt Datasets

Now Available! Compare NCBI RefSeq and UniProt Datasets

Do you need to compare and combine data based on NCBI RefSeq and UniProt datasets, and aren’t sure which proteins are comparable? For many years, NCBI Gene has provided information about the relationships between RefSeq and UniProt accessions courtesy of data imported from UniProt, but the tremendous growth of both datasets has led to large gaps in the data. We have developed a new process to compare the two datasets, first looking for 100% identical proteins and then checking the remaining sequences for similar matches in related taxa. The result is mapping information now covering over 170 million RefSeq proteins across the tree of life. 

You can find links to related UniProt accessions on individual NCBI Gene records. The entire dataset is available on our FTP site  Continue reading “Now Available! Compare NCBI RefSeq and UniProt Datasets”

GenBank Release 258.0 is Available!

GenBank Release 258.0 is Available!

GenBank release 258.0 (11/2/2023) is now available on the NCBI FTP site. This release has 26.74 trillion bases and 3.85 billion records.

The current release has:

  • 247,777,761 traditional records containing 2,433,391,164,875 base pairs of sequence data
  • 2,775,205,599 WGS records containing 23,600,199,887,231 base pairs of sequence data
  • 701,336,089 bulk-oriented TSA records containing 659,924,904,311 base pairs of sequence data
  • 130,654,568 bulk-oriented TLS records containing 50,868,407,906 base pairs of sequence data 

Continue reading “GenBank Release 258.0 is Available!”

Using NCBI Resources for Genotype-Based Medication Optimization

Using NCBI Resources for Genotype-Based Medication Optimization

NCBI offers a variety of clinical genetic resources to help you research, diagnose, and treat diseases and conditions. You can quickly and easily access our data and tools through the Medical Genetics and Human Variation page of the NCBI website.  

How and why should you use our resources? Consider the example below.

Your patient is a 58-year-old woman who has been diagnosed with Acute Coronary Syndrome, scheduled for an angioplasty, and she will need to take clopidogrel for at least three months. She mentions that her father died of a stroke while taking the drug and is concerned. You look into pharmacogenetic influences on clopidogrel response and use the results of your patient’s genetic test to determine if a change in the prescription is needed.   Continue reading “Using NCBI Resources for Genotype-Based Medication Optimization”

New Annotations in RefSeq!

New Annotations in RefSeq!

In July, August, and September, the NCBI Eukaryotic Genome Annotation Pipeline released fifty-six new annotations in RefSeq!

New Annotations
  • Achroia grisella (moth)
  • Acipenser ruthenus (sterlet)
  • Ahaetulla prasina (snake)
  • Alligator mississippiensis (American alligator)
  • Ammospiza caudacuta (bird)
  • Ammospiza nelsoni (bird)
  • Anopheles bellator (mosquito)
  • Anopheles coustani (mosquito)
  • Anopheles ziemanni (mosquito)
  • Arachis stenosperma (eudicot)
  • Carassius carassius (crucian carp)
  • Centropristis striata (black seabass)
  • Cornus florida (flowering dogwood) (pictured)
  • Corylus avellana (European hazelnut)
  • Corythoichthys intestinalis (scribbled pipefish) Continue reading “New Annotations in RefSeq!”