Tag: Comparative Genomics Resource (CGR)

Gene Ontology (GO) Terms for NCBI RefSeq Eukaryotic Genomes

Gene Ontology (GO) Terms for NCBI RefSeq Eukaryotic Genomes

Are you interested in more functional information about protein-coding genes? We’ve expanded NCBI RefSeq’s Eukaryote Genome Annotation Pipeline (EGAP) to include Gene Ontology (GO) terms computed for most protein-coding genes. We are using the latest version of InterProScan, which now includes analysis based on PANTHER reference trees, on all NCBI RefSeq eukaryotic genomes. That means having comprehensive GO data with inferred biological process, molecular function, and cellular component terms matched with high-quality RefSeq annotations across hundreds of taxa to help drive your research. The data is available on individual records in NCBI’s Gene resource, NCBI Gene FTP, or in community standard .gaf formatted files with each RefSeq genome release on our FTP site.  Continue reading “Gene Ontology (GO) Terms for NCBI RefSeq Eukaryotic Genomes”

RefSeq Release 221

RefSeq Release 221

RefSeq release 221 is now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.

What’s included in this release?

As of November 6, 2023, this full release incorporates genomic, transcript, and protein data containing:

  • 404,657,610 records
  • 300,054,945 proteins
  • 57,882,313 RNAs
  • sequences from 143,819 organisms 

Continue reading “RefSeq Release 221”

Now Available! Compare NCBI RefSeq and UniProt Datasets

Now Available! Compare NCBI RefSeq and UniProt Datasets

Do you need to compare and combine data based on NCBI RefSeq and UniProt datasets, and aren’t sure which proteins are comparable? For many years, NCBI Gene has provided information about the relationships between RefSeq and UniProt accessions courtesy of data imported from UniProt, but the tremendous growth of both datasets has led to large gaps in the data. We have developed a new process to compare the two datasets, first looking for 100% identical proteins and then checking the remaining sequences for similar matches in related taxa. The result is mapping information now covering over 170 million RefSeq proteins across the tree of life. 

You can find links to related UniProt accessions on individual NCBI Gene records. The entire dataset is available on our FTP site  Continue reading “Now Available! Compare NCBI RefSeq and UniProt Datasets”

New Annotations in RefSeq!

New Annotations in RefSeq!

In July, August, and September, the NCBI Eukaryotic Genome Annotation Pipeline released fifty-six new annotations in RefSeq!

New Annotations
  • Achroia grisella (moth)
  • Acipenser ruthenus (sterlet)
  • Ahaetulla prasina (snake)
  • Alligator mississippiensis (American alligator)
  • Ammospiza caudacuta (bird)
  • Ammospiza nelsoni (bird)
  • Anopheles bellator (mosquito)
  • Anopheles coustani (mosquito)
  • Anopheles ziemanni (mosquito)
  • Arachis stenosperma (eudicot)
  • Carassius carassius (crucian carp)
  • Centropristis striata (black seabass)
  • Cornus florida (flowering dogwood) (pictured)
  • Corylus avellana (European hazelnut)
  • Corythoichthys intestinalis (scribbled pipefish) Continue reading “New Annotations in RefSeq!”
NCBI Datasets: Easily Access and Download Sequence Data and Metadata

NCBI Datasets: Easily Access and Download Sequence Data and Metadata

Effective May 2024, NCBI Datasets will replace legacy Genome and Assembly web resources 

As part of our ongoing effort to enhance your experience and modernize our services, NCBI will gradually replace the legacy Genome and Assembly resources with the newly introduced NCBI Datasets resource. NCBI Datasets is a continually evolving platform designed to provide easy and intuitive access to NCBI’s sequence data and metadata. 

  • The legacy Genome and Assembly web resources will no longer be available after May 2024
  • There will be no changes to how you access the databases using E-Utilities or EDirect 

Continue reading “NCBI Datasets: Easily Access and Download Sequence Data and Metadata”

Introducing the New NCBI Datasets Genome Annotation Table

Introducing the New NCBI Datasets Genome Annotation Table

As part of our ongoing effort to modernize and improve your experience, we are excited to introduce the new NCBI Datasets genome annotation table. You can now quickly and easily access annotated gene and protein sequences annotated by NCBI RefSeq or GenBank submitters.  

Features & Benefits
  • Easier than ever to search and download data for annotated genes  
  • Download gene, transcript and protein sequences, and metadata 
  • Annotation tables are available for ~7500 eukaryotic and ~1.5M prokaryotic annotated genomes   
  • Annotation data is now available for both RefSeq and GenBank submitted annotations 
  • Filter by gene type, gene name, and chromosome or location on the genome 

Continue reading “Introducing the New NCBI Datasets Genome Annotation Table”

Comparing Yeast Species Used in Beer Brewing and Bread Making

Comparing Yeast Species Used in Beer Brewing and Bread Making

Using the NIH Comparative Genomics Resource (CGR) to gain knowledge about less-researched organisms 

The scientific community relies heavily on model organism research to gain knowledge and make discoveries. However, focusing solely on these species misses valuable variation. Comparative genomics allows us to use knowledge from a model species, such as Saccharomyces cerevisiae, to understand traits in other, related organisms, such as Saccharomyces pastorianus or Saccharomyces eubayanus. Applying this information may provide valuable insight for other less-researched organisms. The National Institutes of Health (NIH) Comparative Genomics Resource (CGR) offers a cutting-edge NCBI toolkit of high-quality genomics data and tools to help you do just that.  Continue reading “Comparing Yeast Species Used in Beer Brewing and Bread Making”

Join NCBI at ASHG 2023

Join NCBI at ASHG 2023

November 1-5 in Washington, D.C. 

We look forward to seeing you in person at the American Society for Human Genetics Annual Meeting (ASHG 2023), November 1-5, 2023, in Washington, D.C. We will participate in a variety of activities and events including hosting an exhibit booth where you can stop by to meet NCBI experts, ask questions, provide feedback, or just chat! We’re especially excited to share our recent efforts on our clinical and human genetic resources and provide an update on the NIH Comparative Genomics Resource (CGR). 

Check out NCBI’s schedule of activities and events: 

Continue reading “Join NCBI at ASHG 2023”

BLAST ClusteredNR Database is Now Available for Download!

BLAST ClusteredNR Database is Now Available for Download!

Now available! You can download the ClusteredNR protein database, previously only available on the BLAST web application. As recently introduced, our ClusteredNR database allows you to get quicker BLAST results and access to information about the distribution of your hits across a wider range of organisms and evolutionary distances. The package includes the ClusteredNR BLAST database, an SQLite3 database, and several scripts for accessing cluster information and members.  

Features & Benefits
  • Reduced redundancy 
  • Faster searches 
  • More diverse proteins and organisms in your BLAST results 

Continue reading “BLAST ClusteredNR Database is Now Available for Download!”

New Fungal Alignments Available in the Comparative Genome Viewer (CGV)

New Fungal Alignments Available in the Comparative Genome Viewer (CGV)

Recognizing Fungal Disease Awareness Week 

Fungal pathogens are a growing threat to global public health. To promote awareness of this issue, the Centers for Disease Control and Prevention (CDC) has established September 18 -22 as Fungal Disease Awareness Week 

In honor of this week, we’re highlighting whole genome alignments for fungal pathogens that are now available in the Comparative Genome Viewer (CGV) – NCBI’s latest genome visualization tool.  Alignment displays in CGV help you identify rearrangements and differences in genomic structure such as deletions, inversions, and translocations. These differences can be important for understanding genome plasticity, genetic diversity within species (PMC8640552) and the response to environmental stresses such as exposure to anti-fungal drugs (PMC5555451).     Continue reading “New Fungal Alignments Available in the Comparative Genome Viewer (CGV)”