Are you interested in more functional information about protein-coding genes? We’ve expanded NCBI RefSeq’s Eukaryote Genome Annotation Pipeline (EGAP) to include Gene Ontology (GO) terms computed for most protein-coding genes. We are using the latest version of InterProScan, which now includes analysis based on PANTHER reference trees, on all NCBI RefSeq eukaryotic genomes. That means having comprehensive GO data with inferred biological process, molecular function, and cellular component terms matched with high-quality RefSeq annotations across hundreds of taxa to help drive your research. The data is available on individual records in NCBI’s Gene resource, NCBI Gene FTP, or in community standard .gaf formatted files with each RefSeq genome release on our FTP site.
Features & Benefits
GO terms allow you to quickly and easily:
- Understand the molecular function, biological processes, and cellular components associated with these protein-coding genes.
- Compare genes across different species and understand the functional relationships between different genes within the same or different organisms.
- Identify potential protein-protein interaction partners and metabolic pathways that involve these genes.
- Find genes that contribute to cellular processes, such as metabolism, transport, and signaling.
GO data is already available for over 450 species, including 280 vertebrates, and we’re working on adding data for most of the over 1,750 eukaryote species in the NCBI RefSeq collection. Data for human, mouse, rat, fly, and other core model organisms are imported from community sources provided through the Gene Ontology Consortium.
Figure 1: GO terms added to a Daphnia carinata Gene record (LOC132088345 nonsense-mediated mRNA decay factor SMG7-like, Gene ID: 132088345). Top panel: GO terms section of the Gene page; Bottom panel: excerpt from the GCF_022539665.2-RS_2023_09_gene_ontology.gaf file, available from the assembly FTP directory, showing GO terms for the same gene.
Stay up to date
This new feature was developed as part of the NIH Comparative Genomics Resource (CGR). CGR facilitates reliable comparative genomics analyses for all eukaryotic organisms through an NCBI Toolkit and community collaboration.
Join our mailing list to keep up to date with RefSeq and other CGR news.
If you have questions or would like to provide feedback, please write to our help desk.