Tag: RefSeq

Comparing Yeast Species Used in Beer Brewing and Bread Making

Comparing Yeast Species Used in Beer Brewing and Bread Making

Using the NIH Comparative Genomics Resource (CGR) to gain knowledge about less-researched organisms 

The scientific community relies heavily on model organism research to gain knowledge and make discoveries. However, focusing solely on these species misses valuable variation. Comparative genomics allows us to use knowledge from a model species, such as Saccharomyces cerevisiae, to understand traits in other, related organisms, such as Saccharomyces pastorianus or Saccharomyces eubayanus. Applying this information may provide valuable insight for other less-researched organisms. The National Institutes of Health (NIH) Comparative Genomics Resource (CGR) offers a cutting-edge NCBI toolkit of high-quality genomics data and tools to help you do just that.  Continue reading “Comparing Yeast Species Used in Beer Brewing and Bread Making”

RefSeq Release 220

RefSeq Release 220

RefSeq release 220 is now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.

What’s included in this release?

As of September 5, 2023, this full release incorporates genomic, transcript, and protein data containing:

  • 391,350,361 records
  • 289,333,423 proteins
  • 56,423,426 RNAs
  • sequences from 141,099 organisms 

Continue reading “RefSeq Release 220”

Now Available! Updated Bacterial and Archaeal Reference Genomes Collection

Now Available! Updated Bacterial and Archaeal Reference Genomes Collection

An updated bacterial and archaeal reference genome collection is available! This collection of 18,343 genomes was built by selecting exactly one genome assembly for each species among the 312,000+ prokaryotic genomes in RefSeq, except for E. coli for which two assemblies were selected as reference.

The criteria for selecting the reference assembly for a given species include assembly contiguity and completeness and quality of the RefSeq annotation. 

What’s new?
  • 790 species were added to the collection
  • 199 species are represented by a better assembly (compared to the April 2023 release)
  • 70 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment 

Continue reading “Now Available! Updated Bacterial and Archaeal Reference Genomes Collection”

New Annotations in RefSeq!

New Annotations in RefSeq!

In April, May, and June, the NCBI Eukaryotic Genome Annotation Pipeline released eighty-two new annotations in RefSeq!

Highlights:

  • Homo sapiens (human) T2T-CHM13v2.0 now includes many more alternative splice variants
  • Homo sapiens (human) GRCh38.p14 includes all transcripts from MANE v1.2, and includes over 78,000 new RefSeq Functional Element (RefSeqFE) features added since our last annotation in 2022
  • Mus musculus (house mouse) GRCm39 integrates curation for over 3,000 genes and 14,000 transcripts since September 2020
  • Rattus norvegicus (Norway rat) mRatBN7.2, including curation of over 5000 genes since our last annotation in 2021

New annotations: Continue reading “New Annotations in RefSeq!”

RefSeq Release 219

RefSeq Release 219

RefSeq release 219 is now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.

What’s included in this release?

As of July 18, 2023, this full release incorporates genomic, transcript, and protein data containing:

  • 371,291,248 records
  • 3,752,372,037,103 nucleotide bases
  • 106,842,615,422 amino acids
  • sequences from 138,491 organisms

The release is provided in several directories as a complete dataset and divided by logical groupings.

Updates & announcements

Continue reading “RefSeq Release 219”

Now Available! Access to Historical Human Transcript Alignments

Now Available! Access to Historical Human Transcript Alignments

Do you need to work with variant data mapped to historical human RefSeq transcript versions? To make it easier to map your data to the current GRCh38 reference genome and MANE transcripts, we’re now providing a collection of RefSeq transcript alignments including both the latest versions in the GCF_000001405.40-RS_2023_03 annotation release, and older transcripts going back to 1999. The data are available for download from the FTP site.  

Example

As shown in the example below (Image 1), you can view these alignments in the Genome Data Viewer by loading the remote bam track (GCF_00001405-RS_2023_03_knownrefseqs_aln.bam) from the FTP site.   Continue reading “Now Available! Access to Historical Human Transcript Alignments”

Gene Ontology (GO) Terms on 100M+ RefSeq Prokaryotic Protein Sequence Records

Gene Ontology (GO) Terms on 100M+ RefSeq Prokaryotic Protein Sequence Records

Do you work with or study prokaryotic proteins? As previously announced, we’ve been adding Gene Ontology (GO) terms to RefSeq prokaryotic protein sequence records (example below) to standardize the language when describing the functions of genes and their products. Over 100 million RefSeq proteins from prokaryotes now have at least one GO Term, a 55% increase since we started propagating GO terms from Conserved Domains Database (CDD) architectures in March.  Continue reading “Gene Ontology (GO) Terms on 100M+ RefSeq Prokaryotic Protein Sequence Records”

RefSeq Release 218

RefSeq Release 218

RefSeq release 218 is now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.

What’s included in this release?

As of May 1, 2023, this full release incorporates genomic, transcript, and protein data containing:

New Release! Updated Bacterial and Archaeal Reference Genomes Collection Now Available

New Release! Updated Bacterial and Archaeal Reference Genomes Collection Now Available

As previously announced, we are continuously curating a better Prokaryotic Reference Genomes Collection. An updated bacterial and archaeal reference genome collection is now available! This collection of 17,623 genomes was built by selecting exactly one genome assembly for each species among the 283,000+ prokaryotic genomes in RefSeq, except for E. coli for which two assemblies were selected as reference. 

What’s new?
  • 480 species were added to this collection 
  • 178 species are represented by a better assembly 
  • 17 species were removed due to changes in NCBI Taxonomy or uncertainty in their species assignment 

Continue reading “New Release! Updated Bacterial and Archaeal Reference Genomes Collection Now Available”

New annotations in RefSeq!

New annotations in RefSeq!

In February and March, the NCBI Eukaryotic Genome Annotation Pipeline released forty-two new annotations in RefSeq for the organisms listed below. Additionally, interim builds for over sixty species were run during that time period to fix some issues with gene symbol assignment.