Are you interested in high quality genomic annotations for human and mouse? Check out the Consensus Coding Sequence (CCDS) project! Release 23 of the CCDS project is now available in Entrez Gene. This release compares NCBI’s Mus musculus annotation release 108 to Ensembl’s annotation release 98. This update adds 1,570 new CCDS records and 175 genes to the mouse CCDS dataset. In total, release 23 includes 27,219 CCDS records that correspond to 20,486 genes.
Tag: genome annotation
This full release incorporates genomic, transcript, and protein data available, as of September 9, 2019 and contains 213,863,503 records, including 152,910,397 proteins, 28,017,380 RNAs, and sequences from 94,946 organisms.
The release is provided as a complete dataset and also in several directories divided by logical groupings.
1. New Mus musculus (house mouse) Annotation Release 108
The latest annotation run for Mus musculus, 108, is a complete re-annotation of the mouse GRCm38.p6 assembly that incorporates ongoing curation work and new computed models based on extensive long-read transcriptome data.
See the annotation report for details. You can access these annotation products through the sequence databases and on the FTP site.
2. Updated Homo sapiens Annotation Release 109.20190905
Annotation Release 109.20190905 is an update of NCBI Homo sapiens Annotation Release 109. The annotation report has details. You can access the annotation products from the sequence databases or download the data from the FTP site. We will continue to update the human genome annotation frequently so that we can
incorporate ongoing curation work including the MANE project and other curation activities. See our post on the increased frequency of annotation for more information on the new schedule.
3. dbSNP Human Build 153
NCBI announces Annotation Release 100 of the Pacific white shrimp (Penaeus vannamei) genome in RefSeq, based on the assembly (GCF_003789085.1) submitted by the Institute of Oceanology, Chinese Academy of Sciences. The Pacific white shrimp is one of the most important shrimp species in fisheries and aquaculture and represents the first decapod to have its genome annotated by NCBI. We predicted 24,987 protein coding genes with evidence from alignment of six billion RNA-Seq reads and homology with invertebrate proteins. This annotation will enable genomic research in this commercially important species.
Please visit our Eukaryotic RefSeq Genome Annotation Status page to see more annotations in progress.
Next week, NCBI staff will attend the Plant and Animal Genome (PAG) Conference. We have several activities planned, including 1 booth (#223), 4 workshops, 1 talk and 2 posters.
Read on to learn more about what you can look forward to if you’re attending PAG this year. (Note: The listed times are Pacific time.)
Highlights in release 109:
- A total of 20,203 protein-coding genes and 17,871 non-coding genes were annotated.
- The number of annotated curated transcripts increased by 17% and genes with two or more curated alternative variants increased by 8%.
- The annotation includes 6,862 features and 2,075 GeneIDs for non-genic functional elements, such as regulatory regions and known structural elements. For example, see the opsin locus control region (OPSIN-LCR).
- Papio anubis (olive baboon)
- Prunus avium (sweet cherry)
- Aedes aegypti (yellow fever mosquito)
- Chenopodium quinoa (quinoa)
- Hevea brasiliensis (a eudicot)
- Manihot esculenta (cassava)
- Carlito syrichta (Philippine tarsier)
See more details on the Eukaryotic RefSeq Genome Annotation Status page.
Annotation Release 101 for the bottlenose dolphin (Tursiops truncatus) is out in RefSeq! This annotation was based on the NIST Tur_tru v1 assembly, which has a four-fold increase in contiguity from the assembly used in the previous annotation. Over four billion RNA-Seq reads from skin and blood tissue were used for gene prediction. As a result of these improvements, the percent of partially-represented protein-coding genes went down from 24% to 4%. Over 2500 genes that were fragmented in the previous assembly were merged into complete genes. A total of 24,026 genes were annotated, and 17,096 of them were protein-coding. A full report on the annotation can be found here.
In an earlier blog post, we discussed how sequence updates in GRCh38, the most recent version of the human reference genome, filled in a gap in human chromosome 17 near position 21,300K and expanded the region by 500K (500,000 base pairs). In this post, we will again consider this same region, but with an emphasis now on how GRCh38 also improved the gene annotation.
In late December 2013, the Genome Reference Consortium (GRC) released an updated version of the human reference genome assembly, GRCh38, and submitted these new sequences to GenBank. This is the first time in four years that a new major version of the human genome has become available to the genomics community.
Perhaps you’ve been working on data mapped to the previous assembly (GRCh37) that became available in March 2009, or maybe you are still using an even earlier version, such as NCBI36 from March 2006. Is there a way to reduce the amount of time and effort required to reanalyze your data in the context of the new assembly?
Yes! It’s NCBI’s Genome Remapping Service, or NCBI Remap for short.