You can now download human annotation release 109 on FTP or explore it in the Genome Data Viewer, in the Gene database, and with BLAST.
Highlights in release 109:
- A total of 20,203 protein-coding genes and 17,871 non-coding genes were annotated.
- The number of annotated curated transcripts increased by 17% and genes with two or more curated alternative variants increased by 8%.
- The annotation includes 6,862 features and 2,075 GeneIDs for non-genic functional elements, such as regulatory regions and known structural elements. For example, see the opsin locus control region (OPSIN-LCR).
In July, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:
- Papio anubis (olive baboon)
- Prunus avium (sweet cherry)
- Aedes aegypti (yellow fever mosquito)
- Chenopodium quinoa (quinoa)
- Hevea brasiliensis (a eudicot)
- Manihot esculenta (cassava)
- Carlito syrichta (Philippine tarsier)
Papio anubis (olive or anubis baboon)
Source: United States Fish and Wildlife Service: Digital Library System
See more details on the Eukaryotic RefSeq Genome Annotation Status page.
In June, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms, including Danio rerio (zebrafish):
Annotation Release 101 for the bottlenose dolphin (Tursiops truncatus) is out in RefSeq! This annotation was based on the NIST Tur_tru v1 assembly, which has a four-fold increase in contiguity from the assembly used in the previous annotation. Over four billion RNA-Seq reads from skin and blood tissue were used for gene prediction. As a result of these improvements, the percent of partially-represented protein-coding genes went down from 24% to 4%. Over 2500 genes that were fragmented in the previous assembly were merged into complete genes. A total of 24,026 genes were annotated, and 17,096 of them were protein-coding. A full report on the annotation can be found here.
In an earlier blog post, we discussed how sequence updates in GRCh38, the most recent version of the human reference genome, filled in a gap in human chromosome 17 near position 21,300K and expanded the region by 500K (500,000 base pairs). In this post, we will again consider this same region, but with an emphasis now on how GRCh38 also improved the gene annotation.
Figure 1. Annotation of a region of chromosome 17 near the KCNJ12 and KCNJ18 genes. Top panel: Annotation release 105 on GRCh37.p13 represented by a configured graphic display of sequence record NC_000017.10. Bottom panel: Annotation release 106 on assembly GRCh38 represented by a configured graphic display of sequence record NC_000017.11. New gene models are circled.