A total of 20,203 protein-coding genes and 17,871 non-coding genes were annotated.
The number of annotated curated transcripts increased by 17% and genes with two or more curated alternative variants increased by 8%.
The annotation includes 6,862 features and 2,075 GeneIDs for non-genic functional elements, such as regulatory regions and known structural elements. For example, see the opsin locus control region (OPSIN-LCR).
The NCBI Eukaryotic Genome Annotation Pipeline now includes the prediction of more non-coding RNAs. Starting with software release 8.0, rRNAs, snRNAs and snoRNAs are predicted by searching eukaryotic genomes with HMM models from RFAM. Below is an example of a rRNA cassette predicted in maize Annotation Release 102. These new small RNA types come in addition to the miRNAs and tRNAs that have long been annotated by the pipeline.
Fig.1: rRNA cassette on maize scaffold NW_017972167.1 of assembly B73 RefGen_v4. The top track displays the annotated 18S, 5.8S and 28S rRNA subunits in Annotation Release 102. These three genes were missing from the previous annotation, and replaced incorrect non-coding gene predictions (see Annotation Release 101, middle track). The bottom track shows the repeats identified by RepeatMasker. The boundaries of the rRNA repeats match precisely the predicted 18S and 28S rRNA genes.
See what we are annotating now on the Eukaryotic RefSeq Genome Annotation Status page.