On Wednesday, December 11, 2019 at 12 PM, NCBI staff will present a webinar that will show you how to use NCBI’s PGAP (https://github.com/ncbi/pgap) on your own data to predict genes on bacterial and archaeal genomes using the same inputs and applications used inside NCBI. You can run PGAP your own machine, a compute farm, or in the Cloud. Plus, you can now submit genome sequences annotated by your copy of PGAP to GenBank. Attend the webinar to learn more!
Date and time: Wed, Dec 11, 2019 12:00 PM – 12:45 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
Are you interested in high quality genomic annotations for human and mouse? Check out the Consensus Coding Sequence (CCDS) project! Release 23 of the CCDS project is now available in Entrez Gene. This release compares NCBI’s Mus musculus annotation release 108 to Ensembl’s annotation release 98. This update adds 1,570 new CCDS records and 175 genes to the mouse CCDS dataset. In total, release 23 includes 27,219 CCDS records that correspond to 20,486 genes.
You can now access RefSeq release 96 online, from the FTP site, and through NCBI’s Entrez programming utilities (E-utilities).
This full release incorporates genomic, transcript, and protein data available, as of September 9, 2019 and contains 213,863,503 records, including 152,910,397 proteins, 28,017,380 RNAs, and sequences from 94,946 organisms.
The release is provided as a complete dataset and also in several directories divided by logical groupings.
1. New Mus musculus (house mouse) Annotation Release 108
The latest annotation run for Mus musculus, 108, is a complete re-annotation of the mouse GRCm38.p6 assembly that incorporates ongoing curation work and new computed models based on extensive long-read transcriptome data.
See the annotation report for details. You can access these annotation products through the sequence databases and on the FTP site.
2. Updated Homo sapiens Annotation Release 109.20190905
Annotation Release 109.20190905 is an update of NCBI Homo sapiens Annotation Release 109. The annotation report has details. You can access the annotation products from the sequence databases or download the data from the FTP site. We will continue to update the human genome annotation frequently so that we can
incorporate ongoing curation work including the MANE project and other curation activities. See our post on the increased frequency of annotation for more information on the new schedule.
3. dbSNP Human Build 153
The short variations (SNPs) annotated on human RefSeq transcripts and RefSeqGene records now incorporate data from dbSNP build 153.
NCBI announces Annotation Release 100 of the Pacific white shrimp (Penaeus vannamei) genome in RefSeq, based on the assembly (GCF_003789085.1) submitted by the Institute of Oceanology, Chinese Academy of Sciences. The Pacific white shrimp is one of the most important shrimp species in fisheries and aquaculture and represents the first decapod to have its genome annotated by NCBI. We predicted 24,987 protein coding genes with evidence from alignment of six billion RNA-Seq reads and homology with invertebrate proteins. This annotation will enable genomic research in this commercially important species.
A total of 20,203 protein-coding genes and 17,871 non-coding genes were annotated.
The number of annotated curated transcripts increased by 17% and genes with two or more curated alternative variants increased by 8%.
The annotation includes 6,862 features and 2,075 GeneIDs for non-genic functional elements, such as regulatory regions and known structural elements. For example, see the opsin locus control region (OPSIN-LCR).
In an earlier blog post, we discussed how sequence updates in GRCh38, the most recent version of the human reference genome, filled in a gap in human chromosome 17 near position 21,300K and expanded the region by 500K (500,000 base pairs). In this post, we will again consider this same region, but with an emphasis now on how GRCh38 also improved the gene annotation.
Figure 1. Annotation of a region of chromosome 17 near the KCNJ12 and KCNJ18 genes. Top panel: Annotation release 105 on GRCh37.p13 represented by a configured graphic display of sequence record NC_000017.10. Bottom panel: Annotation release 106 on assembly GRCh38 represented by a configured graphic display of sequence record NC_000017.11. New gene models are circled.