NCBI and EBI have been hard at work on our joint MANE collaboration, providing a set of representative transcripts for human protein-coding genes that are identically annotated in the NCBI RefSeq and Ensembl/GENCODE annotation sets and exactly match the GRCh38 reference assembly. We’re pleased to announce MANE v0.92, now covering 16,865 genes or ~88% of known human protein-coding genes.
In particular, we’ve focused on clinically relevant genes and MANE Select now includes 99% of genes with high gene-disease validity. This release also includes 43 extra transcripts labeled “MANE Plus Clinical” that we’ve chosen to aid in clinical reporting, for example, when there are additional pathogenic variants not covered in the MANE Select transcript. While it’s critical to consider other alternatively-spliced transcripts for variant interpretation or functional analyses, the MANE Select and MANE Plus Clinical transcripts provide a common foundation for clinical reporting, and other analyses that benefit from using just one well-supported transcript or protein per gene.
Continue reading “NCBI RefSeq and Ensembl/GENCODE taking MANE mainstream with v0.92!”
You can now access RefSeq release 96 online, from the FTP site, and through NCBI’s Entrez programming utilities (E-utilities).
This full release incorporates genomic, transcript, and protein data available, as of September 9, 2019 and contains 213,863,503 records, including 152,910,397 proteins, 28,017,380 RNAs, and sequences from 94,946 organisms.
The release is provided as a complete dataset and also in several directories divided by logical groupings.
1. New Mus musculus (house mouse) Annotation Release 108
The latest annotation run for Mus musculus, 108, is a complete re-annotation of the mouse GRCm38.p6 assembly that incorporates ongoing curation work and new computed models based on extensive long-read transcriptome data.
See the annotation report for details. You can access these annotation products through the sequence databases and on the FTP site.
2. Updated Homo sapiens Annotation Release 109.20190905
Annotation Release 109.20190905 is an update of NCBI Homo sapiens Annotation Release 109. The annotation report has details. You can access the annotation products from the sequence databases or download the data from the FTP site. We will continue to update the human genome annotation frequently so that we can
incorporate ongoing curation work including the MANE project and other curation activities. See our post on the increased frequency of annotation for more information on the new schedule.
3. dbSNP Human Build 153
The short variations (SNPs) annotated on human RefSeq transcripts and RefSeqGene records now incorporate data from dbSNP build 153.
RefSeq release 95 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.
This full release incorporates genomic, transcript, and protein data available, as of July 8, 2019 and contains 206,416,381 records, including 146,381,777 proteins, 27,212,750 RNAs, and sequences from 93,618 organisms.
Continue reading “RefSeq release 95: naming evidence added to all relevant WP proteins”
RefSeq release 94 is now available through NCBI web services, FTP and through NCBI’s Entrez programming utilities, E-utilities.
This full release incorporates genomic, transcript, and protein data available, as of May 13, 2019 and contains 200,311,267 records, including 141,839,334 proteins, 26,534,602 RNAs, and sequences from 91,873 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.
Continue reading “RefSeq release 94 with MANE and RefSeq Select markup, protein name evidence, and improved [Candida] auris assembly”
In October last year, we announced the launch of an exciting new collaboration between NCBI and EMBL-EBI called MANE (Matched Annotation from the NCBI and EMBL-EBI). As a first step, we began generating the MANE Select set, comprising a matched representative transcript for every human protein-coding gene. Now that our genome resources are integrated into a high-quality transcript set, you don’t need to choose between RefSeq and Ensembl/GENCODE datasets for genomic analyses.
Not only does the MANE Select set make it easier for you to exchange data or translate coordinates between RefSeq and Ensembl annotation results, but you’ll also be able to use the set with NGS-based sequencing technologies and other resources that use the latest and highest-quality reference human genome assembly available.
Continue reading “MANE Select v0.5 is now available!”