Join us March 14-18 in Salt Lake City, Utah
We are excited to celebrate ClinVar’s 10th anniversary and look forward to seeing you in-person at the 2023 ACMG Annual Clinical Genetics Meeting, March 14-18, 2023, in Salt Lake City, Utah. We will participate in a variety of events and activities featuring our clinical and human genetic resources.
Check out NCBI’s schedule:
Continue reading “NCBI at ACMG 2023” →
An updated dataset of human protein-coding regions from the Consensus Coding Sequence (CCDS) collaboration
Are you interested in a set of high-quality human coding regions (CDS) with equivalent annotation in NCBI’s RefSeq and EMBL-EBI’s (European Molecular Biology Laboratories-European Bioinformatics Institute) Ensembl annotations? Check out the new CCDS Release 24! This CCDS set was generated by comparing RefSeq Annotation Release 110 and Ensembl Release 108.
This update adds 2,746 new CCDS IDs and 237 new genes compared to the last human CCDS build (Release 22, 2018). CCDS Release 24 includes a total of 35,608 CCDS IDs that correspond to 19,107 genes, with 48,062 protein sequences from RefSeq and 47,762 from Ensembl.
The new CCDS release is available on FTP for bulk download and on the CCDS webpage in case you are looking for data on individual genes. Continue reading “CCDS Release 24” →
Join us October 25-29 in Los Angeles, CA
We are looking forward to seeing you in-person at the American Society of Human Genetics (ASHG) annual meeting, October 25-29, 2022, in Los Angeles, California.
We will present a variety of talks and posters featuring our clinical and human genetic resources, as well as genome products and tools. We are excited to introduce the NIH Comparative Genomics Resource (CGR), a multi-year National Library of Medicine (NLM) project to maximize the impact of eukaryotic research organisms and their genomic data resources to biomedical research. If you’re interested in providing feedback that will be used to help drive CGR forward, consider joining our round table discussion.
Check out NCBI’s schedule of activities and events:
Continue reading “Connect with NCBI at ASHG 2022” →
The annotation of human assemblies GRCh38.p14 and T2T-CHM13v2.0
We are happy to announce the first de novo annotation of human T2T-CHM13v2.0, the gap-less assembly generated by the T2T Consortium, and the full re-annotation of the human reference assembly, GRCh38.p14. We hope the results will serve both the needs of those eager to explore newly sequenced regions of the genome, including telomeres and centromeres, and those interested in refreshing their interpretation of the human reference, in light of recently curated transcripts and new transcriptomic and other data incorporated in the annotation. Continue reading “Announcing Human Annotation Release 110” →
NCBI and EBI have been hard at work on our joint MANE collaboration, providing a set of representative transcripts for human protein-coding genes that are identically annotated in the NCBI RefSeq and Ensembl/GENCODE annotation sets and exactly match the GRCh38 reference assembly. We’re pleased to announce MANE v0.92, now covering 16,865 genes or ~88% of known human protein-coding genes.
In particular, we’ve focused on clinically relevant genes and MANE Select now includes 99% of genes with high gene-disease validity. This release also includes 43 extra transcripts labeled “MANE Plus Clinical” that we’ve chosen to aid in clinical reporting, for example, when there are additional pathogenic variants not covered in the MANE Select transcript. While it’s critical to consider other alternatively-spliced transcripts for variant interpretation or functional analyses, the MANE Select and MANE Plus Clinical transcripts provide a common foundation for clinical reporting, and other analyses that benefit from using just one well-supported transcript or protein per gene.
Continue reading “NCBI RefSeq and Ensembl/GENCODE taking MANE mainstream with v0.92!” →
You can now access RefSeq release 96 online, from the FTP site, and through NCBI’s Entrez programming utilities (E-utilities).
This full release incorporates genomic, transcript, and protein data available, as of September 9, 2019 and contains 213,863,503 records, including 152,910,397 proteins, 28,017,380 RNAs, and sequences from 94,946 organisms.
The release is provided as a complete dataset and also in several directories divided by logical groupings.
1. New Mus musculus (house mouse) Annotation Release 108
The latest annotation run for Mus musculus, 108, is a complete re-annotation of the mouse GRCm38.p6 assembly that incorporates ongoing curation work and new computed models based on extensive long-read transcriptome data.
See the annotation report for details. You can access these annotation products through the sequence databases and on the FTP site.
2. Updated Homo sapiens Annotation Release 109.20190905
Annotation Release 109.20190905 is an update of NCBI Homo sapiens Annotation Release 109. The annotation report has details. You can access the annotation products from the sequence databases or download the data from the FTP site. We will continue to update the human genome annotation frequently so that we can
incorporate ongoing curation work including the MANE project and other curation activities. See our post on the increased frequency of annotation for more information on the new schedule.
3. dbSNP Human Build 153
The short variations (SNPs) annotated on human RefSeq transcripts and RefSeqGene records now incorporate data from dbSNP build 153.
RefSeq release 95 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.
This full release incorporates genomic, transcript, and protein data available, as of July 8, 2019 and contains 206,416,381 records, including 146,381,777 proteins, 27,212,750 RNAs, and sequences from 93,618 organisms.
Continue reading “RefSeq release 95: naming evidence added to all relevant WP proteins” →
RefSeq release 94 is now available through NCBI web services, FTP and through NCBI’s Entrez programming utilities, E-utilities.
This full release incorporates genomic, transcript, and protein data available, as of May 13, 2019 and contains 200,311,267 records, including 141,839,334 proteins, 26,534,602 RNAs, and sequences from 91,873 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.
Continue reading “RefSeq release 94 with MANE and RefSeq Select markup, protein name evidence, and improved [Candida] auris assembly” →
In October last year, we announced the launch of an exciting new collaboration between NCBI and EMBL-EBI called MANE (Matched Annotation from the NCBI and EMBL-EBI). As a first step, we began generating the MANE Select set, comprising a matched representative transcript for every human protein-coding gene. Now that our genome resources are integrated into a high-quality transcript set, you don’t need to choose between RefSeq and Ensembl/GENCODE datasets for genomic analyses.
Not only does the MANE Select set make it easier for you to exchange data or translate coordinates between RefSeq and Ensembl annotation results, but you’ll also be able to use the set with NGS-based sequencing technologies and other resources that use the latest and highest-quality reference human genome assembly available.
Continue reading “MANE Select v0.5 is now available!” →