Tag: RefSeq Functional Elements

The latest in COVID-19 related human gene annotation now in NCBI RefSeq and Gene

Interested in human genes involved in COVID-19 biology? NCBI’s RefSeq group has been hard at work compiling a set of human genes with roles in coronavirus infection and disease. You can now see and search for these genes and their regulatory elements in NCBI Gene and RefSeq.

Figure 1. Top section of the human ACE2 record in the Gene database. COVID-19 information can be found in the Summary and Annotation information sections.

Continue reading “The latest in COVID-19 related human gene annotation now in NCBI RefSeq and Gene”

RefSeq Release 202 is public

RefSeq release 202 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of September 8, 2020, and contains 255,571,455 records, including 186,755,483 proteins, 33,077,068 RNAs, and sequences from 104,969  organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Updated human genome Annotation Release 109.20200815
Updated Annotation Release 109.2020815 is an update of NCBI Homo sapiens Annotation Release 109. The annotation report is available here.

The annotation products are available in the sequence databases and on the FTP site.

This update includes around 15,000 updated RefSeq transcripts revised to use CAGE and polyA data to define 5′ and 3′ ends, and match the reference GRCh38 sequence.

Coronavirus host gene regulatory elements now annotated by RefSeq Functional Elements
The RefSeq Functional Elements project at NCBI has prioritized curation of experimentally validated regulatory elements for human host genes associated with SARS-CoV-2 entry into cells. The annotations include several enhancers, promoters, cis-regulatory elements and protein binding sites, among other feature types. We annotated 236 regulatory features for 27 distinct biological regions, including regulatory elements for the ABO, ACE2, ANPEP, CD209, CLEC4G, CLEC4M, CTSL, DPP4, and TMPRSS2 genes. More information can be found here.

New eukaryotic genome annotations
This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 27 species, including:

  • maize annotation release 103, based on the new assembly Zm-B73-REFERENCE-NAM-5.0 (GCF_902167145.1)
  • marmoset annotation release 105, based on the new assembly Callithrix_jacchus_cj1700_1.1 (GCF_009663435.1)
  • Chinese hamster annotation release 104, based on the assembly CriGri_1.0 (GCF_000223135.1) and the new assembly CriGri-PICRH-1.0 (GCF_003668045.3)
  • Asian giant hornet annotation release 100, based on the new assembly V.mandarinia_Nanaimo_p1.0 (GCF_014083535.2)
  • Florida lancelet annotation release 100, based on the new assembly Bfl_VNyyK (GCF_000003815.2)
  • Anopheles stephensi annotation release 100, based on the new assembly UCI_ANSTEP_V1.0 (GCF_013141755.1)

Updated and improved collection of RefSeq representative genome assemblies now available
The collection of representative genome assemblies for Bacteria and Archaea contains 11,727 prokaryotic assemblies to represent their respective species. More information can be found here.

Updated protein family models used by PGAP available for download
Release 3.0 of the NCBI protein family models used by the Prokaryotic Genome Annotation Pipeline (PGAP) is now available.

This release contains 17,350 models: 12,864 HMMs built at NCBI (111 more than in release 2.0) and 4,486 TIGRFAM HMMs. In addition, since release 2.0, we have assigned product names to over 2,000 Pfam HMMs, bringing the total to 6,698 Pfam HMMs with names that can be transferred by PGAP to the annotated proteins they hit. More information can be found here.

Future change: Mouse Reference Assembly Update
RefSeq annotation of the new mouse GRCm39 assembly is in progress, and is expected to be included in the next release.

Coronavirus host gene regulatory elements now annotated by RefSeq Functional Elements

The COVID-19 pandemic has drawn attention to the human host genes associated with SARS-CoV-2 entry and to the elements that regulate expression of these genes. At NCBI, we have prioritized curation of experimentally validated regulatory elements for these genes in the RefSeq Functional Elements project. Our annotations include several enhancers, promoters, cis-regulatory elements and protein binding sites, among other feature types.  We have annotated 236 regulatory features for 27 distinct biological regions in the latest human Annotation Release (109.20200522) including regulatory elements for the ABOACE2, ANPEPCD209CLEC4GCLEC4MCTSL, DPP4,and TMPRSS2 genes

You can view our regulatory element to target gene linkages in the regulatory interactions track using our new track hub that we recently announced.  You can also see the biological regions and features tracks. These have functional and descriptive metadata, including biological region summaries, experimental evidence types, publication support and more.

The example in Figure 1 shows RefSeq Functional Element feature annotation in NCBI’s Genome Data Viewer (GDV) for the ABO gene region (GRCh38, NW_009646201.1: 73,864-103,789) the determiner of the human ABO blood group. A genome-wide association study recently identified non-coding  ABO variants associated with COVID-19 disease severity (PMID:32558485), which map to some of the RefSeq Functional Elements in this region.ABO region showing biological regions in GDVFigure 1. The human ABO gene region in the NCBI GDV displaying the RefSeq Functional Element features.  The biological regions aggregate track shows underlying feature annotation for an ABO upstream enhancer (LOC112637023),  promoter region (LOC112679202),  +5.8 intron 1 enhancer (LOC112679198),  a 3′ regulatory region (LOC112639999), and a +36.0 downstream enhancer (LOC112637025).  Functional Element features include numerous enhancers, promoters, cis-regulatory elements and protein / transcription factor binding sites.

We have more information about RefSeq Functional Elements on our website, including data download and extraction options. Stay tuned to NCBI Insights and other NCBI social media for future announcements about RefSeq Functional Elements!

New interaction data, downloads and track hub available for RefSeq Functional Elements 

We’ve added several new enhancements to the RefSeq Functional Elements dataset, which provides genome annotation and richly annotated RefSeq and Gene records for experimentally validated non-genic functional regions in human and mouse. Read on to see what we’ve done!

Continue reading “New interaction data, downloads and track hub available for RefSeq Functional Elements “

Major update for the NCBI RefSeq mouse GRCm38.p6 annotation

We have updated our annotation for the mouse reference genome, GRCm38.p6. It includes:

  • Markup for RefSeq Select, which identifies one representative transcript and protein for every protein-coding gene. Find features with the ‘tag=RefSeq Select’ attribute in GFF3 for those analyses where you need just a single transcript or protein for each coding gene. You can also find these RefSeqs in Entrez using the query ‘refseq_select[filter].’
  • Annotation updates made in the last year for over 2000 genes, including over 4000 new or revised curated transcripts. This includes targeted curation to ensure we are representing well-expressed and conserved transcripts for inclusion in RefSeq Select.
  • Annotation of over 2300 regulatory and other functional element features from over 900 biological regions. These are now identified with the source “RefSeqFE” in GFF3 column 2 for easy parsing.

When citing, please refer to this annotation as NCBI Mus musculus Annotation Release 108.20200622. You can find the data in:

This is our last update before upgrading to the new major assembly version just released by the Genome Reference Consortium, GRCm39. We expect to be cranking up our compute farm in the next few weeks to produce a full annotation based on our latest curation and extensive short (Illumina) and long (PacBio IsoSeq and nanopore) RNA-seq data, which should be released later this summer. Stay tuned!

October 11 NCBI Minute: Introducing the New RefSeq Functional Elements Project

October 11 NCBI Minute: Introducing the New RefSeq Functional Elements Project

On October 11, 2017, NCBI will present a webinar on RefSeq Functional Elements. This NCBI Minute will introduce you to this project and its scope, describe how functional elements are curated and displayed, demonstrate how to access the data, and provide information on the current progress of the project.

Date and time: Wed, Oct 11, 2017 12:00 PM – 12:30 PM EDT

After registering, you will receive a confirmation email with information about attending the webinar. After the live presentation, the webinar will be uploaded to the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

The new RefSeq Functional Elements project is an expansion of the NCBI RefSeq project to include non-genic functional genomic regions in human and mouse that have been experimentally validated and described in the scientific literature.

RefSeq Functional Elements now public

RefSeq Functional Elements now public

NCBI is pleased to announce the initial data release of RefSeq Functional Elements, a resource that provides RefSeq and Gene records for experimentally validated human and mouse non-genic functional elements. Data can be accessed via GeneNucleotideBLASTBioProjectGraphical Displays and FTP.

Continue reading “RefSeq Functional Elements now public”