Tag: Eukaryotic genome annotation

January-February 2021 RefSeq annotations include dog, fly, rat

Figure 1. This is Tasha, the female boxer used for one of the assemblies annotated for dog (GCF_000002285.5). Image courtesy of the National Human Genome Research Institute.

This January and February, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Benincasa hispida (wax gourd)
  • Canis lupus familiaris (dog)
  • Corvus cornix cornix (hooded crow)
  • Crotalus tigris (tiger rattlesnake)
  • Culex pipiens pallens (northern house mosquito)
  • Dioscorea cayenensis subsp. rotundata (Guinea yam)
  • Drosophila santomea (fly)
  • Drosophila simulans (fly)
  • Drosophila yakuba (fly)
  • Eucalyptus grandis (rose gum)
  • Hibiscus syriacus (Rose-of-Sharon)
  • Hyaena hyaena (striped hyena)
  • Maniola hyperantus (ringlet)
  • Mauremys reevesii (Reeves’s turtle)
  • Nilaparvata lugens (brown planthopper)

Continue reading “January-February 2021 RefSeq annotations include dog, fly, rat”

Improvements to NCBI Assembly

NCBI’s genome Assembly has a number of significant improvements!

Assembly records now have a link to Primer-BLAST making it easy to design primers in the context of a specific eukaryote genome assembly.  Figure 1 shows the Assembly page for the Genome Reference Consortium Mouse Build 39 (GRCm39) with the link to Primer-BLAST.

Figure 1. The Assembly page for the mouse reference genome (GCF_000001635.27). Showing the new Run Primer-BLAST link, which loads the assembly as a database in the Primer-BLAST search (bottom) and the new expandable note sections, Genome-Annotation-Data in this case. 
Continue reading “Improvements to NCBI Assembly”

View intron feature evidence in the Genome Data Viewer and Sequence Viewer

Are you a researcher who works on gene biology and are interested in alternative splice patterns in your gene or genes of interest?  If so, be sure to explore the intron feature evidence available in graphics views of genome assemblies annotated by NCBI. You can view the NCBI evidence used for calling splice variant for genes, add other intron feature evidence tracks, and use new display and filter options that make it easier to interpret the data .

Figure 1. Graphical view of the monoamine oxidase gene (MAOA, MOAB) region on the human X  chromosome showing intron features tracks (‘RNA-seq intron features, aggregate’ and ‘Intropolis RNA-Seq intron features’). Mousing-over an intron feature activates a tooltip that shows details such as the number of reads with the splice site, the location on the chromosome, the length of the intron and the donor and acceptor bases at the splice site. The Intropolis track was added through the search feature of the Configure Tracks menu and configured (bottom menu) so that the features were sorted by strand and filtered so that only features with greater than 500 reads appear.

Continue reading “View intron feature evidence in the Genome Data Viewer and Sequence Viewer”

October-December eukaryotic genome annotations in Refseq

October-December eukaryotic genome annotations in Refseq

Since October, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for a large number of organisms. We’ve separated them by group; click on “details” to see the full list for each.

Mammals

Pedestrians on pedestrian crosswalk. Top view.

  • Artibeus jamaicensis (Jamaican fruit-eating bat)
  • Arvicola amphibius (Eurasian water vole)
  • Balaenoptera musculus (Blue whale)
  • Cebus imitator (Panamanian white-faced capuchin)
  • Chlorocebus sabaeus (green monkey)
  • Homo sapiens (human)
  • Manis javanica (Malayan pangolin)
  • Manis pentadactyla (Chinese pangolin)
  • Ochotona princeps (American pika)
  • Peromyscus leucopus (white-footed mouse)
  • Pipistrellus kuhlii (Kuhl’s pipistrelle)
  • Sturnira hondurensis (bat)
  • Talpa occidentalis (Iberian mole)
  • Trichosurus vulpecula (common brushtail)

Continue reading “October-December eukaryotic genome annotations in Refseq”

RefSeq release 204 is now available

RefSeq release 204 is now available

RefSeq release 204 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of January 4, 2021, and contains 262,714,372 records, including 191,411,721 proteins, 35,353,412 RNAs, and sequences from 106,581 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Updated human genome Annotation Release 109.20201120
Updated Annotation Release 109.20201120 is an update of NCBI Homo sapiens Annotation Release 109.

The annotation report for 109.20201120 is available here. The annotation products are available in the sequence databases and on the FTP site. Continue reading “RefSeq release 204 is now available”

New RefSeq annotations for mouse, maize, sunflower and more!

New RefSeq annotations for mouse, maize, sunflower and more!

In August and September, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Amphiprion ocellaris (clown anemonefish)
  • Anopheles stephensi (Asian malaria mosquito)
  • Aplysia californica (California sea hare)
  • Bactrocera oleae (olive fruit fly)
  • Branchiostoma floridae (Florida lancelet)
  • Egretta garzetta (little egret)
  • Folsomia candida (springtail)
  • Fundulus heteroclitus (mummichog)
  • Halichoerus grypus (gray seal)
  • Helianthus annuus (common sunflower)
  • Homo sapiens (human)
  • Lynx canadensis (Canada lynx)
  • Molossus molossus (Pallas’s mastiff bat)
  • Monomorium pharaonis (pharaoh ant)
  • Mus musculus (house mouse)
  • Myotis myotis (bat)
  • Neolamprologus brichardi (lyretail cichlid)
  • Oncorhynchus keta (chum salmon)
  • Onychomys torridus (southern grasshopper mouse)
  • Oryzias melastigma (Indian medaka)
  • Phyllostomus discolor (pale spear-nosed bat)
  • Rousettus aegyptiacus (Egyptian rousette)
  • Sander lucioperca (pike-perch)
  • Zea mays (maize)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

Learn more about the annotation of the new mouse reference assembly, GRCm39, here. This is the first coordinate-changing update to the mouse reference since the 2012 release of GRCm38.

RefSeq Release 202 is public

RefSeq release 202 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of September 8, 2020, and contains 255,571,455 records, including 186,755,483 proteins, 33,077,068 RNAs, and sequences from 104,969  organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Updated human genome Annotation Release 109.20200815
Updated Annotation Release 109.2020815 is an update of NCBI Homo sapiens Annotation Release 109. The annotation report is available here.

The annotation products are available in the sequence databases and on the FTP site.

This update includes around 15,000 updated RefSeq transcripts revised to use CAGE and polyA data to define 5′ and 3′ ends, and match the reference GRCh38 sequence.

Coronavirus host gene regulatory elements now annotated by RefSeq Functional Elements
The RefSeq Functional Elements project at NCBI has prioritized curation of experimentally validated regulatory elements for human host genes associated with SARS-CoV-2 entry into cells. The annotations include several enhancers, promoters, cis-regulatory elements and protein binding sites, among other feature types. We annotated 236 regulatory features for 27 distinct biological regions, including regulatory elements for the ABO, ACE2, ANPEP, CD209, CLEC4G, CLEC4M, CTSL, DPP4, and TMPRSS2 genes. More information can be found here.

New eukaryotic genome annotations
This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 27 species, including:

  • maize annotation release 103, based on the new assembly Zm-B73-REFERENCE-NAM-5.0 (GCF_902167145.1)
  • marmoset annotation release 105, based on the new assembly Callithrix_jacchus_cj1700_1.1 (GCF_009663435.1)
  • Chinese hamster annotation release 104, based on the assembly CriGri_1.0 (GCF_000223135.1) and the new assembly CriGri-PICRH-1.0 (GCF_003668045.3)
  • Asian giant hornet annotation release 100, based on the new assembly V.mandarinia_Nanaimo_p1.0 (GCF_014083535.2)
  • Florida lancelet annotation release 100, based on the new assembly Bfl_VNyyK (GCF_000003815.2)
  • Anopheles stephensi annotation release 100, based on the new assembly UCI_ANSTEP_V1.0 (GCF_013141755.1)

Updated and improved collection of RefSeq representative genome assemblies now available
The collection of representative genome assemblies for Bacteria and Archaea contains 11,727 prokaryotic assemblies to represent their respective species. More information can be found here.

Updated protein family models used by PGAP available for download
Release 3.0 of the NCBI protein family models used by the Prokaryotic Genome Annotation Pipeline (PGAP) is now available.

This release contains 17,350 models: 12,864 HMMs built at NCBI (111 more than in release 2.0) and 4,486 TIGRFAM HMMs. In addition, since release 2.0, we have assigned product names to over 2,000 Pfam HMMs, bringing the total to 6,698 Pfam HMMs with names that can be transferred by PGAP to the annotated proteins they hit. More information can be found here.

Future change: Mouse Reference Assembly Update
RefSeq annotation of the new mouse GRCm39 assembly is in progress, and is expected to be included in the next release.

New annotations in RefSeq: white-tufted-ear marmoset, ruddy duck, and more

New annotations in RefSeq: white-tufted-ear marmoset, ruddy duck, and more

In June and July, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

Acipenser ruthenus (sterlet)
Anguilla anguilla (European eel)
Aphantopus hyperantus (ringlet)
Callithrix jacchus (white-tufted-ear marmoset)
Chelonus insularis (wasp)
Cricetulus griseus (Chinese hamster)
Cygnus atratus (black swan)
Drosophila subobscura (fly)
Electrophorus electricus (electric eel)
Etheostoma cragini (Arkansas darter)
Hippoglossus stenolepis (Pacific halibut)
Mirounga leonina (Southern elephant seal)
Morone saxatilis (striped sea-bass)
Mus musculus (house mouse)
Oxyura jamaicensis (ruddy duck)
Pan paniscus (pygmy chimpanzee)
Populus alba (eudicot)
Scophthalmus maximus (turbot)
Spodoptera frugiperda (fall armyworm)
Stegodyphus dumicola (spider)
Vitis riparia (eudicot)
Zootoca vivipara (common lizard)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

New annotations in RefSeq: budgerigar, bony fish, fly and more

close-up-photo-of-white-and-blue-bird

In May, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Acipenser ruthenus (sterlet)
  • Arvicanthis niloticus (African grass rat)
  • Cannabis sativa (eudicot)
  • Crassostrea gigas (Pacific oyster)
  • Cyclopterus lumpus (lumpfish)
  • Drosophila albomicans (fly)
  • Drosophila guanche (fly)
  • Drosophila innubila (fly)
  • Esox lucius (northern pike)
  • Gymnodraco acuticeps (bony fish)
  • Hippoglossus hippoglossus (Atlantic halibut)
  • Marmota flaviventris (yellow-bellied marmot)
  • Melopsittacus undulatus (budgerigar)
  • Osmia lignaria (orchard mason bee)
  • Pangasianodon hypophthalmus (striped catfish)
  • Pantherophis guttatus (snake)
  • Periophthalmus magnuspinnatus (bony fish)
  • Prunus dulcis (almond)
  • Pseudochaenichthys georgianus (South Georgia icefish)
  • Setaria viridis (monocot)
  • Thalassophryne amazonica (bony fish)
  • Thrips palmi (thrip)
  • Trematomus bernacchii (emerald rockcod)
  • Zea mays (maize)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.