Tag: eukaryotic genome annotation

New RefSeq annotations for mouse, maize, sunflower and more!

New RefSeq annotations for mouse, maize, sunflower and more!

In August and September, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Amphiprion ocellaris (clown anemonefish)
  • Anopheles stephensi (Asian malaria mosquito)
  • Aplysia californica (California sea hare)
  • Bactrocera oleae (olive fruit fly)
  • Branchiostoma floridae (Florida lancelet)
  • Egretta garzetta (little egret)
  • Folsomia candida (springtail)
  • Fundulus heteroclitus (mummichog)
  • Halichoerus grypus (gray seal)
  • Helianthus annuus (common sunflower)
  • Homo sapiens (human)
  • Lynx canadensis (Canada lynx)
  • Molossus molossus (Pallas’s mastiff bat)
  • Monomorium pharaonis (pharaoh ant)
  • Mus musculus (house mouse)
  • Myotis myotis (bat)
  • Neolamprologus brichardi (lyretail cichlid)
  • Oncorhynchus keta (chum salmon)
  • Onychomys torridus (southern grasshopper mouse)
  • Oryzias melastigma (Indian medaka)
  • Phyllostomus discolor (pale spear-nosed bat)
  • Rousettus aegyptiacus (Egyptian rousette)
  • Sander lucioperca (pike-perch)
  • Zea mays (maize)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

Learn more about the annotation of the new mouse reference assembly, GRCm39, here. This is the first coordinate-changing update to the mouse reference since the 2012 release of GRCm38.

RefSeq Release 202 is public

RefSeq release 202 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of September 8, 2020, and contains 255,571,455 records, including 186,755,483 proteins, 33,077,068 RNAs, and sequences from 104,969  organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Updated human genome Annotation Release 109.20200815
Updated Annotation Release 109.2020815 is an update of NCBI Homo sapiens Annotation Release 109. The annotation report is available here.

The annotation products are available in the sequence databases and on the FTP site.

This update includes around 15,000 updated RefSeq transcripts revised to use CAGE and polyA data to define 5′ and 3′ ends, and match the reference GRCh38 sequence.

Coronavirus host gene regulatory elements now annotated by RefSeq Functional Elements
The RefSeq Functional Elements project at NCBI has prioritized curation of experimentally validated regulatory elements for human host genes associated with SARS-CoV-2 entry into cells. The annotations include several enhancers, promoters, cis-regulatory elements and protein binding sites, among other feature types. We annotated 236 regulatory features for 27 distinct biological regions, including regulatory elements for the ABO, ACE2, ANPEP, CD209, CLEC4G, CLEC4M, CTSL, DPP4, and TMPRSS2 genes. More information can be found here.

New eukaryotic genome annotations
This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 27 species, including:

  • maize annotation release 103, based on the new assembly Zm-B73-REFERENCE-NAM-5.0 (GCF_902167145.1)
  • marmoset annotation release 105, based on the new assembly Callithrix_jacchus_cj1700_1.1 (GCF_009663435.1)
  • Chinese hamster annotation release 104, based on the assembly CriGri_1.0 (GCF_000223135.1) and the new assembly CriGri-PICRH-1.0 (GCF_003668045.3)
  • Asian giant hornet annotation release 100, based on the new assembly V.mandarinia_Nanaimo_p1.0 (GCF_014083535.2)
  • Florida lancelet annotation release 100, based on the new assembly Bfl_VNyyK (GCF_000003815.2)
  • Anopheles stephensi annotation release 100, based on the new assembly UCI_ANSTEP_V1.0 (GCF_013141755.1)

Updated and improved collection of RefSeq representative genome assemblies now available
The collection of representative genome assemblies for Bacteria and Archaea contains 11,727 prokaryotic assemblies to represent their respective species. More information can be found here.

Updated protein family models used by PGAP available for download
Release 3.0 of the NCBI protein family models used by the Prokaryotic Genome Annotation Pipeline (PGAP) is now available.

This release contains 17,350 models: 12,864 HMMs built at NCBI (111 more than in release 2.0) and 4,486 TIGRFAM HMMs. In addition, since release 2.0, we have assigned product names to over 2,000 Pfam HMMs, bringing the total to 6,698 Pfam HMMs with names that can be transferred by PGAP to the annotated proteins they hit. More information can be found here.

Future change: Mouse Reference Assembly Update
RefSeq annotation of the new mouse GRCm39 assembly is in progress, and is expected to be included in the next release.

New annotations in RefSeq: white-tufted-ear marmoset, ruddy duck, and more

New annotations in RefSeq: white-tufted-ear marmoset, ruddy duck, and more

In June and July, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

Acipenser ruthenus (sterlet)
Anguilla anguilla (European eel)
Aphantopus hyperantus (ringlet)
Callithrix jacchus (white-tufted-ear marmoset)
Chelonus insularis (wasp)
Cricetulus griseus (Chinese hamster)
Cygnus atratus (black swan)
Drosophila subobscura (fly)
Electrophorus electricus (electric eel)
Etheostoma cragini (Arkansas darter)
Hippoglossus stenolepis (Pacific halibut)
Mirounga leonina (Southern elephant seal)
Morone saxatilis (striped sea-bass)
Mus musculus (house mouse)
Oxyura jamaicensis (ruddy duck)
Pan paniscus (pygmy chimpanzee)
Populus alba (eudicot)
Scophthalmus maximus (turbot)
Spodoptera frugiperda (fall armyworm)
Stegodyphus dumicola (spider)
Vitis riparia (eudicot)
Zootoca vivipara (common lizard)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

New annotations in RefSeq: budgerigar, bony fish, fly and more

close-up-photo-of-white-and-blue-bird

In May, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Acipenser ruthenus (sterlet)
  • Arvicanthis niloticus (African grass rat)
  • Cannabis sativa (eudicot)
  • Crassostrea gigas (Pacific oyster)
  • Cyclopterus lumpus (lumpfish)
  • Drosophila albomicans (fly)
  • Drosophila guanche (fly)
  • Drosophila innubila (fly)
  • Esox lucius (northern pike)
  • Gymnodraco acuticeps (bony fish)
  • Hippoglossus hippoglossus (Atlantic halibut)
  • Marmota flaviventris (yellow-bellied marmot)
  • Melopsittacus undulatus (budgerigar)
  • Osmia lignaria (orchard mason bee)
  • Pangasianodon hypophthalmus (striped catfish)
  • Pantherophis guttatus (snake)
  • Periophthalmus magnuspinnatus (bony fish)
  • Prunus dulcis (almond)
  • Pseudochaenichthys georgianus (South Georgia icefish)
  • Setaria viridis (monocot)
  • Thalassophryne amazonica (bony fish)
  • Thrips palmi (thrip)
  • Trematomus bernacchii (emerald rockcod)
  • Zea mays (maize)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

Recent RefSeq annotations: barn owl, monarch butterfly and more

800px-Barn_Owl,_Manchester_area,_UK,_by_Andy_Chilton_2016-07-06_(Unsplash)In February and March, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Amblyraja radiata (thorny skate)
  • Catharus ustulatus (Swainson’s thrush)
  • Chelonoidis abingdonii (Abingdon island giant tortoise)
  • Chiroxiphia lanceolata (lance-tailed manakin)
  • Danaus plexippus plexippus (monarch butterfly)
  • Daphnia magna (crustacean)
  • Drosophila grimshawi (fly)
  • Drosophila mojavensis (fly)
  • Drosophila sechellia (fly)
  • Homo sapiens (human)
  • Hylobates moloch (silvery gibbon)
  • Lontra canadensis (Northern American river otter)
  • Lynx canadensis (Canada lynx)
  • Nasonia vitripennis (jewel wasp)
  • Odontomachus brunneus (ant)
  • Petromyzon marinus (sea lamprey)
  • Phocoena sinus (vaquita)
  • Rattus rattus (black rat)
  • Rhinolophus ferrumequinum (greater horseshoe bat)
  • Strigops habroptila (Kakapo)
  • Taeniopygia guttata (zebra finch)
  • Tyto alba (Barn owl)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

The next RefSeq FTP release number will skip to 200

NCBI’s Reference Sequence (RefSeq) FTP release numbers will increment to 200 for the next release and skip over the numbers 100-199. The current, March 2020 release, is release 99. The next bi-monthly release in May 2020 will be release 200.  This change is to avoid overlapping with the release numbers of the completely independent RefSeq annotation releases for the eukaryotic genomes we annotate, which are currently in the range 100-109, for example Mus musculus Annotation Release 108. Continue reading “The next RefSeq FTP release number will skip to 200”

Fifteen new NCBI annotations in RefSeq: flies, harbor seal and more

Fifteen new NCBI annotations in RefSeq: flies, harbor seal and more

In January and February, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Aythya fuligula (tufted duck)
  • Camelus ferus (Wild Bactrian camel)
  • Corvus moneduloides (New Caledonian crow)
  • Coturnix japonica (Japanese quail)
  • Drosophila ananassae (fly)
  • Drosophila virilis (fly)
  • Etheostoma spectabile (orangethroat darter)
  • Hylobates moloch (silvery gibbon)
  • Mustela erminea (ermine)
  • Nematostella vectensis (starlet sea anemone)
  • Nomia melanderi (Alkali bee)
  • Phoca vitulina (harbor seal)
  • Sapajus apella (Tufted capuchin)
  • Thamnophis elegans (Western terrestrial garter snake)
  • Xiphophorus hellerii (green swordtail)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

Important changes to the genomes FTP site in February

We have added the latest NCBI Eukaryotic Genome Annotation Pipeline results for the more than 580 species that we annotate to the genomes/refseq directory on the genomes FTP area. As we announced in December, we will stop publishing annotation results to the genus_species directories (example: genomes/Xenopus_tropicalis) on the genomes FTP site effective February 1, 2020. We will also move existing genus_species directories to genomes/archive/old_refseq during the month of February.X_t_assemblyFigure 1. The Assembly page for the Xenopus tropicalis UCB Xtro 10.0 (GCF_000004195.4) showing the blue download button. Annotation results such as the RefSeq transcript alignments that can be downloaded from the web page are now also under the genomes/refseq directory on the FTP site. The FTP path to the .bam alignment files is in red.

These FTP changes do not affect the Assembly download function. As always, you can download assembly data using the blue Download button on the web pages (Figure 1).

 

December 2019 RefSeq annotations: human, Tasmanian devil and more

tasmanian devil sits, looking to the right

In December, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Anarrhichthys ocellatus (wolf-eel)
  • Apis florea (little honeybee)
  • Contarinia nasturtii (swede midge)
  • Cucumis sativus (cucumber)
  • Galleria mellonella (greater wax moth)
  • Homo sapiens (human)
  • Nasonia vitripennis (jewel wasp)
  • Oncorhynchus kisutch (coho salmon)
  • Oreochromis aureus (blue tilapia)
  • Piliocolobus tephrosceles (Ugandan red Colobus)
  • Sarcophilus harrisii (Tasmanian devil)
  • Xenopus tropicalis (tropical clawed frog)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.