May – July annotations in RefSeq: ants, Chinese alligator & more


In recent months, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Alligator sinensis (Chinese alligator)
  • Athalia rosae (coleseed sawfly)
  • Bubalus bubalis (water buffalo)
  • Camponotus floridanus (Florida carpenter ant)
  • Canis lupus dingo (dingo)
  • Harpegnathos saltator (Jerdon’s jumping ant)
  • Melanaphis sacchari (aphid)
  • Pelodiscus sinensis (Chinese soft-shelled turtle)
  • Pogonomyrmex barbatus (red harvester ant)
  • Pomacea canaliculata (gastropod)
  • Sipha flava (yellow sugarcane aphid)
  • Theropithecus gelada (gelada)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

RefSeq release 89 is public


RefSeq release 89 is accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available as of July 9, 2018. It contains 163,859,625 records, including 113,429,348 proteins, 23,029,67 RNAs and sequences from 81,345 organisms. The release is in several directories as a complete dataset and as divided by logical groupings.

April and May annotations in RefSeq: cow, bonobo and more


In April and May, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Bos taurus (cattle)
  • Cephus cinctus (wheat stem sawfly)
  • Citrus sinensis (sweet orange)
  • Cynara cardunculus cardunculus (eudicot)
  • Cynoglossus semilaevis (tongue sole)
  • Gallus gallus (chicken)
  • Kryptolebias marmoratus (mangrove rivulus)
  • Macaca nemestrina (pig-tailed macaque)
  • Maylandia zebra (zebra mbuna)
  • Medicago truncatula (barrel medic)
  • Pan paniscus (pygmy chimpanzee)
  • Pteropus alecto (black flying fox)
  • Python bivittatus (Burmese python)
  • Ricinus communis (castor bean)
  • Temnothorax curvispinosus (ant)
  • Tetranychus urticae (two-spotted spider mite)
  • Ziziphus jujuba (common jujube)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

Improved annotation of Streptomyces RefSeq genomes


We’ve completed the RefSeq reannotation of over 1,000 Streptomyces genomes! The genomes were reannotated using the Prokaryotic Genome Annotation Pipeline (PGAP). PGAP detected nearly 100% of ribosomally synthesized and post-translationally modified peptide natural products (RiPP)-encoding genes from known families, despite their small size, using a set of over 30 hidden Markov Models (HMMs) built by RefSeq biocurators. Over 70% (251) of lasso peptides now present in Streptomyces RefSeq genomes (354) were annotated for the first time.

If you are aware of any class of RiPP precursor in Streptomyces that was not found in our recent re-annotation, please contact us through the NCBI Help Desk, and we will add new HMMs to the rules we use to find and annotate RiPP precursor genes.

RefSeq release 88 available


RefSeq release 88 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available, as of May 14, 2018. It contains 160,224,355 records, including 110,333,800 proteins, 22,461,378 RNAs, and sequences from 79,448 organisms. The release is in several directories as a complete dataset and as divided by logical groupings.

This release incorporates dbSNP release 151, which nearly doubles the number of SNPs annotated on the human GRCh38 genome, with matching increases in the size of the human nucleotide flatfile (.gbff) records.

Starting in November 2018, SNP variation features will no longer be in RefSeq genome assembly records.  The RefSeq release notes have information about this change.

March & April annotations in RefSeq: chimpanzee, human & more


Chimpanzees_in_Uganda_(5984913059)The NCBI Eukaryotic Genome Annotation Pipeline has recently released new annotations in RefSeq for the following organisms:

  • Bombus impatiens (common eastern bumble bee)
  • Brachypodium distachyon (stiff brome)
  • Cimex lectularius (bed bug)
  • Desmodus rotundus (common vampire bat)
  • Halyomorpha halys (brown marmorated stink bug)
  • Homo sapiens (human, more information can be found here)
  • Lingula anatina (brachiopod)
  • Neophocaena asiaeorientalis asiaeorientalis (Yangtze finless porpoise)
  • Oncorhynchus tshawytscha (Chinook salmon)
  • Oryzias melastigma (Indian medaka)
  • Pan troglodytes (chimpanzee)
  • Physcomitrella patens (moss)
  • Populus trichocarpa (black cottonwood)
  • Rosa chinensis (China rose)
  • Selaginella moellendorffii (club-moss)
  • Terrapene mexicana triunguis (Three-toed box turtle)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

Human annotation release 109 for GRCh38.p12 is available in RefSeq


You can now download human annotation release 109 on FTP or explore it in the Genome Data Viewer, in the Gene database, and with BLAST.

Highlights in release 109:

  • A total of 20,203 protein-coding genes and 17,871 non-coding genes were annotated.
  • The number of annotated curated transcripts increased by 17% and genes with two or more curated alternative variants increased by 8%.
  • The annotation includes 6,862 features and 2,075 GeneIDs for non-genic functional elements, such as regulatory regions and known structural elements. For example, see the opsin locus control region (OPSIN-LCR).

Continue reading

RefSeq release 87 available


RefSeq release 87 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript and protein data available as of March 5, 2018 and contains 155,118,991 records, including 106,245,682 proteins, 21,923,574 RNAs, and sequences from 77,225 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings.

Starting in July 2018, SNP variation features will no longer be in RefSeq genome assembly records – chromosome and contig records with NC_, NT_, NW_ and AC_ accession prefixes.  The RefSeq release notes have more information about this change.

RefSeq release 86 is now public


RefSeq release 86 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available, as of January 8, 2018 and contains 149,493,466 records, including 102,133,844 proteins, 21,370,778 RNAs, and sequences from 75,218 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings.

Two important notes follow; please see the RefSeq release notes for more information.

Non-human SNP data dropped

Non-human SNPs were dropped from all RefSeq FTP files in the daily FTP files starting in December 2017, and in this full release (January 2018).

HPRD features removed

We have dropped a set of features, originally imported from HPRD, from human transcript and protein RefSeq records.