RefSeq Release 96: complete re-annotation of mouse genome and new human annotation


You can now access RefSeq release 96  online, from the FTP  site, and through NCBI’s Entrez programming utilities (E-utilities).

This full release incorporates genomic, transcript, and protein data available, as of September 9, 2019 and contains 213,863,503 records, including 152,910,397 proteins, 28,017,380 RNAs, and sequences from 94,946 organisms.

The release is provided  as a complete dataset and also in several directories divided by logical groupings.

Special announcements:

1. New Mus musculus (house mouse) Annotation Release 108

The latest annotation run for Mus musculus, 108, is a complete re-annotation of the mouse GRCm38.p6 assembly that incorporates ongoing curation work and new computed models based on extensive long-read transcriptome data.
See the annotation report for  details.  You can access these  annotation products through the sequence databases and on the FTP site.

2. Updated Homo sapiens Annotation Release 109.20190905

Annotation Release 109.20190905 is an update of NCBI Homo sapiens Annotation Release 109. The annotation report has details. You can access the annotation products from the sequence databases or download the data from the FTP site. We will continue to update the human genome annotation frequently so that we can
incorporate ongoing curation work including the MANE project and other curation activities. See our post on the increased frequency of annotation for more information on the new schedule.

3. dbSNP Human Build 153

The short variations (SNPs) annotated on human RefSeq transcripts and RefSeqGene records now incorporate data from dbSNP build 153.

Evidence for naming the protein now on non-redundant refseq records (WP_ accessions)


We are now showing the curated evidence used for assigning names and, if possible, gene symbols, publications, and Enzyme Commission numbers on nearly 70% (83 million) microbial RefSeq proteins. This evidence includes a hierarchical collection of curated Hidden Markov Model (HMM)-based and BLAST-based protein families, and conserved domain architectures.

Continue reading

RefSeq release 95: naming evidence added to all relevant WP proteins


RefSeq release 95 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available, as of July 8, 2019 and contains 206,416,381 records, including 146,381,777 proteins, 27,212,750 RNAs, and sequences from 93,618 organisms.

Continue reading

RefSeq release 94 with MANE and RefSeq Select markup, protein name evidence, and improved [Candida] auris assembly


RefSeq release 94 is now available through NCBI web services, FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available, as of May 13, 2019 and contains 200,311,267 records, including 141,839,334 proteins, 26,534,602 RNAs, and sequences from 91,873 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Continue reading

Human genome annotation will be updated every 2 months


NCBI will be updating the human genome RefSeq annotation more frequently to incorporate improvements made to genes and transcripts by RefSeq curation experts. Faster updates will allow us to include the latest datasets.

In the past, we’ve produced a full re-annotation of the human genome about once a year. The last full annotation, Homo sapiens Annotation Release 109, was in March 2018. A full annotation is produced by two main processes:

Continue reading

Expanded accession formats appear in RefSeq release 93


RefSeq release 93 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of March 13, 2019. It contains 192,722,653 records, including 135,670,032 proteins, 25,840,272 RNAs, and sequences from 88,816 organisms.

Continue reading

New RefSeq annotations for big brown bat, peregrine falcon and more


Hibernating brown bat

In January and February, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Aphis gossypii (cotton aphid)
  • Balaenoptera acutorostrata scammoni (minke whale)
  • Bombyx mandarina (wild silkworm)
  • Chelonia mydas (green sea turtle)
  • Corapipo altera (white-ruffed manakin)
  • Empidonax traillii (willow flycatcher)
  • Eptesicus fuscus (big brown bat)
  • Eumetopias jubatus (Steller sea lion)
  • Falco cherrug (Saker falcon)
  • Falco peregrinus (peregrine falcon)
  • Marmota flaviventris (yellow-bellied marmot)
  • Monomorium pharaonis (pharaoh ant)
  • Neopelma chrysocephalum (saffron-crested tyrant-manakin)
  • Ovis aries (sheep)
  • Pipra filicauda (wire-tailed manakin)
  • Rhopalosiphum maidis (corn leaf aphid)
  • Solanum pennellii (eudicot)
  • Tupaia chinensis (Chinese tree shrew)
  • Vigna unguiculata (cowpea)
  • Vombatus ursinus (common wombat)
  • Xiphophorus couchianus (Monterrey platyfish)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

RefSeq release 92 updates 10,000 human transcripts


RefSeq release 92 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available, as of January 4, 2019 and contains 185,738,687 records, including 130,366,644 proteins, 25,088,890 RNAs, and sequences from 86,867 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings.

Continue reading