Tag: RefSeq

RefSeq release 91 is public

RefSeq release 91 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of November 5, 2018. It contains 179,672,083 records, including 125,530,811 proteins, 24,447,570 RNAs, and sequences from 85,308 organisms.

The release is provided in several directories as a complete dataset and as divided by logical groupings.

Continue reading “RefSeq release 91 is public” →

October 2018 RefSeq annotations include honey bee, butterfly & more

In October, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

Continue reading “October 2018 RefSeq annotations include honey bee, butterfly & more” →

August and September annotations in RefSeq

In August and September, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

Continue reading “August and September annotations in RefSeq” →

Matched Annotation by NCBI and EMBL-EBI (MANE): a new joint venture to define a set of representative transcripts for human protein-coding genes

The RefSeq project at the NCBI and the Ensembl/GENCODE project at EMBL-EBI have provided independent high-quality human reference gene datasets to biologists since the sequencing of the human genome.

Now we’re joining together on an exciting new project we’re calling Matched Annotation from the NCBI and EMBL-EBI or MANE, to provide a matched set of well-supported transcripts for human protein-coding genes and define one representative transcript for each gene. Both RefSeq and Ensembl will continue to provide a rich set of alternate transcripts per gene.

Continue reading “Matched Annotation by NCBI and EMBL-EBI (MANE): a new joint venture to define a set of representative transcripts for human protein-coding genes” →

RefSeq release 90 is public

RefSeq release 90 is accessible online, via FTP and through NCBI’s programming utilities.

This full release incorporates genomic, transcript, and protein data available as of September 10, 2018. It contains 173,956,003 records, including 121,138,769 proteins, 23,838,676 836, and sequences from 84,276 organisms.

The release is provided in several directories as a complete dataset and as divided by logical groupings.

May – July annotations in RefSeq: ants, Chinese alligator & more

In recent months, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

Alligator sinensis (Chinese alligator)
Athalia rosae (coleseed sawfly)
Bubalus bubalis (water buffalo)
Camponotus floridanus (Florida carpenter ant)
Canis lupus dingo (dingo)
Harpegnathos saltator (Jerdon’s jumping ant)
Melanaphis sacchari (aphid)
Pelodiscus sinensis (Chinese soft-shelled turtle)
Pogonomyrmex barbatus (red harvester ant)
Pomacea canaliculata (gastropod)
Sipha flava (yellow sugarcane aphid)
Theropithecus gelada (gelada)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

RefSeq release 89 is public

RefSeq release 89 is accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available as of July 9, 2018. It contains 163,859,625 records, including 113,429,348 proteins, 23,029,67 RNAs and sequences from 81,345 organisms. The release is in several directories as a complete dataset and as divided by logical groupings.

April and May annotations in RefSeq: cow, bonobo and more

In April and May, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

Bos taurus (cattle)
Cephus cinctus (wheat stem sawfly)
Citrus sinensis (sweet orange)
Cynara cardunculus cardunculus (eudicot)
Cynoglossus semilaevis (tongue sole)
Gallus gallus (chicken)
Kryptolebias marmoratus (mangrove rivulus)
Macaca nemestrina (pig-tailed macaque)
Maylandia zebra (zebra mbuna)
Medicago truncatula (barrel medic)
Pan paniscus (pygmy chimpanzee)
Pteropus alecto (black flying fox)
Python bivittatus (Burmese python)
Ricinus communis (castor bean)
Temnothorax curvispinosus (ant)
Tetranychus urticae (two-spotted spider mite)
Ziziphus jujuba (common jujube)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

Improved annotation of Streptomyces RefSeq genomes

We’ve completed the RefSeq reannotation of over 1,000 Streptomyces genomes! The genomes were reannotated using the Prokaryotic Genome Annotation Pipeline (PGAP). PGAP detected nearly 100% of ribosomally synthesized and post-translationally modified peptide natural products (RiPP)-encoding genes from known families, despite their small size, using a set of over 30 hidden Markov Models (HMMs) built by RefSeq biocurators. Over 70% (251) of lasso peptides now present in Streptomyces RefSeq genomes (354) were annotated for the first time.

If you are aware of any class of RiPP precursor in Streptomyces that was not found in our recent re-annotation, please contact us through the NCBI Help Desk, and we will add new HMMs to the rules we use to find and annotate RiPP precursor genes.

RefSeq release 88 available

RefSeq release 88 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available, as of May 14, 2018. It contains 160,224,355 records, including 110,333,800 proteins, 22,461,378 RNAs, and sequences from 79,448 organisms. The release is in several directories as a complete dataset and as divided by logical groupings.

This release incorporates dbSNP release 151, which nearly doubles the number of SNPs annotated on the human GRCh38 genome, with matching increases in the size of the human nucleotide flatfile (.gbff) records.

Starting in November 2018, SNP variation features will no longer be in RefSeq genome assembly records. The RefSeq release notes have information about this change.