Tag: RefSeq

Announcing an updated prokaryotic representative genomes collection with 706 new species!

Announcing an updated prokaryotic representative genomes collection with 706 new species!

An updated bacterial and archaeal representative genomes collection is available! A total of 16,105 assemblies among the 249,000 prokaryotic assemblies in RefSeq were selected to represent their respective species. The collection has grown by 3.7% since January 2022. A total of 706 species are represented for the first time. In addition, 186 species are represented by a better assembly, and 124 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment.

We updated the database on the Microbial Nucleotide BLAST page as well as the basic nucleotide BLAST RefSeq Representative genomes database (fourth in the menu) to reflect these changes. Finally, remember that you can now run BLAST searches against the proteins annotated on representative genomes (second in the menu). See more info here.

New in RAPT: Better taxonomic assignment and GO annotation

New in RAPT: Better taxonomic assignment and GO annotation

We are excited to announce two improvements to the Read assembly and Annotation Pipeline Tool (RAPT), which allows you to assemble genomic reads for bacterial or archaeal isolates and annotate their genes at the click of a button.

Improved taxonomic assignment

Now RAPT verifies the scientific name you provide with the reads, and corrects it as needed with the Average Nucleotide Identity (ANI) tool, which compares your genome to type strain assemblies in GenBank to place it in the taxonomic tree. So, even if you only have a rough idea of the species you have sequenced, input datasets tailored to your genome will be used for the annotation and you will get the best possible gene set from RAPT. Continue reading “New in RAPT: Better taxonomic assignment and GO annotation”

Announcing Human Annotation Release 110

Announcing Human Annotation Release 110

The annotation of human assemblies GRCh38.p14 and T2T-CHM13v2.0

We are happy to announce the first de novo annotation of human T2T-CHM13v2.0, the gap-less assembly generated by the T2T Consortium, and the full re-annotation of the human reference assembly, GRCh38.p14. We hope the results will serve both the needs of those eager to explore newly sequenced regions of the genome, including telomeres and centromeres, and those interested in refreshing their interpretation of the human reference, in light of recently curated transcripts and new transcriptomic and other data incorporated in the annotation. Continue reading “Announcing Human Annotation Release 110”

Gapless Telomere to Telomere human genome (T2T-CHM13) now available

Gapless Telomere to Telomere human genome (T2T-CHM13) now available

On April 1, 2022, Science published the first complete sequence of a human genome, known as T2T-CHM13. This notable scientific achievement comes two decades after the first human genome release from the Human Genome Project and offers an in situ look at biologically important regions, such as centromeres, telomeres, and segmental duplications, that were previously unassembled. Read on to learn more about how you can access this assembly and related resources at NCBI, or to access any one of the more than 1000 human genome assemblies now in GenBank. Continue reading “Gapless Telomere to Telomere human genome (T2T-CHM13) now available”

RefSeq release 212 is available!

RefSeq release 212 is available!

RefSeq release 212 is now available online, from the FTP site and through NCBI’s Entrez
programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of May 2, 2022, and contains 314,915,153 records, including 229,417,182 proteins, 44,805,833 RNAs, and sequences from 119,373 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Human genome Annotation Release 110

Annotation Release 110 is the first new annotation of human in four years, including all latest curated RefSeqs, and recalculation of models using over 80M long reads and 9B Illumina RNA-seq reads. AR 110 includes annotation of two human assemblies: Continue reading “RefSeq release 212 is available!”

New RefSeq annotations!

New RefSeq annotations!

In February and March, the NCBI Eukaryotic Genome Annotation Pipeline released thirty-seven new annotations in RefSeq for the following organisms:

  • Belonocnema kinseyi (wasp)
  • Daphnia pulex (common water flea)
  • Daphnia pulicaria (crustacean)
  • Dermatophagoides farinae (American house dust mite)
  • Diprion similis (hymenopteran)
  • Drosophila willistoni (fly)
  • Equus quagga burchellii (Burchell’s zebra) (pictured)
  • Gallus gallus (chicken)
  • Haliotis rubra (blacklip abalone)
  • Haliotis rufescens (red abalone)
  • Helicoverpa zea (corn earworm)
  • Homalodisca vitripennis (glassy-winged sharpshooter)
  • Hydra vulgaris (swiftwater hydra)
  • Hypomesus transpacificus (delta smelt)
  • Ictalurus punctatus (channel catfish)
  • Ischnura elegans (damselfly)
  • Lolium rigidum (monocot)
  • Lucilia cuprina (Australian sheep blowfly)
  • Lynx rufus (bobcat)
  • Marmota monax (woodchuck)
  • Meles meles (Eurasian badger)
  • Micropterus dolomieu (smallmouth bass)
  • Neodiprion fabricii (hymenopteran)
  • Neodiprion lecontei (redheaded pine sawfly)
  • Neodiprion pinetum (white pine sawfly)
  • Neodiprion virginiana (hymenopteran)
  • Oncorhynchus gorbuscha (pink salmon)
  • Osmia bicornis bicornis (red mason bee)
  • Scatophagus argus (bony fish)
  • Schistocerca americana (American grasshopper)
  • Schistocerca piceifrons (Central American locust)
  • Silurus meridionalis (bony fish)
  • Ursus americanus (American black bear)
  • Vanessa cardui (painted lady)
  • Vespa crabro (European hornet)
  • Vigna umbellata (eudicot)
  • Xenia sp. Carnegie-2017 (soft coral)

View the full list of annotated eukaryotes available in the Genome Data Viewer (GDV) browser.

Bacterial and archaeal genomes with GO terms in RefSeq!

RefSeq prokaryotic genomes and proteins are now annotated with Gene Ontology (GO) terms. Over the years we have received many requests to add GO terms to the annotations we provide. We heard you!

We are embarking on this adventure and starting to place terms from the Biological Process, Molecular Function and Cellular Component ontologies to genomes and proteins we annotate with the Prokaryotic Genome Annotation Pipeline (PGAP). Because of the hierarchical nature of the Gene Ontologies, these annotations will help the comparison of gene content across genomes at variable levels of specificity and eventually allow GO term enrichment analysis. GO terms are now associated with coding sequence (CDS) features on newly-submitted genomes (See Figure 1). They will progressively appear on genomes that are already in RefSeq as these get reannotated (about once a year). We expect all RefSeq genomes to have some GO terms by the spring of 2023.

Continue reading “Bacterial and archaeal genomes with GO terms in RefSeq!”

New RefSeq annotations!

New RefSeq annotations!

In December and January, the NCBI Eukaryotic Genome Annotation Pipeline released twenty-four new annotations in RefSeq for the following organisms:

    • Aegilops tauschii (monocot)
    • Camelus bactrianus (Bactrian camel)
    • Colias croceus (clouded yellow)
    • Echinops telfairi (small Madagascar hedgehog)
    • Harmonia axyridis (beetle)
    • Lemur catta (Ring-tailed lemur)
    • Leopardus geoffroyi (Geoffroy’s cat)
    • Macaca fascicularis (crab-eating macaque)
    • Maniola jurtina (meadow brown)
    • Meles meles (Eurasian badger)
    • Melitaea cinxia (Glanville fritillary) (pictured) 

Continue reading “New RefSeq annotations!”

New Gene Information from the Alliance of Genome Resources

NCBI Gene now has descriptive information about genes from the Alliance of Genome Resources for organisms including Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, Homo sapiens, Mus musculus, Rattus norvegicus, and Saccharomyces cerevisiae.

Figure 1. The gene summary section of the Drosophila melanogaster slmb Gene Full Report showing the link to the corresponding record at the Alliance of Genome Resources.

The Summary section of the Gene Full Report page has Links to gene pages at the Alliance of Genome Resources (Figure 1). These are also in the right-hand sidebar of the Links to other resources section.   In the case of genes that don’t have a RefSeq summary,  we use  the textual gene descriptions from the Alliance of Genome resources.

The Drosphila slmb gene record shows the enhancements provided by the Alliance of Genome Resources.  The gene_info.gz files on the  Gene FTP site also include AllianceGenome references in the dbXrefs column.

Updated prokaryotic representative genomes collection includes 685 new species!

We are happy to announce an updated bacterial and archaeal representative genomes collection. The current collection contains a total of 15,507 assemblies selected from 236,000 prokaryotic RefSeq assemblies to represent their respective species. The collection has grown by five percent since August 2021. A total of 685 species are represented for the first time. In addition, 370 species are represented by a better assembly, and 84 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment.

We updated the database on the Microbial Nucleotide BLAST page as well as the basic nucleotide BLAST RefSeq Representative genomes database (fourth in the menu) to reflect these changes. Finally, remember that you can now run BLAST searches against the proteins annotated on representative genomes (second in the menu). Find more information here.