On April 1, 2022, Science published the first complete sequence of a human genome, known as T2T-CHM13. This notable scientific achievement comes two decades after the first human genome release from the Human Genome Project and offers an in situ look at biologically important regions, such as centromeres, telomeres, and segmental duplications, that were previously unassembled. Read on to learn more about how you can access this assembly and related resources at NCBI, or to access any one of the more than 1000 human genome assemblies now in GenBank. Continue reading “Gapless Telomere to Telomere human genome (T2T-CHM13) now available”
This full release incorporates genomic, transcript, and protein data available as of May 2, 2022, and contains 314,915,153 records, including 229,417,182 proteins, 44,805,833 RNAs, and sequences from 119,373 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.
Human genome Annotation Release 110
Annotation Release 110 is the first new annotation of human in four years, including all latest curated RefSeqs, and recalculation of models using over 80M long reads and 9B Illumina RNA-seq reads. AR 110 includes annotation of two human assemblies: Continue reading “RefSeq Release 212 is available!”
- Belonocnema kinseyi (wasp)
- Daphnia pulex (common water flea)
- Daphnia pulicaria (crustacean)
- Dermatophagoides farinae (American house dust mite)
- Diprion similis (hymenopteran)
- Drosophila willistoni (fly)
- Equus quagga burchellii (Burchell’s zebra) (pictured)
- Gallus gallus (chicken)
- Haliotis rubra (blacklip abalone)
- Haliotis rufescens (red abalone)
- Helicoverpa zea (corn earworm)
- Homalodisca vitripennis (glassy-winged sharpshooter)
- Hydra vulgaris (swiftwater hydra)
- Hypomesus transpacificus (delta smelt)
- Ictalurus punctatus (channel catfish)
- Ischnura elegans (damselfly)
- Lolium rigidum (monocot)
- Lucilia cuprina (Australian sheep blowfly)
- Lynx rufus (bobcat)
- Marmota monax (woodchuck)
- Meles meles (Eurasian badger)
- Micropterus dolomieu (smallmouth bass)
- Neodiprion fabricii (hymenopteran)
- Neodiprion lecontei (redheaded pine sawfly)
- Neodiprion pinetum (white pine sawfly)
- Neodiprion virginiana (hymenopteran)
- Oncorhynchus gorbuscha (pink salmon)
- Osmia bicornis bicornis (red mason bee)
- Scatophagus argus (bony fish)
- Schistocerca americana (American grasshopper)
- Schistocerca piceifrons (Central American locust)
- Silurus meridionalis (bony fish)
- Ursus americanus (American black bear)
- Vanessa cardui (painted lady)
- Vespa crabro (European hornet)
- Vigna umbellata (eudicot)
- Xenia sp. Carnegie-2017 (soft coral)
View the full list of annotated eukaryotes available in the Genome Data Viewer (GDV) browser.
We are embarking on this adventure and starting to place terms from the Biological Process, Molecular Function and Cellular Component ontologies to genomes and proteins we annotate with the Prokaryotic Genome Annotation Pipeline (PGAP). Because of the hierarchical nature of the Gene Ontologies, these annotations will help the comparison of gene content across genomes at variable levels of specificity and eventually allow GO term enrichment analysis. GO terms are now associated with coding sequence (CDS) features on newly-submitted genomes (See Figure 1). They will progressively appear on genomes that are already in RefSeq as these get reannotated (about once a year). We expect all RefSeq genomes to have some GO terms by the spring of 2023.
- Aegilops tauschii (monocot)
- Camelus bactrianus (Bactrian camel)
- Colias croceus (clouded yellow)
- Echinops telfairi (small Madagascar hedgehog)
- Harmonia axyridis (beetle)
- Lemur catta (Ring-tailed lemur)
- Leopardus geoffroyi (Geoffroy’s cat)
- Macaca fascicularis (crab-eating macaque)
- Maniola jurtina (meadow brown)
- Meles meles (Eurasian badger)
- Melitaea cinxia (Glanville fritillary) (pictured)
NCBI Gene now has descriptive information about genes from the Alliance of Genome Resources for organisms including Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, Homo sapiens, Mus musculus, Rattus norvegicus, and Saccharomyces cerevisiae.
Figure 1. The gene summary section of the Drosophila melanogaster slmb Gene Full Report showing the link to the corresponding record at the Alliance of Genome Resources.
The Summary section of the Gene Full Report page has Links to gene pages at the Alliance of Genome Resources (Figure 1). These are also in the right-hand sidebar of the Links to other resources section. In the case of genes that don’t have a RefSeq summary, we use the textual gene descriptions from the Alliance of Genome resources.
The Drosphila slmb gene record shows the enhancements provided by the Alliance of Genome Resources. The gene_info.gz files on the Gene FTP site also include AllianceGenome references in the dbXrefs column.
We are happy to announce an updated bacterial and archaeal representative genomes collection. The current collection contains a total of 15,507 assemblies selected from 236,000 prokaryotic RefSeq assemblies to represent their respective species. The collection has grown by five percent since August 2021. A total of 685 species are represented for the first time. In addition, 370 species are represented by a better assembly, and 84 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment.
We updated the database on the Microbial Nucleotide BLAST page as well as the basic nucleotide BLAST RefSeq Representative genomes database (fourth in the menu) to reflect these changes. Finally, remember that you can now run BLAST searches against the proteins annotated on representative genomes (second in the menu). Find more information here.
If you’re curious about genome annotation beyond the genes, then read on! We previously blogged about our RefSeq Functional Elements resource, which provides annotation of experimentally validated, non-genic functional elements in human and mouse. Now, to kick off 2022, we’re delighted to announce a new publication in the January issue of Genome Research:
Farrell CM, Goldfarb T, Rangwala SH, Astashyn A, Ermolaeva OD, Hem V, Katz KS, Kodali VK, Ludwig F, Wallin CL, Pruitt KD, Murphy TD. RefSeq Functional Elements as experimentally assayed nongenic reference standards and functional interactions in human and mouse. Genome Res. 2022 Jan;32(1):175-188. doi: 10.1101/gr.275819.121. Epub 2021 Dec 7. PMID: 34876495.
Figure 1. Workflow for production of the RefSeq Functional Elements dataset. Full cylinders represent databases, the half-cylinder represents the indicated data source, and rectangles represent actions. Further details can be found in the publication.
This full release incorporates genomic, transcript, and protein data available as of January 3, 2022, and contains 302,482,881 records, including 220,595,192 proteins, 42,453,222 transcripts, and sequences from 115,929 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings. Continue reading “RefSeq Release 210 is available”
Introducing the NIH Comparative Genomics Resource (CGR)
NCBI is looking forward to seeing you in person at the International Plant and Animal Genome Conference (PAG XXIX), January 8-12, 2022 in San Diego, California. We’re especially excited to introduce our newest endeavor – the NLM initiative known as the NIH Comparative Genomics Resource (CGR)– a platform we are developing to support comparative analyses of sequenced eukaryotic research organisms. Understanding and supporting the needs of researchers is a fundamental element in the development of CGR and is critical to its future success in supporting a large and diverse collection.
Please join NCBI for the following events to learn more about CGR and how you can inform its development: