On April 1, 2022, Science published the first complete sequence of a human genome, known as T2T-CHM13. This notable scientific achievement comes two decades after the first human genome release from the Human Genome Project and offers an in situ look at biologically important regions, such as centromeres, telomeres, and segmental duplications, that were previously unassembled. Read on to learn more about how you can access this assembly and related resources at NCBI, or to access any one of the more than 1000 human genome assemblies now in GenBank. Continue reading “Gapless Telomere to Telomere human genome (T2T-CHM13) now available”
Tag: BioProject
Retrieve genome data by BioProject using the Datasets command-line tool
You can now retrieve genome data using the NCBI Datasets command-line tool and API by simply providing a BioProject accession. You can go directly from a BioProject accession to genome data even when the BioProject accession is the parent of multiple BioProjects (Figure 1).
Figure 1. Command-lines using BioProject accessions with the datasets command-line tool and sample metadata. Top panel: command-line for downloading genome metadata for the Sanger 25 Genomes Project (PRJEB33226). Middle panel: a portion of the metadata in JSON format for the 25 Genomes Project. Bottom panel: command-line for downloading sequence data and annotation metadata for a component BioProject for the king scallop (PRJEB35331). Continue reading “Retrieve genome data by BioProject using the Datasets command-line tool”
Researchers: Now it’s easier to find the data you want in BioProject
We’ve improved BioProject to give you a better way to find all data from a specific project. We think you’ll love the new interface that lets you quickly choose the right BioProject with links to the data you want in other NCBI databases.
The updated BioProject browser makes it easier than ever to filter the data by a variety of attributes so you can quickly pick BioProjects that interest you.

Continue reading “Researchers: Now it’s easier to find the data you want in BioProject”
RefSeq Functional Elements now public
NCBI is pleased to announce the initial data release of RefSeq Functional Elements, a resource that provides RefSeq and Gene records for experimentally validated human and mouse non-genic functional elements. Data can be accessed via Gene, Nucleotide, BLAST, BioProject, Graphical Displays and FTP.
Accessing the Hidden Kingdom: Fungal ITS Reference Sequences
This post is geared toward fungi researchers as well as RefSeq and BLAST users.
Fungi have unique characteristics that can make it difficult to identify and classify species based on morphology. To address these issues, Conrad Schoch, NCBI’s fungi taxonomist, and Barbara Robbertse, NCBI’s fungi RefSeq curator, in collaboration with outside mycology experts, are curating a set of fungal sequences from internal transcribed spacer (ITS) regions of the nuclear ribosomal RNA genes. This set of standard DNA sequences for fungal taxa not only addresses these difficulties in identifying and classifying fungal species by morphology, but is also essential for analyzing environmental (metagenomics) sequencing studies. The curated ITS sequences, described in a recent article in Database (PMC Free Article), all have associated specimen data and, when possible, are taken from sequences from type materials, ensuring correct species identification and tracking of name changes. This article will show you how to access these ITS sequences and search them using the specialized Targeted Loci BLAST service.
The fungal ITS sequences are a RefSeq Targeted Loci BioProject (PRJNA177353). As you may know, a BioProject is a collection of biological data related to a single initiative; in this case, the goal is to collect and curate fungal sequences from targeted loci – specific molecular markers such as protein coding or ribosomal RNA genes used for phylogenetic analysis.
Continue reading “Accessing the Hidden Kingdom: Fungal ITS Reference Sequences”
The Tasmanian Devil 2: The tumor and Tasmanian devil mitochondrial genomes
The Tasmanian devil (Sarcophilus harrisii), the last remaining large marsupial carnivore, now faces extinction because of a strange and deadly infection, a transmissible cancer known as Transmissible Devil Facial Tumor Disease (TDFTD). In a previous NCBI Insights post, we discussed gene expression data from the tumors that established their neural origin and showed the tumors were likely derived from Schwann cells. In this post, we’ll consider some of the genome sequencing projects in the NCBI databases and explore evidence that the tumor originated in a different individual than the affected animal supporting the idea that the tumor cells themselves are infectious agents. Continue reading “The Tasmanian Devil 2: The tumor and Tasmanian devil mitochondrial genomes”