Tag: BioProject

Scrubbing human sequence contamination from Sequence Read Archive (SRA) submissions

Scrubbing human sequence contamination from Sequence Read Archive (SRA) submissions

Do you work with human-derived sequence data? Do you often struggle with the need to determine if your data is free of human sequence and therefore suitable for public distribution? We encourage submitters to screen for and remove contaminating human reads from data files prior to submission to SRA. To support investigators in this effort, we offer a tool to remove human sequence contamination from your SRA submissions!

Human Read Removal Tool (HRRT)

The Human Read Removal Tool (HRRT; also known as the Human Scrubber) is available on GitHub and DockerHub. The HRRT is based on the SRA Taxonomy Analysis Tool (STAT) that will take as input a fastq file and produce as output a fastq.clean file in which all reads identified as potentially of human origin are masked with ‘N’. Continue reading “Scrubbing human sequence contamination from Sequence Read Archive (SRA) submissions”

Gapless Telomere to Telomere human genome (T2T-CHM13) now available

Gapless Telomere to Telomere human genome (T2T-CHM13) now available

On April 1, 2022, Science published the first complete sequence of a human genome, known as T2T-CHM13. This notable scientific achievement comes two decades after the first human genome release from the Human Genome Project and offers an in situ look at biologically important regions, such as centromeres, telomeres, and segmental duplications, that were previously unassembled. Read on to learn more about how you can access this assembly and related resources at NCBI, or to access any one of the more than 1000 human genome assemblies now in GenBank. Continue reading “Gapless Telomere to Telomere human genome (T2T-CHM13) now available”

Researchers: Now it’s easier to find the data you want in BioProject

We’ve improved BioProject to give you a better way to find all data from a specific project. We think you’ll love the new interface that lets you quickly choose the right BioProject with links to the data you want in other NCBI databases.

The updated BioProject browser makes it easier than ever to filter the data by a variety of attributes so you can quickly pick BioProjects that interest you.

fig 1
Figure 1. The BioProject home page showing links to the BioProject browser. To use the new browser, click the ‘Browse by Project Attributes link below the search bar on any BioProject page or the ‘By Project attributes’ link on the BioProject home page.

Continue reading “Researchers: Now it’s easier to find the data you want in BioProject”

RefSeq Functional Elements now public

RefSeq Functional Elements now public

NCBI is pleased to announce the initial data release of RefSeq Functional Elements, a resource that provides RefSeq and Gene records for experimentally validated human and mouse non-genic functional elements. Data can be accessed via GeneNucleotideBLASTBioProjectGraphical Displays and FTP.

Continue reading “RefSeq Functional Elements now public”

Accessing the Hidden Kingdom: Fungal ITS Reference Sequences

This post is geared toward fungi researchers as well as RefSeq and BLAST users.

Fungi have unique characteristics that can make it difficult to identify and classify species based on morphology. To address these issues, Conrad Schoch, NCBI’s fungi taxonomist, and Barbara Robbertse, NCBI’s fungi RefSeq curator, in collaboration with outside mycology experts, are curating a set of fungal sequences from internal transcribed spacer (ITS) regions of the nuclear ribosomal RNA genes. This set of standard DNA sequences for fungal taxa not only addresses these difficulties in identifying and classifying fungal species by morphology, but is also essential for analyzing environmental (metagenomics) sequencing studies. The curated ITS sequences, described in a recent article in Database (PMC Free Article), all have associated specimen data and, when possible, are taken from sequences from type materials, ensuring correct species identification and tracking of name changes. This article will show you how to access these ITS sequences and search them using the specialized Targeted Loci BLAST service.

The fungal ITS sequences are a RefSeq Targeted Loci BioProject (PRJNA177353). As you may know, a BioProject is a collection of biological data related to a single initiative; in this case, the goal is to collect and curate fungal sequences from targeted loci – specific molecular markers such as protein coding or ribosomal RNA genes used for phylogenetic analysis.

Continue reading “Accessing the Hidden Kingdom: Fungal ITS Reference Sequences”

The Tasmanian Devil 2: The tumor and Tasmanian devil mitochondrial genomes

The Tasmanian devil (Sarcophilus harrisii), the last remaining large marsupial carnivore, now faces extinction because of a strange and deadly infection, a transmissible cancer known as Transmissible Devil Facial Tumor Disease (TDFTD).  In a previous NCBI Insights post, we discussed gene expression data from the tumors that established their neural origin and showed the tumors were likely derived from Schwann cells.  In this post, we’ll consider some of the genome sequencing projects in the NCBI databases and explore evidence that the tumor originated in a different individual than the affected animal supporting the idea that the tumor cells themselves are infectious agents. Continue reading “The Tasmanian Devil 2: The tumor and Tasmanian devil mitochondrial genomes”