In a previous blog post, we explained several important concepts about the human reference genome. We presented a region of human chromosome 17 as an example of a location where the genome sequence was not fully assembled. In this post, we are going to revisit the same gapped region to see how the Genome Reference Consortium (GRC) changed this part of the genome in GRCh38, the updated human reference assembly released in December 2013. This region represents just one of the more than 1,000 changes and improvements that the GRC introduced in GRCh38.
First, we’ll examine corresponding regions of chromosome 17 in the previous (GRCh37.p13) and current (GRCh38) reference assemblies (Figure 1). The representation of the region spanning from 21,200K to 21,700K in GRCh37.p13 contained a gap of unknown size (the blank area with no components in the figure), which was arbitrarily set to 100K. The ‘K’ indicates a kilo base pair, or 1,000 bp.
Figure 1: Updates to the reference human genome assembly in a region of chromosome 17. Top panel: A region of chromosome 17 (NC_000017.10) from the GRCh37.p13 assembly showing the components. Bottom panel: The corresponding region of chromosome 17 (NC_000017.11) in the GRCh38 assembly showing the new components. Components shared between the two builds are marked with checks. Components not present in GRCh38 are marked with an “X”. The labeled components AC233702.5 and ABBA01006765.1 are two of the 11 new components in GRCh38.
GRCh38 contains new information in this region, including 11 new components that expanded the area by about 500K. This expansion resulted in a change in coordinates for all downstream genes and features on chromosome 17. Moreover, three of the components in GRCh37.p13 within this region are no longer in GRCh38, and therefore any genes or features annotated on those components have moved somewhere else in the new build or have been deleted.
Major changes such as these require researchers to take a moment to assess the consequences of the updates on any genes of interest in the many regions that have been updated in GRCh38. To make this easier, NCBI offers a Genome Remapping Service that converts mapping data from one build to another, as described in a previous post. You can use this tool to confirm that the region on chromosome 17 between 21,200 K and 21,700 K in GRCh37.p13 roughly corresponds to the region between 21,300 K and 22,200K on chromosome 17 in GRCh38.
In a future post, we will take a closer look at this region of chromosome 17 to see how the gene annotations changed as a result of the new components.
The Tasmanian devil (Sarcophilus harrisii), the last remaining large marsupial carnivore, now faces extinction because of a strange and deadly infection, a transmissible cancer known as Transmissible Devil Facial Tumor Disease (TDFTD). In a previous NCBI Insights post, we discussed gene expression data from the tumors that established their neural origin and showed the tumors were likely derived from Schwann cells. In this post, we’ll consider some of the genome sequencing projects in the NCBI databases and explore evidence that the tumor originated in a different individual than the affected animal supporting the idea that the tumor cells themselves are infectious agents. Continue reading
The NCBI in partnership with the National Library of Medicine Training Center (NTC) will offer the Librarian’s Guide to NCBI course on the NIH campus in April 2014. This will be the second presentation of the course; it was previously offered in the spring of 2013 (NCBI Insights April 11 and May 6, 2013). After the course, we will post lecture slides and hands-on practical exercises on the education area of the NCBI FTP site and video tutorials of the course lectures will be available on the NCBI YouTube channel. Materials from the 2013 course are available, as well as lecture videos for the expression module.
This month marks a major event in the realm of human genome research: the release of a new assembly of the genome, GRCh38. It has been over four years since the last major release (GRCh37 in March 2009), and we are going to explore several aspects of this new assembly in a series of blog posts over the coming weeks. In this initial post, we will give an overview of the data flow so that you will understand how NCBI received the data, where the data are at NCBI and what genome annotations you can expect from NCBI in the near future.
An easy way to speed up your BLAST analysis is to search a smaller database targeted to sequences of interest. We’ll describe here a few ways to create such custom databases on the BLAST web pages. For this Quick Tip we’ll use the pages in the Basic BLAST section of the BLAST home page.
BLAST parent databases
Generating a custom database begins with selecting the appropriate parent database. The BLAST Guide provides database descriptions to help with choosing a database. You select the parent in the Database pull-down menu, shown in Figure 1. Selecting the database is really your first opportunity to customize.
Figure 1. The database selection pull-down lists: top panel, nucleotide databases; bottom panel, protein databases
Do you regularly perform PubMed searches to find new articles on your topic of interest?
Would you like to know when new sequence records become available for your gene?
Is it important to be alerted when new bioactivity assays are available with inhibitor data for your enzyme?
With a free My NCBI account, you can easily set up a series of e-mail alerts to notify you of such new information. You can read more about the many other functions of My NCBI.
Here’s how to set up these alerts:
November 2013 marks 25 years since the founding of the National Center for Biotechnology Information (NCBI).
In honor of NCBI’s 25th anniversary, United States Senator Ben Cardin read a statement into the Congressional Record recognizing years of service in providing access to biomedical and genomic information to enhance the world’s science and health.
On November 1st an awards and recognition program was held on the NIH Campus in Bethesda, Maryland to commemorate this occasion.
Tony Hey, Ph.D., Vice President of Microsoft Research, presenting the Jim Gray eScience award to David Lipman, M.D., Director of the NCBI.
At this event, Tony Hey, PhD, Vice President of Microsoft Research, presented NCBI Director David Lipman, MD, with the Jim Gray eScience Award which recognizes researchers who have made outstanding contributions to the field of data-intensive computing in the pursuit of open, supportive, and collaborative research models. Continue reading
It’s been an exciting and productive time since the PubMed Commons beta launch. We’ve learned a great deal, both here working under the hood and from the conversations in social media and blog posts.
We are working on answers to questions that people are asking, via our Twitter account and by revising and expanding information on the PubMed Commons page soon. And we will try out a Twitter chat: so keep your eye out on @PubMedCommons for the announcement.
There are now about 1,000 people signed up in the Commons. Remember, any author in PubMed can join, from anywhere in the world. Check out our step-by-step guide. Once you are in, you can invite others. So please spread the word!
In our previous post we wrote about a new service called PubMed Commons that allows researchers to add comments to individual PubMed records. As we described in that post, PubMed Commons is currently a beta pilot release, and requires interested people to join the system before they can view or add comments. This post will describe how to join PubMed Commons.