Bald eagle and other bird genome sequence and annotation data publicly available at NCBI


A series of press releases, including one by Science Publishing, recently announced the first findings of the Avian Phylogenomics Consortium, who analyzed genome sequences and annotation data for 48 bird genomes representing all of the bird taxonomic orders. All of the sequenced genomes, along with any annotation provided by the submitter, are available in NCBI resources including Assembly, Nucleotide, Protein, the Sequence Read Archive (SRA), and BLAST, or from species-specific GenBank genomes FTP directories. RNA-Seq data for some of the bird species can be found in SRA.

With the exception of three very fragmented assemblies, NCBI annotated the genome assemblies submitted by the Avian Phylogenomics Consortium using NCBI’s Eukaryotic Genome Annotation Pipeline, and these annotations are now part of the RefSeq project. The RefSeq project also generated annotations for an additional 6 bird assemblies, for a total of 51 RefSeq genomes. A summary of all the bird genomes that have RefSeq annotation is here.

Figure 1. A selection of the bird genomes with RefSeq annotation. At the top right is a legend describing resource links for each bird genome. Detailed annotation reports, accessible through the "AR" link in the far right column, are available for those genomes annotated in 2014. RefSeq annotation is on organism-specific BLAST pages (the "B" link) and on FTP (the "F" link). Click on the picture to go to the summary table.

Figure 1. A selection of the bird genomes with RefSeq annotation. At the top right is a legend describing resource links for each bird genome. Detailed annotation reports, accessible through the “AR” link in the far right column, are available for those genomes annotated in 2014. RefSeq annotation is on organism-specific BLAST pages (the “B” link) and on FTP (the “F” link). Click on the picture to go to the summary table.

RNA-Seq data was used to generate annotations for 12 of the 51 bird assemblies. The number of protein-coding genes per genome ranges from >13,300 to >21,100 (chicken) with an average of 14,932 protein-coding genes. Orthology to human proteins was also calculated using simple metrics of local synteny and sequence similarity, and on average, roughly 11,000 orthologous proteins were identified per avian genome. These results are shown in the Homology section of NCBI Gene records (see Figure 2 below).

Figure 2. A portion of the NCBI Gene report for the bald eagle ACO2 gene. The graphical display includes information about the gene structure, the RefSeq transcript and protein models, and RNA-Seq coverage graphs produced by the annotation pipeline. The Homology section is highlighted, showing 139 organisms, including the bald eagle, with orthology to the human ACO2 gene.

Figure 2. A portion of the NCBI Gene report for the bald eagle ACO2 gene. The graphical display includes information about the gene structure, the RefSeq transcript and protein models, and RNA-Seq coverage graphs produced by the annotation pipeline. The Homology section is highlighted, showing 139 organisms, including the bald eagle, with orthology to the human ACO2 gene.

Related news stories:

Designing exon-specific primers for the human genome


A common task facing geneticists is to assay for sequence changes at particular locations in genes. These assays are often looking for changes in the coding exon of genes, and the target sequences are typically amplified using PCR from genomic DNA using a pair of specific primers. In this article, we will show you how to use NCBI Reference Sequences and Primer-BLAST, NCBI’s primer designer and specificity checker, to design a pair of primers that will amplify a single exon (exon 15) of the human breast cancer 1 (BRCA1) gene.

Here are the steps to follow to design primers to amplify exon 15 from human BRCA1. Continue reading

Sequence updates in human assembly GRCh38: improving gene annotation


In an earlier blog post, we discussed how sequence updates in GRCh38, the most recent version of the human reference genome, filled in a gap in human chromosome 17 near position 21,300K and expanded the region by 500K (500,000 base pairs). In this post, we will again consider this same region, but with an emphasis now on how GRCh38 also improved the gene annotation.

"Figure

Figure 1. Annotation of a region of chromosome 17 near the KCNJ12 and KCNJ18 genes. Top panel: Annotation release 105 on GRCh37.p13 represented by a configured graphic display of sequence record NC_000017.10. Bottom panel: Annotation release 106 on assembly GRCh38 represented by a configured graphic display of sequence record NC_000017.11. New gene models are circled. 

Figure 1 shows a narrower area that corresponds to components AC068418.5 and AC233702.5 on GRCh38. The graphic display is configured so that it shows annotated gene models without the corresponding transcripts and proteins. The two assemblies share component AC068418.5 along with the five gene models annotated on it.  That the same sequence would have the same annotation over time might seem an obvious outcome, but this is not always the case. Annotations on the same sequence (same assembly) can change from one annotation release to another if new transcript data support a new gene model, and this process of gathering and presenting new evidence for gene models is one of the major purposes of new annotation releases on a given assembly. Continue reading

NCBI’s 3 Newest Medical Genetics Resources: GTR, MedGen & ClinVar


GTR_ClinVar_MedGen imageNCBI has three relatively new online resources for information about genetic tests, genetic conditions, and genetic variations:

  • The Genetic Testing Registry, or GTR – a registry of genetic tests for heritable and somatic changes in humans
  • MedGen – a medical genetics portal that focuses on information about medical conditions with a genetic component
  • ClinVar – an archival database that contains reported assertions about the relationship between genetic variations and phenotypes

This blog will provide a very brief overview of the three resources by outlining some of their content features. For a more thorough introduction to the three resources, including the types of information available in each and how to use them, we recommend viewing this approximately hour-long webinar that we conducted in June 2014.

The GTR, MedGen and ClinVar databases are all integrated, making it simple to navigate between them to find related information. They are also integrated with a number of other databases, such as OMIM, GeneReviews, PubMed, Genetics Home Reference, and others.  This integration provides a rich information space for exploration, but it is nonetheless helpful to know where you might want to start based on the type of information you are seeking. Continue reading

Advice for NIH Grantees: How to comply with the NIH Public Access Policy


“The NIH public access policy requires scientists to submit final peer-reviewed journal manuscripts that arise from NIH funds to PubMed Central immediately upon acceptance for publication.” – http://publicaccess.nih.gov/

To comply with NIH Public Access Policy, here are the steps you should take:

Determine if the Public Access Policy applies to your publication

Generally, the NIH Public Access Policy applies to any peer-reviewed journal article that was accepted for publication on or after April 7, 2008 and that arose from NIH funding in Fiscal Year 2008 or later.

Determine Applicability for Your Publication

What does the NIH consider to be a ‘journal’?

Review your publication agreement

Before you sign a publication agreement or similar copyright transfer agreement, first make sure that the agreement allows the paper to be posted to PubMed Central (PMC) in accordance with the NIH Public Access Policy.

Continue reading

New SciENcv Features Allow Users To Create and Download Multiple Biosketches


NCBI’s recent update to the SciENcv feature in MyNCBI gives researchers the ability to create multiple biosketches for grants from federal agencies engaged in scientific research, allowing a more tailored and convenient approach to the grant application process.

What is SciENcv?

SciENcv (Science Experts Network Curriculum Vitae) is designed to help researchers assemble an NIH biosketch by extracting information from NIH eRA Commons and PubMed. The SciENcv interagency working group includes NIH, as well as DOD, DOE, EPA, NSF, USDA and the Smithsonian. You can access SciENcv if you have a My NCBI account. My NCBI accounts are free and offer many useful features, such as saving searches, automated e-mail alerts and My Bibliography.

 Create your biosketch

Based on user suggestions, we’ve made it possible to create biosketches in three ways: from scratch, from an external source, or by duplicating an existing profile (see Figure 1). While the eRA Commons data feed is currently the only external data option, we plan on adding other external data sources in a future release of SciENcv.

Figure 1. Three ways to create your NIH biosketches in SciENcv

Figure 1. Three ways to create your NIH biosketches in SciENcv

Continue reading

The Second Offering of “A Librarian’s Guide to NCBI” at NIH


NCBI, in collaboration with NLM and the National Network of Libraries of Medicine NLM Training Center (NTC) at the University of Utah, recently presented the second offering of A Librarian’s Guide to NCBI. Health Sciences Librarians from 17 universities and two federal agencies attended the five-day intensive course on the NIH campus. This second offering of the training continues to prepare health science librarians for supporting NCBI molecular databases and tools, and training patrons in the use of NCBI resources at their own institutions.

Participants and instructors from the 2014 “A Librarian’s Guide to NCBI” outside the National Library of Medicine.

Participants and instructors from the 2014 “A Librarian’s Guide to NCBI” outside of the National Library of Medicine.

As before, all the course materials are available online. Feel free to learn from them, adapt them for your own teaching, and share them with others. You can use the links below to access the updated 2014 course materials. These include the slide sets with demonstrations and practice problems.

Continue reading

Sequence updates in human genome assembly GRCh38: filling in the gaps


In a previous blog post, we explained several important concepts about the human reference genome.  We presented a region of human chromosome 17 as an example of a location where the genome sequence was not fully assembled.  In this post, we are going to revisit the same gapped region to see how the Genome Reference Consortium (GRC) changed this part of the genome in GRCh38, the updated human reference assembly released in December 2013.  This region represents just one of the more than 1,000 changes and improvements that the GRC introduced in GRCh38.

Continue reading

The Tasmanian Devil 2: The tumor and Tasmanian devil mitochondrial genomes


The Tasmanian devil (Sarcophilus harrisii), the last remaining large marsupial carnivore, now faces extinction because of a strange and deadly infection, a transmissible cancer known as Transmissible Devil Facial Tumor Disease (TDFTD).  In a previous NCBI Insights post, we discussed gene expression data from the tumors that established their neural origin and showed the tumors were likely derived from Schwann cells.  In this post, we’ll consider some of the genome sequencing projects in the NCBI databases and explore evidence that the tumor originated in a different individual than the affected animal supporting the idea that the tumor cells themselves are infectious agents. Continue reading

NCBI’s Genome Remapping Service assists in the transition to the new human genome reference assembly (GRCh38)


In late December 2013, the Genome Reference Consortium (GRC) released an updated version of the human reference genome assembly, GRCh38, and submitted these new sequences to GenBank. This is the first time in four years that a new major version of the human genome has become available to the genomics community.

Perhaps you’ve been working on data mapped to the previous assembly (GRCh37) that became available in March 2009, or maybe you are still using an even earlier version, such as NCBI36 from March 2006. Is there a way to reduce the amount of time and effort required to reanalyze your data in the context of the new assembly?

Yes! It’s NCBI’s Genome Remapping Service, or NCBI Remap for short.

Continue reading