Tag: CGR

Try out the latest BLAST ClusteredNR database results. Now with in-cluster analyses!

Try out the latest BLAST ClusteredNR database results. Now with in-cluster analyses!

As we previously announced, we are offering a ClusteredNR protein database on the web BLAST service that provides faster searches, greater taxonomic reach, and easier to interpret results than the traditional nr database. We’ve added some new features to the results that make the ClusteredNR even more useful by allowing analyses within each cluster including the ability to:

    • Align the query to the members of the cluster.
    • Display Tree View and MSA View the cluster alignment.
    • Submit the cluster to COBALT to generate a true multiple sequence alignment of the members.
    • Display a BLAST Taxonomy Report to see the taxonomic distribution of the sources of the members.

Figure 1 shows you how access these in-cluster analysis options. The new Cluster Taxonomy report is shown in Figure 2. Try ClusteredNR yourself — follow this link to set up a search!

Continue reading “Try out the latest BLAST ClusteredNR database results. Now with in-cluster analyses!”

Fungal species identification using DNA: an NCBI and USDA-APHIS collaboration with a focus on Colletotrichum

Fungal species identification using DNA: an NCBI and USDA-APHIS collaboration with a focus on Colletotrichum

As reported in the journal Plant Disease,  a recent collaboration between National Library of Medicine’s NCBI and the U.S. Department of Agriculture’s Animal and Plant Health Inspection Service (APHIS) analyzed public sequence records for the fungal genus Colletotrichum, an important group of fungal plant pathogens that are a significant threat  to food production. Colletotrichum species are challenging to identify accurately, and public sequences may contain out of date taxonomic information. The study improved the accuracy of species names assigned to Colletotrichum database sequences, verified a comprehensive set of reliable reference markers for the genus, and produced a multi-marker tree as well as the genome based interactive tree shown in Figure 1.

Figure 1.  Views from genome assembly derived multi-protein distance tree that shows the analysis of publicly available Colletotrichum genomes. The interactive tree is available online. You can browse, search, download, and export the tree. As an example search, you can demonstrate that assembly GCA_002901105.1 was incorrectly labeled as Colletotrichum gloeosporioides.  Searching the tree for the name “Colletotrichum gloeosporioides” highlights two clades.  Clicking the node for the Truncatum species complex and clicking “Show descendants” expands the clade and shows that assembly GCA_002901105.1, which was labelled as gloeosporioides, clusters with the Truncatum species complex. You can find more details on the tree building process in the supplementary material for the publication and on GitHub.

Continue reading “Fungal species identification using DNA: an NCBI and USDA-APHIS collaboration with a focus on Colletotrichum”

Save the Date: NCBI at the Bioinformatics Open Science Conference (BOSC), July 2022

Save the Date: NCBI at the Bioinformatics Open Science Conference (BOSC), July 2022

Come visit NCBI at the Bioinformatics Open Science Conference (BOSC), part of the Intelligent Systems for Molecular Biology Conference (ISMB), July 13-16, taking place both in person in Madison, Wisconsin and virtually! We’ll be presenting talks and posters on the latest updates to the NCBI Datasets, BLAST, and Protein resources. You can also join us at the Birds of a Feather (BoF) discussion and the BOSC CollaborationFest (CoFest) to explore these resources and discuss workflows with NCBI staff. Continue reading “Save the Date: NCBI at the Bioinformatics Open Science Conference (BOSC), July 2022”

NCBI Posters at the Biology of Genomes Meeting

NCBI Posters at the Biology of Genomes Meeting

May 10-14, 2022

We are looking forward to the Biology of Genomes meeting, which will focus on “DNA sequence variation and its role in molecular evolution, population genetics and complex diseases, comparative genomics, large-scale studies of gene and protein expression, and genomic approaches to ecological systems.”

NCBI will present three posters to highlight our Comparative Genomics Resource (CGR) and the Allele Frequency Aggregator (ALFA):

  1. The NIH Comparative Genomics Resource: Amplifying the biology of genomes presented by Valerie Schneider, PhD

On behalf of NIH, NLM is developing the NIH Comparative Genomics Resource (CGR) at NCBI to facilitate organism-spanning data connections and promote new research discoveries. This initiative aims to connect NCBI genomics-associated data types and tools with resources external to NCBI to provide a foundation for reliable comparative analysis for all eukaryotic research organisms. Continue reading “NCBI Posters at the Biology of Genomes Meeting”

New ClusteredNR database: faster searches and more informative BLAST results

New ClusteredNR database: faster searches and more informative BLAST results

Reduced redundancy. Faster searches. More diverse proteins and organisms in your BLAST results. Check out our new ClusteredNR database – derived from the default BLAST protein nr database by clustering sequences at 90% identity / 90% length (details below).  Get quicker results and access to information about the distribution of your hits across a wider range of organisms and evolutionary distances.

Searching ClusteredNR

You can choose the ClusteredNR database in the Choose Search Set section of the BLAST submission form where you normally pick the BLAST database.  Simply select the Experimental databases radio button.  You can also select the checkbox to search both ClusteredNR and the standard nr at the same time allowing you to compare results (Figure 1).

Figure 1. The ‘Choose Search Set’ section of the BLAST submission form. Selecting the Experimental databases radio button chooses ClusteredNR. You can also perform simultaneous searches against the clustered and the standard nr by checking ‘Select to compare standard and experimental database.’ Continue reading “New ClusteredNR database: faster searches and more informative BLAST results”

New Gene Information from the Alliance of Genome Resources

NCBI Gene now has descriptive information about genes from the Alliance of Genome Resources for organisms including Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, Homo sapiens, Mus musculus, Rattus norvegicus, and Saccharomyces cerevisiae.

Figure 1. The gene summary section of the Drosophila melanogaster slmb Gene Full Report showing the link to the corresponding record at the Alliance of Genome Resources.

The Summary section of the Gene Full Report page has Links to gene pages at the Alliance of Genome Resources (Figure 1). These are also in the right-hand sidebar of the Links to other resources section.   In the case of genes that don’t have a RefSeq summary,  we use  the textual gene descriptions from the Alliance of Genome resources.

The Drosphila slmb gene record shows the enhancements provided by the Alliance of Genome Resources.  The gene_info.gz files on the  Gene FTP site also include AllianceGenome references in the dbXrefs column.

Join NCBI at PAG XXIX

Join NCBI at PAG XXIX

Introducing the NIH Comparative Genomics Resource (CGR)

NCBI is looking forward to seeing you in person at the International Plant and Animal Genome Conference (PAG XXIX), January 8-12, 2022 in San Diego, California. We’re especially excited to introduce our newest endeavor – the NLM initiative known as the NIH Comparative Genomics Resource (CGR)– a platform we are developing to support comparative analyses of sequenced eukaryotic research organisms. Understanding and supporting the needs of researchers is a fundamental element in the development of CGR and is critical to its future success in supporting a large and diverse collection.

Please join NCBI for the following events to learn more about CGR and how you can inform its development:

Continue reading “Join NCBI at PAG XXIX”

Save the Date: NCBI at Plant and Animal Genome (PAGXXIX), Jan 2022

Save the Date: NCBI at Plant and Animal Genome (PAGXXIX), Jan 2022

Come see NCBI in person at the International Plant and Animal Genome (PAG) Conference (PAGXXIX), January 9-12 in San Diego, California. Learn about new ways that we are supporting the data management and analysis needs of scientists working across the tree of life. We’re excited to be back after a year of unprecedented circumstances!

As we described in our NLM Director’s featured blog articles, A Journey to Spur Innovation and Discover and Using Comparative Genomics to Advance Scientific Discoveries, NCBI has recently embarked on the NIH-supported NLM initiative known as the NIH Comparative Genomics Resource (CGR). This initiative will modernize resources and infrastructure in order to promote comparative genomic analyses for all eukaryotic organisms. CGR will connect common data elements for genomic-related content with standard structures and mechanisms that will help you uncover previously unrecognized relationships. It will also provide tools that promote the quality of genomic-related data in sequence archives.

When you are at PAG, please check out our NCBI workshops and other sessions, swing by our booth, and visit our posters to learn more about ongoing CGR-related developments and additional NCBI resources related to your genomic research. We especially invite you to join our CGR Listening Session where you can offer valuable input on how NCBI can best provide a resource to support your analyses.

As PAG nears, stay tuned for more details and upcoming announcements from NCBI!

NCBI’s Genome Data viewer now displays both NCBI RefSeq and submitted assemblies

NCBI’s Genome Data Viewer (GDV) now supports visualization and analysis of nearly 400 submitter-annotated chromosome-level assemblies from the INSDC (GenBank/ENA/DDBJ). These submitter-annotated assemblies join more than 1,200 NCBI RefSeq-annotated assemblies available in GDV for hundreds of eukaryotes, spanning fungi, plants, fish, insects, and all major model organisms.

Figure 1 shows a GenBank apple assembly (GCA_004115385) displayed in GDV.

Figure 1. Submitter-annotated Malus domestica (apple) assembly displayed in GDV. GDV provides submitter-provided gene annotation, as well as some additional tracks including interspersed repeats identified by RepeatMasker and six-frame translations (not shown). Red boxes indicate useful tools and panels including a search box, an exon navigator, and interfaces to add user data and conduct NCBI BLAST searches. 

Continue reading “NCBI’s Genome Data viewer now displays both NCBI RefSeq and submitted assemblies”

A new service to evaluate the quality of your assembled genome!

A new service to evaluate the quality of your assembled genome!

Are you wondering about the quality of a human, mouse or rat genome that you have assembled?

We offer a new service for evaluating the completeness, correctness, and base accuracy of your human, mouse or rat genome assembly compared to a reference assembly. You simply provide NCBI with one or more assemblies in FASTA format and we will do an annotation-based evaluation of the genome(s) using the expert-curated, high-confidence RefSeq transcripts for the species.

Continue reading “A new service to evaluate the quality of your assembled genome!”