Using the NIH Comparative Genomics Resource (CGR) to gain knowledge about less-researched organisms
The scientific community relies heavily on model organism research to gain knowledge and make discoveries. However, focusing solely on these species misses valuable variation. Comparative genomics allows us to use knowledge from a model species, such as Saccharomyces cerevisiae, to understand traits in other, related organisms, such as Saccharomyces pastorianus or Saccharomyces eubayanus. Applying this information may provide valuable insight for other less-researched organisms. The National Institutes of Health (NIH) Comparative Genomics Resource (CGR) offers a cutting-edge NCBI toolkit of high-quality genomics data and tools to help you do just that. Continue reading “Comparing Yeast Species Used in Beer Brewing and Bread Making”
November 1-5 in Washington, D.C.
We look forward to seeing you in person at the American Society for Human Genetics Annual Meeting (ASHG 2023), November 1-5, 2023, in Washington, D.C. We will participate in a variety of activities and events including hosting an exhibit booth where you can stop by to meet NCBI experts, ask questions, provide feedback, or just chat! We’re especially excited to share our recent efforts on our clinical and human genetic resources and provide an update on the NIH Comparative Genomics Resource (CGR).
Check out NCBI’s schedule of activities and events:
Continue reading “Join NCBI at ASHG 2023”
Now available! You can download the ClusteredNR protein database, previously only available on the BLAST web application. As recently introduced, our ClusteredNR database allows you to get quicker BLAST results and access to information about the distribution of your hits across a wider range of organisms and evolutionary distances. The package includes the ClusteredNR BLAST database, an SQLite3 database, and several scripts for accessing cluster information and members.
Features & Benefits
- Reduced redundancy
- Faster searches
- More diverse proteins and organisms in your BLAST results
Continue reading “BLAST ClusteredNR Database is Now Available for Download!”
Variant Call Format (VCF) files provide a crucial way to record and share information about genetic variants across samples. NCBI joined forces with the National Institute of Allergy and Infectious Diseases (NIAID) to co-host the VCF Files for Population Genomics Codeathon (July 31 – August 4). The codeathon focused on innovative methods for harnessing VCF files to analyze large datasets using the COVID-19 Genome Sequence Dataset, sourced from the National Library of Medicine (NLM) and NCBI’s SARS-CoV-2 Variant Calling Pipeline. This virtual event was a booming success and brought together experts in viral evolution, molecular epidemiology, and population genomics.
We received outstanding participation and engagement!
- 62 participants from academia, government, and industries across the world
- 8 teams collaborated and worked on the projects listed below
- 5,000+ views of final presentations
- 100+ strong applicants
- 21 different countries represented
Continue reading “Successful NCBI-NIAID Codeathon Explored VCF Files in Population Genomics”
Do you currently use or submit clinical variation data? NCBI now has a new mechanism to improve ClinVar data quality. Since ClinVar’s founding over 10 years ago, the amount of information in this free resource has expanded dramatically with submissions from research and clinical laboratories all over the world. Because of the large volume of data and the importance of data quality, we are working with ClinGen biocurators to address problematic records for variants that do not require the efforts of an expert panel.
ClinVar and ClinGen have established a new process for ClinGen biocurators to review submitted records in ClinVar. A problematic record will be curated by ClinGen as a candidate to be flagged in ClinVar. We will notify relevant submitters giving them an opportunity to review and update their data. If the submitter does not provide an update, the problematic record will be flagged in ClinVar, so that it does not contribute to the overall classification. The record, however, will remain accessible in the database (Figure 1). This will reduce the number of variants with a conflict in the classification and improve the accuracy of the ClinVar dataset. Continue reading “ClinVar Partners with ClinGen to Review & Curate Submitted Records”
Recognizing Fungal Disease Awareness Week
Fungal pathogens are a growing threat to global public health. To promote awareness of this issue, the Centers for Disease Control and Prevention (CDC) has established September 18 -22 as Fungal Disease Awareness Week.
In honor of this week, we’re highlighting whole genome alignments for fungal pathogens that are now available in the Comparative Genome Viewer (CGV) – NCBI’s latest genome visualization tool. Alignment displays in CGV help you identify rearrangements and differences in genomic structure such as deletions, inversions, and translocations. These differences can be important for understanding genome plasticity, genetic diversity within species (PMC8640552) and the response to environmental stresses such as exposure to anti-fungal drugs (PMC5555451). Continue reading “New Fungal Alignments Available in the Comparative Genome Viewer (CGV)”
NCBI is excited to introduce Pebblescout, a pilot web service that allows you to search for sequence matches in very large nucleotide databases, such as runs in the NIH Sequence Read Archive (SRA) and assemblies for whole genome shotgun sequencing projects in Genbank – faster and more efficiently!
Pebblescout uses short segments of your query sequences to identify database records with matches. Matches are based on the frequency of a segment’s occurrence in a database. Result produced for each query is a ranked list of matching records where the ranking utilizes informativeness of matching segments. Continue reading “Introducing Pebblescout: Index and Search Petabyte-Scale Sequence Resources Faster than Ever”
Do you rely on ClinVar XML files for your application or analytical pipeline? We are significantly updating ClinVar’s XML format to support the inclusion of new somatic variation data provided by submitters. In the coming months, you may need to make changes to your tool or pipeline code to continue to use the ClinVar XML.
What will change?
Continue reading “Coming Soon to ClinVar! Somatic Variants & Changes to the XML”
RefSeq release 220 is now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.
What’s included in this release?
As of September 5, 2023, this full release incorporates genomic, transcript, and protein data containing:
- 391,350,361 records
- 289,333,423 proteins
- 56,423,426 RNAs
- sequences from 141,099 organisms
Continue reading “RefSeq Release 220”
As of March 2024, NCBI’s Genome Workbench, a desktop software suite of tools for visualizing and analyzing molecular sequence data, will no longer be available for download. Due to low usage of this product, we are focusing our effort on newer and more popular resources and tools.
If you have an existing version of Genome Workbench, you can continue to use it, but we will no longer provide customer support, software updates, or tutorial documentation. We will make no additional public releases or updates after March 2024. Continue reading “NCBI’s Genome Workbench to Retire March 2024”