We are gearing up for the Tools for Sharable Protein Analysis Codeathon, which will take place Sunday July 10th-Thursday July 14th at the International Society for Computational Biology ISMB 2022 Conference. This event is the third installation of the ISMB codeathon. In previous years, teams led a series of developments on in-depth and systematic analysis of molecular interactions, effects of mutations, protein flexibility, annotations of topological domains, and more! This year, projects will span the following themes:
ANALYSIS OF LARGE DATASETS
ANALYSIS OF BIOMOLECULAR INTERACTIONS AND MUTATIONS
BIOMOLECULAR REPRESENTATIONS AND VISUALIZATION
USER INTERFACES AND DATA SHARING MECHANISMS
The event will focus in part on our interactive, web-based 3D structure viewer, iCn3D. NCBI experts Jiyao Wang, Ph.D., Tom Madej, Ph.D., and Alexa Salsbury, Ph.D. will be on hand to help teams throughout the event.
This hybrid codeathon will support teams in person and virtually. We encourage researchers working in all areas of computational biology to join us for the event! To apply for the event, please fill out this form before July 9th!
As reported in the journal Plant Disease, a recent collaboration between National Library of Medicine’s NCBI and the U.S. Department of Agriculture’s Animal and Plant Health Inspection Service (APHIS) analyzed public sequence records for the fungal genus Colletotrichum, an important group of fungal plant pathogens that are a significant threat to food production. Colletotrichum species are challenging to identify accurately, and public sequences may contain out of date taxonomic information. The study improved the accuracy of species names assigned to Colletotrichum database sequences, verified a comprehensive set of reliable reference markers for the genus, and produced a multi-marker tree as well as the genome based interactive tree shown in Figure 1.
Figure 1. Views from genome assembly derived multi-protein distance tree that shows the analysis of publicly available Colletotrichum genomes. The interactive tree is available online. You can browse, search, download, and export the tree. As an example search, you can demonstrate that assembly GCA_002901105.1 was incorrectly labeled as Colletotrichum gloeosporioides. Searching the tree for the name “Colletotrichum gloeosporioides” highlights two clades. Clicking the node for the Truncatum species complex and clicking “Show descendants” expands the clade and shows that assembly GCA_002901105.1, which was labelled as gloeosporioides, clusters with the Truncatum species complex. You can find more details on the tree building process in the supplementary material for the publication and on GitHub.
NCBI had the pleasure of attending and participating in this year’s American Society of Microbiology (ASM) Microbe conference, June 9-13 in Washington, D.C. NCBI staff participated in activities and events throughout the three-day conference. Over 4,500 attendees gathered in the exhibit hall and joined a variety of poster presentations and talks!
Reflections from a few of our NCBI experts
“It was a great honor for me to receive the ASM Elizabeth O. King Lecturer Award. Thank you to my colleagues, without whom so much of my work would not have been possible, and to all of those who attended my presentation on Making Genomics Accessible to Aid Public Health and Research.”
As we previously announced, we will be moving to an updated version of the E-utilities API for PubMed. In preparation for this launch, a test server is currently available to allow you to test your API calls on the new service and report issues. Thank you for trying out the test server and continuing to submit your feedback!
An updated bacterial and archaeal representative genomes collection is available! A total of 16,105 assemblies among the 249,000 prokaryotic assemblies in RefSeq were selected to represent their respective species. The collection has grown by 3.7% since January 2022. A total of 706 species are represented for the first time. In addition, 186 species are represented by a better assembly, and 124 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment.
Join us on June 15 , 2022 at 12PM US eastern time learn about the NCBI Virus resource – a community portal for viral sequence data that has been important in supporting SARS-CoV-2 research and management of the COVID-19 pandemic. Enhancements to NCBI Virus that support these efforts include: SARS-CoV-2 specific filters, a dedicated web interface that reports on geotemporal prevalence of sequence records for SARS2 lineages, plus details on NCBI’s lineage-defining mutations.
Date and time: Wed, June 15, 2022 12:00 PM – 12:45 PM EDT
The American Society of Microbiology (ASM) Microbe conference is back, and scheduled to take place in-person, June 9th-13th in Washington, D.C.
NCBI staff member Dr. Michael Feldgarden will be recognized by ASM with an award for his research. Other NCBI staff will present posters on NCBI resources and will also be available at our booth (#1128) to address your questions. Drop by to see what’s new and provide your feedback. We hope to see you there! Check out NCBI’s schedule of activities: Continue reading “Come see NCBI at the ASM Microbe Conference 2022”→
Validating genome assemblies submitted to GenBank using ANI based workflow
Average Nucleotide Identity (ANI) analysis is a useful tool to verify taxonomic identities in prokaryotic genomes. As part of the NCBI bacterial genome submission process, GenBank performs ANI analyses to compare submitted prokaryotic genome assemblies against reference data generated from type strains. You can learn about more about the relevant workflow and about type strain curation in our publications (PMC6978984 and PMC4383940).
We use genomes obtained from type strains (type assemblies) in computational comparisons, for example using ANI to reclassify or modify existing taxonomy with reasonable confidence. The taxonomy check status for all 1.3 million bacterial genome assemblies is summarized in the ANI_report_prokaryotes.txt file available from the ASSEMBLY_REPORTS FTP directory. The README file describes the contents of the report in detail. You can run ANI on your genome on its own or in the context of annotation. Find more information here. Continue reading “Average Nucleotide Identity (ANI) for assembly validation”→