Next week, NCBI staff will attend AGBT in Marco Island, Florida. On Tuesday, February 25, 2020, three posters from NCBI staff will be on display from 4:40 p.m. – 6:10 p.m. during the Poster Session and Wine Reception in the Banyan and Calusa Ballroom Foyers, Levels 1 and 3. Read on to learn a little bit about what we’ll be presenting.
The latest dbVar data release includes the Genome in a Bottle benchmark structural variant (SV) callset (pre-print Zook et al. 2019) – a highly scrutinized, carefully curated set of 12,745 sequence-resolved deletions, insertions, and delins variants from Personal Genome Project Ashkenazi trio son HG002. The data serve as a robust benchmark standard with which to measure the performance of sequencing and variant-calling pipelines. It “reliably identifies both false negatives and false positives in high-quality SV callsets” (pre-print Zook et al. 2019) that are based on short-, linked-, and long-read sequencing as well as optical mapping.
To ensure that taxonomic information on genome assemblies is as accurate as possible, NCBI will use average nucleotide identity (ANI) analysis to correct existing public records in GenBank.
We will contact submitters of records found to be misidentified and provide reports with ANI information based on comparison to type strains. If there is no objection, the taxonomic change will be made, and a structured comment will be added to the record.
In cases where a genome assembly was not submitted with a binomial name (ex: Bacillus sp. 123) but was found to match a known species with high confidence, the strain will be merged with the binomial in the taxonomy database. This will occur as part of the normal maintenance of merged taxonomic names. The submitter will not be contacted, but the structured comment indicating the change will be added to the record.
A paper in the International Journal of Systematic and Evolutionary Microbiology presents the method NCBI scientists used to review all prokaryotic genome assemblies in GenBank, as well as the current status of GenBank verifications and recent developments in confirming species assignments in new genome submissions.
MutaGene is a new, freely available resource for understanding the mutagenic factors contributing to tumor development.
Cancer arises from multiple changes in the DNA that can be caused by various extrinsic factors, such as sunlight and tobacco smoking, and intrinsic factors, such as the body’s own defense mechanisms fighting against viral infection or faulty DNA copying and repair molecular machinery. Knowing what factors contribute to the accumulation of mutations in a given cancer patient can be crucial for prognosis and identifying correct treatment.