Category: Science Features

ClinVar Celebrates 1 Million Submissions

ClinVar is proud to announce the submission of the one millionth record to its database.

The millionth submission was published on Friday, December 20, 2019, a milestone achievement for providing open access to human variant data with asserted consequence to the clinical genetics and research communities.

ClinVar extends its thanks to the many laboratories, partners, and members of the community whose efforts and adoption of the practice of data-sharing paved the way for this achievement. All organizations that contributed to ClinVar’s genetics resources share in this accomplishment, with special recognition reserved for ClinGen and several of their members, including EGL Genetic Diagnostics/Eurofins Clinical Diagnostics, GeneDx, Invitae, and Laboratory for Molecular Medicine/Partners HealthCare Personalized Medicine, whose early submissions helped jump-start ClinVar’s database.

Continue reading “ClinVar Celebrates 1 Million Submissions” →

Track pathogenic organisms promptly with the National Database of Antibiotic Resistant Organisms

In response to the rising threat of antimicrobial resistance (AMR), NCBI built the National Database of Antibiotic Resistant Organisms (NDARO). With NDARO, you can:

Browse a curated database of AMR genes
Identify AMR genes in bacterial genomes with AMRFinder
Identify bacterial genomes with AMR genes in the Isolate Browser
Submit sequence and phenotype data related to AMR

FIG 1 — **Figure 1.** Filter your Isolates Browser results based on date, location, and for antibiotic resistance (whether the isolate has any AMR genes, or any Antimicrobial Susceptibility Testing (AST) phenotype submitted).

Continue reading “Track pathogenic organisms promptly with the National Database of Antibiotic Resistant Organisms” →

500 organisms annotated with the Eukaryotic Genome Annotation Pipeline

This month, the NCBI Eukaryotic Genome Annotation Pipeline annotated its 500th organism! The lucky winner is Pocillopora damicornis, a stony reef-building coral frequently used as an experimental model, whose larval dispersal and development are affected by environmental changes in the oceans.

Continue reading “500 organisms annotated with the Eukaryotic Genome Annotation Pipeline” →

Magic-BLAST (v1.4.0), an accurate DNA and RNA-Seq aligner

What is Magic-BLAST and why are we excited about it?

Magic-BLAST is a BLAST tool, but it’s unlike any other.

It aligns next generation sequencing reads, both DNA and RNA-seq. It implements the aligner algorithm from MAGIC [1], a trusted pipeline, but uses the well tested and supported BLAST infrastructure. We think it’s like putting two great things together, like having your favorite ice cream in your morning coffee.

We’re so excited about it that we even wrote an article that compares Magic-BLAST to a few other aligners on several data sets.

If you look at the figures in our article, we think you’ll see that Magic-BLAST excels at finding introns and processing ultra-long sequences. It also can handle high levels of mismatches as well compositionally biased DNA. Finally, you’ll see that Magic-BLAST works in a lot of relevant situations in which current aligners won’t. If our results got your attention, here is our documentation, which includes a cookbook with a few examples.

Continue reading “Magic-BLAST (v1.4.0), an accurate DNA and RNA-Seq aligner” →

New International Protein Naming Guidelines promote clarity and consistency

Consistent protein nomenclature is indispensable for communication, literature searching and entry retrieval. NCBI, the European Bioinformatics Institute (EMBL-EBI), the Protein Information Resource (PIR) and the Swiss Institute for Bioinformatics (SIB) revised and reorganized previous guidelines from UniProt and NCBI. This joint effort produced universal guidelines in nomenclature and protein naming to promote clarity in communication and improve consistency in data retrieval across databases.

These guidelines are exclusively focused on nomenclature, providing rules about universal formatting and protein naming choices; they do not include best practices for identifying or predicting function. They cover usage of language, abbreviations, symbols, punctuation, notation, terms and style. Sources of protein names and options for protein naming are also discussed.

During the 2018 INSDC annual meeting, the three collaborating sequence databases (DDBJ, EBI and GenBank) agreed to recommend these guidelines to their submitters. The Protein Naming Guidelines working group plans to write a peer-reviewed publication about protein naming and to track future changes to this document in GitHub.

NCBI scientists verify taxonomic identities in prokaryotic genomes

As of March 2018, there were 141,000 prokaryotic genomes in the Assembly database. As this database grows, misassigned prokaryotic genomes becomes a serious problem. Taxonomy misassignment can occur through simple submission error or can accumulate as new information adds greater specification to the taxonomic tree.

A paper in the International Journal of Systematic and Evolutionary Microbiology presents the method NCBI scientists used to verify taxonomic identities in prokaryotic genomes. The authors used an Average Nucleotide Identity method with optimum threshold ranges for prokaryotic taxa to review all prokaryotic genome assemblies in GenBank. This method relies on Type strain information and is one outcome of a 2015 workshop involving several important parties in the bacteriology community.

The NCBI Bookshelf offers resources related to opioid crisis

By now, the opioid epidemic is a familiar topic to many Americans. According to the National Institute on Drug Abuse (NIDA), “every day, more than 115 American die after overdosing on opioids.” The National Institutes of Health (NIH) is committed to the fight against opioid misuse and addiction. In a May 2017 address, NIH Director Dr. Francis Collins and NIDA Director Dr. Nora Volkow outlined research priorities for ending the opioid crisis, such as finding new ways to treat opioid addiction and improving overdose prevention and reversal. The NCBI Bookshelf, an archive of books and documents in life science and healthcare, offers a variety of resources related to enacting such solutions.

Continue reading “The NCBI Bookshelf offers resources related to opioid crisis “ →

Bioinformatics paper uses NCBI open data to analyze drug response

A study (PMID: 28158543) published in the July 2017 issue of Bioinformatics collects, classifies and analyzes single nucleotide variants (SNVs) that may affect response to currently approved drugs. They identified 2,640 SNVs of interest, most of which occur rarely in populations (minor allele frequency <0.01).

The researchers used protein sequence alignment tools and mined open data from multiple information resources accessed through E-utilities including PubChem Compound (Kim et al., 2016 PMID: 26400175), NCBI Gene (Maglott D, et al., 2014. PMID: 25355515), NCBI Protein (Sayers, 2013), MMDB (Madej et al., 2012 PMID: 22135289), PDB (Berman et al., 2000 PMID: 10592235), dbSNP (Sherry et al., 2001 PMID: 11125122), and ClinVar (Landrum et al., 2016 PMID: 26582918).

Questions, comments, and other feedback may be sent to Yanli Wang.

Cake, Poetry and Success Stories: NCBI Celebrates 10 Years of dbGaP

For the past decade, dbGaP, the database of Genotypes and Phenotypes, has been the worldwide resource for genome-wide data. To celebrate this milestone, NCBI threw a party!

Continue reading “Cake, Poetry and Success Stories: NCBI Celebrates 10 Years of dbGaP” →

NLM In Focus blog profiles Dr. Kim Pruitt, NCBI Staff Scientist

The inaugural article in NLM In Focus’s new series on NLM scientists features Kim Pruitt, PhD. Dr. Pruitt is a staff scientist at NCBI; she heads the Reference Sequence Database, better known as RefSeq.

In the article, Dr. Pruitt shares her career trajectory as well as pearls of wisdom for young scientists.

Click on the picture to read NLM’s profile on Kim Pruitt, PhD.