Magic-BLAST (v1.4.0), an accurate DNA and RNA-Seq aligner

What is Magic-BLAST and why are we excited about it?

Magic-BLAST is a BLAST tool, but it’s unlike any other.

It aligns next generation sequencing reads, both DNA and RNA-seq.  It implements the aligner algorithm from MAGIC [1], a trusted pipeline, but uses the well tested and supported BLAST infrastructure. We think it’s like putting two great things together, like having your favorite ice cream in your morning coffee.

We’re so excited about it that we even wrote an article that compares Magic-BLAST to a few other aligners on several data sets.

If you look at the figures in our article, we think you’ll see that Magic-BLAST excels at finding introns and processing ultra-long sequences. It also can handle high levels of mismatches as well compositionally biased DNA.  Finally, you’ll see that Magic-BLAST works in a lot of relevant situations in which current aligners won’t. If our results got your attention, here is our documentation, which includes a cookbook with a few examples.

Continue reading “Magic-BLAST (v1.4.0), an accurate DNA and RNA-Seq aligner”

NCBI to retire Clone DB web interface

NCBI to retire Clone DB web interface

Starting in April 2019, the sequence content of Clone DB will be frozen, and its web interfaces will no longer be available. NCBI will continue to produce and make genomic clone placements available as annotations in NCBI’s Genome Data Viewer (GDV) using the sequence data currently in Clone DB. These placements and their corresponding underlying (now static) library and sequence data will also be accessible on the Clone DB FTP site.

The collection of Clone DB records for cell-based (integrated) gene targeting and gene trap libraries will also be retired in January. These data were provided to Clone DB by MGI.  Clone DB users should refer to MGI for their continuing research needs.

Please contact us with any comments, concerns, or if you need help with the use of Clone DB data.

Clone DB was originally implemented as the Clone Registry during the human and mouse genome projects. In subsequent years, it expanded to represent clone-associated data for a broad range of organisms. Clone DB has been a valuable resource connecting users with information and reagents for genomic and cell-based clones. However, with the advent of short read sequencing, fewer and fewer genomic clone end and insert sequences are submitted to NCBI every year, and the usage of and need for Clone DB has dropped significantly.

Tenure-Track Investigator Recruitment in Data Science, Biomedical Informatics, and Computational Biology

Posted Date: August 13, 2018; Closing Date: Until position is filled.

The National Library of Medicine (NLM) of the National Institutes of Health (NIH) is seeking 2-3 tenure-track investigators to lead world-class research programs within its Intramural Research Program. The goal of this search is to identify candidates with the potential to develop a dynamic, innovative, and independent computational research program that will enhance NLM’s collaborative environment by performing novel, cutting-edge research and, thereby, advance the objectives of NLM’s new Strategic Plan, 2017-2027 to accelerate data-driven discovery. As part of this plan, and in alignment with NIH’s new Strategic Plan for Data Science, NLM is embarking on a major expansion of its Intramural Research Program, with a specific emphasis on data science methodologies, analytics, visualization, computer-assisted curation, and applications of novel methods for basic biological and biomedical discovery.

Continue reading “Tenure-Track Investigator Recruitment in Data Science, Biomedical Informatics, and Computational Biology”

Tenure-Eligible Senior Investigator Recruitment in Data Science, Biomedical Informatics, and Computational Biology

Posted Date: August 13, 2018; Closing Date: Until position is filled.

The National Library of Medicine (NLM) of the National Institutes of Health (NIH) is seeking to recruit a tenure-eligible Senior Investigator to lead a world-class research program within its Intramural Research Program. The goal of this search is to identify candidates to develop a dynamic, innovative, and independent computational research program that will enhance NLM’s collaborative environment by performing novel, cutting-edge research and, thereby, advance the objectives of NLM’s new Strategic Plan, 2017-2027 to accelerate data-driven discovery. As part of this plan, and in alignment with NIH’s new Strategic Plan for Data Science, NLM is embarking on a major expansion of its Intramural Research Program, with a specific emphasis on data science methodologies, analytics, visualization, computer-assisted curation, and applications of novel methods for basic biological and biomedical discovery.

Continue reading “Tenure-Eligible Senior Investigator Recruitment in Data Science, Biomedical Informatics, and Computational Biology”

Release Plan for E-utility API Keys

As promised in our post this past spring, we are now announcing the scheduled release of API keys for the E-utilities API. If you’ve missed some of our original discussion of these keys, or have questions about how to get a key, you may want to check out this post.

In this post, we’ll be discussing three things:

  • The current status of API keys
  • Upcoming testing periods in September
  • Final public release on December 1, 2018.

Continue reading “Release Plan for E-utility API Keys”

Join NCBI-style hackathons at NIH, Cold Spring Harbor Labs, and UT Southwestern Medical Center

Join NCBI-style hackathons at NIH, Cold Spring Harbor Labs, and UT Southwestern Medical Center

Applications are open for three NCBI-style hackathons:

  • 12th NIH Research Festival Collaborative Data Science and Machine Learning Hackathon (September 10-12)
  • Post-Biological Data Science meeting hackathon at Cold Spring Harbor labs (November 10-12)
  • U-HACK MED, the pre-SuperComputing hackathon at UTSW (November 9-10)

The application period for each hackathon ends this month, August 2018. See our Biohackathons GitHub page for details on each hackathon, including how to apply.

Aug 8 NCBI Minute: Hey Professors! Get your free personal assistant – an NCBI Account!

Aug 8 NCBI Minute: Hey Professors! Get your free personal assistant – an NCBI Account!

Next Wednesday, Aug 8, 2018, NCBI staff will show you how to use an NCBI account to help with research and teaching tasks including:

  • Making custom collections of important records for use in coursework and research projects
  • Creating lists of publications or database records to send to your courses, journal clubs and research teams
  • Setting automated updates when new publications or database records are available
  • Maintaining your bibliography and sharing it on your Faculty Profile
  • Formatting your U.S. Gov’t BioSketch with a click of a mouse
  • And keeping track of everything – right on your My NCBI dashboard!

Date and time: Wed, Aug 8, 2018 12:00 PM – 12:30 PM EDT

Register now!

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

New International Protein Naming Guidelines promote clarity and consistency

Consistent protein nomenclature is indispensable for communication, literature searching and entry retrieval. NCBI, the European Bioinformatics Institute (EMBL-EBI), the Protein Information Resource (PIR) and the Swiss Institute for Bioinformatics (SIB) revised and reorganized previous guidelines from UniProt and NCBI. This joint effort produced universal guidelines in nomenclature and protein naming to promote clarity in communication and improve consistency in data retrieval across databases.

These guidelines are exclusively focused on nomenclature, providing rules about universal formatting and protein naming choices; they do not include best practices for identifying or predicting function. They cover usage of language, abbreviations, symbols, punctuation, notation, terms and style. Sources of protein names and options for protein naming are also discussed.

During the 2018 INSDC annual meeting, the three collaborating sequence databases (DDBJ, EBI and GenBank) agreed to recommend these guidelines to their submitters. The Protein Naming Guidelines working group plans to write a peer-reviewed publication about protein naming and to track future changes to this document in GitHub.

PubMed Health to be discontinued October 31, 2018; content will continue to be available at NLM

PubMed Health to be discontinued October 31, 2018; content will continue to be available at NLM

Update #2: As announced July 31, 2018, the PubMed Health website has been shut down as of October 31, 2018.

NLM thanks you for using PubMed Health over the years.


Update #1: As reported previously, the PubMed Health website will shut down on October 31, 2018. This decision was made so the National Library of Medicine (NLM) can consolidate its consumer health and comparative effectiveness resources to make them easier to find.


In an effort to consolidate similar resources and make information easier to find, the National Library of Medicine will be retiring its PubMed Health website, effective October 31, 2018, and providing the same or similar content through more widely used NLM resources, namely PubMed, MedlinePlus, and Bookshelf.

PubMed Health content falls into two general categories: consumer health resources and systematic reviews/comparative effectiveness research (CER). A similar range of consumer health information to that in PubMed Health is available from NLM’s MedlinePlus, while the systematic reviews and CER in PubMed Health are searchable through PubMed, which links to the full text (when available) in Bookshelf, journals, and/or PubMed Central.

Continue reading “PubMed Health to be discontinued October 31, 2018; content will continue to be available at NLM”

Upcoming Changes to EST and GSS Databases

Upcoming Changes to EST and GSS Databases

Update: NCBI is now in the process of merging EST and GSS records into the Nucleotide database, and we expect to complete this process in early 2019. Accession.version and GI identifiers will not change during this process.

As of December 1, 2018, all records from the databases for Expressed Sequence Tags (EST) and Genome Survey Sequences (GSS) will reside in NCBI’s Nucleotide database. This change will provide a single point of access for all GenBank sequence data with a common look and feel.

Read more to learn about how this change affects these resources:

  • Websites (Entrez)
  • APIs (E-utilities)
  • FTP sites
  • Submission procedures
  • BLAST
  • TSA (have a look if you’re not familiar!)

Continue reading “Upcoming Changes to EST and GSS Databases”