GenBank will start using expanded accession formats by December 2018


By the end of 2018, GenBank and other INSDC members will expand the accession formats used for sequencing projects. We have assigned almost all the possible accession numbers using the current, shorter formats. Using these longer formats will allow us to expand accession ranges and give us greater capacity.

The expanded format for Whole Genome Shotgun (WGS), Transcriptome Shotgun Assembly (TSA), and Targeted Locus Study (TLS) sequencing projects will use a six-letter Project Code prefix and a two-digit Assembly-Version number followed by 7, 8, or 9 digits (for example, AAAAAA020000001).

Non-WGS/TLS/TSA nucleotide sequences currently use a “2+6” format, two-letter prefix followed by six digits. This format will be expanded to eight digits.

Protein sequences currently use a “3+5” accession format. By the end of 2018, this format will use seven digits.

You will need to adjust any processing methods to accommodate these new identifier formats.  Please write to the helpdesk with any questions about the new formats.

Improved Search Now Available Across NCBI Databases


Earlier this year, we announced the release of a new and improved search feature that interprets plain language to give better results for common searches. This feature, originally developed in NCBI Labs and later released on the NCBI All Databases search, is now available across several NCBI resources: Nucleotide, Protein, Gene, Genome, and Assembly. Whether you are searching for a specific gene or for a whole genome, you will now retrieve NCBI’s best results regardless of the database  you search.

The image below shows the results for a search for human INS in the Nucleotide database. Even though this is a Nucleotide search, the results include relevant information from Gene, Protein, Taxonomy,  plus links to the NCBI reference sequences (RefSeq) as well as access to BLAST and the insulin gene region in NCBI’s genome browser, the Genome Data Viewer.KIS_nuccore_smallFigure 1.  The new natural language search result in the Nucleotide database from a search for human INS.

Try out this new search capability and let us know what you think. And keep visiting the NCBI Labs search page to try our latest experiments, which we’ll also announce here on NCBI Insights.

 

September 12 NCBI Minute: Release Plan for NCBI API Keys


Update: Webinar is now on September 12!

If you already registered for the September 5 date, you are automatically registered for September 12. You do not need to re-register. We welcome anyone else who would like to register.

As previously announced, NCBI has introduced API keys for the E-utilities. You will soon want to start using API Keys in your E-Utilities API calls as these will allow the fastest access to NCBI databases. In this webinar, we will review how API Keys work and will provide you with a schedule of brief testing periods and the timing of the full release of API key functionality.

Date and time: Wed, Sep 12, 2018 12:00 PM – 12:30 PM EDT

Register here: https://bit.ly/2v0wFMl

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

(Webinar re-scheduled to September 12 because the presenter was called away unexpectedly.)

Hey Professors! Get your free personal assistant — an NCBI Account!


Professors, we know you’re busy ­­— really, really busy.  You have to develop and teach your courses and labs, coordinate and run your journal clubs and seminars, direct your lab’s research efforts, write grants and publications, counsel and mentor your students, and stay current on everything related to your teaching and research topics.

NCBI has information that can help with all of this, but there are so many interesting records and so little time to organize them. Sign up (Help) for or log in (Help) to your free NCBI Account and let us help you get started and get organized!

Read on – or watch the video embedded below – to learn more about what you can do with your NCBI Account.

Continue reading

Improved search for prokaryotic assemblies and genes


We now have many improvements to our search functionality on NCBI’s global search page that will benefit users trying to find prokaryotic assemblies and genes. These improvements aim to highlight the best results and provide links to related NCBI content, so you don’t have to sift through pages of results and navigate between different NCBI resources.

new search genome assembly

Continue reading

Standalone variation services replace Variation Reporter


As of July 2018, a new set of standalone variation services replaces the variant matching functions of Variation Reporter. Variation Reporter was a tool designed to search human sequence variation data by location and to report matching variants found in dbSNP, dbVar, and ClinVar.

The new services are faster, better at handling variants in repeat regions, and scalable to accommodate the continued explosive growth of variation volume. You can find more information about the services in the initial blog post and online SPDI document.

If you would like to report any issues related to these new services and/or would like to provide comments, please write to snp-admin@ncbi.nlm.nih.gov.

If you have any specific questions about the NCBI site in general, contact us at info@ncbi.nlm.nih.gov.

We appreciate your continued support and interaction with the NCBI tools.

NCBI’s Genome Data Viewer now displays data from track hubs


The Genome Data Viewer’s (GDV) browser display now supports content provided in track hubs. This new GDV feature, summarized in this short video, extends the genome browser’s capability when it comes to viewing user-supplied data tracks alongside NCBI-provided tracks.  You now have multiple options to analyze your data that include uploading your data (file/URL), streaming individual files from a remote location and/or connecting to a track hub. In all instances, GDV recognizes a variety of popular file formats with support for additional file formats planned. In the display, you can now also easily distinguish user-supplied tracks by their green-tinted track labels. Continue reading

Magic-BLAST (v1.4.0), an accurate DNA and RNA-Seq aligner


What is Magic-BLAST and why are we excited about it?

Magic-BLAST is a BLAST tool, but it’s unlike any other.

It aligns next generation sequencing reads, both DNA and RNA-seq.  It implements the aligner algorithm from MAGIC [1], a trusted pipeline, but uses the well tested and supported BLAST infrastructure. We think it’s like putting two great things together, like having your favorite ice cream in your morning coffee.

We’re so excited about it that we even wrote an article that compares Magic-BLAST to a few other aligners on several data sets.

If you look at the figures in our article, we think you’ll see that Magic-BLAST excels at finding introns and processing ultra-long sequences. It also can handle high levels of mismatches as well compositionally biased DNA.  Finally, you’ll see that Magic-BLAST works in a lot of relevant situations in which current aligners won’t. If our results got your attention, here is our documentation, which includes a cookbook with a few examples.

Continue reading

NCBI to retire Clone DB web interface


Starting in January 2019, the sequence content of Clone DB will be frozen, and its web interfaces will no longer be available. NCBI will continue to produce and make genomic clone placements available as annotations in NCBI’s Genome Data Viewer (GDV) using the sequence data currently in Clone DB. These placements and their corresponding underlying (now static) library and sequence data will also be accessible on the Clone DB FTP site.

The collection of Clone DB records for cell-based (integrated) gene targeting and gene trap libraries will also be retired in January. These data were provided to Clone DB by MGI.  Clone DB users should refer to MGI for their continuing research needs.

Please contact us with any comments, concerns, or if you need help with the use of Clone DB data.

Clone DB was originally implemented as the Clone Registry during the human and mouse genome projects. In subsequent years, it expanded to represent clone-associated data for a broad range of organisms. Clone DB has been a valuable resource connecting users with information and reagents for genomic and cell-based clones. However, with the advent of short read sequencing, fewer and fewer genomic clone end and insert sequences are submitted to NCBI every year, and the usage of and need for Clone DB has dropped significantly.

Tenure-Track Investigator Recruitment in Data Science, Biomedical Informatics, and Computational Biology


Posted Date: August 13, 2018; Closing Date: Until position is filled.

The National Library of Medicine (NLM) of the National Institutes of Health (NIH) is seeking 2-3 tenure-track investigators to lead world-class research programs within its Intramural Research Program. The goal of this search is to identify candidates with the potential to develop a dynamic, innovative, and independent computational research program that will enhance NLM’s collaborative environment by performing novel, cutting-edge research and, thereby, advance the objectives of NLM’s new Strategic Plan, 2017-2027 to accelerate data-driven discovery. As part of this plan, and in alignment with NIH’s new Strategic Plan for Data Science, NLM is embarking on a major expansion of its Intramural Research Program, with a specific emphasis on data science methodologies, analytics, visualization, computer-assisted curation, and applications of novel methods for basic biological and biomedical discovery.

Continue reading