The Genome Workbench team is proud to present version 2.13.0, with the latest usability improvements and bug fixes. See the full list of changes in the Genome Workbench release notes.
Some of the improvements include:
- New SNP tracks using the most recent dbSNP release
- Improved alignment statistics table to correctly account for introns
- Alignment tooltips report introns separately from gaps
- Fixes for several interface issues to make MAFFT and BLAST alignments easier to use.
Genome Workbench is an integrated application for viewing and analyzing sequences. Genome Workbench can be used to browse and import data from NCBI and combine it with your own private data.
In late May, we introduced a new type of search experience in NCBI Labs that uses natural language queries to make common tasks easier. The experience at NCBI Labs – where we experiment with potential new features and tools – proved successful. We’re pleased to announce that we added this simplified search capability to NCBI’s global search page. Some natural language queries now work in the “All Databases” search from the NCBI home page!
We know it’s not always easy to find the sequence data you’re after at NCBI. Maybe it’s because you’re no expert at constructing queries, and you end up with no results or too many results. Or maybe you’re an Entrez wizard, but creating a query full of Booleans and filters seems like overkill when you could just write a short natural language query, like you’re used to doing in Google. The next time you search for a gene, transcript or genome assembly for a given organism, try the new search experience we’re piloting in NCBI Labs.
In NCBI Labs, you can now search for sequences using natural language and get the best results.
Figure 1. The new interface for specified transcript search.
The improved search experience now available in NCBI Labs addresses 3 types of queries that commonly fail in searches at NCBI: organism-gene (e.g. human BRCA1), organism-transcript (e.g. Mouse p53 transcripts) and organism-assembly (e.g. dog reference genome). For each of these query types in NCBI Labs, we now return NCBI’s highest quality sequence sets or reference and representative assemblies in an easy-to-view panel.
Example queries are shown below to get you started.
A paper in the January 2018 issue of Database describes the NCBI BioCollections database, a curated dataset of metadata for culture collections, museums, herbaria and other natural history collections connected to sequence records in GenBank. The BioCollections database was established to allow the association of specimen vouchers and related sequence records to their home institutions. This process also allows back-linking from the home institution for quick identification of all records originating from each collection.
The rapidly growing set of GenBank submissions frequently includes records that are derived from specimen vouchers. Correct identification of the specimens studied, along with a method to associate the sample with its institution, is critical to the outcome of related studies and analyses.
New repository records are added to the database if they are submitted to the International Nucleotide Sequence Database Collaboration (INSDC) along with sequence data. Each record now provides information about the institution that houses the collection, standard Institution Code, mailing address, and associated webpage if available.
The BioCollections database is maintained and curated by the Taxonomy group at NCBI.
UniVec, NCBI’s non-redundant database of vector sequences, has been updated to build 10.0, which enables searches run using NCBI’s VecScreen tool to detect more of the foreign sequences introduced during the cloning or sequencing process. UniVec build 10.0 is also available via FTP.
This build added 174 complete vector sequences and 214 adapter, primer and other sequences, including 133 RNA Spike-In sequences, bringing the total number of sequences represented in the UniVec database to 3,039.
IgBLAST 1.7.0 release
A new version of IgBLAST is now available on FTP, with the following new features:
- Specify whether overlapping nucleotides at VDJ junctions are allowed in matching V, D, and J genes.
- Set a custom J gene mismatch penalty
- Report the CDR3 start and stop positions in the sub-region table
- Use alignment length instead of percent identity as the tie-breaker for hits with identical blast scores, improving accuracy in the V, D, J gene assignment.
IgBLAST was developed at the NCBI to facilitate the analysis of immunoglobulin and T cell receptor variable domain sequences.
The NCBI Multiple Sequence Alignment Viewer (MSAV) is a versatile web application that helps you visualize and interpret MSAs for both nucleotide and amino acid sequences. You can display alignment data from many sources, and the viewer is easily embedded into your own web pages with customizable options. An even simpler way to use MSAV is to use our page, upload your data, and share the link to a fully functional viewer displaying your results.
As you may have read in previous posts, NCBI is in the process of changing the way we handle GI numbers for sequence records.
In short, we are moving to a time when accession.version identifiers, rather than GI numbers, will be the primary identifiers for sequence records.
As part of this transition, an obvious question for any of you currently using GI numbers is how to convert a GI number to an accession.version, so that you can make appropriate updates. The good news is that it’s pretty easy if you have no more than a few thousand GIs to convert.
NCBI has announced that we will be changing the way we handle GI numbers for sequence records in September 2016. (Read more, in case you missed it).
In this post, we’ll address a key question:
What is the future of existing GI numbers?
The short answer is that nothing is happening to these GI numbers.
If a nucleotide or protein record already has a GI, it will continue to have that GI indefinitely. You will also be able to retrieve such a record using its GI either on the NCBI web site or using the E-utilities.
Moreover, GIs will remain part of the XML and ASN.1 formats of sequence records.
If not GIs, then what?
Accession.version identifiers. All sequence records, both new and old, will have a unique accession.version identifier.
Existing records will keep the accessions they already have; new sequences will only receive an accession.version identifier.
So what’s all the fuss about?
Stay tuned for additional posts about this topic, and please contact us if you have questions.
You may have heard that NCBI is changing the way we handle GI numbers for sequence records in September 2016. Well, you heard right! Here’s the announcement, in case you missed it.
There are a number of issues raised by these changes, but we’re going to answer two questions in this post:
- What pieces of your code will break in September?
- Are GI numbers gone for good?