IgBLAST 1.9.0 release includes AIRR rearrangement reporting


IgBLAST 1.9.0 supports the adaptive immune receptor repertoire (AIRR) standard for sequence analysis results. The AIRR format is available on web IgBLAST as well as in the standalone IgBLAST tool, with the -outfmt 19 option.

Supporting this new schema in IgBLAST will enhance the increasing amount of repertoire studies that use next-generation sequencing technology to generate very large sets of Ig/T-cell receptor rearrangement analysis data.

IgBLAST facilitates the analysis of immunoglobulin and T cell receptor variable domain sequences. Get IgBLAST on FTP. A new manual is on GitHub.

Test drive a new sequence search experience at NCBI Labs


We know it’s not always easy to find the sequence data you’re after at NCBI. Maybe it’s because you’re no expert at constructing queries, and you end up with no results or too many results. Or maybe you’re an Entrez wizard, but creating a query full of Booleans and filters seems like overkill when you could just write a short natural language query, like you’re used to doing in Google.  The next time you search for a gene, transcript or genome assembly for a given organism, try the new search experience we’re piloting in NCBI Labs.

In NCBI Labs, you can now search for sequences using natural language and get the best results.

NCBI Labs transcript search interface

Figure 1. The new interface for specified transcript search.

The improved search experience now available in NCBI Labs addresses 3 types of queries that commonly fail in searches at NCBI: organism-gene (e.g. human BRCA1), organism-transcript (e.g. Mouse p53 transcripts) and organism-assembly (e.g. dog reference genome). For each of these query types in NCBI Labs, we now return NCBI’s highest quality sequence sets or reference and representative assemblies in an easy-to-view panel.

Example queries are shown below to get you started.

Continue reading

GenBank release 225: Over 1 billion sequence records stored!


GenBank release 225.0 (4/14/2018) has 208,452,303 traditional records (including non-bulk-oriented TSA) containing 260,189,141,631 base pairs of sequence data. In addition, there are 621,379,029 WGS records containing 2,784,740,996,536 base pairs of sequence data, 227,364,990 TSA records containing 205,232,396,043 base pairs of sequence data, and 14,782,654 TLS records containing 5,612,769,448 base pairs of sequence data.

During the 60 days between the close dates for GenBank releases 224.0 and 225.0, the traditional portion of GenBank grew by 6,558,433,533 base pairs and by 1,411,748 sequence records. During that same period, 86,960 records were updated – an average of 24,978 records added or updated per day.

Continue reading

May 16 webinar: Improved Standalone BLAST database and programs: now with taxonomic information


Next Wednesday, May 16, 2018, we’ll show you how to download and use the latest standalone BLAST databases, dbv5. You’ll learn how to use BLASTdbv5 and the new BLAST programs to limit searches to taxonomic groups and to retrieve sequences from the database by taxonomy.

Date and time: Wed, May 16, 2018 12:00 PM – 12:30 PM EDT

Register here: https://bit.ly/2qW7LLy

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

May 9 NCBI Minute: Integrating PubChem into Your Chemistry Teaching


Next Wednesday, May 9, 2018, NCBI staff will show you how to use PubChem as a cheminformatics education resource. In addition to learning about tools and services for chemical information search, analysis, and download, you will also see examples of how instructors incorporate PubChem in Cheminformatics OLCC (On-Line Chemistry Courses), an intercollegiate hybrid course.

Date and time: Wednesday, May 9, 2018 12:00 – 12:30 PM EDT

Register here: https://bit.ly/2q5wtsF

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

March & April annotations in RefSeq: chimpanzee, human & more


Chimpanzees_in_Uganda_(5984913059)The NCBI Eukaryotic Genome Annotation Pipeline has recently released new annotations in RefSeq for the following organisms:

  • Bombus impatiens (common eastern bumble bee)
  • Brachypodium distachyon (stiff brome)
  • Cimex lectularius (bed bug)
  • Desmodus rotundus (common vampire bat)
  • Halyomorpha halys (brown marmorated stink bug)
  • Homo sapiens (human, more information can be found here)
  • Lingula anatina (brachiopod)
  • Neophocaena asiaeorientalis asiaeorientalis (Yangtze finless porpoise)
  • Oncorhynchus tshawytscha (Chinook salmon)
  • Oryzias melastigma (Indian medaka)
  • Pan troglodytes (chimpanzee)
  • Physcomitrella patens (moss)
  • Populus trichocarpa (black cottonwood)
  • Rosa chinensis (China rose)
  • Selaginella moellendorffii (club-moss)
  • Terrapene mexicana triunguis (Three-toed box turtle)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

The NCBI BioCollections database links specimen vouchers and sequence records to home institutions


A paper in the January 2018 issue of Database describes the NCBI BioCollections database, a curated dataset of metadata for culture collections, museums, herbaria and other natural history collections connected to sequence records in GenBank. The BioCollections database was established to allow the association of specimen vouchers and related sequence records to their home institutions. This process also allows back-linking from the home institution for quick identification of all records originating from each collection.

The rapidly growing set of GenBank submissions frequently includes records that are derived from specimen vouchers.  Correct identification of the specimens studied, along with a method to associate the sample with its institution, is critical to the outcome of related studies and analyses.

New repository records are added to the database if they are submitted to the International Nucleotide Sequence Database Collaboration (INSDC) along with sequence data. Each record now provides information about the institution that houses the collection, standard Institution Code, mailing address, and associated webpage if available.

The BioCollections database is maintained and curated by the Taxonomy group at NCBI.

Researchers: Now it’s easier to find the data you want in BioProject


We’ve improved BioProject to give you a better way to find all data from a specific project. We think you’ll love the new interface that lets you quickly choose the right BioProject with links to the data you want in other NCBI databases.

The updated BioProject browser makes it easier than ever to filter the data by a variety of attributes so you can quickly pick BioProjects that interest you.

fig 1

Figure 1. The BioProject home page showing links to the BioProject browser. To use the new browser, click the ‘Browse by Project Attributes link below the search bar on any BioProject page or the ‘By Project attributes’ link on the BioProject home page.

Continue reading