RefSeq release 88 available


RefSeq release 88 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available, as of May 14, 2018. It contains 160,224,355 records, including 110,333,800 proteins, 22,461,378 RNAs, and sequences from 79,448 organisms. The release is in several directories as a complete dataset and as divided by logical groupings.

This release incorporates dbSNP release 151, which nearly doubles the number of SNPs annotated on the human GRCh38 genome, with matching increases in the size of the human nucleotide flatfile (.gbff) records.

Starting in November 2018, SNP variation features will no longer be in RefSeq genome assembly records.  The RefSeq release notes have information about this change.

IgBLAST 1.9.0 release includes AIRR rearrangement reporting


IgBLAST 1.9.0 supports the adaptive immune receptor repertoire (AIRR) standard for sequence analysis results. The AIRR format is available on web IgBLAST as well as in the standalone IgBLAST tool, with the -outfmt 19 option.

Supporting this new schema in IgBLAST will enhance the increasing amount of repertoire studies that use next-generation sequencing technology to generate very large sets of Ig/T-cell receptor rearrangement analysis data.

IgBLAST facilitates the analysis of immunoglobulin and T cell receptor variable domain sequences. Get IgBLAST on FTP. A new manual is on GitHub.

Test drive a new sequence search experience at NCBI Labs


We know it’s not always easy to find the sequence data you’re after at NCBI. Maybe it’s because you’re no expert at constructing queries, and you end up with no results or too many results. Or maybe you’re an Entrez wizard, but creating a query full of Booleans and filters seems like overkill when you could just write a short natural language query, like you’re used to doing in Google.  The next time you search for a gene, transcript or genome assembly for a given organism, try the new search experience we’re piloting in NCBI Labs.

In NCBI Labs, you can now search for sequences using natural language and get the best results.

NCBI Labs transcript search interface

Figure 1. The new interface for specified transcript search.

The improved search experience now available in NCBI Labs addresses 3 types of queries that commonly fail in searches at NCBI: organism-gene (e.g. human BRCA1), organism-transcript (e.g. Mouse p53 transcripts) and organism-assembly (e.g. dog reference genome). For each of these query types in NCBI Labs, we now return NCBI’s highest quality sequence sets or reference and representative assemblies in an easy-to-view panel.

Example queries are shown below to get you started.

Continue reading

GenBank release 225: Over 1 billion sequence records stored!


GenBank release 225.0 (4/14/2018) has 208,452,303 traditional records (including non-bulk-oriented TSA) containing 260,189,141,631 base pairs of sequence data. In addition, there are 621,379,029 WGS records containing 2,784,740,996,536 base pairs of sequence data, 227,364,990 TSA records containing 205,232,396,043 base pairs of sequence data, and 14,782,654 TLS records containing 5,612,769,448 base pairs of sequence data.

During the 60 days between the close dates for GenBank releases 224.0 and 225.0, the traditional portion of GenBank grew by 6,558,433,533 base pairs and by 1,411,748 sequence records. During that same period, 86,960 records were updated – an average of 24,978 records added or updated per day.

Continue reading

May 16 webinar: Improved Standalone BLAST database and programs: now with taxonomic information


Next Wednesday, May 16, 2018, we’ll show you how to download and use the latest standalone BLAST databases, dbv5. You’ll learn how to use BLASTdbv5 and the new BLAST programs to limit searches to taxonomic groups and to retrieve sequences from the database by taxonomy.

Date and time: Wed, May 16, 2018 12:00 PM – 12:30 PM EDT

Register here: https://bit.ly/2qW7LLy

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

May 9 NCBI Minute: Integrating PubChem into Your Chemistry Teaching


Next Wednesday, May 9, 2018, NCBI staff will show you how to use PubChem as a cheminformatics education resource. In addition to learning about tools and services for chemical information search, analysis, and download, you will also see examples of how instructors incorporate PubChem in Cheminformatics OLCC (On-Line Chemistry Courses), an intercollegiate hybrid course.

Date and time: Wednesday, May 9, 2018 12:00 – 12:30 PM EDT

Register here: https://bit.ly/2q5wtsF

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

March & April annotations in RefSeq: chimpanzee, human & more


Chimpanzees_in_Uganda_(5984913059)The NCBI Eukaryotic Genome Annotation Pipeline has recently released new annotations in RefSeq for the following organisms:

  • Bombus impatiens (common eastern bumble bee)
  • Brachypodium distachyon (stiff brome)
  • Cimex lectularius (bed bug)
  • Desmodus rotundus (common vampire bat)
  • Halyomorpha halys (brown marmorated stink bug)
  • Homo sapiens (human, more information can be found here)
  • Lingula anatina (brachiopod)
  • Neophocaena asiaeorientalis asiaeorientalis (Yangtze finless porpoise)
  • Oncorhynchus tshawytscha (Chinook salmon)
  • Oryzias melastigma (Indian medaka)
  • Pan troglodytes (chimpanzee)
  • Physcomitrella patens (moss)
  • Populus trichocarpa (black cottonwood)
  • Rosa chinensis (China rose)
  • Selaginella moellendorffii (club-moss)
  • Terrapene mexicana triunguis (Three-toed box turtle)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.