November 8 NCBI Minute: New API keys for better E-utilities & EDirect access to NCBI data


On Wednesday, November 8, 2017, we will present a webinar on API keys for E-utilities. In this webinar, you’ll learn how to get and start using your API key with the E-utilities and the command line EDirect programs.

Date and time: Wednesday, November 8, 2017 12:00-12:30PM EST

After registering, you will receive a confirmation email with information about attending the webinar. After the live presentation, the webinar will be uploaded to the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

New API Keys for the E-utilities


If you regularly use the E-utilities API, we have important news for you: NCBI is now providing API keys for the E-utilities! After May 1, 2018, NCBI will limit your access to the E-utilities unless you have one of these keys. Obtaining an API key is quick, and simple, and will allow you to access NCBI data faster. If you don’t have an API key, E-utilities will still work, but you may be limited to fewer requests than allowed with an API key.

What is an API key?

An API key is a unique string that you include in your HTTP requests that identifies you to NCBI servers. Think of the API key as a ‘turbocharger’ that lets you get more data, faster, from NCBI.

Continue reading

Variation feature changes in NCBI Reference Sequences coming in 2018


Starting in March 2018, SNP variation features will no longer be in RefSeq genome assembly records – chromosome and contig records with NC_, NT_, NW_ and AC_ accession prefixes. This change affects both the ASN.1 and flatfile records. Because the number of variants is already enormous and still growing, removing SNP features from these large genomic records will significantly reduce the size of RefSeq FTP files and make downloading and processing easier. We will continue to include SNPs on NG_-prefixed genomic records, and transcript (NM_, NR_, XM_, XR_) and protein (NP_, XP_, YP_) sequences.

Reminder: As of September 2017, NCBI has stopped accepting submissions for non-human SNPs in dbSNP and dbVar. RefSeq flatfiles will stop presenting non-human variant data in November 2017.

Subscribe to the refseq-announce listserv for regular updates on RefSeq.

BLAST+ 2.7.1 now available


In the new version (2.7.1) of the BLAST+ executables, blastdbcmd can look up taxonomic names (e.g., scientific or common name) faster. We have also made some low-level improvement that allow BLAST to multithread more efficiently, especially when available memory is not sufficient for the database.

Note: Some LINUX and MacOSX users may find that they need to increase the number of open file descriptors allowed for a process. The number of allowed open file descriptors can be easily changed with “ulimit -n” (under bash). We suggest setting the limit to at least 1024.

See the BLAST+ release notes for more information.

IgBLAST 1.8.0 release


A new version of IgBLAST is now available on FTP, along with a new manual on GitHub. This release has the following improvements:

  1. The igblastn executable can now multi-thread much more efficiently for large sets of queries. The default number of threads is now four, but can be changed with the -num_threads option.
  2. The igblastn executable can now take an SRA accession as the query input. The search runs on the local machine, but the queries are retrieved from the SRA repository at the NCBI. Use the -sra rather than the -query option to enable.
  3. A lower default nucleotide mismatch penalty values for finding D and J genes (from -4 to -2 and from -3 to -2, respectively). This improves accuracy in finding the best D and J gene hits for moderately mutated sequences.

Our web IgBLAST page also uses the new default nucleotide mismatch penalty values (i.e., -2 for finding both D and J genes).

IgBLAST facilitates the analysis of immunoglobulin and T cell receptor variable domain sequences.

New Influenza Virus Submission Wizard Makes Flu Sequence Submissions Easier


NCBI now offers a flu sequence submission wizard that makes submissions easier and will provide you with accession numbers sooner. To get started, sign in to NCBI, go to the Submission Portal and choose the link for “Ribosomal RNA (rRNA), rRNA-ITS or Influenza sequences” from the GenBank section.

submission portal page with genbank link

Continue reading

November 1 webinar: Introducing the Genome Data Viewer (GDV)


On Wednesday, November 1, 2017, we will present a webinar on GDV, NCBI’s full-featured genome browser. In this webinar, you’ll learn how to explore and analyze sequences and annotations for eukaryotic RefSeq genome assemblies. We’ll show you how to:

  • Search across the entire assembly for genes, products and other markers or jump to a specific position or range
  • Display any of seven preselected track sets highlighting various aspects of the assembly or create and load your own custom track sets from your NCBI account.
  • Load and display submitted alignment data from NCBI’s GEO or SRA.
  • Upload your own annotation and variant data
  • Display BLAST or Primer-BLAST results on the assembly in the browser.

Date and time: Wednesday, November 1, 2017 12:00-12:30PM EDT

After registering, you will receive a confirmation email with information about attending the webinar. After the live presentation, the webinar will be uploaded to the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

NCBI’s Genome Data Viewer (GDV) to replace Map Viewer


The Genome Data Viewer (GDV) is now the main genome browser at NCBI replacing the Map Viewer, our original genome browser. GDV is a modern genome browser with essential improvements over Map Viewer. These include sequence-level details and an automated update process that keeps up with the rapid pace of genome sequencing, assembly and annotation.

APOBE3CB_GDV_

The Genome Data Viewer homepage (top panel) and browser view (bottom panel)

Continue reading

GenBank release 222.0 is available via FTP, Entrez and BLAST


GenBank release 222.0 (10/14/2017) has 203,953,682 traditional records (including non-bulk-oriented TSA) containing 244,914,705,468 base pairs of sequence data. In addition, there are 508,825,331 WGS records containing 2,318,156,361,999 base pairs of sequence data, 192,754,804 TSA records containing 172,909,268,535 base pairs of sequence data, and 9,479,460 TLS records containing 2,993,818,315 base pairs of sequence data.

Continue reading

Sequence Viewer 3.23 now available


Sequence Viewer 3.23 has several new features, improvements and bug fixes, including performance optimization for alignment renderings and improved tooltips in uploaded VCF files. For a full list of changes, see the Sequence Viewer release notes.

Sequence Viewer is a graphical view of sequences and color-coded annotations on regions of sequences stored in the Nucleotide and Protein databases.