Month: January 2020

Adjust your scripts: new arrangement and naming for BLAST databases on the FTP site!

As we announced, the new default database version for BLAST+ is dbV5.  To complete the transition to the new version, we will modify the directory structure and naming conventions on the BLAST FTP database directory.  We expect to make this change around February 4th, 2020.

Here is a list of what we will change:

  1. All databases at the base of the blastdb directory (/ blast/db/) will be the dbV5 versions.
  2. The version 5 databases will no longer have “_v5” as part of the archive or database names.
  3. We will move the dbV4 databases to a v4 subdirectory (/blast/db/v4/).
  4. The now legacy dbV4 database archives will have “_v4” in their names (e.g., nr_v4.00.tar.gz); we will not rename the files within the archive.
  5. We will no longer update the dbV4 databases.
  6. We will freeze the cloud directory (/blast/db/cloud/) with no new entries after January 13, 2020.
  7. We will provide only nr, nt, swissprot, and pdbaa files in the FASTA directory (/blast/db/FASTA/).

Please adjust your scripts or procedures to accommodate the changes!

If you have any questions or concerns, please contact us.

Improving the Display of Type Material in the NCBI TaxBrowser

Have you ever been confused by multiple taxonomic names for a single organism? You’re not alone! It’s one of the challenges in maintaining any biological database. Recently we updated the NCBI TaxBrowser to assist with this.

Let’s start with a brief word about how investigators name species in the first place. For any new species, the reporting author declares a “type.” They then deposit a specimen, or “type material,” in a publicly available biorepository. This type material is tied to the new species name and serves as a reference for future comparisons. Researchers can then use DNA sequences obtained from type material to identify other samples from the same species. NCBI currently uses such an approach to verify the taxonomic assignment of prokaryotic genomes.

Our Taxonomy group has been curating type material records in the Taxonomy database since 2013 using a common vocabulary accepted by our international partners (the INSDC). For example, the Entrez query “type material[prop]” in the Taxonomy database will return all type material at NCBI.

So what are the improvements to the TaxBrowser?

Continue reading “Improving the Display of Type Material in the NCBI TaxBrowser”

December 2019 RefSeq annotations: human, Tasmanian devil and more

tasmanian devil sits, looking to the right

In December, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Anarrhichthys ocellatus (wolf-eel)
  • Apis florea (little honeybee)
  • Contarinia nasturtii (swede midge)
  • Cucumis sativus (cucumber)
  • Galleria mellonella (greater wax moth)
  • Homo sapiens (human)
  • Nasonia vitripennis (jewel wasp)
  • Oncorhynchus kisutch (coho salmon)
  • Oreochromis aureus (blue tilapia)
  • Piliocolobus tephrosceles (Ugandan red Colobus)
  • Sarcophilus harrisii (Tasmanian devil)
  • Xenopus tropicalis (tropical clawed frog)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

A new version of IgBLAST (1.15.0) is here!

IgBLAST is a popular NCBI package for classifying and analyzing immunoglobulin (IG) and T cell receptor (TCR) variable domain sequences. We’ve released a new version (1.15.0) of IgBLAST with four new improvements / bug fixes:

  1. Support for the new framework region 4 (FWR4) annotation feature in the standard alignment formats and AIRR format.
  2. Renamed the previous “-penalty” parameter to -V_penalty to be consistent with other IgBLAST penalty options.
  3. Restored constant internal BLAST search parameters for domain annotation (i.e., FWR/CDR) so that this process is not influenced by user-provided parameters.
  4. Corrected FWR/CDR annotations for certain mouse VK and rat VH germline genes.

IgBLAST 1.15 is available for download from the BLAST FTP area. See the manual on GitHub for information about setting up and running IgBLAST.

Dengue virus submission improvements now live!

When there is an outbreak of dengue fever in the world, it’s critical that viral genomic sequence data be submitted by researchers and made available to analyze as soon as possible.  You can now submit Dengue virus sequences to GenBank using a new workflow (Figure 1) in the Submission Portal designed to help make these data available as soon as possible.  The streamlined process, similar to the one described in a previous post for animal mitochondrial COX1 sequences, has an improved interface, enhanced validation, and automatic annotation that saves you time and effort.


Figure 1. The Submission Portal pages for targeted sequence submission workflows. Top panel. The new submission page for entering the workflow. Bottom panel. Submission Portal page with the Dengue virus submission option selected (boxed in red).  The service has options for other targeted submissions including mitochondrial COX1 from multicellular animals (metazoa), ribosomal RNA (rRNA), rRNA-ITS, Influenza virus, and Norovirus sequences.

This update is part of a larger and ongoing effort to consolidate GenBank submissions in a central location.  In addition to Dengue virus data, you can also submit Influenza A, B, C and Norovirus sequences as well as other targeted sequences including mitochondrial COX1 genes from multicellular animals (metazoa), ribosomal RNA (rRNA), and rRNA-ITS through the options on the Submission Portal.  You should submit other types of sequence data including other virus sequences to GenBank using BankIt or tbl2ASN.

You can use the search feature on the Submission Portal to find the appropriate submission tool for your data.

The new PubMed is here!

The updated interface includes a responsive design to improve the mobile experience as well as improved search capabilities using a best match sort.

PubMed includes the features you rely on for searching, saving, and sharing your results.

  • Access the same trusted database of more than 30 million citations for biomedical literature.
  • Use the default filters or customize the filter menu to meet your needs.
  • Save your search results to a file, email your results to yourself or a colleague, or send your results to a clipboard, collection, or your NCBI My Bibliography.
  • Save your search and create an email alert.

This version of PubMed will become the default in early 2020 and will eventually replace the legacy PubMed.  NLM will continue adding features and improving the user experience, ensuring that PubMed remains a trusted and accessible source of biomedical literature today and in the future.

We want to hear from you! What do you think of the new PubMed? Please submit your comments, questions, or concerns using the “Feedback” button available on each page of the new PubMed

New in ClinVar – notifications for changes in the clinical interpretation of variants

We have added a new feature to ClinVar that allows you to follow a particular variant and be notified if the overall clinical interpretation in ClinVar changes, for example from a pathogenic category to a non-pathogenic one.  This service will let you know about changes that may require you to update your analysis reports and contact your patients and ordering physicians. The new feature allows you to follow a variant from the  variation page (Figure 1).  Simply click the “Follow” button to begin receiving notifications.

FollowFigure 1. A ClinVar variant page (VCV000541155.1) showing the ‘Follow’ button. The text on the button changes to ‘Following’ after you add  it to your followed variants. Clicking ‘Following’ presents the option to ‘Unfollow’, which removes the variant from the followed list when clicked.

Continue reading “New in ClinVar – notifications for changes in the clinical interpretation of variants”

RefSeq Release 98 is public

RefSeq Release 98 is public

RefSeq release 98 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of January 6, 2020, and contains 223,560,051 records, including 161,133,441 proteins, 29,134,515 RNAs, and sequences from 98,406 organisms.

The release is provided in several directories as a complete dataset and as divided by logical groupings.

Read on for several important announcements.

Continue reading “RefSeq Release 98 is public”

NCBI on YouTube: Get the most out of NCBI resources with these videos

Check out the latest videos on YouTube to learn how to best use NCBI graphical viewers, SRA, PGAP, and other resources.

Genome Data Viewer: Analyzing Remote BAM Alignment Files and Other Tips

This video shows you how to upload remote BAM files, and succinctly demonstrates handy viewer settings, such as Pileup display options, and highlights the very helpful tooltips in the Genome Data Viewer (GDV). There’s also a brief blog post on the same topic.

Continue reading “NCBI on YouTube: Get the most out of NCBI resources with these videos”

Novel coronavirus complete genome from the Wuhan outbreak now available in GenBank


Get rapid access to Wuhan coronavirus (2019-nCoV) sequence data from the current outbreak as it becomes available. We will continue to update the page with newly released data.

The complete annotated genome sequence of the novel coronavirus associated with the outbreak of pneumonia in Wuhan, China is now available from GenBank for free and easy access by the global biomedical community. Figure 1 shows the relationship of the Wuhan virus to selected coronaviruses.


Figure 1.  Phylogenetic tree showing the relationship of Wuhan-Hu-1 (circled in red) to selected coronaviruses. Nucleotide alignment was done with MUSCLE 3.8. The phylogenetic tree was estimated with MrBayes 3.2.6 with parameters for GTR+g+i.  The scale bar indicates estimated substitutions per site, and all branch support values are 99.3% or higher.

Continue reading “Novel coronavirus complete genome from the Wuhan outbreak now available in GenBank”