About NCBI Staff

The National Center for Biotechnology Information (NCBI), a division of the U.S. National Library of Medicine, provides access to scientific and biomedical databases, software tools for analyzing molecular data, and performs research in computational biology.

Proposed changes to AGP files for genome assemblies

If you are a consumer or producer of AGP (A Golden Path) files for genome assemblies, please read on.  We’d like your feedback on the proposed changes described here.

As you know, AGP files are used to describe the structure of certain genome assemblies. The AGP file format has not kept up with changes in sequencing technology or International Sequence Database Collaboration (INSDC) feature usage. NCBI is therefore proposing to extend the current AGP v2.0 specification to add new linkage evidence types and a gap type of “contamination” as detailed below and described in the AGP v2.1 proposed specification.

Proposed changes from AGP v2.0 to AGP v2.1:

  • Add ‘proximity-ligation’ and ‘pcr’ to the set of accepted linkage evidence values
  • Drop ‘strobe’ from the set of accepted linkage evidence values
  • Expand the definition of ‘paired-end’ linkage evidence to include ‘mate-pairs’ and molecular-barcode techniques
  • Add a gap-type of ‘contamination’
    • definition: a gap inserted in place of foreign sequence to maintain the coordinates
    • usage: treated as linked to preserve the original scaffold but with linkage evidence ‘unspecified’


April 16 – May 7: Comment period
May 8 – May 10: AGP v2.1 proposal finalized
May 12 – May 16: AGP v2.1 approved at the annual INSDC meeting
Summer 2019: NCBI begins accepting the new linkage-evidence types, and using the contamination gap type

Note: NCBI would continue to accept genome submissions in AGP v2.0 format.

We are seeking your input on these proposed changes. Please comment on this post or write to suggest@ncbi.nlm.nih.gov if you have any comments or suggestions.

Recent enhancements to BLAST+ (2.9.0): built-in taxonomy and access to proteins from the Pathogen Detection Project

We have made some recent improvements to the BLAST+ applications that take full advantage of the version 5 BLAST databases (BLASTDBv5), which include built in taxonomic information for sequences and no longer rely on the integer sequence identifiers (gi numbers).

With the latest version of BLAST, you can now:

  • Limit your searches by taxonomy using information built into the BLAST databases
  • Limit searches more efficiently when using a list of sequence accessions
  • Retrieve sequences by taxonomy from the BLAST database with blastdbcmd
  • Search PDB proteins with identifiers up to four-characters long.  You can read more about about PDB changes on our Structure database documentation.

Only BLASTDBv5 supports these new features. These new BLAST databases also contain accession-based (gi-less) proteins from important high-throughput genome sequencing projects that are not available in the previous version of BLAST databases. These include proteins from annotation of assemblies from large-scale pathogen surveillance efforts that are part of the NCBI Pathogen Project as well as those coming from large-scale metagenomics surveillance. With the v5 databases, you can perform BLAST searches of all proteins from these assemblies to find the proteins of interest.

For more information on new database version, BLASTDBv5 (download), see the previous NCBI Insights article and the recording of our webinar. We will continue to update the BLAST databases in their current version (BLASTDBv4) until September 2019.

NCBI on YouTube: Request access to controlled data in dbGaP

Do you need access to controlled data in the database of Genotypes and Phenotypes (dbGaP)? This short video will show you how to request data today!

dbGaP archives and distributes the data and results from studies that have investigated the interaction of genotype and phenotype in humans. Responsible stewardship of controlled-access data subject to the NIH GDS Policy is shared among the NIH, the investigators approved to access the data, and the investigators’ institutions.

Conserved Domain Database (CDD) 3.17 is now available

The latest version of the Conserved Domain Database contains 3,272 new or updated NCBI-curated domains and now mirrors Pfam version 31 as well as models from NCBIfams, a collection of protein family hidden Markov models (HMMs) for improving bacterial genome annotation. A fine-grained classification of the major facilitator superfamily has also been added. You can find this updated content on the CDD FTP site.

Continue reading

Genome Workbench 2.13.0 now available

The Genome Workbench team is proud to present version 2.13.0, with the latest usability improvements and bug fixes.  See the full list of changes in the Genome Workbench release notes.

Some of the improvements include:

  • New SNP tracks using the most recent dbSNP release
  • Improved alignment statistics table to correctly account for introns
  • Alignment tooltips report introns separately from gaps
  • Fixes for several interface issues to make MAFFT and BLAST alignments easier to use.

Genome Workbench is an integrated application for viewing and analyzing sequences. Genome Workbench can be used to browse and import data from NCBI and combine it with your own private data.

Easily configure gene feature modes in NCBI’s graphical sequence displays

Did you know you can easily switch between gene feature modes in NCBI’s graphical sequence displays like Sequence Viewer and GDV? You may need to configure gene tracks to suit your needs if, for example, you need to conduct analyses or present quality images.

Use one of two easy access points to the gene configuration menu to show the gene bar, the single line gene model or the expanded modes that show transcripts and CDSs.

Continue reading

New BLAST results page in NCBI LABS

NCBI Labs is showcasing an experiment to improve the BLAST results page. The goal is to provide a more useful BLAST output that better meets your needs and integrates with your workflows. The new results incorporate feedback from surveys and interviews with BLAST users. We think you’ll find the new results are more compact, easier to navigate, and expose useful formatting and other features that you may not have known about.

The results page has organism, percent identity, and E value filters in plain view and easily accessible. The Descriptions and Graphic Summary are on separate tabs, and the popular taxonomy view is on a fourth tab rather than on a separate web page. These changes along with other enhancements make the display more concise and easier to navigate. The figure below shows the new output format.

Blast_resultsFigure 1. The New BLAST Results with filters directly on the page and a more concise tabbed output that includes the taxonomy report. The Back to Traditional Results Page link re-loads the results in the standard format.

Continue reading

Advanced search comes to PubMed Labs

Advanced Search is now available in PubMed Labs!

advanced search pubmed

Figure 1. PubMed Advanced Search.

The tools included with Advanced Search help you:

  • Search for terms in a specific field (such as Author)
  • Combine searches and build large, complex search strings
  • See how your query was translated by PubMed
  • Compare number of results for different queries
  • Download your search history

Continue reading

NCBI at Experimental Biology next week (Apr 6-9) in Orlando

We’ll be exhibiting next week at the 2019 Experimental Biology conference in Orlando. Stop by the NCBI booth (#446) (April 7-9, 9 AM – 4 PM) to meet NCBI staff,  to see live demonstrations of NCBI molecular and literature databases and tools, ask questions and provide feedback. We’ll also be showcasing important updates to BLAST, PubChem, and PubMed!