Sequence Viewer 3.23 now available


Sequence Viewer 3.23 has several new features, improvements and bug fixes, including performance optimization for alignment renderings and improved tooltips in uploaded VCF files. For a full list of changes, see the Sequence Viewer release notes.

Sequence Viewer is a graphical view of sequences and color-coded annotations on regions of sequences stored in the Nucleotide and Protein databases.

CNVs from Exome Aggregation Consortium (ExAC) added to dbVar in September 2017 data release


Copy number variants (CNVs) from ExAC’s publication are now available at dbVar as nstd151. The data include approximately 50,000 CNV regions identified from 60,000 human exomes, providing a deep survey of common and rare copy number variation affecting protein-coding sequences in the human genome.

dbVar provides FTP files in VCF, GVF, and CSV formats, and include placements on GRCh37 as well as remapped placements on GRCh38. Tutorials for working with different formats are also available.

Follow the dbVar RSS feed for information on monthly releases.

Updated HIV-1 interaction datasets in Gene


We recently updated the HIV-1 interaction datasets in Gene with data provided by the Southern Research Institute (SRI).

The protein interactions dataset now has:

  • 8,005 interactions,
  • 16,215 interaction descriptions,
  • 3,859 proteins encoded by 3,757 human genes,
  • and 6,822 publications.

The replication interactions dataset now has:

  • 1,595 interactions,
  • 1,854 interaction descriptions,
  • 1,583 proteins encoded by 1,583 human genes,
  • and 229 publications.

Data are also available at the RefSeq HIV-1 website and the GeneRIF FTP site.

GRAF, a new tool for finding duplicates and closely related samples in large genomic datasets


Genome-wide association studies (GWAS) usually rely on the assumption that different samples aren’t from closely related individuals. If you’re using combined datasets that have been genotyped on different platforms, though, how do you detect duplicates and close relatives?

The dbGaP team at NCBI developed a new software tool and rapid statistical method called Genetic Relationship and Fingerprinting (GRAF) to do exactly that. At NCBI, we use GRAF as a quality assurance tool in dbGaP data processing. We’re presenting this tool publicly so any researcher can check the quality of their own data.

GRAF uses two statistical metrics to determine subject relationships directly from the observed genotypes, without estimating probabilities of identity by descent (IBD), or kinship coefficients, and compares the predicted relationships with those reported in the pedigree files. Please see the PLOS ONE article published in July 2017 for a detailed description of GRAF.

A recent update to GRAF adds the ability to determine subject ancestries. For more information on this addition, visit Poster #1322T, “Quickly determining subject ancestries in large datasets using genotypes of dbGaP fingerprint SNPs”, on Thursday, October 19th from 3-4 in the Exhibit Hall at ASHG.

Genome Workbench v2.12.5


Recent updates to Genome Workbench include a new navigation tutorial for Graphical Sequence View, and various bug fixes and improvements. You can see the full list of changes in the Genome Workbench release notes.

Genome Workbench is an integrated application for viewing and analyzing sequences. Genome Workbench can be used to browse data in GenBank and combine this data with your own private data.

October 11 NCBI Minute: Introducing the New RefSeq Functional Elements Project


On October 11, 2017, NCBI will present a webinar on RefSeq Functional Elements. This NCBI Minute will introduce you to this project and its scope, describe how functional elements are curated and displayed, demonstrate how to access the data, and provide information on the current progress of the project.

Date and time: Wed, Oct 11, 2017 12:00 PM – 12:30 PM EDT

After registering, you will receive a confirmation email with information about attending the webinar. After the live presentation, the webinar will be uploaded to the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

The new RefSeq Functional Elements project is an expansion of the NCBI RefSeq project to include non-genic functional genomic regions in human and mouse that have been experimentally validated and described in the scientific literature.

October 4th NCBI Minute: Create, link and share your bibliography (PubMed & ORCID)


On October 4, 2017, NCBI staff will present a webinar on author disambiguation and the advantages of using an ORCID ID.

Disambiguating common author names is tough in any field, but if your published research is cited in PubMed, we can help you find your citations, create a bibliography, and share your publication list with others.

In this webinar, we’ll also talk about the advantage of quickly registering for a free, unique identifier that will remain constant – even if your name changes.

Date & time: Wednesday, October 4, 2017 12:00 PM – 12:30 PM EDT

After registering, you will receive a confirmation email with information about attending the webinar. After the live presentation, the webinar will be uploaded to the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Magic-BLAST 1.3.0 released with new features and improvements


The newest version of Magic-BLAST (v. 1.3.0) offers improved sensitivity and faster run-times as well as a number of other new features and improvements. These include the ability to set the alignment cut-off score as a function of read length, a maximum edit distance option and optional local cacheing for SRA files. For more information on these and other improvements, see the release notes. You can download the new executables from the NCBI FTP site.

Magic-BLAST is a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. Read more here.

RefSeq release 84 available


RefSeq release 84 is now accessible online, via FTP and through NCBI’s programming utilities.

This full release incorporates genomic, transcript, and protein data available, as of September 11, 2017, and contains 140,627,690 records, including 95,563,598 proteins, 20,356,598 RNAs, and sequences from 72,965 organisms.

The release is provided in several directories as a complete dataset and as divided by logical groupings. See the RefSeq release notes for more information.

Phasing out support for non-human organisms

As of September 1, 2017, the dbSNP and dbVar databases have stopped accepting submissions for non-human organisms. Submissions for non-human variation will now be accepted by the European Variation Archive, one of our partners in the International Nucleotide Sequence Database (INSDC).

NCBI releases newly designed dbSNP RefSNP Report – Alpha version


NCBI dbSNP is pleased to announce a newly designed Reference SNP (RefSNP, rs) Report webpage to provide enhanced performance and presentation for access to individual RefSNP records. This Alpha version of the report enables browsing of submitted and computed RefSNP variant data from the redesigned dbSNP build system.

The new RefSNP report (alpha version). You can see all of the sections described in the blog post, like the summary section and the sidebar menu.

Figure 1. The dbSNP RefSNP Report Alpha for rs268.

Continue reading