Author: NCBI Staff

New version of PGAP available now!

We are happy to announce that a new version of PGAP is available. This version will annotate 20 to 25% more genes with symbols (e.g. recA) on the assembled genomes of key species, compared to previous versions.

You will observe an increase in symbols when you annotate the genomes of Escherichia coli, Campylobacter jejuni and a few other species. As several users have requested, this feature will facilitate the comparison of gene content across multiple genomes. It is permitted by the addition of a new workflow to PGAP for identifying orthologs between the reference genomes of Escherichia coli str. K-12 substr. MG1655, Bacillus subtilis subsp. subtilis str. 168, Campylobacter jejuni subsp. jejuni NCTC 11168, Mycobacterium tuberculosis H37Rv, and Acinetobacter pittii PHEA-2 and genomes in the same species being annotated. Symbols of reference genes with defined function are propagated to their orthologs in the genome annotated with PGAP.

Continue reading “New version of PGAP available now!”

Structure viewer iCn3D version 3 featuring analysis of 3D structures!

The NCBI structure viewer iCn3D version 3 is now available on the NCBI web site and from GitHub.

Analysis of 3D Structures

You can use the current version with the icn3d package at npm to write scripts to call functions in iCn3D. For example, this script on GitHub can calculate the change in interactions due to a mutation.  The results of this analysis for the structure (6M0J) of the SARS-CoV-2 spike protein bound to the ACE2 receptor are displayed in Figure 1. These show the predicted changes in interactions with other residues in the the SARS-CoV-2 spike protein and in the ACE2 receptor when the asparagine (N) at position 501 of the spike protein is changed to a tyrosine (Y). You can also run these scripts from the command line to process a list of 3D structures to get and analyze annotations.

Figure 1. iCn3D viewer showing the predicted interactions with other residues in the spike protein and in the ACE2 target when the asparagine (N) at position 501 of the SARS-CoV-2 spike protein is substituted with  tyrosine (Y), highlighted in yellow. Interactions were calculated using the script interactions2.js.

Continue reading “Structure viewer iCn3D version 3 featuring analysis of 3D structures!”

Automate your workflow with the ClinVar Submission API

ClinVar and our scientific and patient-care community rely on your submissions. With our new Application Programming Interface (API) for submissions, we’ve made it even easier for you to provide us with your most up-to-date classification of variants. The new RESTful API allows you to automate your submission workflow so that you can submit new records and update existing records faster. Setting up your account to use the API requires three one-time activities:

ClinVar Submission API Setup







Click on each of the steps in order to set up your account to use the API!

Continue reading “Automate your workflow with the ClinVar Submission API”

ClinVar Reaches One Million Variants!

A counter ticks up to 1,000,000. Text reads "Celebrating 1,000,000 variants in ClinVar"

ClinVar has become a go-to resource for the clinical genetics community.  You have come to ClinVar to look for the reported clinical significance of human genetic variants that you’ve identified in clinical testing or through your research.  You have researched the supporting evidence and publications to the benefit of the health and genetic science community .  You have surveyed all available variants within a gene to understand the spectrum of variation for that gene and to curate gene-disease relationships.

We know how critical this information is to you on a daily basis.

We keep ClinVar free and publicly available and work closely with our submitters to add more variants and supporting information, so that you can continue to benefit from this reliable information at your fingertips.

Today, we are proud to announce that ClinVar has passed the milestone of one million variants in our database. Continue reading “ClinVar Reaches One Million Variants!”

GenBank release 243.0

GenBank release 243.0 (5/26/2021) is now available on the NCBI FTP site. This release has 14.03 trillion bases and 2.40 billion records.

The current release has 227,123,201 traditional records containing 832,400,799,511 base pairs of sequence data. There are also 1,590,670,459 WGS records containing 12,732,048,052,023 base pairs of sequence data, 481,154,920 bulk-oriented TSA records containing 425,076,483,459 base pairs of sequence data, and 102,395,753 bulk-oriented TLS records containing 37,998,534,461 base pairs of sequence data. 

Continue reading “GenBank release 243.0”

A more modern PMC is coming – let us know what you think in PMC Labs!

We’re updating PubMed Central (PMC) to a give you a more modern and easier to use site and we want your feedback. The first phase of this work is now on PMC Labs  for you to explore and provide feedback.

In the first phase we have focused on modernizing PMC’s platform to create a more stable and easy-to-update environment. This also includes some initial changes to the homepage (Figure 1), site organization, and article pages (Figure 2). Many of the updates you see on the Labs site create a similar look and feel for PMC and PubMed, reorganizing documentation to highlight the most accessed and important content first and consolidating redundant features to provide a smoother experience. Please visit PMC Labs to try out the PMC updates and provide feedback using the buttons on the lower right-hand side of the Labs pages (Figure 1). We will update the current PMC website with new features once we gather your input on the Labs site.

Figure 1. The PMC Labs homepage featuring the PMC full text search bar,  links to the most heavily used documentation, information for distinct groups of PMC users (Authors, Publishers, and Developers), statistics on deposits, an updated “New in PMC section” (not shown),  and a prominent Feedback link (circled) for you to provide comments and suggestions. Continue reading “A more modern PMC is coming – let us know what you think in PMC Labs!”

Work(shops) from home – NCBI North Texas Workshops and Codeathon 2021

Work(shops) from home – NCBI North Texas Workshops and Codeathon 2021

The NCBI Education team worked with universities in the greater Dallas area to host and present four online workshops and a codeathon, May 11th-20th. These events helped attendees from a variety of educational backgrounds and interests incorporate NCBI data and tools into their work. The NCBI North Texas Workshops spanned topics of clinical genetics, human genome research, coding, and cloud computing and brought together nearly 100 participants. This is the first of three posts describing these events. The current post focuses on the two more traditional workshops: NCBI Resources for Genetic Disease Discovery and Clinical Support and NCBI Resources for Human Genome Research. Subsequent posts will highlight the two technical workshops and the codeathon.

Figure 1. Sample materials from the NCBI North Texas Workshops presented May 11th-14th.

Continue reading “Work(shops) from home – NCBI North Texas Workshops and Codeathon 2021”

Announcing RefSeq Release 206!

Announcing RefSeq Release 206!

RefSeq Release 206 is now available. This release includes the following:

Updated human genome Annotation Release 109.20210514
Updated Annotation Release 109.20210514 is an update of NCBI Homo sapiens Annotation Release 109. The annotation report is available here. The annotation products are available in the sequence databases and on the FTP site.

Other new eukaryotic genome annotations
This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 45 additional species, including: Continue reading “Announcing RefSeq Release 206!”

The wait is over… NIH’s Public Sequence Read Archive is now open access on the cloud

The NIH NCBI Sequence Read Archive (SRA) on AWS, containing all public SRA data, is now live! This data is hosted on Amazon Web Services (AWS) under the Open Data Sponsorship Program (ODP) with support from NIH’s Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative.

Continue reading “The wait is over… NIH’s Public Sequence Read Archive is now open access on the cloud”

Introducing GaPTools, a stand-alone data validation tool for dbGaP submissions

We have just launched GaPTools, a stand-alone data validation tool for NCBI’s database of Genotype and Phenotype (dbGaP) submissions. You can use GaPTools to validate your dbGaP submissions or submissions to other genomic data repositories. GaPTools checks for common data inconsistency and integrity issues and validates subject-sample ID mapping, subject consents, data dictionaries, and phenotype and genotype data. GaPTools is available as a docker image on Docker Hub.

Why Use GaPTools?

GaPTools will validate files before you submit (see Figure 1).  This means that by the time you formally submit, some of the pre-validation steps are already addressed.  This tool allows you to prepare your data quickly and ensures a faster processing cycle and a faster release of your individual-level research data.Figure 1: Flow chart depicting data submission and GaPTools validation

Continue reading “Introducing GaPTools, a stand-alone data validation tool for dbGaP submissions”