Author: NCBI Staff

NCBI to present on SRA and cloud computing at the 2021 Galaxy Community Conference

NCBI to present on SRA and cloud computing at the 2021 Galaxy Community Conference

 

We’re bringing exciting developments to our user community at the 2021 Galaxy Community Conference (GCC 2021), which is virtual this year!

Dr. Jon Trow, SRA Subject Matter Expert
Dr. Adelaide Rhodes, Cloud Subject Matter Expert

 

 

 

 

 

 

 

 

 

 

We start with hosting NCBI’s first ever GCC training week tutorial co-written by Jon Trow, Ph.D. – Sequence Read Archive (SRA): Subject Matter Expert and Adelaide Rhodes, Ph.D. – Cloud: Subject Matter Expert. This tutorial will become a permanent addition to the Galaxy Training Network. The tutorial, “SRA Aligned Read Format (SARF) to Speed Up SARS-CoV-2 Data Analysis”, has detailed instructions and a video demonstration on how to search SRA metadata for SARFs and download them into Galaxy workflows. We will be available via Slack during Office Hours for live virtual interactions.

Continue reading “NCBI to present on SRA and cloud computing at the 2021 Galaxy Community Conference”

GenBank release 244.0

GenBank release 244.0 (6/26/2021) is now available on the NCBI FTP site. This release has 14.78 trillion bases and 2.46 billion records.

The current release has 227,888,889 traditional records containing 866,009,790,959 base pairs of sequence data. There are also 1,632,796,606 WGS records containing 13,442,974,346,437 base pairs of sequence data, 494,641,358 bulk-oriented TSA records containing 436,594,941,165 base pairs of sequence data, and 102,662,929 bulk-oriented TLS records containing 38,198,113,354 base pairs of sequence data. Continue reading “GenBank release 244.0”

Announcing the re-annotation of RefSeq genome assemblies for E. coli and four other species!

We have re-annotated all RefSeq genomes for Escherichia coliMycobacterium tuberculosis, Bacillus subtilis, Acinetobacter pittii, and Campylobacter jejuni using the most recent release of PGAP. You will find that more genes now have gene symbols (e.g. recA). Your feedback indicated that the lack of symbols was an impediment to comparative analysis, so we hope that this improvement will help.

The number of re-annotated genomes is 25,619 for E. coli, 470 for B. subtilis, 6,828 for M. tuberculosis, 316 for A. pittii, and 1,829 for C. jejuni. On average, the increase in gene symbols is 30% in E. coli, 110% in B. subtilis, 57% in M. tuberculosis, 94% in A. pittii and 62% in C. jejuni (see Figure 1). After re-annotation, on average, 73% of PGAP-annotated E. coli genes and 79% of B. subtilis have symbols (35% for M. tuberculosis, 40% for A. pittii and 46% for C. jejuni). We assigned symbols to the annotated genes by calculating the orthologs between the genome of interest and the reference assembly for the species, and transferring the symbols from the reference genes to their orthologs in the annotated genomes.

Figure 1: Average and standard deviation of the number of genes annotated with symbols per genome, in the previous (blue) and the current annotation (orange). 

Continue reading “Announcing the re-annotation of RefSeq genome assemblies for E. coli and four other species!”

June 30 Webinar: Using NCBI Datasets to download sequence and annotation for genomes and genes

June 30 Webinar: Using NCBI Datasets to download sequence and annotation for genomes and genes

Join us on June 30, 2021 at 12PM eastern time to learn how to use the new NCBI Datasets resource to find and download gene, genome and SARS-CoV-2 sequence and annotation. You will learn how to access these datasets through either the web interface or the new command-line tools that allow you to incorporate these data in your bioinformatic workflows.

  • Date and time: Wed, June 30, 2021 12:00 PM – 12:45 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI webinars playlist on the NLM YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Getting Started with Python and Cloud Computing — NCBI North Texas Workshops and Codeathon 2021

Getting Started with Python and Cloud Computing — NCBI North Texas Workshops and Codeathon 2021

Learning to use computational tools and techniques is increasingly important for life scientists. But knowing where to start when learning relevant data-centric skills such as coding and cloud computing can be a big challenge. The NCBI education team is here to help!  As a part of the NCBI North Texas Workshops and Codeathon, we presented workshops that helped novice users learn about coding in Python, Jupyter Notebooks, and cloud computing.

Attendees from the greater Dallas area logged on to webinars and NCBI-provided cloud accounts to learn about programming in Python, Jupyter Notebooks, cloud computing services, and to perform relevant research tasks. Continue reading “Getting Started with Python and Cloud Computing — NCBI North Texas Workshops and Codeathon 2021”

New version of PGAP available now!

We are happy to announce that a new version of PGAP is available. This version will annotate 20 to 25% more genes with symbols (e.g. recA) on the assembled genomes of key species, compared to previous versions.

You will observe an increase in symbols when you annotate the genomes of Escherichia coli, Campylobacter jejuni and a few other species. As several users have requested, this feature will facilitate the comparison of gene content across multiple genomes. It is permitted by the addition of a new workflow to PGAP for identifying orthologs between the reference genomes of Escherichia coli str. K-12 substr. MG1655, Bacillus subtilis subsp. subtilis str. 168, Campylobacter jejuni subsp. jejuni NCTC 11168, Mycobacterium tuberculosis H37Rv, and Acinetobacter pittii PHEA-2 and genomes in the same species being annotated. Symbols of reference genes with defined function are propagated to their orthologs in the genome annotated with PGAP.

Continue reading “New version of PGAP available now!”

Structure viewer iCn3D version 3 featuring analysis of 3D structures!

The NCBI structure viewer iCn3D version 3 is now available on the NCBI web site and from GitHub.

Analysis of 3D Structures

You can use the current version with the icn3d package at npm to write scripts to call functions in iCn3D. For example, this script on GitHub can calculate the change in interactions due to a mutation.  The results of this analysis for the structure (6M0J) of the SARS-CoV-2 spike protein bound to the ACE2 receptor are displayed in Figure 1. These show the predicted changes in interactions with other residues in the the SARS-CoV-2 spike protein and in the ACE2 receptor when the asparagine (N) at position 501 of the spike protein is changed to a tyrosine (Y). You can also run these scripts from the command line to process a list of 3D structures to get and analyze annotations.

Figure 1. iCn3D viewer showing the predicted interactions with other residues in the spike protein and in the ACE2 target when the asparagine (N) at position 501 of the SARS-CoV-2 spike protein is substituted with  tyrosine (Y), highlighted in yellow. Interactions were calculated using the script interactions2.js.

Continue reading “Structure viewer iCn3D version 3 featuring analysis of 3D structures!”

Automate your workflow with the ClinVar Submission API

ClinVar and our scientific and patient-care community rely on your submissions. With our new Application Programming Interface (API) for submissions, we’ve made it even easier for you to provide us with your most up-to-date classification of variants. The new RESTful API allows you to automate your submission workflow so that you can submit new records and update existing records faster. Setting up your account to use the API requires three one-time activities:

ClinVar Submission API Setup

 

 

 

 

 

 

Click on each of the steps in order to set up your account to use the API!

Continue reading “Automate your workflow with the ClinVar Submission API”

ClinVar Reaches One Million Variants!

A counter ticks up to 1,000,000. Text reads "Celebrating 1,000,000 variants in ClinVar"

ClinVar has become a go-to resource for the clinical genetics community.  You have come to ClinVar to look for the reported clinical significance of human genetic variants that you’ve identified in clinical testing or through your research.  You have researched the supporting evidence and publications to the benefit of the health and genetic science community .  You have surveyed all available variants within a gene to understand the spectrum of variation for that gene and to curate gene-disease relationships.

We know how critical this information is to you on a daily basis.

We keep ClinVar free and publicly available and work closely with our submitters to add more variants and supporting information, so that you can continue to benefit from this reliable information at your fingertips.

Today, we are proud to announce that ClinVar has passed the milestone of one million variants in our database. Continue reading “ClinVar Reaches One Million Variants!”

GenBank release 243.0

GenBank release 243.0 (5/26/2021) is now available on the NCBI FTP site. This release has 14.03 trillion bases and 2.40 billion records.

The current release has 227,123,201 traditional records containing 832,400,799,511 base pairs of sequence data. There are also 1,590,670,459 WGS records containing 12,732,048,052,023 base pairs of sequence data, 481,154,920 bulk-oriented TSA records containing 425,076,483,459 base pairs of sequence data, and 102,395,753 bulk-oriented TLS records containing 37,998,534,461 base pairs of sequence data. 

Continue reading “GenBank release 243.0”