Tag: GenBank

GenBank release 238 is available

GenBank release 238.0 (6/19/2020) is now available on the NCBI FTP site. This release has 8.93 trillion bases and 2 billion records.

The current release has 217,122,233 traditional records containing 427,823,258,901 base pairs of sequence data. There are also 1,302,852,615 WGS records containing 8,114,046,262,158 base pairs of sequence data, 409,725,050 bulk-oriented TSA records containing 359,947,709,062 base pairs of sequence data, and 75,063,181 bulk-oriented TLS records containing 27,500,635,128 base pairs of sequence data.

Continue reading “GenBank release 238 is available”

New GenBank submission options for SARS-CoV-2 submitters

NCBI is pleased to announce ongoing enhancements to submission of SARS-CoV-2 assembled genomes to GenBank, including a streamlined workflow on the web and a new API option. Both new options mean that you can receive accessions for SARS-CoV-2 data submissions more quickly!

A streamlined workflow with improved interface and enhanced validation on both web and API saves you time and effort and, most importantly, makes it possible to get SARS-CoV-2 accession numbers and public release of data within hours. In addition, we automatically annotate all SARS-CoV-2 genomes to produce standardized, consistent annotation which saves you time and benefits researchers who find your data valuable. Continue reading “New GenBank submission options for SARS-CoV-2 submitters”

Expanded average nucleotide identity analysis now available for prokaryotic genome assemblies

As we described in an earlier post, GenBank uses average nucleotide identity (ANI) analysis to find and correct misidentified prokaryotic genome assemblies. You can now access ANI data for the more than 600,000 GenBank bacterial and archaeal genome assemblies through a downloadable report (ANI_report_prokaryotes.txt) available from the genomes/ASSEMBLY_REPORTS area of the FTP site. The README describes the contents of the report in detail. You can use the ANI data to evaluate the taxonomic identity of genome assemblies of interest for yourself.

The new ANI_report_prokaryotes.txt replaces the older ANI_report_bacteria.txt in the same directory. We are no longer updating the ANI_report_bacteria.txt file and will remove it after 31st May 2020.

GenBank release 237 is available

GenBank release 237.0 (4/21/2020) is now available on the NCBI FTP site. This release has over 8.58 trillion bases and 1.95 billion records.

The release has 216,531,829 traditional records containing 415,770,027,949 base pairs of sequence data. There are also 1,267,547,429 WGS records containing 7,788,133,221,338 base pairs of sequence data, 396,392,280 bulk-oriented TSA records containing 349,692,751,528 base pairs of sequence data, and 65,521,132 bulk-oriented TLS records containing 24,615,270,313 base pairs of sequence data.

During the 63 days between the close dates for GenBank Releases 236.0 and 237.0, the ‘traditional’ portion of GenBank grew by 16,393,173,077 base pairs and by 317,614 sequence records. During that same period, 55,268 records were updated. An average of 5,919 ‘traditional’ records were added and/or updated per day.

Between releases 236.0 and 237.0, the WGS component of GenBank grew by 819,141,955,586 basepairs and by 60,826,741 sequence records. The TSA component of GenBank grew by 8,698,462,463 basepairs and by 9,747,409 sequence records. The TLS component of GenBank grew by 10,945,592,117 basepairs and by 31,483,761 sequence records.

The total number of sequence data files increased by 59 with this release. The divisions are as follows:

  • BCT: 14 new files, now a total of 432
  • CON: 1 new file, now a total of 217
  • ENV: 1 new file, now a total of 60
  • INV: 6 new files, now a total of 86
  • MAM: 15 new files, now a total of 64
  • PLN: 8 new files, now a total of 212
  • VRT: 14 new files, now a total of 175

For downloading purposes, the uncompressed GenBank release 237.0 flat files require roughly 1142 GB, including the sequence files and the *.txt files. The ASN.1 data files require approximately 844 GB.

More information about GenBank release 237.0 is available in the Release Notes, as well as in the README files in the GenBank and ASN.1 (ncbi-asn1) directories on FTP.

Read about NCBI resources in 2020 Nucleic Acids Research database issue

The 2020 Nucleic Acids Research database issue features papers from NCBI staff on GenBank, ClinVar and more. These papers are also available on PubMed. To read an article, click on the PMID number listed below.

“Database resources of the National Center for Biotechnology Information”

by Eric W Sayers, Jeff Beck, J Rodney Brister, Evan E Bolton, Kathi Canese et al. (PMID: 31602479)

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 38 distinct databases. This article provides a brief overview of the NCBI Entrez system of databases, followed by a summary of resources that were either introduced or significantly updated in the past year, including PubMed, PMC, BookshelfBLAST databases and more!

Continue reading “Read about NCBI resources in 2020 Nucleic Acids Research database issue”

Rapid access to SARS-CoV-2 data from the current public health emergency

As the global health emergency around the Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, formerly 2019-nCoV) continues, we continue to play a key role in providing the biomedical community free and easy access to genome sequences from the coronavirus. You can quickly access these data through the NCBI search (Figure 1).sar-2_sensorFigure 1.  NCBI search results for the term “SARS-COV-2” showing the schematic map of the viral assembly and annotation and buttons that link to the data in the NCBI Virus resource, a specialized BLAST page that searches Betacoronavirus sequences, and the reference assembly download. The bottom panel provides links to the CDC website for COVID-19 information and a link to GenBank®/SRA sequence data.

Continue reading “Rapid access to SARS-CoV-2 data from the current public health emergency”

Dengue virus submission improvements now live!

When there is an outbreak of dengue fever in the world, it’s critical that viral genomic sequence data be submitted by researchers and made available to analyze as soon as possible.  You can now submit Dengue virus sequences to GenBank using a new workflow (Figure 1) in the Submission Portal designed to help make these data available as soon as possible.  The streamlined process, similar to the one described in a previous post for animal mitochondrial COX1 sequences, has an improved interface, enhanced validation, and automatic annotation that saves you time and effort.

Dengue_sub

Figure 1. The Submission Portal pages for targeted sequence submission workflows. Top panel. The new submission page for entering the workflow. Bottom panel. Submission Portal page with the Dengue virus submission option selected (boxed in red).  The service has options for other targeted submissions including mitochondrial COX1 from multicellular animals (metazoa), ribosomal RNA (rRNA), rRNA-ITS, Influenza virus, and Norovirus sequences.

This update is part of a larger and ongoing effort to consolidate GenBank submissions in a central location.  In addition to Dengue virus data, you can also submit Influenza A, B, C and Norovirus sequences as well as other targeted sequences including mitochondrial COX1 genes from multicellular animals (metazoa), ribosomal RNA (rRNA), and rRNA-ITS through the options on the Submission Portal.  You should submit other types of sequence data including other virus sequences to GenBank using BankIt or tbl2ASN.

You can use the search feature on the Submission Portal to find the appropriate submission tool for your data.

Novel coronavirus complete genome from the Wuhan outbreak now available in GenBank

Updated!

Get rapid access to Wuhan coronavirus (2019-nCoV) sequence data from the current outbreak as it becomes available. We will continue to update the page with newly released data.

The complete annotated genome sequence of the novel coronavirus associated with the outbreak of pneumonia in Wuhan, China is now available from GenBank for free and easy access by the global biomedical community. Figure 1 shows the relationship of the Wuhan virus to selected coronaviruses.

Wuhan-human-1_posterior-output2

Figure 1.  Phylogenetic tree showing the relationship of Wuhan-Hu-1 (circled in red) to selected coronaviruses. Nucleotide alignment was done with MUSCLE 3.8. The phylogenetic tree was estimated with MrBayes 3.2.6 with parameters for GTR+g+i.  The scale bar indicates estimated substitutions per site, and all branch support values are 99.3% or higher.

Continue reading “Novel coronavirus complete genome from the Wuhan outbreak now available in GenBank”

GenBank release 235

GenBank release 235

GenBank release 235.0 (12/11/2019) is now available on the NCBI FTP site. This release has 7 trillion bases and 1.74 billion records.

The current release has 215,333,020 traditional records containing 388,417,258,009 base pairs of sequence data. There are also 1,127,023,870 WGS records containing 6,277,551,200,690 base pairs of sequence data, 367,193,844 bulk-oriented TSA records containing 325,433,016,129 base pairs of sequence data, and 28,227,180 bulk-oriented TLS records containing 11,280,596,614 base pairs of sequence data.

Continue reading “GenBank release 235”

Mitochondrial COX1 submission improvements now live in submission portal!

GenBank submitters, you can now submit mitochondrial COX1 (cytochrome oxidase subunit I; COI) sequence data from multicellular animals (metazoa) using a new workflow (Figure 1) with an improved interface, enhanced validation, and automatic COX1 CDS feature annotation.  Once you have submitted mitochondrial COX1 data using this tool, you’ll have a single, helpful page to reference your submission information: accession number(s), COX1 submission status, relevant files and more. Plus, you can also fix any errors from this page.

COX1_Submission2
Figure 1. Submission Portal page with the mitochondrial COX1 submission option selected (boxed in red).  The service has options for other targeted submissions including ribosomal RNA (rRNA), rRNA-ITS, Influenza virus, and Norovirus sequences.

Continue reading “Mitochondrial COX1 submission improvements now live in submission portal!”