The current release has 237,520,318 traditional records containing 1,266,154,890,918 base pairs of sequence data. There are also 1,781,374,217 WGS records containing 16,071,520,702,170 base pairs of sequence data, 534,770,586 bulk-oriented TSA records containing 474,421,076,448 base pairs of sequence data, and 109,820,387 bulk-oriented TLS records containing 41,324,192,343 base pairs of sequence data. Continue reading “Announcing GenBank Release 249.0”
The National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) has released a new resource, called the SARS-CoV-2 Variants Overview, that aggregates data related to SARS-CoV-2 variants from sequences available in NCBI’s GenBank and Sequence Read Archive (SRA) databases.
SARS-CoV-2 Variants Overview, a freely available online dashboard, was developed with guidance from the TRACE Working Group as part of NLM’s participation in the National Institutes of Health (NIH) Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) initiative, a public-private partnership for a coordinated research strategy to support and speed up the development of COVID-19 treatments and vaccines.
One impetus for development of the dashboard is that unassembled SRA data cannot be processed through Pango tools, and many SARS-CoV-2 samples are only represented in SRA. The Pango nomenclature is being used by researchers and public health agencies worldwide to track the transmission and spread of SARS-CoV-2, including variants of concern. Thus, we developed a uniform approach to making variant calls from SRA records and assigning Pangolin lineages on the basis of these results. This means that submission groups do not have to go through the effort of creating assemblies. Continue reading “Introducing SARS-CoV-2 Variants Overview, NLM’s latest tool in the fight against COVID-19 “
The current release has 236,338,284 traditional records containing 1,173,984,081,721 base pairs of sequence data. There are also 1,750,505,007 WGS records containing 15,428,122,140,820 base pairs of sequence data, 524,464,601 bulk-oriented TSA records containing 465,013,156,502 base pairs of sequence data, and 109,809,966 bulk-oriented TLS records containing 41,321,107,981 base pairs of sequence data. Continue reading “GenBank Release 248.0”
GenBank release 247.0 (12/19/2021) is now available on the NCBI FTP site. This release has 16.47 trillion bases and 2.59 billion records.
The current release has 234,557,297 traditional records containing 1,053,275,115,030 base pairs of sequence data. There are also 1,734,664,952 WGS records containing 14,922,033,922,302 base pairs of sequence data, 514,158,576 bulk-oriented TSA records containing 455,870,853,358 base pairs of sequence data, and 109,379,021 bulk-oriented TLS records containing 41,143,480,750 base pairs of sequence data. Continue reading “GenBank Release 247.0”
Introducing the NIH Comparative Genomics Resource (CGR)
NCBI is looking forward to seeing you in person at the International Plant and Animal Genome Conference (PAG XXIX), January 8-12, 2022 in San Diego, California. We’re especially excited to introduce our newest endeavor – the NLM initiative known as the NIH Comparative Genomics Resource (CGR)– a platform we are developing to support comparative analyses of sequenced eukaryotic research organisms. Understanding and supporting the needs of researchers is a fundamental element in the development of CGR and is critical to its future success in supporting a large and diverse collection.
Please join NCBI for the following events to learn more about CGR and how you can inform its development:
GenBank release 246.0 (11/2/2021) is now available on the NCBI FTP site. This release has 16.1 trillion bases and 2.57 billion records.
The current release has 233642893 traditional records containing 1,014,763,752,113 base pairs of sequence data. There are also 1,721,064,101 WGS records containing 14,599,101,574,547 base pairs of sequence data, 508,319,391 bulk-oriented TSA records containing 449,891,016,597 base pairs of sequence data, and 107,569,935 bulk-oriented TLS records containing 40,168,874,815 base pairs of sequence data.
We have recently added several exciting improvements to the SARS-CoV-2 GenBank submission process based on community feedback. To save you time, NCBI completes feature annotation for you, which means SARS-CoV-2 GenBank submission only requires a FASTA file and source metadata. Here are other new features to ease and simplify your submission workflow.
Automatically remove failed sequences from a submission: On the web, a single click lets you opt-in to automatic removal of failed sequences (Figure 1) so that the rest of your sequences can be swiftly accessioned! A report provided after the submission lists your failed sequences and points out potential sequence problems so that you can take a closer look after your error-free sequences are released. This option is also available for submission via FTP.
Need to set up FTP submissions? The NCBI team is here to help. Contact email@example.com.
Figure 1. GenBank submission page showing the option to remove sequences with processing errors.
The current release has 231,982,592 traditional records containing 940,513,260,726 base pairs of sequence data. There are also 1,653,427,055 WGS records containing 13,888,187,863,722 base pairs of sequence data, 498,305,045 bulk-oriented TSA records containing 440,578,422,611 base pairs of sequence data, and 106,995,218 bulk-oriented TLS records containing 39,930,167,315 base pairs of sequence data.
The current release has 227,888,889 traditional records containing 866,009,790,959 base pairs of sequence data. There are also 1,632,796,606 WGS records containing 13,442,974,346,437 base pairs of sequence data, 494,641,358 bulk-oriented TSA records containing 436,594,941,165 base pairs of sequence data, and 102,662,929 bulk-oriented TLS records containing 38,198,113,354 base pairs of sequence data. Continue reading “GenBank release 244.0”
The current release has 227,123,201 traditional records containing 832,400,799,511 base pairs of sequence data. There are also 1,590,670,459 WGS records containing 12,732,048,052,023 base pairs of sequence data, 481,154,920 bulk-oriented TSA records containing 425,076,483,459 base pairs of sequence data, and 102,395,753 bulk-oriented TLS records containing 37,998,534,461 base pairs of sequence data.