We are excited to introduce new and useful updates to the Datasets genome table that let you quickly find and download a genome dataset including genome, transcript and protein sequence, annotation, and a data report.
The new genome table includes many new features and benefits (see Figure 1). With the new genome table you can:
- Find all current genomes, including metagenomes
- View multiple taxa such as birds and bees, or polyphyletic groups like fish
- Easily find genomes with NCBI RefSeq annotations
- Get more accurate genome counts, since each row now represents a single genome with GenBank and RefSeq accessions for that genome in the same row
- Customize your downloads to include either GenBank or RefSeq files, or both
- Download tables or data packages
Continue reading “NLM’s all-new NCBI Datasets genome table is now available” →
GenBank release 250.0 (6/17/2022) is now available on the NCBI FTP site. This release has 18.63 trillion bases and 2.69 billion records.
The current release has 239,017,893 traditional records containing 1,395,628,631,187 base pairs of sequence data. There are also 1,796,349,114 WGS records containing 16,710,373,006,600 base pairs of sequence data, 546,991,572 bulk-oriented TSA records containing 485,056,129,761 base pairs of sequence data, and 111,142,107 bulk-oriented TLS records containing 41,999,358,847 base pairs of sequence data.
Continue reading “GenBank Release 250.0 is available!” →
NCBI had the pleasure of attending and participating in this year’s American Society of Microbiology (ASM) Microbe conference, June 9-13 in Washington, D.C. NCBI staff participated in activities and events throughout the three-day conference. Over 4,500 attendees gathered in the exhibit hall and joined a variety of poster presentations and talks!
Reflections from a few of our NCBI experts
“It was a great honor for me to receive the ASM Elizabeth O. King Lecturer Award. Thank you to my colleagues, without whom so much of my work would not have been possible, and to all of those who attended my presentation on Making Genomics Accessible to Aid Public Health and Research.”
~Michael Feldgarden, Ph.D. Continue reading “ASM Microbe 2022 was a success!” →
The American Society of Microbiology (ASM) Microbe conference is back, and scheduled to take place in-person, June 9th-13th in Washington, D.C.
NCBI staff member Dr. Michael Feldgarden will be recognized by ASM with an award for his research. Other NCBI staff will present posters on NCBI resources and will also be available at our booth (#1128) to address your questions. Drop by to see what’s new and provide your feedback. We hope to see you there! Check out NCBI’s schedule of activities: Continue reading “Come see NCBI at the ASM Microbe Conference 2022” →
Validating genome assemblies submitted to GenBank using ANI based workflow
Average Nucleotide Identity (ANI) analysis is a useful tool to verify taxonomic identities in prokaryotic genomes. As part of the NCBI bacterial genome submission process, GenBank performs ANI analyses to compare submitted prokaryotic genome assemblies against reference data generated from type strains. You can learn about more about the relevant workflow and about type strain curation in our publications (PMC6978984 and PMC4383940).
We use genomes obtained from type strains (type assemblies) in computational comparisons, for example using ANI to reclassify or modify existing taxonomy with reasonable confidence. The taxonomy check status for all 1.3 million bacterial genome assemblies is summarized in the ANI_report_prokaryotes.txt file available from the ASSEMBLY_REPORTS FTP directory. The README file describes the contents of the report in detail. You can run ANI on your genome on its own or in the context of annotation. Find more information here. Continue reading “Average Nucleotide Identity (ANI) for assembly validation” →
The first complete genome sequence of the current monkeypox virus (MPXV) outbreak (isolate name MPXV_USA_2022_MA001) is now available with accession ON563414 in GenBank, a public database of DNA sequences hosted by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM).
Several cases of monkeypox have been identified in geographically widespread countries. Monkeypox is classified as a zoonotic disease where transmission of the virus is usually due to animal-human contact. Genetically, monkeypox viruses cluster into two groups: the Congo basin and the west African clade. This particular outbreak has been identified as due to a virus from the west African clade which is often associated with milder disease and, in this case, human-to-human spread is suspected. Continue reading “Monkeypox virus: Complete genome from the current outbreak now available in GenBank” →
We are excited to announce two improvements to the Read assembly and Annotation Pipeline Tool (RAPT), which allows you to assemble genomic reads for bacterial or archaeal isolates and annotate their genes at the click of a button.
Improved taxonomic assignment
Now RAPT verifies the scientific name you provide with the reads, and corrects it as needed with the Average Nucleotide Identity (ANI) tool, which compares your genome to type strain assemblies in GenBank to place it in the taxonomic tree. So, even if you only have a rough idea of the species you have sequenced, input datasets tailored to your genome will be used for the annotation and you will get the best possible gene set from RAPT. Continue reading “New in RAPT: Better taxonomic assignment and GO annotation” →
The National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) has released a new resource, called the SARS-CoV-2 Variants Overview, that aggregates data related to SARS-CoV-2 variants from sequences available in NCBI’s GenBank and Sequence Read Archive (SRA) databases.
SARS-CoV-2 Variants Overview, a freely available online dashboard, was developed with guidance from the TRACE Working Group as part of NLM’s participation in the National Institutes of Health (NIH) Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) initiative, a public-private partnership for a coordinated research strategy to support and speed up the development of COVID-19 treatments and vaccines.
One impetus for development of the dashboard is that unassembled SRA data cannot be processed through Pango tools, and many SARS-CoV-2 samples are only represented in SRA. The Pango nomenclature is being used by researchers and public health agencies worldwide to track the transmission and spread of SARS-CoV-2, including variants of concern. Thus, we developed a uniform approach to making variant calls from SRA records and assigning Pangolin lineages on the basis of these results. This means that submission groups do not have to go through the effort of creating assemblies. Continue reading “Introducing SARS-CoV-2 Variants Overview, NLM’s latest tool in the fight against COVID-19 “ →
The current release has 236,338,284 traditional records containing 1,173,984,081,721 base pairs of sequence data. There are also 1,750,505,007 WGS records containing 15,428,122,140,820 base pairs of sequence data, 524,464,601 bulk-oriented TSA records containing 465,013,156,502 base pairs of sequence data, and 109,809,966 bulk-oriented TLS records containing 41,321,107,981 base pairs of sequence data. Continue reading “GenBank Release 248.0” →