GenBank release 249.0 (4/19/2022) is now available on the NCBI FTP site. This release has 17.85 trillion bases and 2.66 billion records.
The current release has 237,520,318 traditional records containing 1,266,154,890,918 base pairs of sequence data. There are also 1,781,374,217 WGS records containing 16,071,520,702,170 base pairs of sequence data, 534,770,586 bulk-oriented TSA records containing 474,421,076,448 base pairs of sequence data, and 109,820,387 bulk-oriented TLS records containing 41,324,192,343 base pairs of sequence data. Continue reading “Announcing GenBank Release 249.0”→
Join NCBI at the Bio-IT World 2022 Hackathon on May 4-5, 2022 to learn about and work with data from our ALFA project! The primary goal of this hackathon project is to develop a novel tool, app, or approach to explore and visualize NCBI ALFA variants and allele frequency for 12 different human populations. We aspire to create a new helpful variant interpretation resource for the clinical and research communities.
One impetus for development of the dashboard is that unassembled SRA data cannot be processed through Pango tools, and many SARS-CoV-2 samples are only represented in SRA. The Pango nomenclature is being used by researchers and public health agencies worldwide to track the transmission and spread of SARS-CoV-2, including variants of concern. Thus, we developed a uniform approach to making variant calls from SRA records and assigning Pangolin lineages on the basis of these results. This means that submission groups do not have to go through the effort of creating assemblies. Continue reading “Introducing SARS-CoV-2 Variants Overview, NLM’s latest tool in the fight against COVID-19 “→
We’re reading and incorporating your feedback! As requested, you can now search for sequences in our Multiple Sequence Alignment (MSA) Viewer. You can search the anchor or consensus sequence of a multiple alignment for short sequence strings. This new feature allows you to:
Look for a sequence motif in DNA or protein alignments in order to confirm the presence of a probe or PCR primer.
Check whether your sequence has matches in multiple locations on the anchor or consensus.
We are delighted to announce that three and a half years of hard work by the collaborative team that brought you the Matched Annotation from NCBI and EMBL-EBI (MANE) dataset has culminated in a full article in the April 14 issue of Nature! We invite you to read the online article to learn more about the goals of the MANE collaboration, MANE offerings and how to access them, and the methods used in generating MANE data. And of course, now you have a paper to cite MANE data!
Morales, J., Pujar, S., Loveland, J.E. et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature (2022). DOI: 10.1038/s41586-022-04558-8
Launched in October 2018, MANE is a collaboration between the National Library of Medicine’s (NLM) National Center for Biotechnology Information (NCBI) and the EMBL’s European Bioinformatics Institute (EMBL-EBI), the two major groups who provide whole-genome annotation for a broad range of organisms including human. Our initial offering, MANE Select, is intended to be used as a universal standard to report clinical variants and for browser display in genome resources. Starting from MANE v0.92, we added MANE Plus Clinical transcripts for a small set of genes where MANE Select alone was not sufficient to report known clinical variants (Figure 1).
Figure 1. The Sequence Viewer showing the MANE Project track and the NCBI Genes track for the human SCN5A gene region on chromosome 3. The MANE track has the MANE Select Transcript, NM_000335, and the MANE Plus Clinical transcript, NM_001099404, providing two standard transcripts to represent the gene.
Release 8.0 of the NCBI Hidden Markov models (HMM), used by the Prokaryotic Genome Annotation Pipeline (PGAP), is now available for download. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.
The 8.0 release contains 15,358 models, including 160 that are new since 7.0. In addition, we have added better names, EC numbers, Gene Ontology (GO) terms, gene symbols or publications to over 550 existing HMMs. You can search and view the details for these in the Protein Family Model collection, which also includes conserved domain architectures and BlastRules, and find all RefSeq proteins they name.
GO terms associated with HMMs are now propagated to coding sequences and proteins annotated with PGAP. In case you missed it, see our previous blog post on this topic.
BLAST+ 2.13.0 includes several important new features including SRA BLAST programs, ARM Linux executables, and the ability to produce database metadata as well as some important improvements, and a few bug fixes. You can download the new BLAST release from the FTP site.
SRA / WGS BLAST (blastn_vdb, tblastn_vdb)
Beginning with this release, the BLAST distribution now includes the SRA BLAST programs blastn_vdb and tblastn_vdb that can directly search SRA and WGS projects without the need to build a BLAST database. See the BLAST documentation on how to use these programs with WGS projects.
Starting with BLAST+ 2.13.0, the makeblastdb program generates an additional file with the file extension .njs for nucleotide databases or .pjs for protein databases. These files contain BLAST database metadata in JSON format. See the BLAST database metadata section in the BLAST User Manual for an example. This file can be easily read by many tools and makes the BLAST database more compliant with FAIR principles.
See the release notes for more details on improvements and bug fixes for the release.
Important reminder about usage reporting
As we announced previously, BLAST can report limited usage information back to NCBI. This information shows us whether BLAST+ is being used by the community, and therefore is worth being maintained and developed. It also allows us to focus our development efforts on the most used aspects of BLAST+. Please help us improve BLAST by allowing BLAST to share information about your search. The BLAST privacy statement provides details on the information collected, how it is used, and how to opt-out of reporting if you don’t want to participate.
NCBI offers a portfolio of medical genetics resources to help you research, diagnose, and treat diseases and conditions. You can easily access our data and tools through the Medical Genetics and Human Variation page of the NCBI website. We also encourage you to join our community of thousands of submitters and share your germline and/or somatic data to advance discovery and optimize clinical care.
How and why should you use our resources? Consider the example below.
Your patient is a 40-year-old mother of two presenting with changes in bathroom habits, bleeding, and belly pain. She has a medical history of colonic polyps. Her family history reveals that her maternal grandmother, mother and uncle had several forms of cancers including colon, breast, and endometrium.