The current release has 237,520,318 traditional records containing 1,266,154,890,918 base pairs of sequence data. There are also 1,781,374,217 WGS records containing 16,071,520,702,170 base pairs of sequence data, 534,770,586 bulk-oriented TSA records containing 474,421,076,448 base pairs of sequence data, and 109,820,387 bulk-oriented TLS records containing 41,324,192,343 base pairs of sequence data. Continue reading “Announcing GenBank Release 249.0”
Author: NCBI Staff
Announcing the Allele Frequency Aggregator (ALFA) Project as part of the Bio-IT World 2022 Hackathon: Visualization of NCBI ALFA Variants
Join NCBI at the Bio-IT World 2022 Hackathon on May 4-5, 2022 to learn about and work with data from our ALFA project! The primary goal of this hackathon project is to develop a novel tool, app, or approach to explore and visualize NCBI ALFA variants and allele frequency for 12 different human populations. We aspire to create a new helpful variant interpretation resource for the clinical and research communities.
We hope to see you there! More information and registration here. Continue reading “NCBI ALFA Project at Bio-IT World 2022 Hackathon”
The National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) has released a new resource, called the SARS-CoV-2 Variants Overview, that aggregates data related to SARS-CoV-2 variants from sequences available in NCBI’s GenBank and Sequence Read Archive (SRA) databases.
SARS-CoV-2 Variants Overview, a freely available online dashboard, was developed with guidance from the TRACE Working Group as part of NLM’s participation in the National Institutes of Health (NIH) Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) initiative, a public-private partnership for a coordinated research strategy to support and speed up the development of COVID-19 treatments and vaccines.
One impetus for development of the dashboard is that unassembled SRA data cannot be processed through Pango tools, and many SARS-CoV-2 samples are only represented in SRA. The Pango nomenclature is being used by researchers and public health agencies worldwide to track the transmission and spread of SARS-CoV-2, including variants of concern. Thus, we developed a uniform approach to making variant calls from SRA records and assigning Pangolin lineages on the basis of these results. This means that submission groups do not have to go through the effort of creating assemblies. Continue reading “Introducing SARS-CoV-2 Variants Overview, NLM’s latest tool in the fight against COVID-19 “
- Belonocnema kinseyi (wasp)
- Daphnia pulex (common water flea)
- Daphnia pulicaria (crustacean)
- Dermatophagoides farinae (American house dust mite)
- Diprion similis (hymenopteran)
- Drosophila willistoni (fly)
- Equus quagga burchellii (Burchell’s zebra) (pictured)
- Gallus gallus (chicken)
- Haliotis rubra (blacklip abalone)
- Haliotis rufescens (red abalone)
- Helicoverpa zea (corn earworm)
- Homalodisca vitripennis (glassy-winged sharpshooter)
- Hydra vulgaris (swiftwater hydra)
- Hypomesus transpacificus (delta smelt)
- Ictalurus punctatus (channel catfish)
- Ischnura elegans (damselfly)
- Lolium rigidum (monocot)
- Lucilia cuprina (Australian sheep blowfly)
- Lynx rufus (bobcat)
- Marmota monax (woodchuck)
- Meles meles (Eurasian badger)
- Micropterus dolomieu (smallmouth bass)
- Neodiprion fabricii (hymenopteran)
- Neodiprion lecontei (redheaded pine sawfly)
- Neodiprion pinetum (white pine sawfly)
- Neodiprion virginiana (hymenopteran)
- Oncorhynchus gorbuscha (pink salmon)
- Osmia bicornis bicornis (red mason bee)
- Scatophagus argus (bony fish)
- Schistocerca americana (American grasshopper)
- Schistocerca piceifrons (Central American locust)
- Silurus meridionalis (bony fish)
- Ursus americanus (American black bear)
- Vanessa cardui (painted lady)
- Vespa crabro (European hornet)
- Vigna umbellata (eudicot)
- Xenia sp. Carnegie-2017 (soft coral)
View the full list of annotated eukaryotes available in the Genome Data Viewer (GDV) browser.
We’re reading and incorporating your feedback! As requested, you can now search for sequences in our Multiple Sequence Alignment (MSA) Viewer. You can search the anchor or consensus sequence of a multiple alignment for short sequence strings. This new feature allows you to:
- Look for a sequence motif in DNA or protein alignments in order to confirm the presence of a probe or PCR primer.
- Check whether your sequence has matches in multiple locations on the anchor or consensus.
- Examine the sequence similarity within the alignment of your match. Continue reading “New feature in the MSA viewer: Search for a short sequence”
We are delighted to announce that three and a half years of hard work by the collaborative team that brought you the Matched Annotation from NCBI and EMBL-EBI (MANE) dataset has culminated in a full article in the April 14 issue of Nature! We invite you to read the online article to learn more about the goals of the MANE collaboration, MANE offerings and how to access them, and the methods used in generating MANE data. And of course, now you have a paper to cite MANE data!
Morales, J., Pujar, S., Loveland, J.E. et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature (2022). DOI: 10.1038/s41586-022-04558-8
Launched in October 2018, MANE is a collaboration between the National Library of Medicine’s (NLM) National Center for Biotechnology Information (NCBI) and the EMBL’s European Bioinformatics Institute (EMBL-EBI), the two major groups who provide whole-genome annotation for a broad range of organisms including human. Our initial offering, MANE Select, is intended to be used as a universal standard to report clinical variants and for browser display in genome resources. Starting from MANE v0.92, we added MANE Plus Clinical transcripts for a small set of genes where MANE Select alone was not sufficient to report known clinical variants (Figure 1).
Figure 1. The Sequence Viewer showing the MANE Project track and the NCBI Genes track for the human SCN5A gene region on chromosome 3. The MANE track has the MANE Select Transcript, NM_000335, and the MANE Plus Clinical transcript, NM_001099404, providing two standard transcripts to represent the gene.
Release 8.0 of the NCBI Hidden Markov models (HMM), used by the Prokaryotic Genome Annotation Pipeline (PGAP), is now available for download. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.
The 8.0 release contains 15,358 models, including 160 that are new since 7.0. In addition, we have added better names, EC numbers, Gene Ontology (GO) terms, gene symbols or publications to over 550 existing HMMs. You can search and view the details for these in the Protein Family Model collection, which also includes conserved domain architectures and BlastRules, and find all RefSeq proteins they name.
GO terms associated with HMMs are now propagated to coding sequences and proteins annotated with PGAP. In case you missed it, see our previous blog post on this topic.
BLAST+ 2.13.0 includes several important new features including SRA BLAST programs, ARM Linux executables, and the ability to produce database metadata as well as some important improvements, and a few bug fixes. You can download the new BLAST release from the FTP site.
SRA / WGS BLAST (blastn_vdb, tblastn_vdb)
Beginning with this release, the BLAST distribution now includes the SRA BLAST programs blastn_vdb and tblastn_vdb that can directly search SRA and WGS projects without the need to build a BLAST database. See the BLAST documentation on how to use these programs with WGS projects.
ARM Linux executables
This release also includes executables compiled under ARM Linux for the first time. Please let us know if you find any issues with ARM Linux programs.
Database metadata in JSON format
Starting with BLAST+ 2.13.0, the makeblastdb program generates an additional file with the file extension .njs for nucleotide databases or .pjs for protein databases. These files contain BLAST database metadata in JSON format. See the BLAST database metadata section in the BLAST User Manual for an example. This file can be easily read by many tools and makes the BLAST database more compliant with FAIR principles.
See the release notes for more details on improvements and bug fixes for the release.
Clinical Genetics Information at Your Fingertips
NCBI offers a portfolio of medical genetics resources to help you research, diagnose, and treat diseases and conditions. You can easily access our data and tools through the Medical Genetics and Human Variation page of the NCBI website. We also encourage you to join our community of thousands of submitters and share your germline and/or somatic data to advance discovery and optimize clinical care.
How and why should you use our resources? Consider the example below.
Your patient is a 40-year-old mother of two presenting with changes in bathroom habits, bleeding, and belly pain. She has a medical history of colonic polyps. Her family history reveals that her maternal grandmother, mother and uncle had several forms of cancers including colon, breast, and endometrium.
Official update scheduled to launch June 2022
As previously announced, we will be moving to an updated version of the E-utilities API for PubMed. We are planning to delay this change until June 2022 to give you time to test your API calls on the new service, report issues, and provide your feedback. Don’t wait until launch! A test server is available leading up to the release and ready for you to try!
How do I use the test server?
The test server is available through the following URL:
Test server: https://eutilspreview.ncbi.nlm.nih.gov/entrez/eutils/