SARS-CoV-2 genomic data is critical for monitoring the viral spread and evolution of the COVID-19 pandemic, identifying newly emerging variants, and developing and evaluating the countermeasures. As of September 2022, over 13 million SARS-CoV-2 genomes have been sequenced across the world, making it the most sequenced pathogen ever. A cornerstone of genomic analysis is building a phylogeny, which demonstrates the relatedness of individual isolates to the rest of the sequenced genomes. However, the volume of SARS-CoV-2 genomes presents novel opportunities beyond phylogenies, as well as computational challenges to traditional methods of genomic analyses and visualization. Continue reading “NCBI-NIAID Beyond Phylogenies Codeathon was a success!”
Data on GCP include:
- The tables from the MicroBIGG-E database of anti-microbial resistance (AMR), stress response, virulence genes, and genomic elements and the Pathogen Isolates Browser that are both accessible through Google BigQuery.
- The MicroBIGG-E sequences in FASTA format that are available from Google Cloud Storage.
Features & Benefits
Pathogen Detection data on GCP allows you larger-scale access than is currently available through the web or from FTP. Notably, there is no FTP access to MicroBIGG-E; the web interface is limited to 100K rows and sequence downloads are restricted. There are no such restrictions on GCP. MicroBIGG-E at BigQuery also allows you to download all AMRFinderPlus results. Currently there are more than 20 million rows of antimicrobial resistance, virulence, and stress response genes, and point mutations, identified in more than 1 million pathogen isolates.
Here are two examples where researchers have used MicroBIGG-E and AMFinderPlus data to advance research on antimicrobial resistance:
Want to submit federal grant applications quickly and easily? Check out our new and improved SciENcv experience! Science Experts Network Curriculum Vitae (SciENcv) is an electronic system that helps you assemble professional information needed to apply for federal grant support.
SciENcv helps you gather and compile information on expertise, employment, education, and professional accomplishments. You can use SciENcv to create and maintain financial documents and biographical sketches that are submitted as part of grant application packages. Continue reading “Introducing a new and improved SciENcv experience!”
Easily distinguish reverse orientation alignments
We are excited to announce an update to NCBI’s Comparative Genome Viewer (CGV) that allows you to quickly determine the relative orientation of aligned segments. CGV displays whole genome alignments between two different eukaryotic assemblies (Figure 1).
In the viewer, individual alignment regions are connected by colored bands between two chromosomes. These alignments are now colored differently depending on whether the aligned sequences on the two assemblies are in the same orientation (forward) or reverse orientation relative to one another. Forward orientation alignments are connected by green bands, whereas reverse alignments are connected by purple bands. Reverse alignments represent local genome inversions or inverted translocations and may point to areas of significant biological difference between the two assemblies. Continue reading “New feature in the Comparative Genome Viewer!”
Interested in understanding how sequence data are submitted, processed, and made publicly available in GenBank and the Sequence Read Archive (SRA)? Announcing the GenBank and SRA Data Processing webpage!
Here you can learn about procedures that the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine (NLM), uses for processing submitted data and public posting, as well as key definitions of data status. Continue reading “Announcing the GenBank and SRA Data Processing Webpage”
Phase 2 expands the scope of the preprints included in PubMed and PMC
Last month, the National Library of Medicine (NLM) announced plans to extend its NIH Preprint Pilot in PubMed Central (PMC) and PubMed beyond COVID-19 to encompass all preprints reporting on NIH-funded research. The second phase of the pilot, launching later this month, will include preprints supported by an NIH award, contract, or intramural program and posted to an eligible preprint server on or after January 1, 2023. Continue reading “Next Phase of the NIH Preprint Pilot Launching Soon”
An updated bacterial and archaeal reference genome collection is available! This collection of 17,163 genomes was built by selecting exactly one genome assembly for each species among the 272,000+ prokaryotic genomes in RefSeq, except for E. coli for which two assemblies were selected as reference.
A total of 497 species are included in this collection for the first time. In addition, comparing to the October 2022 set, 174 species are represented by a better assembly and 15 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment. The criteria for selecting one assembly for a given species from all assemblies available in RefSeq for the species include assembly contiguity and completeness and quality of the RefSeq annotation. See the documentation for details.
We have updated the nucleotide BLAST RefSeq reference genomes database (fourth in the menu) as well as the database on the Microbial Nucleotide BLAST page to reflect these changes. You can also run BLAST searches against the proteins annotated on these reference genomes (RefSeq Select proteins database, second in the menu).
The potential impact of emerging model organisms on human health
Comparative genomics is a science that compares genomic data either within a species or across species to answer questions in biomedicine. Laboratory experiments can then investigate the functional impact of those genomics similarities and differences. The history of comparative genomics goes back to the mid-1990s, but comparative genomics is now accelerating. A flood of new data is emerging as DNA sequencing technology becomes cheaper and commoditized. While this growth poses many challenges to current tools and approaches, it also offers immense opportunity for scientific research and understanding. These insights continue to reveal novel model organisms that can further the impact of comparative genomics on human health. Continue reading “NIH Comparative Genomics Resource project”
ClusteredNR, the new protein database that provides results with a better overview of protein homologs in a wider range of organisms, is now available for blastx (translated nucleotide query) and PSI-BLAST (Position Specific Iterative BLAST) searches (Figure 1). Simply select ClusteredNR in the database section of the BLAST form. You can even search standard nr at the same time to compare results.
Figure 1. Composite image from the BLAST search forms. The ClusteredNR database is available now for blastx and PSI-BLAST searches in addition to blastp. For all types of searches, you can choose to search both ClusteredNR and standard nr at the same time so you can compare results
ClusteredNR is especially useful with blastx for finding more distant homologs when searching with queries from over-represented groups. For PSI-BLAST, the greater taxonomic scope of ClusteredNR database allows you to work more effectively with the default number target sequences in the first round. The two searches described below highlight these advantages of ClusteredNR.