Full-scale access to microbial Pathogen Detection data in the Cloud!

Full-scale access to microbial Pathogen Detection data in the Cloud!

NCBI’s Pathogen Detection resource now provides selected data on the Google Cloud Platform (GCP) allowing you better access to over 1 million bacterial isolates.

Data on GCP include:

  1. The tables from the MicroBIGG-E database of anti-microbial resistance (AMR), stress response, virulence genes, and genomic elements and the Pathogen Isolates Browser that are both accessible through Google BigQuery.
  2. The MicroBIGG-E sequences in FASTA format that are available from Google Cloud Storage.

Features & Benefits

Pathogen Detection data on GCP allows you larger-scale access than is currently available through the web or from FTP.  Notably, there is no FTP access to MicroBIGG-E; the web interface is limited to 100K rows and sequence downloads are restricted.  There are no such restrictions on GCP. MicroBIGG-E at BigQuery also allows you to download all AMRFinderPlus results. Currently there are more than 20 million rows of antimicrobial resistance, virulence, and stress response genes, and point mutations, identified in more than 1 million pathogen isolates.

Here are two examples where researchers have used MicroBIGG-E and AMFinderPlus data to advance research on antimicrobial resistance:

    • Identifying conserved functional regions in erythromycin resistance methyltransferases (PMID: 34795028).
    • Assessing the health risks of antibiotic resistance genes (PMCID: PMC8346589).

Continue reading “Full-scale access to microbial Pathogen Detection data in the Cloud!”

RefSeq Release 216

RefSeq Release 216

RefSeq release 216 is now available online, from the FTP site, and through NCBI’s new resource, Datasets.

This full release incorporates genomic, transcript, and protein data available as of January 9, 2023, and contains 342,395,932 records, including 249,868,639 proteins, 49,869,497 RNAs, and sequences from 128,299 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings. Continue reading “RefSeq Release 216”

Introducing a new and improved SciENcv experience!

Introducing a new and improved SciENcv experience!

Want to submit federal grant applications quickly and easily? Check out our new and improved SciENcv experience! Science Experts Network Curriculum Vitae (SciENcv) is an electronic system that helps you assemble professional information needed to apply for federal grant support.  

SciENcv helps you gather and compile information on expertise, employment, education, and professional accomplishments. You can use SciENcv to create and maintain financial documents and biographical sketches that are submitted as part of grant application packages.  Continue reading “Introducing a new and improved SciENcv experience!”

New feature in the Comparative Genome Viewer!

New feature in the Comparative Genome Viewer!

Easily distinguish reverse orientation alignments

We are excited to announce an update to NCBI’s Comparative Genome Viewer (CGV) that allows you to quickly determine the relative orientation of aligned segments. CGV displays whole genome alignments between two different eukaryotic assemblies (Figure 1). 

In the viewer, individual alignment regions are connected by colored bands between two chromosomes. These alignments are now colored differently depending on whether the aligned sequences on the two assemblies are in the same orientation (forward) or reverse orientation relative to one another. Forward orientation alignments are connected by green bands, whereas reverse alignments are connected by purple bands. Reverse alignments represent local genome inversions or inverted translocations and may point to areas of significant biological difference between the two assemblies.   Continue reading “New feature in the Comparative Genome Viewer!”

Announcing the GenBank and SRA Data Processing Webpage

Announcing the GenBank and SRA Data Processing Webpage

Interested in understanding how sequence data are submitted, processed, and made publicly available in GenBank and the Sequence Read Archive (SRA)? Announcing the GenBank and SRA Data Processing webpage!

Here you can learn about procedures that the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine (NLM), uses for processing submitted data and public posting, as well as key definitions of data status. Continue reading “Announcing the GenBank and SRA Data Processing Webpage”

Next Phase of the NIH Preprint Pilot Launching Soon

Next Phase of the NIH Preprint Pilot Launching Soon

Phase 2 expands the scope of the preprints included in PubMed and PMC

Last month, the National Library of Medicine (NLM) announced plans to extend its NIH Preprint Pilot in PubMed Central (PMC) and PubMed beyond COVID-19 to encompass all preprints reporting on NIH-funded research. The second phase of the pilot, launching later this month, will include preprints supported by an NIH award, contract, or intramural program and posted to an eligible preprint server on or after January 1, 2023.   Continue reading “Next Phase of the NIH Preprint Pilot Launching Soon”

Updated bacterial and archaeal reference genomes collection now available!

Updated bacterial and archaeal reference genomes collection now available!

An updated bacterial and archaeal reference genome collection is available! This collection of 17,163 genomes was built by selecting exactly one genome assembly for each species among the 272,000+ prokaryotic genomes in RefSeq, except for E. coli for which two assemblies were selected as reference.

A total of 497 species are included in this collection for the first time. In addition, comparing to the October 2022 set, 174 species are represented by a better assembly and 15 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment. The criteria for selecting one assembly for a given species from all assemblies available in RefSeq for the species include assembly contiguity and completeness and quality of the RefSeq annotation. See the documentation for details.

We have updated the nucleotide BLAST RefSeq reference genomes database (fourth in the menu) as well as the database on the Microbial Nucleotide BLAST page to reflect these changes. You can also run BLAST searches against the proteins annotated on these reference genomes (RefSeq Select proteins database, second in the menu).

NIH Comparative Genomics Resource project

NIH Comparative Genomics Resource project

The potential impact of emerging model organisms on human health

Comparative genomics is a science that compares genomic data either within a species or across species to answer questions in biomedicine. Laboratory experiments can then investigate the functional impact of those genomics similarities and differences. The history of comparative genomics goes back to the mid-1990s, but comparative genomics is now accelerating. A flood of new data is emerging as DNA sequencing technology becomes cheaper and commoditized. While this growth poses many challenges to current tools and approaches, it also offers immense opportunity for scientific research and understanding. These insights continue to reveal novel model organisms that can further the impact of comparative genomics on human health. Continue reading “NIH Comparative Genomics Resource project”

Now Available! BLAST ClusteredNR database for blastx and PSI-BLAST searches

Now Available! BLAST ClusteredNR database for blastx and PSI-BLAST searches

ClusteredNR, the new protein database that provides results with a better overview of protein homologs in a wider range of organisms, is now available for blastx (translated nucleotide query) and PSI-BLAST (Position Specific Iterative BLAST) searches (Figure 1). Simply select ClusteredNR in the database section of the BLAST form. You can even search standard nr at the same time to compare results.

Figure 1. Composite image from the BLAST search forms. The ClusteredNR database is available now for blastx and PSI-BLAST searches in addition to blastp. For all types of searches, you can choose to search both ClusteredNR and standard nr at the same time so you can compare results

ClusteredNR is especially useful with blastx for finding more distant homologs when searching with queries from over-represented groups. For PSI-BLAST, the greater taxonomic scope of ClusteredNR database allows you to work more effectively with the default number target sequences in the first round. The two searches described below highlight these advantages of ClusteredNR.

Continue reading “Now Available! BLAST ClusteredNR database for blastx and PSI-BLAST searches”

NCBI hidden Markov models (HMM) release 11.0 now available!

NCBI hidden Markov models (HMM) release 11.0 now available!

Release 11.0 of the NCBI protein profile Hidden Markov models (HMMs) used by the Prokaryotic Genome Annotation Pipeline (PGAP) is now available for download. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package. Continue reading “NCBI hidden Markov models (HMM) release 11.0 now available!”