Introducing MicroBIGG-E, a browser for microbial AMR genes and other stress and resistance elements

The  Pathogen Detection project now offers the Microbial Browser for Identification of Genetic and Genomic Elements (MicroBIGG-E) that lets you browse anti-microbial resistance (AMR), stress response,  virulence genes and genomic elements found in GenBank published isolate genomes from the NCBI Isolates Browser.  Unlike the  Isolates Browser that provides only a strain-level view of both published and unpublished genomes, MicroBIGG-E shows  the location of these genes, how they were identified, plus phenotypic information (Figure 1).

Figure 1. Top panel.  Portion of the  MicroBIGG-E table display showing the results of a search (genes_on_contig:blaTEM-1 AND genes_on_contig:blaKPC*) for isolates that contain two different beta lactamase genes (blaTEM-1 and any of the carbapenem-hydrolyzing , blaKPC* ) on a single contig.  Available columns include  the element’s type, subtype, and class as well as information about how the element was identified and supporting evidence.    Bottom panel. Graphical view of the annotation on a contig from one of the isolates, the assembled Serratia marcescens record NZ_CP020507 showing the two beta-lactamases in the search (blaTEM-1 and blaKPC-3) as well as an oxacillin-hydrolyzing gene (blaOXA-9). All three genes and some other AMR and stress response genes are part  a mobile element on the assembled contig. 

Continue reading “Introducing MicroBIGG-E, a browser for microbial AMR genes and other stress and resistance elements”

Allele Frequency Aggregator (ALFA) Release 2 is available!

We are excited to announce the NCBI Allele Frequency Aggregator (ALFA) Release 2 (version 20201027095038) as one of the largest and most comprehensive aggregated variant datasets with allele frequency available as open-access. This release contains 79 dbGaP studies that included 192 thousand subjects and 5.8 trillion combined genotypes that generated allele frequency for 904 million variants with 316 million novel ones, previously unknown in dbSNP (Build 154).

Continue reading “Allele Frequency Aggregator (ALFA) Release 2 is available!”

NCBI on YouTube: RAPT and BLAST+ on the Cloud, SARS-CoV-2 genome data in Datasets

It’s time we do another roundup of what’s been happening on YouTube!

First up, the NCBI YouTube channel has merged with the NLM YouTube channel. You’ll now be able to find diverse content all on one channel, from tips on using resources to fascinating moments in the history of medicine and more!

Continue reading “NCBI on YouTube: RAPT and BLAST+ on the Cloud, SARS-CoV-2 genome data in Datasets”

RefSeq release 204 is now available

RefSeq release 204 is now available

RefSeq release 204 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of January 4, 2021, and contains 262,714,372 records, including 191,411,721 proteins, 35,353,412 RNAs, and sequences from 106,581 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Updated human genome Annotation Release 109.20201120
Updated Annotation Release 109.20201120 is an update of NCBI Homo sapiens Annotation Release 109.

The annotation report for 109.20201120 is available here. The annotation products are available in the sequence databases and on the FTP site. Continue reading “RefSeq release 204 is now available”

Prokaryotic representative genomes updated — now over 13 thousand assemblies!

We have updated the bacterial and archaeal representative genome collection!  The current collection contains over 13,000 assemblies selected from the 203,000 prokaryotic RefSeq assemblies to represent their respective species. The collection has increased by 11% since August 2020.  We’ve included about 1,400 species for the first time, have used better assemblies for 1,177 species, and have removed 65 species because of changes in NCBI Taxonomy or uncertainty in their species assignment.

We have also updated the  Representative Genomes Database on the Microbial Nucleotide BLAST page as well as the RefSeq Representative Genome Database on basic nucleotide BLAST, to reflect these changes. Continue reading “Prokaryotic representative genomes updated — now over 13 thousand assemblies!”

GenBank release 241.0

GenBank release 241.0 (12/21/2020) is now available on the NCBI FTP site. This release has 12.98 trillion bases and 2.27 billion records.

The current release has 221,467,827 traditional records containing 723,003,822,007 base pairs of sequence data. There are also 1,517,995,689 WGS records containing 11,830,842,428,018 base pairs of sequence data, 446,397,378 bulk-oriented TSA records containing 392,206,975,386 base pairs of sequence data, and 88,039,152 bulk-oriented TLS records containing 33,036,509,446 base pairs of sequence data. Continue reading “GenBank release 241.0”

Important Changes to NCBI Accounts Coming in 2021

Do you login to NCBI to use MyNCBI, SciENcv, or MyBibliography? Do you submit data to NCBI? If so, you’ll want to read further to get a first glimpse at some important changes to NCBI accounts that will be coming in 2021.

What’s happening?

In brief, NCBI will be transitioning to federated account credentials. NCBI-managed credentials are the username and password you set at NCBI — these will be going away. Federated account credentials are those set through eRA Commons, Google, or a university or institutional point of access.

Why is this happening?

NIH, NLM, and NCBI take your privacy and security very seriously. As part of our normal reviews we have determined that making this change will increase the security of your accounts to a level that we feel is necessary.

When is this happening?

After June 1, 2021, you will no longer be able to use NCBI-managed credentials to login to NCBI.

Continue reading “Important Changes to NCBI Accounts Coming in 2021”

Expanding access to coronavirus-related literature: the COVID-19 Initiative in PMC reaches 100K articles!

One important way the National Library of Medicine (NLM) is responding to the ongoing public health emergency is through the COVID-19 Initiative. This public-private cooperation between NLM and more than 50 scholarly publishers and societies allows you to access over 100,000 articles on COVID-19, SARS-CoV-2 and other coronaviruses through PubMed Central (PMC). This collection includes recently published discoveries, a history of coronavirus reports for comparison, international (globally comprehensive) content, and captures the breadth of research, analysis, and commentary. We make these articles available in human- and machine-readable formats to support public accessibility and analysis by researchers.

You can search this public health emergency collection in PMC or download the collection through the PMC Open Access Subset. The collection spans:

    • More than half a century of research, including articles from the 1960s through the present (more than 60% of the articles included thus far were published in 2020 (Figure 1, top panel);
    •  Several languages, including content in English (~95%), German, French, and Spanish;
    •  Many publication types, more than half of them research or review articles (Figure 1, bottom panel).

Figure 1. The Public Health Emergency Collection articles by decade of publication (top panel) and by publication type (bottom panel).

People have viewed or downloaded articles in this PMC collection more than 80 million times since March reflecting the great demand for such an open and centralized collection. Artificial intelligence organizations, such as the Allen Institute for AI — builders of the COVID-19 Research Dataset (CORD-19), have also used the collection to develop new text and data mining techniques that can help answer high-priority scientific questions related to COVID-19.

To learn more about the initiative and NLM’s collaborators, see the Public Health Emergency COVID-19 Initiative overview and related FAQs.

NCBI hidden Markov models (HMM) release 4.0 now available!

Release 4.0 of the NCBI hidden Markov models (HMM) used by the Prokaryotic Genome Annotation Pipeline (PGAP) is now available from our FTP site. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

This release contains 17,443 models, including 94 new models since the last release. We have also updated names and added EC numbers and  gene symbols to over 100 models. You can search and view the details of these HMMs in the newly deployed Protein Family Model collection that also includes conserved domain architectures and BlastRules  and allows you to find all RefSeq proteins named by these profiles. See our recent post for more details.