Important Changes to NCBI Accounts Coming in 2021

Update: Please see our FAQ page for more information and updates.

Do you login to NCBI to use MyNCBI, SciENcv, or MyBibliography? Do you submit data to NCBI? If so, you’ll want to read further to get a first glimpse at some important changes to NCBI accounts that will be coming in 2021.

What’s happening?

In brief, NCBI will be transitioning to federated account credentials. NCBI-managed credentials are the username and password you set at NCBI — these will be going away. Federated account credentials are those set through eRA Commons, Google, or a university or institutional point of access.

Why is this happening?

NIH, NLM, and NCBI take your privacy and security very seriously. As part of our normal reviews we have determined that making this change will increase the security of your accounts to a level that we feel is necessary.

When is this happening?

After June 1, 2021, you will no longer be able to use NCBI-managed credentials to login to NCBI.

Continue reading “Important Changes to NCBI Accounts Coming in 2021”

Expanding access to coronavirus-related literature: the COVID-19 Initiative in PMC reaches 100K articles!

One important way the National Library of Medicine (NLM) is responding to the ongoing public health emergency is through the COVID-19 Initiative. This public-private cooperation between NLM and more than 50 scholarly publishers and societies allows you to access over 100,000 articles on COVID-19, SARS-CoV-2 and other coronaviruses through PubMed Central (PMC). This collection includes recently published discoveries, a history of coronavirus reports for comparison, international (globally comprehensive) content, and captures the breadth of research, analysis, and commentary. We make these articles available in human- and machine-readable formats to support public accessibility and analysis by researchers.

You can search this public health emergency collection in PMC or download the collection through the PMC Open Access Subset. The collection spans:

    • More than half a century of research, including articles from the 1960s through the present (more than 60% of the articles included thus far were published in 2020 (Figure 1, top panel);
    •  Several languages, including content in English (~95%), German, French, and Spanish;
    •  Many publication types, more than half of them research or review articles (Figure 1, bottom panel).

Figure 1. The Public Health Emergency Collection articles by decade of publication (top panel) and by publication type (bottom panel).

People have viewed or downloaded articles in this PMC collection more than 80 million times since March reflecting the great demand for such an open and centralized collection. Artificial intelligence organizations, such as the Allen Institute for AI — builders of the COVID-19 Research Dataset (CORD-19), have also used the collection to develop new text and data mining techniques that can help answer high-priority scientific questions related to COVID-19.

To learn more about the initiative and NLM’s collaborators, see the Public Health Emergency COVID-19 Initiative overview and related FAQs.

NCBI hidden Markov models (HMM) release 4.0 now available!

Release 4.0 of the NCBI hidden Markov models (HMM) used by the Prokaryotic Genome Annotation Pipeline (PGAP) is now available from our FTP site. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

This release contains 17,443 models, including 94 new models since the last release. We have also updated names and added EC numbers and  gene symbols to over 100 models. You can search and view the details of these HMMs in the newly deployed Protein Family Model collection that also includes conserved domain architectures and BlastRules  and allows you to find all RefSeq proteins named by these profiles. See our recent post for more details.

The Protein Family Model resource is now available!

The new Protein Family Model resource  (Figure 1) provides a way for you to search across the evidence used by the NCBI annotation pipelines to name and classify proteins. You can find protein families by gene symbol, protein function, and many other terms. You have access to related proteins in the family and publications describing members. Protein Family Models includes protein profile hidden Markov models (HMMs) and BlastRules for prokaryotes, and conserved domain architectures for prokaryotes and eukaryotes. The HMMs in the collection include Pfam models, TIGRFAMs as well as models developed at NCBI either de novo, or from NCBI protein clusters.  Each of the BlastRules (PMCID: 5753331) consists of one or more model proteins of known biological function with BLAST identity and coverage cutoffs.  The conserved domain architectures are based on BLAST-compatible Position Specific Score Matrices  (PSSMs) that constitute the NCBI Conserved Domain database.Figure 1. Protein Family Model resource pages. Top panel.  Home page. Middle  panel, selected results summaries from a fielded search for the DnaK gene product (DnaK[Gene Symbol]). Bottom panel, a portion of an HMM record for DnaK derived from NCBI Protein Clusters (NF009946). The record also includes PubMed citations and HMMER analyses showing the RefSeq proteins named by this method.

Continue reading “The Protein Family Model resource is now available!”

NCBI Virus: Test drive our new SARS-CoV-2 interactive data dashboard!

Are you looking for SARS-CoV-2 sequence data? Look no further! The NCBI Virus SARS-CoV-2 Data Hub now has an interactive data dashboard (Figure 1) that shows the collection location (country and US state), the date of collection, and the date of public availability for SARS-CoV-2 sequence data. You can view available nucleotide and protein sequences based on criteria you select and send these to a data table.  You can further filter by normalized source information including sequence length, protein content, host, anatomical isolation source. The sequence records have links to related SRA records and publications in PubMed when available. You can download the data as FASTA-formatted sequences with customizable titles, accession lists, or as a table including data descriptors. See the help documentation for more details.

The sequences  in NCBI Virus were submitted to members of the International Sequence Database Consortium (INSDC) – GenBank, EMBL, and DDBJ. This collaborative effort ensures that data is freely available to the scientific and public health communities where it can be used to understand the biology, evolution, and spread of SARS-CoV-2.

Figure 1. The NCBI Virus SARS-CoV-2 Data Hub Dashboard.  You can narrow down sequence data using collection location, collection date, or the public release date.  After making your selections, click “View results, Analyze, or Download” near the top of the page to see your dataset in the results table, which shows nucleotide, protein, and RefSeq sequences as well as associated metadata.

Continue reading “NCBI Virus: Test drive our new SARS-CoV-2 interactive data dashboard!”

December 9 Webinar: Using BLAST+ in Docker and on the cloud

December 9 Webinar: Using BLAST+ in Docker and on the cloud

Join us on December 9, 2020 to learn about containerized BLAST+ in Docker that is ready to use locally and in the cloud. We are staging BLAST databases in some cloud providers making running containerized BLAST as part of a pipeline in the cloud even easier. In this webinar you will learn about the advantages of containerized BLAST and learn how to use it in some practical examples. You will also learn about Elastic BLAST, a cloud application that is useful for aligning extremely large numbers of sequences against BLAST databases.

  • Date and time: Wed, December 9, 2020 12:00 PM – 12:45 PM EST
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

Read assembly and Annotation Pipeline Tool (RAPT) is available for use and testing

We are excited to launch a beta version of RAPT, the Read assembly and Annotation Pipeline Tool, a one-step application for the genome assembly and gene annotation of archaeal and bacterial isolates. Start from an Illumina run in SRA or on your local machine and get a fully annotated genome!

A RAPT Docker container includes SKESA, a high-accuracy assembler for short reads, PGAP, the annotation pipeline written in the common workflow language (CWL) and used by RefSeq, and cwltool, the reference implementation for CWL. A RAPT release also includes a set of reference data that are critical for a quality annotation. RAPT can be executed with Docker, Singularity or podman on any local or remote machine meeting basic requirements. For users of the Google Cloud Platform, RAPT can be launched from the Google Shell without configuring a virtual machine in advance.

To learn more about RAPT, register for our upcoming webinar.

Questions? Interest in becoming a beta tester? Contact us!

RAPT is available here.

New Columns added to the web BLAST Descriptions Table

In response to your requests, we have added new columns to the Descriptions Table for the web BLAST output. The new columns are  Scientific Name, Common Name, Taxid, and Accession Length. Common Name and Accession Length are now part of the default display. You can click ‘Select columns’ or ‘Manage columns’ to add or remove columns from the display (Figure 1). Your preferences will be saved for your next visit to BLAST, and when you download your results,  whatever columns you have displayed will be saved.

Figure 1. The web BLAST Descriptions Table with all possible columns. You can remove columns through the ‘Manage columns’ menu. If you are not displaying any non-default columns, you can add them using the same menu that will be titled ‘Select columns’.

Customize columns in NCBI’s Multiple Sequence Alignment Viewer

Customize columns in NCBI’s Multiple Sequence Alignment Viewer

We’re excited to report that researchers using the NCBI Multiple Sequence Alignment Viewer (MSAV) can now add or remove columns from the alignment view. In this way, you can choose to show only columns with data relevant for analysis of the sequences in your alignment.

When you arrive at an MSA alignment view, you’ll see columns for the Sequence ID (e.g., sequence accession number), Start and End of the alignment, and the organism (species name).

Sometimes, the information in these default columns isn’t the most useful information for sorting through the alignment. In the example above, all the sequences are from the same organism, so looking at the Organism column won’t help in figuring out the differences among the different sequences in the alignment.

Continue reading “Customize columns in NCBI’s Multiple Sequence Alignment Viewer”