Update on My NCBI log-in changes and the Password Retirement Wizard

As we announced previously, the way you log into your My NCBI account will change from your My NCBI username and password to a 3rd-party login. On June 22, we disabled the ability to create new My NCBI passwords and in July we launched the Password Retirement Wizard, which will activate when you login here with a native NCBI password. (Figure 1).

Figure 1. The Password Retirement Wizard screens. The wizard will offer you the option (opt-in) to change your password to a 3rd party login when you sign in at https://account.ncbi.nlm.nih.gov/migrate/ with a native NCBI password. You may choose from any of the available 3rd party accounts. Clicking on an option will take you to the sign-in screen for on the 3rd party website.

Continue reading “Update on My NCBI log-in changes and the Password Retirement Wizard”

Announcing the RefSeq annotation of sheep ARS-UI_Ramb_v2.0!

Announcing the RefSeq annotation of sheep ARS-UI_Ramb_v2.0!

The new reference assembly for sheep is now annotated! Assembly ARS-UI_Ramb_v2.0 is made of 142 scaffolds, a drop from 2,640 in the 2017 assembly Oar_rambouillet_v1.0. With a contig N50 of 43 Mb, ARS-UI_Ramb_v2.0 is 15 times more contiguous than the first assembly of the Rambouillet breed.

Annotation Release 104 (AR 104) of ARS-UI_Ramb_v2.0 reflects these improvements. Nearly 200 more coding genes have a 1:1 ortholog in the human genome than in the annotation of Oar_rambouillet_v1.0 (AR 103). The number of coding models annotated as partial is down 35% from 165 to 107, and the number of coding models labeled low quality due to suspected indels or base substitutions in the underlying genomic sequence decreased by 51% (1646 to 796). Based on BUSCO analysis, 99.1% of the models (cetartiodactyla_odb10) are complete in AR 104 versus 98.8% in AR 103. Details of this annotation, including statistics on the annotation products, the input data used in the pipeline and intermediate alignment results, can be found here. Continue reading “Announcing the RefSeq annotation of sheep ARS-UI_Ramb_v2.0!”

Biosystems retiring March 2022 — use PubChem Pathways

The NCBI BioSystems Database will be retired in March 2022. This retirement includes the representation of BioSystems records in the NCBI Entrez system and viewers of BioSystems content.

NCBI now provides metabolic pathway and other biosystems data through the regularly updated PubChem Pathways resource that offers a fresh, extended, and more modern interface. See the PubChem blog for more details on PubChem Pathways.

Table 1 presents a snapshot of the richer and more extensive data coverage in PubChem Pathways.

Source database BioSystems PubChem Pathways
(as of 8/10/21)
BioCyc 15,259 records 15,037 records
PlantCyc 64,851 records
Reactome 20,478 records 26,407 records
Plant Reactome 18,910 records
NCI Pathway Interaction DB 188 records 745 records
Wiki Pathways 1,478 records 1,823 records
Lipid Maps 14 records 15 records
PharmGKB 147 records
PathBank 110,315 records
COVID-19 Disease Map 20 records
INOH 511 records
GO 44,525 records
KEGG 902,026 records

Table 1: Coverage for BioSystems records in PubChem Pathways. In anticipation of retiring BioSystems, we will begin directing relevant queries to PubChem Pathways to provide users with a richer dataset and enhanced user experience.

Continue reading “Biosystems retiring March 2022 — use PubChem Pathways”

Search the NCBI Hidden Markov models collection against your favorite prokaryotic proteins

The NCBI Hidden Markov models (HMM) 6.0 release, available on our FTP site, has 15,247 models supported at NCBI. We created 80 more new HMMs and consolidated the collection by removing 2,151 HMMs that were nearly identical to another. Release 6.0 also incorporates 12,656 PFAM from release 34 that apply to prokaryotic proteins. You can use the HMMER sequence analysis package to search the collection against your favorite prokaryotic proteins to identify their function. We have also added more specific names or associated EC number, gene symbols and publication to over 500 HMMs.

Gene Ontology (GO) term attributes are now available for 20% of HMM models (see Figure 1 below). We added most of these based on existing mappings, but our experts are working on creating more associations. Starting in the fall, we’ll start propagating GO terms from HMMs to annotated genomes and proteins!

Example Protein Family Model, TIGR03697.1 for the global nitrogen regulator NtcA protein family, with newly shown GO terms (framed in red).
Figure 1. Example Protein Family Model, TIGR03697.1 for the global nitrogen regulator NtcA protein family, with newly shown GO terms (framed in red).

Continue reading “Search the NCBI Hidden Markov models collection against your favorite prokaryotic proteins”

Tackling Petabyte Scale Sequence Search Challenges

Tackling Petabyte Scale Sequence Search Challenges

The volume of biological data being generated by the scientific community is growing exponentially, reflecting technological advances and research activities. This increase in available data has great promise for pushing scientific discovery but also introduces new challenges that scientific communities need to address. The National Institutes of Health’s (NIH) Sequence Read Archive (SRA), which is maintained by the National Library of Medicine’s National Center for Biotechnology Information (NCBI), is a rapidly growing public database that researchers use to improve scientific discovery across all domains of life. As part of the Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative, over 36 petabytes of “next generation” (raw and SRA-formatted) sequencing data is accessible to anybody via two cloud service providers.

To help address the challenges of conducting large-scale analysis of -omic data in the SRA and similar databases, the Department of Energy (DOE) Office of Biological and Environmental Research (BER), the NIH Office of Data Science Strategy (ODSS), and NCBI, held a virtual workshop on June 8, 2021, on Emerging Solutions in Petabyte Scale Sequence Search. The workshop brought together experts from DOE national labs, research institutions, and universities across the world.

SRA data growth over time. Databases like the NIH Sequence Read Archive are growing rapidly and are used extensively by scientific communities. As these databases grow, so do their potential scientific value, but work must be done to ensure ease of access. 

Continue reading “Tackling Petabyte Scale Sequence Search Challenges”

Aug 18 Webinar: Finding Data for your Research Organism: Plants and RNA-Seq data

Aug 18 Webinar: Finding Data for your Research Organism: Plants and RNA-Seq data

Join us on August 18, 2021 at 12PM eastern time for the second webinar on finding data for your non-model research organism. In this webinar, you will learn how to use NCBI’s web resources to get data for a plant species, the black cottonwood. You will see how to find, access, and analyze gene and sequence data from Datasets and other NCBI web resources, as well as sample metadata and gene expression RNA-Seq data from SRA and the SRA Run Selector. You will also see an example that highlights how to use and analyze these data in a typical workflow set up in a Jupyter notebook that uses the NCBI next-gen aligner Magic-BLAST to get relative gene expression levels across samples.

  • Date and time: Wed, August 18, 2021 12:00 PM – 12:45 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI webinars playlist on the NLM YouTube channel. You can learn about future webinars on the Webinars and Courses page.

NCBI events at the Bioinformatics Open Science Conference 2021 (BOSC 2021)

NCBI events at the Bioinformatics Open Science Conference 2021 (BOSC 2021)

Come visit us virtually to learn about new NCBI data access, tools and best practices at the Bioinformatics Open Science Conference  part of the ISMB/ECCB online conference from July 29 – 30, 2021. We will be presenting virtual posters on NCBI resources, offering a Birds of a Feather discussion, and participating in the BOSC  (CoFest) following the conference where you can take part in a hands-on evaluation of ElasticBLAST.

NCBI Posters, July 29, 2021, 11:20 – 12:20 PM EDT

All posters will be presented on Thursday afternoon. You can see complete abstracts on the ISMB/ECCB BOSC schedule.

Nuala O’Leary will talk about NCBI Datasets, a new resource for fast, easy access to NCBI sequence data.  You will learn about the new interface and new tools to access reference genomes, genes, and orthologs using web-based and programmatic tools.

Adelaide Rhodes will present Open access NCBI cloud resources to accelerate scientific insights where you can learn about recent developments in transferring > 20 petabytes of NCBI Sequence Read Archive (SRA) data to the cloud.

Deacon Sweeney will describe the web RAPT service for assembling and annotating bacterial genomes at the click of a button in RAPT, The Read assembly and Annotation Pipeline Tool: building a prokaryotic genome annotation package for users of all backgrounds.

Roberto Vera Alvarez will talk about best practices for using cloud tools for transcriptomics in his poster Transcriptome annotation in the cloud: complexity, best practices, and cost.

Greg Boratyn will discuss improvements to the BLAST-based short read aligner, Magic-Blast, in Recent improvements in Magic-BLAST 1.6.

Visit Christiam Camacho’s poster ElasticBLAST: Using the power of the cloud to speed up science to get an introduction to  ElasticBLAST, a Kubernetes-based approach for high throughput BLAST tasks. Join us following the conference in the CoFest to try out ElasticBLAST yourself and provide input. See the section on the CoFest below and our companion post.

Birds of a Feather, July 29, 2021, 11:20 – 12:20 PM EDT

We will host a Birds of Feather public feedback session on Thursday, where you can provide feedback and participate in discussions on all aspects of NCBI’s new data access options: NCBI Datasets, SRA, BLAST, and the Genome Data Viewer (GDV) — our genome browser for sequence visualization. We welcome your input!  Come and see us!

CollaborationFest (CoFest), July 31 – August 1, 2021

The ElasticBlast team will attend the BOSC CoFest following the conference. Sign up to participate on July 31 and August 1 to get an in-depth orientation and opportunity to test the capabilities of ElasticBlast on the Amazon Web Services (AWS) cloud. You do not have to register for the conference to attend the CoFest. See our post on the CoFest for more information.

 

Try out ElasticBLAST at the BOSC2021 CoFest!

Try out ElasticBLAST at the BOSC2021 CoFest!

Join the BLAST team at the virtual CollaborationFest (July 31 -August 1, 2021) after the BOSC 2021 conference to help test and improve ElasticBLAST, a new cloud-based tool designed to speed up high throughput BLAST searches. We would love to have your help with real world testing of our alpha release of ElasticBLAST with you own workflows and data. You may sign up for the CoFest even if you aren’t registered for BOSC 2021.

Here are suggestions for how you can participate. See the FAQs below for additional information.

  1. Try it out and let us know how well it works. You can be blunt.
  2. Help us improve the documentation.
  3. Write a script to make ElasticBLAST part of your workflow.
  4. Try to process ElasticBLAST results with cloud-native tools. Here is an example.
  5. Bring your own high throughput BLAST search problem to use with ElasticBLAST!  Please discuss it with us first to make sure you don’t blow our budget and get the ElasticBLAST team in trouble!

Continue reading “Try out ElasticBLAST at the BOSC2021 CoFest!”

Participating labs contribute over 70 tests for COVID-19 to the NIH Genetic Testing Registry

Participating labs contribute over 70 tests for COVID-19 to the NIH Genetic Testing Registry

During the COVID-19 pandemic, an often-heard refrain in the arena of public health was “Testing, testing, testing!”. Testing for the presence of the SARS-CoV-2 virus in patients with symptoms or potential exposure, or for the presence of antibodies to the virus in patients who had recovered from the disease, took on vital importance in efforts to curb its spread. Last fall, the NIH Genetic Testing Registry (GTR) expanded its scope to include molecular and serology tests for microorganisms impacting human health and disease. It now contains 70+ tests for COVID-19.

There are 54 molecular genetic tests that detect viral RNA from individual samples or pools using nucleic acid amplification technologies. While most of the tests detect the SARS-CoV-2 viral RNA alone, 8 tests detect multiple bacterial or viral markers as part of a panel. Two tests detect viral variants in a targeted variant analysis of the whole viral genome. Sixteen serologic tests detect antibodies to SARS-CoV-2.

Continue reading “Participating labs contribute over 70 tests for COVID-19 to the NIH Genetic Testing Registry”

New RefSeq annotations for human, zebra finch, great white shark and more!

New RefSeq annotations for human, zebra finch, great white shark and more!

In May and June, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for 27 organisms.

This release includes new annotations for human, zebra finch, golden eagle, sea urchin, snowfinch, Arctic fox, clawed frog, great white shark, and more:

Continue reading “New RefSeq annotations for human, zebra finch, great white shark and more!”