Category: What’s New

Fungal Disease Awareness Week: fungal pathogen data and literature at NCBI

Fungal Disease Awareness Week: fungal pathogen data and literature at NCBI

This post is in support of the CDC’s Fungal Disease Awareness Week — September 20-24, 2021.

The impact of fungal diseases on human health has often been neglected, but increased association of fungal infections with severe illness and death during the COVID-19 pandemic has brought fungal diseases into the spotlight.

According to the CDC, the most common fungal co-infections in patients with COVID-19 include aspergillosis or invasive candidiasis including healthcare-associated infection from Candida auris.  Other reported diseases are mucormycosis, coccidioidomycosis and cryptococcosis. Aspergillosis is commonly caused by Aspergillus fumigatus, mucormycosis by Rhizopus species, coccidioidomycosis by Coccidioides immitis and C. posadasii and cryptococcosis by Cryptococcus neoformans.

This post explores several NCBI resources that have relevant information about the fungal pathogens implicated in these COVID-19 related illnesses.

Assembled genomes

Correctly identified and annotated genome assemblies are available for the fungal taxa implicated as co-infections in COVID-19 patients are summarized in table below.  These and  many other fungi are also available as curated RefSeq genome assemblies.

Continue reading “Fungal Disease Awareness Week: fungal pathogen data and literature at NCBI”

RefSeq Release 208 is available!

RefSeq Release 208 is available!

RefSeq release 208 is now available online, from the FTP site and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of September 7, 2021, and contains 288,903,207 records, including 210,703,648 proteins, 40,213,945 RNAs, and sequences from 113,002 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings. Continue reading “RefSeq Release 208 is available!”

New in NCBI Datasets: Species pages and species browser

NCBI Datasets introduces species pages and species browser! The species pages summarize taxon information and provide access to genomic data, including reference genomes. For example, see Figure 1, the Nothobranchius furzeri (turquoise killifish) species page.

Figure 1: Nothobranchius furzeri species page. The browse species button will take you to the species browser. 

Continue reading “New in NCBI Datasets: Species pages and species browser”

Learn the best way to find data in NIH’s Sequence Read Archive (SRA) on the cloud

Learn the best way to find data in NIH’s Sequence Read Archive (SRA) on the cloud

NCBI will present a workshop at the American Society for Human Genetics (ASHG) as part of their conference activities in 2021. The workshop is scheduled for Wednesday, September 15, 2021.

Register now!

Adelaide Rhodes, Ph.D. from the Customer Experience team and Adam Stine, SRA Curator will co-lead the workshop, which will introduce attendees to powerful metadata searches on BigQuery on Google Cloud Platform (GCP) and Athena on Amazon Web Services (AWS) to speed up analytic workflows using the NIH’s Sequence Read Archive (SRA).

Cloud-based query services with expanded metadata options for SRA help researchers to find the target data more quickly than ever before. The workshop will be a mix of training in Structured Query Language (SQL), demos on the cloud console and hands-on exercises in Jupyter notebooks with examples to help researchers understand how to build searches in SQL. Researchers who attend this workshop will learn how to extract specific data sets as well as how to conduct exploratory analysis of the entirety of the SRA data available in the cloud.

Both BigQuery and Athena require SQL but no prior SQL experience is required. By the end of this workshop you will know how to run cloud metadata queries using SQL to find SRA data based on parameters that are of interest to you.

Adam Stine, Ph.D., SRA Curator
Adelaide Rhodes, Ph.D., Customer Experience

 

Sept 22 Webinar: Using NCBI Datasets command-line tools to access data and metadata for genomes

Sept 22 Webinar: Using NCBI Datasets command-line tools to access data and metadata for genomes

Join us on September 22, 2021 at 12PM eastern time learn to use the datasets command-line tools (datasets and dataformat) to access, filter, download, and format data and metadata for genomes. Through examples from eukaryotes and the SARS-CoV-2 coronavirus, you will see how to use metadata to filter for genome sequences with desired properties such as genomes with high contig N50 values.

  • Date and time: Wed, September 22, 2021 12:00 PM – 12:45 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI webinars playlist on the NLM YouTube channel. You can learn about future webinars on the Webinars and Courses page.

NCBI’s GI sequence identifiers will soon exceed 32-bit numbers. Are you and your software ready?

In 2016, NCBI announced that it was curtailing its display of its numeric ‘GI’ in popular sequence data formats such as FASTA and GenBank flatfiles. Due to the continued growth of GenBank, NCBI will soon begin assigning GIs exceeding the signed 32-bit threshold of 2,147,483,647 for those remaining sequence types that still receive these identifiers.

NCBI has updated products including Entrez systemGenBank (Nucleotide), BLAST™ and the C++ Toolkit to prepare for that moment by upgrading GI-related code and APIs to accept 64-bit integers. This change over is projected for late 2021. Stay tuned for additional communications from NCBI and take note of the following information if you think you may be impacted.

For a seamless transition, all organizations and developers using our products should review software for any remaining reliance on GIs and compatibility with these larger identifiers. Note that this update requires no changes to submission procedures or assignment of accessions.  Continue reading “NCBI’s GI sequence identifiers will soon exceed 32-bit numbers. Are you and your software ready?”

PubMed Central Article Datasets are Now Available on the Cloud

To enhance machine access to biomedical literature and drive impactful analyses and reuse, the National Library of Medicine (NLM) is pleased to announce the availability of the PubMed Central (PMC) Article Datasets on Amazon Web Services (AWS) Registry of Open Data as part of AWS’s Open Data Sponsorship Program (ODP). These datasets collectively span 4 million of PMC’s 7 million (total) full-text scientific articles.

screenshot of the registry of open data of AWS (Amazon Web Services)
Figure 1. NCBI PMC Article Datasets on Registry of Open Data on AWS.

Continue reading “PubMed Central Article Datasets are Now Available on the Cloud”

GenBank release 245.0

GenBank release 245.0 (8/18/2021) is now available on the NCBI FTP site. This release has 15.31 trillion bases and 2.49 billion records.

The current release has 231,982,592 traditional records containing 940,513,260,726 base pairs of sequence data. There are also 1,653,427,055 WGS records containing 13,888,187,863,722 base pairs of sequence data, 498,305,045 bulk-oriented TSA records containing 440,578,422,611 base pairs of sequence data, and 106,995,218 bulk-oriented TLS records containing 39,930,167,315 base pairs of sequence data.

Continue reading “GenBank release 245.0”

Gene filtering in NCBI Sequence Viewer

Gene filtering in NCBI Sequence Viewer

We are excited to announce new track display options for gene annotation tracks in the NCBI Genome Data Viewer genome browser and other instances of the NCBI Sequence Viewer!

Now, you can simplify gene annotation tracks to show only the genes and transcripts that you care about most.  For instance, you can choose to hide non-coding transcripts, including pseudogenes, so that only protein-coding transcript variants are shown in your view. You can also hide any transcript models predicted using NCBI’s Gnomon algorithm. Learn more:

Continue reading “Gene filtering in NCBI Sequence Viewer”

Update on My NCBI log-in changes and the Password Retirement Wizard

As we announced previously, the way you log into your My NCBI account will change from your My NCBI username and password to a 3rd-party login. On June 22, we disabled the ability to create new My NCBI passwords and in July we launched the Password Retirement Wizard, which will activate when you login here with a native NCBI password. (Figure 1).

Figure 1. The Password Retirement Wizard screens. The wizard will offer you the option (opt-in) to change your password to a 3rd party login when you sign in at https://account.ncbi.nlm.nih.gov/migrate/ with a native NCBI password. You may choose from any of the available 3rd party accounts. Clicking on an option will take you to the sign-in screen for on the 3rd party website.

Continue reading “Update on My NCBI log-in changes and the Password Retirement Wizard”