Try out Datasets and ElasticBLAST at the BOSC 2022 CoFest!

Try out Datasets and ElasticBLAST at the BOSC 2022 CoFest!

Join NLM’s NCBI at the virtual CollaborationFest on July 15 from 08:00 – 11:00 CDT and 12:00 – 16:00 CDT following the BOSC 2022 conference. Get an in-depth orientation and opportunity to test the capabilities of Datasets and ElasticBLAST.

What is Datasets?

Datasets is a new resource that lets you easily gather data from across NCBI databases. Find and download gene, transcript, protein and genome sequences, annotation and metadata. We invite you to try the Datasets command line tool in your bioinformatic workflows! Continue reading “Try out Datasets and ElasticBLAST at the BOSC 2022 CoFest!”

GenBank Release 250.0 is available!

GenBank Release 250.0 is available!

GenBank release 250.0 (6/17/2022) is now available on the NCBI FTP site. This release has 18.63 trillion bases and 2.69 billion records. 

The current release has 239,017,893 traditional records containing 1,395,628,631,187 base pairs of sequence data. There are also 1,796,349,114 WGS records containing 16,710,373,006,600 base pairs of sequence data, 546,991,572 bulk-oriented TSA records containing 485,056,129,761 base pairs of sequence data, and 111,142,107 bulk-oriented TLS records containing 41,999,358,847 base pairs of sequence data.

Continue reading “GenBank Release 250.0 is available!”

Introducing NLM’s new NCBI Datasets genome page!

Introducing NLM’s new NCBI Datasets genome page!

As part of an ongoing effort to modernize and improve your experience, NLM’s NCBI Datasets is introducing all-new genome pages. These pages make it easier for you to browse and download genome sequence and metadata, and navigate to tools such as the Genome Data Viewer (GDV) and BLAST.

To get started, search NCBI Datasets by assembly accession (e.g., GCF_016699485.2), assembly name (e.g., bGalGal1.mat.broiler.GRCg7b), WGS accession (e.g., JAENSK01), or species name + genome (e.g., chicken genome), and click on the title in the box. See the top red arrow in Figure 1 below where we search for ‘chicken genome’.

Figure 1: Finding the chicken reference assembly. A search for ‘chicken genome’ returns a box that provides a quick link to the new genome page (middle red arrow). From there, the download button (bottom red arrow) allows you to select the files you need (see ‘Download Package’ window on the left) along with a detailed metadata report that includes all the metadata on the web page.  Continue reading “Introducing NLM’s new NCBI Datasets genome page!”

Join Us at the ISMB Codeathon- Tools for Sharable Protein Analysis

NLM’s NCBI is gearing up for the Tools for Sharable Protein Analysis Codeathon, which will take place July 10 – July 14, 2022 at the International Society for Computational Biology (ISMB) Conference. This event is the third installation of the ISMB codeathon. In previous years, teams led a series of developments on in-depth and systematic analysis of molecular interactions, effects of mutations, protein flexibility, annotations of topological domains, and more! This year, projects will span the following themes:

  1. ANALYSIS OF LARGE DATASETS
  2. ANALYSIS OF BIOMOLECULAR INTERACTIONS AND MUTATIONS
  3. BIOMOLECULAR REPRESENTATIONS AND VISUALIZATION
  4. USER INTERFACES AND DATA SHARING MECHANISMS

The event will focus in part on our interactive, web-based 3D structure viewer, iCn3D. NCBI experts Jiyao Wang, Ph.D., Tom Madej, Ph.D., and Alexa Salsbury, Ph.D. will be on hand to help teams throughout the event.

This hybrid codeathon will support teams in person and virtually. We encourage researchers working in all areas of computational biology to join us for the event! To apply for the event, please fill out this form before July 9, 2022!

More information on the event can be found here.

New RefSeq annotations are available!

New RefSeq annotations are available!

In April and May, the NCBI Eukaryotic Genome Annotation Pipeline released twenty-eight new annotations in RefSeq for the following organisms:

Fungal species identification using DNA: an NCBI and USDA-APHIS collaboration with a focus on Colletotrichum

Fungal species identification using DNA: an NCBI and USDA-APHIS collaboration with a focus on Colletotrichum

As reported in the journal Plant Disease,  a recent collaboration between National Library of Medicine’s NCBI and the U.S. Department of Agriculture’s Animal and Plant Health Inspection Service (APHIS) analyzed public sequence records for the fungal genus Colletotrichum, an important group of fungal plant pathogens that are a significant threat  to food production. Colletotrichum species are challenging to identify accurately, and public sequences may contain out of date taxonomic information. The study improved the accuracy of species names assigned to Colletotrichum database sequences, verified a comprehensive set of reliable reference markers for the genus, and produced a multi-marker tree as well as the genome based interactive tree shown in Figure 1.

Figure 1.  Views from genome assembly derived multi-protein distance tree that shows the analysis of publicly available Colletotrichum genomes. The interactive tree is available online. You can browse, search, download, and export the tree. As an example search, you can demonstrate that assembly GCA_002901105.1 was incorrectly labeled as Colletotrichum gloeosporioides.  Searching the tree for the name “Colletotrichum gloeosporioides” highlights two clades.  Clicking the node for the Truncatum species complex and clicking “Show descendants” expands the clade and shows that assembly GCA_002901105.1, which was labelled as gloeosporioides, clusters with the Truncatum species complex. You can find more details on the tree building process in the supplementary material for the publication and on GitHub.

Continue reading “Fungal species identification using DNA: an NCBI and USDA-APHIS collaboration with a focus on Colletotrichum”

ASM Microbe 2022 was a success!

ASM Microbe 2022 was a success!

NCBI had the pleasure of attending and participating in this year’s American Society of Microbiology (ASM) Microbe conference, June 9-13 in Washington, D.C. NCBI staff participated in activities and events throughout the three-day conference. Over 4,500 attendees gathered in the exhibit hall and joined a variety of poster presentations and talks!

Reflections from a few of our NCBI experts

“It was a great honor for me to receive the ASM Elizabeth O. King Lecturer Award. Thank you to my colleagues, without whom so much of my work would not have been possible, and to all of those who attended my presentation on Making Genomics Accessible to Aid Public Health and Research.”

~Michael Feldgarden, Ph.D.  Continue reading “ASM Microbe 2022 was a success!”

PubMed API launch is pushed back

PubMed API launch is pushed back

As we previously announced, we will be moving to an updated version of the E-utilities API for PubMed. In preparation for this launch, a test server is currently available to allow you to test your API calls on the new service and report issues. Thank you for trying out the test server and continuing to submit your feedback!

To address your comments, finalize updates, and to give you more time to prepare for the API update, we are pushing back the release of the new API until later this year. Continue reading “PubMed API launch is pushed back”

Announcing an updated prokaryotic representative genomes collection with 706 new species!

Announcing an updated prokaryotic representative genomes collection with 706 new species!

An updated bacterial and archaeal representative genomes collection is available! A total of 16,105 assemblies among the 249,000 prokaryotic assemblies in RefSeq were selected to represent their respective species. The collection has grown by 3.7% since January 2022. A total of 706 species are represented for the first time. In addition, 186 species are represented by a better assembly, and 124 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment.

We updated the database on the Microbial Nucleotide BLAST page as well as the basic nucleotide BLAST RefSeq Representative genomes database (fourth in the menu) to reflect these changes. Finally, remember that you can now run BLAST searches against the proteins annotated on representative genomes (second in the menu). See more info here.

Save the Date: NCBI at the Bioinformatics Open Science Conference (BOSC), July 2022

Save the Date: NCBI at the Bioinformatics Open Science Conference (BOSC), July 2022

Come visit NCBI at the Bioinformatics Open Science Conference (BOSC), part of the Intelligent Systems for Molecular Biology Conference (ISMB), July 13-16, taking place both in person in Madison, Wisconsin and virtually! We’ll be presenting talks and posters on the latest updates to the NCBI Datasets, BLAST, and Protein resources. You can also join us at the Birds of a Feather (BoF) discussion and the BOSC CollaborationFest (CoFest) to explore these resources and discuss workflows with NCBI staff. Continue reading “Save the Date: NCBI at the Bioinformatics Open Science Conference (BOSC), July 2022”