Author: NCBI Staff

New GenBank submission options for SARS-CoV-2 submitters

NCBI is pleased to announce ongoing enhancements to submission of SARS-CoV-2 assembled genomes to GenBank, including a streamlined workflow on the web and a new API option. Both new options mean that you can receive accessions for SARS-CoV-2 data submissions more quickly!

A streamlined workflow with improved interface and enhanced validation on both web and API saves you time and effort and, most importantly, makes it possible to get SARS-CoV-2 accession numbers and public release of data within hours. In addition, we automatically annotate all SARS-CoV-2 genomes to produce standardized, consistent annotation which saves you time and benefits researchers who find your data valuable. Continue reading “New GenBank submission options for SARS-CoV-2 submitters”

New viral protein domain models for annotation of coronaviruses

NLM’s Conserved Domain Database (CDD) has expanded its scope to now include 153 new viral protein domain family models for the annotation of coronaviruses, including models such as for the S1 subunit of coronavirus Spike proteins (cd21527), the nucleocapsid (N) protein of coronavirus (cd21595), and the coronavirus RNA-dependent RNA polymerase (cd21530).

Each curated domain model consists of a multiple sequence alignment containing conserved sequence features that may have been confirmed experimentally, plus links to relevant publications. When available, the domain models include 3D structures with links to interactive 3D views and interacting partners.

Check out this tabular summary of SARS-CoV-2 gene products for links to matching conserved domain models and representative 3D protein structures.

Want to view these alignments in 3D space? We’ve updated iCn3D, a web-based 3D structure viewer, with new rendering, annotation, and alignment features.  Read more about how you can use iCn3D to view and analyze SARS-CoV-2-related structures.

Don’t forget to review our SARS-CoV-2 resources page to keep up to date on other coronavirus data at NCBI!

The New and Improved PubMed® — We Are Listening

Today marks 5 weeks since the new PubMed was made the default version. Throughout this process, we promised to listen, and we heard from you!

This was a huge change

We know change isn’t always easy, especially with major changes to a familiar service or product. We are staunch believers in making incremental changes whenever possible: releasing small improvements, observing the effects, gathering user feedback, and then using that data to make further modifications. This time, an incremental approach to improving PubMed wasn’t feasible. We needed to make major changes under the hood (new databases, cloud delivery, new web architecture, etc.) for PubMed to be sustainable going forward.

User feedback is invaluable: it has played an enormous role in updates over the 24 years PubMed has been in existence, and it continues to do so. To prepare for new PubMed, we launched the beta version in 2017, then called PubMed Labs, as a way to set up the new framework and solicit feedback from our users. During development and since, we reached out to our stakeholders with presentations, webinars, handouts, FAQstoolkits, and tutorials, including a series of four 90- minute online classes, How PubMed® Works, many of which continue to be available.

We understand that not everyone had a chance to put the new PubMed through its paces, and we’re grateful to those of you who provided feedback along the way, whether it was by sending questions or comments using the feedback button, by discussing with us how you accomplish your work with PubMed, or by filling out a survey.

For some, when the new version of PubMed became the default last month, it was a huge shift. The ways in which you were accustomed to working with the system changed. We heard from some of you that you were used to a particular feature being available on PubMed and now you don’t know where to find it.

Continue reading “The New and Improved PubMed® — We Are Listening”

New BLAST default parameters and search limits coming in September

To provide a more efficient BLAST experience for everyone, we’re changing some parameters and limits on the web BLAST service on September 8, 2020. The new settings, listed below, will improve overall performance and make search times more consistent.

  1. The Expect Value Threshold default setting will be reduced to 0.05.
  2. The maximum number of target sequences (Max target sequences) limit will be no more than 5,000.
  3. The maximum allowed query length for nucleotide queries (blastn, blastx, and tblastx) will be 1,000,000 and 100,000 for protein queries (blastp and tblastn).

These changes will help keep the BLAST service running smoothly as the already very large databases continue to grow rapidly. If you have any questions or concerns, please email us at blast-help@ncbi.nlm.nih.gov

dbSNP human build 154 release + ALFA data

dbSNP human build 154, now available, includes new ALFA (Allele Frequency Aggregator) variants and allele frequency. This build contains over two billion Submitted SNP (ss) records and 730 million Reference SNP (rs) records.

New features include:

See the release notes for more information about what’s new in build 154.

New annotations in RefSeq: budgerigar, bony fish, fly and more

close-up-photo-of-white-and-blue-bird

In May, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Acipenser ruthenus (sterlet)
  • Arvicanthis niloticus (African grass rat)
  • Cannabis sativa (eudicot)
  • Crassostrea gigas (Pacific oyster)
  • Cyclopterus lumpus (lumpfish)
  • Drosophila albomicans (fly)
  • Drosophila guanche (fly)
  • Drosophila innubila (fly)
  • Esox lucius (northern pike)
  • Gymnodraco acuticeps (bony fish)
  • Hippoglossus hippoglossus (Atlantic halibut)
  • Marmota flaviventris (yellow-bellied marmot)
  • Melopsittacus undulatus (budgerigar)
  • Osmia lignaria (orchard mason bee)
  • Pangasianodon hypophthalmus (striped catfish)
  • Pantherophis guttatus (snake)
  • Periophthalmus magnuspinnatus (bony fish)
  • Prunus dulcis (almond)
  • Pseudochaenichthys georgianus (South Georgia icefish)
  • Setaria viridis (monocot)
  • Thalassophryne amazonica (bony fish)
  • Thrips palmi (thrip)
  • Trematomus bernacchii (emerald rockcod)
  • Zea mays (maize)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

Enhanced prokaryote type strain report now with details on needed type strain data

The Prokaryote type strain report provides information on type-strains for over 18,000 species. We revised and expanded the report to make it easier to identify cases where sequencing or establishing type material would have the biggest impact on improving prokaryote taxonomy and accurate identification.  These cases include species with designated type strains but without a sequenced type strain assembly and species without designated type material. We hope that the community will prioritize sequencing type strains for the former set of species (Table 1) and establishing a neotype or reftype, where applicable (as defined in Cuifo et al 2018) for the latter set (Table 2).

Other changes from the old format file are detailed in a recent genomes announce post.

Scientific Name Type material/co-identical strains Assemblies
Burkholderia ubonensis CCUG:48852, CIP:1070, … 308
Escherichia albertii Albert 19982, BCCM/LMG:20976, … 181
Xanthomonas perforans AATCC:BAA-983, DSM:18975, … 153
Listeria innocua ATCC:33090, BCCM/LMG:11387, … 106
Streptococcus iniae ATCC:29178, BCCM/LMG:14520, … 94
Vibrio lentus CECT:5110, CIP:107166, … 87
Vibrio cyclitrophicus ATCC:700982, BCCM/LMG:21359, … 83
Pseudomonas coronafaciens BCCM/LMG:5060, CFPB:2216, … 77
Aliivibrio fischeri ATCC:7744, BCCM/LMG:4414, … 66
Xanthomonas fragariae ATCC:33239, BCCM/LMG:708, … 61

Table 1. The top 10 candidate species for sequencing type-strains sorted by the number of assemblies. These have designated type strains but no type strain assembly. We generated the list by sorting by “number of assemblies from type materials per species”, then by decreasing “number of assemblies per taxon”, then filtering out “type materials and coidentical strains” = “na”.

Table 2. The top 10 candidates for proposing a reftype assembly, or neotype where applicable sorted by the number of assemblies. These species have no designated type strain.  We generated the list by selecting for “type materials and coidentical strains” = “na”, “number of assemblies from type materials per species” = 0, and sorting by decreasing “number of assemblies per taxon”, then filtering out Candidatus.

Please contact info@ncbi.nlm.nih.gov if you want to provide information about missing type-strains.

June 24 webinar: An insider’s guide to creating Federal grant BioSketches

June 24 webinar: An insider’s guide to creating Federal grant BioSketches

Join us on June 24 to learn how to use My Bibliography and SciENcv, My NCBI applications that help you to create biographical sketches for grant applications for the National Institutes of Health (NIH), National Science Foundation (NSF) and the Institute of Education Sciences (IES). You will learn how create a profile, add citations and import information from 3rd party accounts like ORCiD

  • Date and time: Wed, June 24, 2020 12:00 PM – 12:45 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

New NCBI SARS-CoV-2 Resources Page

Are you trying to keep up with the rapidly growing number of biological resources associated with the SARS-CoV-2 virus and the related disease, COVID-19? There’s a new page to help you find SARS-CoV-2-related content available at NCBI (Figure1). This new site will help bench scientists, bioinformaticians, clinicians, and others connect with the information they need to study SARS-CoV-2 and end the COVID-19 pandemic.Cov-2_BLOGFigure 1. The new SARS-CoV-2 resources page providing access to data submissions, literature, molecular information, and clinical resources.

Continue reading “New NCBI SARS-CoV-2 Resources Page”

June 17 webinar: Updated BLAST RefSeq rRNA databases for identification and phylogenetic analysis

June 17 webinar: Updated BLAST RefSeq rRNA databases for identification and phylogenetic analysis

Join us on June 17 to learn about NCBI’s curated marker rRNA sequences (targeted loci) for Bacteria and Archaea (16S) and Fungi (18S, 28S and ITS) from type strains, which are now available as a distinct set of BLAST databases. You will learn how to access these data and use these databases and BLAST to help identify organisms and explore their diversity.

  • Date and time: Wed, June 17, 2020 12:00 PM – 12:45 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.