The new PubMed has been the default now since May, and more than 99% of you are using the new site. The recent NLM technical bulletin has details on features that we have added to the new PubMed based on your requests.
Legacy PubMed, which has been available in parallel with the new PubMed, will be finally taken down after October 31, 2020. We will continue to provide API access to PubMed through the E-utilities, which uses the legacy system, for the foreseeable future and until we can transition to an API that accesses the new system.
The NCBI structure viewer iCn3D 2.20.0 is now available on the NCBI web site and from GitHub. You can now view the electrostatic potential map for any subset of 3D structures within 30,000 atoms. The potential is calculated using the DelPhi program by solving a linear Poisson-Boltzmann equation. You can show the potential on a surface or show a equipotential map. The potential map shows the effect of charges on molecular interactions qualitatively.
The example in Figure 1 below shows the electrostatic potential for the binding of Gleevec to the human Abl2 protein. This new feature can be accessed from the menu “Analysis > DelPhi Potential.” You can also download the PQR file format with assigned partial charges.
Figure 1: 3GVU: The crystal structure of human ABL2 in complex with GLEEVEC. The ligand shows the -25 mV (red) and +25 mV (blue) equipotential map with a grid size 65, salt concentration 0.15 M, and pH 7. The protein shows the surface potential with a gradient from -75 mV (red) to +75 mV (blue).
Two up-and-coming NCBI resources will be featured in videos, surveys and live events at the American Society for Human Genetics (ASHG) 2020 Annual Meeting. Come and watch on-demand videos in the CoLab Theater. Then, let us know what you think and how you do or might use these resources by either taking an online survey or joining us for the CoLab Live! Events on Thursday, October 29, 2020.
RefSeq select offers a high-quality set of transcript and protein sequences for human and mouse; for prokaryotes, RefSeq select is the set of proteins annotated on RefSeq reference and representative genomes. You can now search these sets on the nucleotide and protein BLAST services. The new databases are RefSeq Select rna sequences and RefSeq Select proteins (Figure 1). Figure 1. The new BLAST RefSeq Select rna (top panel) and protein (bottom panel) databases and results. The searches used the erythroid hexokinase 1 transcript (NM_033496.3) and protein (NP_277031.1) as queries. The protein BLAST search was limited to Clostridia (taxid:186801) to highlight prokaryote matches.
Searches against these more compact databases run faster and give search results that are better defined and easier to interpret. You can also download the pre-formatted refseq_select_rna and refseq_select_protein databases from the BLAST db FTP directory for use with a local BLAST installation. You can also get these databases on the Amazon and Google clouds with the BLAST+ Docker image.
NCBI RefSeq has finished its initial annotation of the new mouse reference assembly, GRCm39, recently released by the Genome Reference Consortium. This is the first coordinate-changing update to the mouse reference since the 2012 release of GRCm38, resolving over 400 issues, almost doubling the scaffold N50, closing almost half the gaps, and adding 1.9 Mb of sequence. It’s a big deal!Figure 1. The Genome Data Viewer showing the annotation for the mouse pseudoautosomal region that includes annotations of four genes that were previously missing: Sts, Nlgn4l, Akap17a, and 2510022D24Rik.
NCBI Datasets now offers Gene tables: customizable tables of the genes you specify, with key gene information, and the ability to easily download a dataset of genomic, transcript and protein sequences.
Drag and drop a list of Gene IDs or gene symbols, and the data table shows your genes with up to 15 columns of metadata, including genomic coordinates, RefSeq transcript and protein accessions, Ensembl IDs and UniProt accessions, and other gene information. You can browse and select items in your table on the web, or download everything to your computer for later analysis (Figure 1).
Second, we made tooltip improvements in the Graphical Sequence View to include information about insertions and unaligned data.
Bug fixes and Improvements
We made a number of other fixes and improvements. In Text View we fixed a crash in showing certain AGP data. With AGP export, we fixed an issue where sequence IDs from AGP did not match sequence IDs from FASTA file (when sequence ranges were used).
In Tree View, we fixed a crash on search and a tooltip issue where tooltip meta-information disappeared when a custom label was set. We also improved startup time and fixed some visual issues in tooltips in the Graphical Sequence View.
Finally, in the Editing Package, we modified the control layout in the Table Reader dialog to fit onto small screens; improved the speed of Table Reader; and fixed several cases for when Undo did not work after importing a feature table.
Interested in human genes involved in COVID-19 biology? NCBI’s RefSeq group has been hard at work compiling a set of human genes with roles in coronavirus infection and disease. You can now see and search for these genes and their regulatory elements in NCBI Gene and RefSeq.
Figure 1. Top section of the human ACE2 record in the Gene database. COVID-19 information can be found in the Summary and Annotation information sections.
Join us on Oct 14th to learn how to use Athena on AWS to quickly search Sequence Read Archive (SRA) in the cloud to speed up your bioinformatic research and discovery projects and to explore a new SRA SARS-CoV-2 dataset. In this webinar, we’ll introduce you to a way to search SRA submitter-supplied metadata and the results of SRA taxonomic analysis with the native AWS tool, Athena, which explores cloud-based data tables using SQL-like queries. You’ll see a real-world case study demonstrating how to find key information about SRA runs and identify data sets for your own analysis pipelines. The example will highlight a new data format designed to help advance SARS-CoV-2 research. The SRA aligned reads format is a new compressed data object. This data format includes the raw reads aligned to pre-assembled, contiguous data assemblies (contigs), which will help you more easily determine what’s really in a sequence data sample.
Date and time: Wed, Oct 14, 2020 12:00 PM – 1:00 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.