New RefSeq annotations for mouse, maize, sunflower and more!

New RefSeq annotations for mouse, maize, sunflower and more!

In August and September, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms:

  • Amphiprion ocellaris (clown anemonefish)
  • Anopheles stephensi (Asian malaria mosquito)
  • Aplysia californica (California sea hare)
  • Bactrocera oleae (olive fruit fly)
  • Branchiostoma floridae (Florida lancelet)
  • Egretta garzetta (little egret)
  • Folsomia candida (springtail)
  • Fundulus heteroclitus (mummichog)
  • Halichoerus grypus (gray seal)
  • Helianthus annuus (common sunflower)
  • Homo sapiens (human)
  • Lynx canadensis (Canada lynx)
  • Molossus molossus (Pallas’s mastiff bat)
  • Monomorium pharaonis (pharaoh ant)
  • Mus musculus (house mouse)
  • Myotis myotis (bat)
  • Neolamprologus brichardi (lyretail cichlid)
  • Oncorhynchus keta (chum salmon)
  • Onychomys torridus (southern grasshopper mouse)
  • Oryzias melastigma (Indian medaka)
  • Phyllostomus discolor (pale spear-nosed bat)
  • Rousettus aegyptiacus (Egyptian rousette)
  • Sander lucioperca (pike-perch)
  • Zea mays (maize)

See more details on the Eukaryotic RefSeq Genome Annotation Status page.

Learn more about the annotation of the new mouse reference assembly, GRCm39, here. This is the first coordinate-changing update to the mouse reference since the 2012 release of GRCm38.

New PubMed updates and retirement of legacy PubMed on October 31

The new PubMed has been the default now since May, and more than 99% of you are using the new site. The recent NLM technical bulletin has details on features that we have added to the new PubMed based on your requests.

Legacy PubMed, which has been available in parallel with the new PubMed, will be finally taken down after October 31, 2020.  We will continue to provide API access to PubMed through the E-utilities, which uses the legacy system, for the foreseeable future and until we can transition to an API that accesses the new system.

We understand that it can take time to adapt to changes and find favorite features in a new interface. Several learning and training resources are available to help you use the new PubMed: Continue reading “New PubMed updates and retirement of legacy PubMed on October 31”

Structure viewer iCn3D 2.20.0 is available with new features including viewing an electrostatic potential map!

The NCBI structure viewer iCn3D 2.20.0 is now available on the NCBI web site and from GitHub. You can now view the electrostatic potential map for any subset of 3D structures within 30,000 atoms. The potential is calculated using the DelPhi program by solving a linear Poisson-Boltzmann equation. You can show the potential on a surface or show a equipotential map. The potential map shows the effect of charges on molecular interactions qualitatively.

The example in Figure 1 below shows the electrostatic potential for the binding of Gleevec to the human Abl2 protein. This new feature can be accessed from the menu “Analysis > DelPhi Potential.” You can also download the PQR file format with assigned partial charges.

Figure 1: 3GVU: The crystal structure of human ABL2 in complex with GLEEVEC. The ligand shows the -25 mV (red) and +25 mV (blue) equipotential map with a grid size 65, salt concentration 0.15 M, and pH 7. The protein shows the surface potential with a gradient from -75 mV (red) to +75 mV (blue). 

Continue reading “Structure viewer iCn3D 2.20.0 is available with new features including viewing an electrostatic potential map!”

NCBI Presents Two Online CoLabs at ASHG 2020!

NCBI Presents Two Online CoLabs at ASHG 2020!

Two up-and-coming NCBI resources will be featured in videos, surveys and live events at the American Society for Human Genetics (ASHG) 2020 Annual Meeting. Come and watch on-demand videos in the CoLab Theater. Then, let us know what you think and how you do or might use these resources by either taking an online survey or joining us for the CoLab Live! Events on Thursday, October 29, 2020.

Continue reading “NCBI Presents Two Online CoLabs at ASHG 2020!”

BLAST now offers RefSeq Select databases for faster searches and better results

RefSeq select offers a high-quality set of transcript and protein sequences for human and mouse; for prokaryotes,  RefSeq select is the set of proteins annotated on RefSeq reference and representative genomes. You can now search these sets on the nucleotide and protein BLAST services. The new databases are  RefSeq Select rna sequences and RefSeq Select proteins (Figure 1). Figure 1. The new BLAST RefSeq Select rna (top panel) and protein (bottom panel) databases and results.  The searches used the erythroid hexokinase 1 transcript (NM_033496.3) and protein (NP_277031.1) as queries. The protein BLAST search was limited to Clostridia (taxid:186801) to highlight prokaryote matches.

 Searches against these more compact databases run faster and give search results that are better defined and easier to interpret.  You can also download the pre-formatted refseq_select_rna and refseq_select_protein databases from the BLAST db FTP directory for use with a local BLAST installation. You can also get these databases on the Amazon and Google clouds with the BLAST+ Docker image.

Please write to us at blast-help@ncbi.nlm.nih.gov and let us know what you think of these new databases.

 

Announcing the RefSeq annotation of mouse GRCm39!

NCBI RefSeq has finished its initial annotation of the new mouse reference assembly, GRCm39, recently released by the Genome Reference Consortium. This is the first coordinate-changing update to the mouse reference since the 2012 release of GRCm38, resolving over 400 issues, almost doubling the scaffold N50, closing almost half the gaps, and adding 1.9 Mb of sequence. It’s a big deal!Figure 1. The Genome Data Viewer showing the annotation for the mouse pseudoautosomal region that includes annotations of four genes that were previously missing: Sts, Nlgn4l, Akap17a, and 2510022D24Rik

Continue reading “Announcing the RefSeq annotation of mouse GRCm39!”

NCBI Datasets now provides downloads of gene data for more than 30 thousand organisms

NCBI Datasets now offers Gene tables: customizable tables of the genes you specify, with key gene information, and the ability to easily download a dataset of genomic, transcript and protein sequences.

Drag and drop a list of Gene IDs or gene symbols, and the data table shows your genes with up to 15 columns of metadata, including genomic coordinates, RefSeq transcript and protein accessions, Ensembl IDs and UniProt accessions, and other gene information. You can browse and select items in your table on the web, or download everything to your computer for later analysis (Figure 1).

Figure 1. The Data tables web download. Top panel. Enter or upload a list of gene identifiers or symbols. Bottom panel. The resulting table display allows you to browse results, download the table or the sequence data for the genes (genomic, transcripts, proteins).  Continue reading “NCBI Datasets now provides downloads of gene data for more than 30 thousand organisms”

Recent enhancements in Genome Workbench version 3.5.0

New Features

Version 3.5.0 of Genome Workbench, NCBI’s sequence annotation and analysis platform, includes two new features.  First, we improved the phylogenetic reconstruction algorithm to add sequence additional meta-information, such as isolation source, collection date, and country. This is useful for analyzing coronaviruses, for example.  For more information on this feature, check out our new tutorials: creating phylogenetic tree starting from search and creating phylogenetic tree from a multiple alignment.

Second, we made tooltip improvements in the Graphical Sequence View to include information about insertions and unaligned data.

Bug fixes and Improvements

We made a number of other fixes and improvements.  In Text View we fixed a crash in showing certain AGP data. With AGP export, we fixed an issue where sequence IDs from AGP did not match sequence IDs from FASTA file (when sequence ranges were used).

In Tree View, we fixed a crash on search and a tooltip issue where tooltip meta-information disappeared when a custom label was set.  We also improved startup time and fixed some visual issues in tooltips in the Graphical Sequence View.

Finally, in the Editing Package, we modified the control layout in the Table Reader dialog to fit onto small screens; improved the speed of Table Reader; and fixed several cases for when Undo did not work after importing a feature table.

The latest in COVID-19 related human gene annotation now in NCBI RefSeq and Gene

Interested in human genes involved in COVID-19 biology? NCBI’s RefSeq group has been hard at work compiling a set of human genes with roles in coronavirus infection and disease. You can now see and search for these genes and their regulatory elements in NCBI Gene and RefSeq.

Figure 1. Top section of the human ACE2 record in the Gene database. COVID-19 information can be found in the Summary and Annotation information sections.

Continue reading “The latest in COVID-19 related human gene annotation now in NCBI RefSeq and Gene”

Oct 14 webinar: Exploring SRA Metadata with AWS Athena and a new dataset for SARS-CoV-2

Oct 14 webinar: Exploring SRA Metadata with AWS Athena and a new dataset for SARS-CoV-2

Join us on Oct 14th to learn how to use Athena on AWS to quickly search Sequence Read Archive (SRA) in the cloud to speed up your bioinformatic research and discovery projects and to explore a new SRA SARS-CoV-2 dataset. In this webinar, we’ll introduce you to a way to search SRA submitter-supplied metadata and the results of SRA taxonomic analysis with the native AWS tool, Athena, which explores cloud-based data tables using SQL-like queries. You’ll see a real-world case study demonstrating how to find key information about SRA runs and identify data sets for your own analysis pipelines. The example will highlight a new data format designed to help advance SARS-CoV-2 research. The SRA aligned reads format is a new compressed data object. This data format includes the raw reads aligned to pre-assembled, contiguous data assemblies (contigs), which will help you more easily determine what’s really in a sequence data sample.

  • Date and time: Wed, Oct 14, 2020 12:00 PM – 1:00 PM EDT
  • Register

After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.