RefSeq release 99 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.
This full release incorporates genomic, transcript, and protein data available as of March 2, 2020, and contains 231,402,293 records, including 167,278,920 proteins, 29,869,155 RNAs, and sequences from 99,842 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.
Have you ever been confused by multiple taxonomic names for a single organism? You’re not alone! It’s one of the challenges in maintaining any biological database. Recently we updated the NCBI TaxBrowser to assist with this.
Let’s start with a brief word about how investigators name species in the first place. For any new species, the reporting author declares a “type.” They then deposit a specimen, or “type material,” in a publicly available biorepository. This type material is tied to the new species name and serves as a reference for future comparisons. Researchers can then use DNA sequences obtained from type material to identify other samples from the same species. NCBI currently uses such an approach to verify the taxonomic assignment of prokaryotic genomes.
Our Taxonomy group has been curating type material records in the Taxonomy database since 2013 using a common vocabulary accepted by our international partners (the INSDC). For example, the Entrez query “type material[prop]” in the Taxonomy database will return all type material at NCBI.
Get rapid access to Wuhan coronavirus (2019-nCoV) sequence data from the current outbreak as it becomes available. We will continue to update the page with newly released data.
The complete annotated genome sequence of the novel coronavirus associated with the outbreak of pneumonia in Wuhan, China is now available from GenBank for free and easy access by the global biomedical community. Figure 1 shows the relationship of the Wuhan virus to selected coronaviruses.
Figure 1. Phylogenetic tree showing the relationship of Wuhan-Hu-1 (circled in red) to selected coronaviruses. Nucleotide alignment was done with MUSCLE 3.8. The phylogenetic tree was estimated with MrBayes 3.2.6 with parameters for GTR+g+i. The scale bar indicates estimated substitutions per site, and all branch support values are 99.3% or higher.
Genome Workbench version 3 is a major upgrade, including the addition of the Genome Submission Wizard. This video guides you through the wizard, from uploading your genome data file to completion of the submitter report, which is ready to submit to GenBank using tools such as Submission Portal or BankIt. Note: An on-line tutorial is under “Manuals” on the Genome Workbench home page.
You can now download images in both PDF and Scaled Vector Graphics (SVG) formats from our Sequence Viewer and genome browsers such as the Genome Data Viewer! SVG files are ideal for editing in image editors and provide high quality graphics for publications, posters, and presentations. Both the PDF and SVG files that you download contain vector graphics for high fidelity images.
You can download image files by choosing the “Printer-Friendly PDF/SVG” option under the Tools menu from any Graphical Sequence Viewer application (Figure 1).
Figure 1. Printer friendly download options from the graphical view in the Genome Data Viewer. You can download either PDF or SVG formats, which are easily edited in standard graphics applications.
The latest improvement in the NCBI search experience is designed to help you quickly find microbial proteins. Now when you search for a prokaryotic protein name such as recombinase RecA in NCBI’s sequence databases or in the All databases search, a high-quality representative protein sequence is highlighted in a panel at the top of the results page (Figure 1).
The result panel also allows you to quickly link to related resources such as NCBI’s new pages for protein family models, Identical Protein Groups, and SPARCLE, NCBI’s protein domain architecture resource. We also provide as-you-type suggestions so you don’t have to type out some of the long names.
Figure 1. The result for a search with recombinase RecA. The panel provides access to analysis tools, downloads, and relevant links to the protein family, the RefSeq protein, the identical protein group, and citations in PubMed.
Try these protein name searches, or your own, and use the as-you-type suggestions to assist your searches.
As part of our ongoing effort to improve your search experience, we’ve made it easier for you to find the sequence of your favorite organelle genome plus all the information and data associated with it. To find organelle genomes, search for an organism name combined with an organelle description, for example human mitochondrion, tomato chloroplast or Toxoplasma gondii RH apicoplast.
A new results panel will appear with links to the organelle genome sequence, annotated genes, and related phylogenetic and population studies. The panel appears with these searches in an All Databases search or within any of NCBI’s sequence databases including Gene, Nucleotide, Protein, Genome, Assembly. For the human mitochondrial genome, a graphical schematic of the genome allows you to navigate to individual mitochondrial encoded genes (Figure 1).
Figure 1. The organelle genome results for a search with human mitochondrion. The panel provides access to analysis tools, downloads, and other relevant results. Clicking any of the gene objects on the genome graphic links leads to the relevant Gene record, for example Gene ID: 4512 in the case of COX1.
Try it out using the following example searches and let us know what you think!
On Wednesday, September 11, 2019 at 12 PM, NCBI staff will present a webinar for people with limited experience working with gene and sequence information. You will learn about the kinds of data available for genes and sequences, how to select the most informative records, and how to find related genes and sequences using pre-computed information and the BLAST sequence search service.
Date and time: Wed, Sep 11, 2019 12:00 PM – 12:30 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
As you may know, we have been offering a new BLAST results (Figure 1) as a test page since April. In response to your positive reception and after incorporating many improvements that you suggested, we made the new results the default today, August 1, 2019.
You will still be able to access to the traditional results for a several months. This will provide you additional time if you need it to adjust your workflows or teaching materials to the new display.
We thank all past and present submitters of EST and GSS data for the invaluable benefit these data have provided to numerous genomic sequencing projects over the years. Please let us know if you have any questions or concerns about these changes!