RefSeq release 91 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.
This full release incorporates genomic, transcript, and protein data available as of November 5, 2018. It contains 179,672,083 records, including 125,530,811 proteins, 24,447,570 RNAs, and sequences from 85,308 organisms.
The release is provided in several directories as a complete dataset and as divided by logical groupings.
In the NLM Strategic Plan released earlier this year, we noted that “[c]reating efficient ways to link the literature with associated datasets enables knowledge generation and discovery.” To that end, PMC is now aggregating data citations, data availability statements and supplementary materials, as available, in an Associated Data box. This box will only display on articles that have one or more of these features in the article.
Figure 1. The Associated Data box is outlined in red.
To limit your search to records with an Associated Data box, you can use the new “Associated Data” facet on the search results page.
Figure 2. You can click on “Associated Data” (outlined in red) under Article attributes to limit your search to records with an Associated Data box.
We hope that exposing this content in a consistent format and in an easy to find and easy to access manner, you will more readily find the datasets you need to further accelerate discovery and advance health. As part of our ongoing commitment to making data findable, accessible, interoperable, and re-usable (FAIR), we encourage you to contact us with your feedback on these updates and with any other suggestions you may have for improving discovery of related data in PMC.
If you’ve been searching in Gene, Nucleotide, Protein, Genome or Assembly databases, you’ve probably noticed the new search experience we introduced in September to interpret several common language searches and offer improved results. We’re excited to announce we’ve added as-you-type suggestions to the search bar in these databases.
Here’s a peek at the new menu in the NCBI Gene database.
Figure 1. Typing into the search box brings up automatic suggestions of the most popular queries.
MedGen is a free, comprehensive resource for one-stop access to essential information on phenotypic health topics related to medical genetics as collected from established high-quality sources. It integrates terminology from multiple primary ontologies (or nomenclatures) to facilitate standardization and more accurate results from search queries.
We’re specifically looking for folks who have experience in computational virus hunting or adjacent fields to identify known, taxonomically-definable and novel viruses from a few hundred thousand metagenomic datasets that we’ll put on cloud infrastructure. This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for virological analyses from high-throughput experiments. If this describes you, please apply! The event is open to anyone selected for the hackathon and willing to travel to SDSU (see below).
Next week, NCBI staff will be at the NSGC 2018 conference in Atlanta, GA. While there, you can chat in person with us at booth #700 to learn about our medical genetics resources and pick up helpful material. We’d also love to hear any other questions or feedback to help support you.
Next Wednesday, November 14, 2018, NCBI staff will show you how to use NCBI’s genome browsers and other resources to interpret variants. The graphical displays of Genome Data Viewer (GDV) and Variation Viewer offer an interactive experience that allows you to explore NCBI’s rich collection of annotations, datasets and literature for deciphering your variant-associated data. In this presentation, we’ll step through case studies and show you how to quickly display relevant NCBI track sets — including the new RefSeq Functional Elements track, upload a file or remotely-hosted dataset and display these as a track, and use browser tracks to identify known variants, then assess variant functional and clinical significance and allele frequency. You will also learn how to navigate from the browsers to NCBI resources such as ClinVar, dbSNP and PubMed, for additional variant information.
Date and time: Wed, Nov 14, 2018 12:00 PM – 12:45 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
Would you like to compare and analyze your data with known structural variants (SV) in NCBI’s database of genomic structural variation (dbVar)? Now there are easy-to-use files containing non-redundant (NR) deletions, duplications, and insertions aggregated from across studies in dbVar. The files are available for human assembly versions GRCh37 and GRCh38. Descriptions of the NR data are available on GitHub.
The NR files are available for FTP download in BED, BEDPE, and custom tab-separated formats, designed to be compatible with many popular tools and browsers. To help users get started, we have developed tutorials for UCSC Genomic Browser, Galaxy web-based analysis platform, NCBI Sequence Viewer, and command-line BEDtools.
An upcoming release will include annotations including genes, regulatory regions, and more. Have a favorite annotation you’d like to see? Send us your suggestions by contacting dbVar directly or open a GitHub issue. We also welcome comments and other improvement suggestions.