Still waiting for an analysis pipeline that can uniformly process raw sequence data produced by a variety of sequencing platforms? Your wait is over! Announcing the SARS-CoV-2 Variant Calling Pipeline, which is now operational and optimized to provide support for multiple sequencing platforms including, Illumina, Oxford Nanopore, and PacBio.
Every so often, we gather our most recent videos in one post on the blog, for your convenience. Scroll down – and don’t forget to subscribe to our channel!
Introducing GaPTools for dbGaP Submitters
This video introduces new standalone software called GaPTools, which you can use to check your data before submitting to dbGaP. GaPTools uses the same preliminary validation checks as the dbGaP submission portal.
We are excited to announce new track display options for gene annotation tracks in the NCBI Genome Data Viewer genome browser and other instances of the NCBI Sequence Viewer!
Now, you can simplify gene annotation tracks to show only the genes and transcripts that you care about most. For instance, you can choose to hide non-coding transcripts, including pseudogenes, so that only protein-coding transcript variants are shown in your view. You can also hide any transcript models predicted using NCBI’s Gnomon algorithm. Learn more:
Join us on June 2, 2021 at 12PM eastern time to learn how to how to upload and display your own genomic data in the context of annotated genome assemblies. You will use the Genome Data Viewer and the Sequence viewer to visualize your own uploaded data (indexed BAM, VCF, BED, wig, GFF formats), data from public track hubs, and your BLAST and Primer-BLAST results. You will also learn to take advantage of features of the viewers including optimizing display settings, sharing a view with collaborators, exporting images, and downloading genes or other features in the view.
Date and time: Wed, June 2, 2021 12:00 PM – 12:45 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI webinars playlist on the NLM YouTube channel. You can learn about future webinars on the Webinars and Courses page.
Are you a researcher who works on gene biology and are interested in alternative splice patterns in your gene or genes of interest? If so, be sure to explore the intron feature evidence available in graphics views of genome assemblies annotated by NCBI. You can view the NCBI evidence used for calling splice variant for genes, add other intron feature evidence tracks, and use new display and filter options that make it easier to interpret the data .
Figure 1. Graphical view of the monoamine oxidase gene (MAOA, MOAB) region on the human X chromosome showing intron features tracks (‘RNA-seq intron features, aggregate’ and ‘Intropolis RNA-Seq intron features’). Mousing-over an intron feature activates a tooltip that shows details such as the number of reads with the splice site, the location on the chromosome, the length of the intron and the donor and acceptor bases at the splice site. The Intropolis track was added through the search feature of the Configure Tracks menu and configured (bottom menu) so that the features were sorted by strand and filtered so that only features with greater than 500 reads appear.
Primer-BLAST now has a “Primers common for a group of sequences” submission tab that allows you to design primers for a group of highly similar sequences. For example, you may want test for expression of any transcript of gene rather than a specific splice variant, so you want to design primers to cover all transcript variants. Or you may want to design primers that will amplify the same gene in closely related bacteria strains. To find primers for a group of related sequences, Primer-BLAST aligns the longest sequence to the rest to find common regions. It uses these to limit the locations of primers. The longest sequence is also used as the representative template sequence in the results. Figure 1 shows an example search for primers that will amplify all of the 15 splice variants for the human TP53 gene.
Figure 1. Primer-BLAST submission page and results for primers designed for the human TP53 transcripts. Top panel: The submission form with the “Primers common for a group of sequences” selected and the 15 RefSeq transcript accessions for TP53. Middle panel: The graphical results showing the longest sequence (NM_001126114.3) as the representative template, the locations of the primer pairs, and the alignment of the other template sequences. Bottom panel: An individual primer pair showing the locations on each of the template sequences.
Please try out this new feature and let us know what you think!
Are you interested in searching for a chromosomal region in a genome, but don’t know how to write the correct query? The good news is that the NCBI Genome Data Viewer (GDV) now supports a much wider array of search options. Some examples are listed below:
chr1:1,500,000-2,000,000
chr2: 1.5M – 2M
chr2: 1.5M-2,540.2K
2:1,500,000-2,000,000
3: 21.33M – 22.01M
3: 21.335M..21.337M
chr1:1,500,000 / 200
chr1:101,500,200
1:101,500,200
1:1,500K/0.5K
chr5
10
You can use any of these queries or the ones described below for assembly aliases either on the GDV landing page or in the GDV search box (Figure 1).
The COVID-19 pandemic has drawn attention to the human host genes associated with SARS-CoV-2 entry and to the elements that regulate expression of these genes. At NCBI, we have prioritized curation of experimentally validated regulatory elements for these genes in the RefSeq Functional Elements project. Our annotations include several enhancers, promoters, cis-regulatory elements and protein binding sites, among other feature types. We have annotated 236 regulatory features for 27 distinct biological regions in the latest human Annotation Release (109.20200522) including regulatory elements for the ABO, ACE2, ANPEP, CD209, CLEC4G, CLEC4M, CTSL, DPP4,and TMPRSS2genes.
You can view our regulatory element to target gene linkages in the regulatory interactions track using our new track hub that we recently announced. You can also see the biological regions and features tracks. These have functional and descriptive metadata, including biological region summaries, experimental evidence types, publication support and more.
The example in Figure 1 shows RefSeq Functional Element feature annotation in NCBI’s Genome Data Viewer (GDV) for the ABO gene region (GRCh38, NW_009646201.1: 73,864-103,789) the determiner of the human ABO blood group. A genome-wide association study recently identified non-coding ABO variants associated with COVID-19 disease severity (PMID:32558485), which map to some of the RefSeq Functional Elements in this region.Figure 1. The human ABO gene region in the NCBI GDV displaying the RefSeq Functional Element features. The biological regions aggregate track shows underlying feature annotation for an ABO upstream enhancer (LOC112637023), promoter region (LOC112679202), +5.8 intron 1 enhancer (LOC112679198), a 3′ regulatory region (LOC112639999), and a +36.0 downstream enhancer (LOC112637025). Functional Element features include numerous enhancers, promoters, cis-regulatory elements and protein / transcription factor binding sites.
We have more information about RefSeq Functional Elements on our website, including data download and extraction options. Stay tuned to NCBI Insights and other NCBI social media for future announcements about RefSeq Functional Elements!
Version 3.4.1 of Genome Workbench, NCBI’s sequence annotation and analysis platform, includes new features for the Multiple Sequence Alignment View, the Graphical Sequence View and the Sequence Editing and Submission Package as well as a number of other improvements and bug fixes.
In the Multiple Sequence Alignment View, you can now export publication quality graphics (Save As PDF/SVG … , Figure 1). In the Graphical Sequence View you can now search by locus tag, use improved search capabilities for genes by locus and can better display the selected location in the feature editing dialog when annotating a sequence.
Figure 1. A multiple alignment view in Genome Workbench highlighting the new ability to save presentation quality image files (Save As PDF and SVG formats).
In the Sequence Editing and Submission Package, we rearranged the controls in the Table Reader dialog to fit onto smaller screens and improved importing feature tables that contain mat-peptides (mature peptide) features.
Bug Fixes and Improvements
We have made a number of other fixes and improvements. For MacOS users we fixed blurry text in some dialogs, fixed the copy to clipboard problem, and improved support for the latest Catalina version. We also fixed a crashing problem in the Active Object Inspector interface. You should also see improvements in loading SNP data and better recovery in cases of power outages or other events causing local file corruption.
In the Sequence Editing and Submission Package, we fixed a bug that occurred when applying miscellaneous descriptors and structured comment fields using the Table Reader and an issue with using a PubMed ID to look up a publication.
Please see the extensive help documentation including FAQs, videos, and tutorials linked to the Genome Workbench homepage for more information and examples on how to use Genome Workbench in your research.
You can now view SNP variation data for many commonly studied animals and plants – including mouse, cow, Drosophila, Arabidopsis, maize, cabbage, and many more – in the Genome Data Viewer (GDV) and other graphical sequence viewers. This data is streamed from the European Variation Archive (EVA) at the European Bioinformatics Institute (EBI).
On any NCBI graphical sequence view you can use the Configure Tracks menu and the Track Configuration Panel to add the track for the EVA RefSNP data. This track is available through the left-hand tab for Remote Variation Data (Figure 1). The EVA RefSNP track displayed on the pig (Sus scrofa) chromosome 12 graphical view is shown in Figure 2.
Figure 1. The Track Configuration panel showing the Remote Variation Data tab and he EVA RefSNP Release 1 track. Select the track checkbox and click Configure to load the track.
Figure 2. The graphical sequence viewer showing the region of the growth hormone gene on pig chromosome 12 (NC_010454.4) with the EVA RefSNP Release 1 track at the bottom. The track header has an (R) and a green highlight to indicate that it is remote data streamed from an external website. NCBI is not responsible for the content or availability of these data.
The EVA SNP FTP site has more information about the EVA SNP data release.
Please contact us using the Feedback link on the graphical view to let us know what you think and how we can further improve your experience with the NCBI genome browsers and graphical sequence viewers