The National Library of Medicine (NLM) is pleased to announce that all controlled-access and publicly available data in SRA is now available through Google Cloud Platform (GCP) and Amazon Web Services (AWS). To access the data please visit our SRA in the Cloud webpage where you will find links to our new SRA Toolkit and other access methods.
The SRA data available in the two clouds currently totals more than 14 petabytes and consists of all data in the SRA format as well as some data in its original submission format. Since May 2019, NCBI has been putting all submitted SRA data on the GCP and AWS clouds in both the submitted format and our converted SRA format. We have also been moving previously submitted original format data to the clouds and expect to complete that process in 2021. Continue reading “The entire corpus of the Sequence Read Archive (SRA) now live on two cloud platforms!”
Check out the latest videos on YouTube to learn how to best use NCBI graphical viewers, SRA, PGAP, and other resources.
Genome Data Viewer: Analyzing Remote BAM Alignment Files and Other Tips
This video shows you how to upload remote BAM files, and succinctly demonstrates handy viewer settings, such as Pileup display options, and highlights the very helpful tooltips in the Genome Data Viewer (GDV). There’s also a brief blog post on the same topic.
Continue reading “NCBI on YouTube: Get the most out of NCBI resources with these videos”
NCBI is pleased to announce a single-cell focused codeathon at the New York Genome Center, January 15 -17. To apply, please complete the application form by December 30, 2019. Read on if you need more information about the event.
Continue reading “Single Cell in the Cloud Codeathon, Jan 15-17 at NYGC”
If you download data from the SRA (Sequence Read Archive) FTP site, we would encourage you to try the SRA Toolkit. This is particularly true if you use the SRA Fuse/FTP site at ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant, which the SRA team will decommission on December 1, 2019.
The SRA Toolkit offers several advantages for downloading SRA data, including greater flexibility in specifying the data you need as well as access to public SRA data in the cloud. If you’re new to the Toolkit, you may want to start with these instructions.
If you have any questions or concerns about downloading SRA data, please contact firstname.lastname@example.org. We’d love to hear from you!
NCBI is pleased to announce a Structural Variant Hackathon at the Baylor College of Medicine, Houston Texas, immediately before ASHG on October 11-13, 2019.
We’re specifically looking for folks who have experience in working with structural variants, complex disease, precision medicine, and similar genomic analysis. If this describes you, please apply! This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for large scale genomic analyses from high-throughput experiments (please note that the event itself will focus on open access public human data).
Potential topics include:
- Mapping structural variants to public databases
- Calculating the heritability of different types of structural variants
- CNV effect on isoform expression
- Assembly accuracy for metagenomics
- Quality assessment in large cohorts
The hackathon runs from 9 am – 6 pm each day, with the potential to extend into the evening hours each day. There will also be optional social events at the end of each day. Working groups of five to six individuals, with various backgrounds and expertise, will be formed into five to eight teams with an experienced leader. These teams will build pipelines and tools to analyze large datasets within a cloud infrastructure. Each day, we will come together to discuss progress on each of the topics, bioinformatics best practices, coding styles, etc.
There will be no registration fee associated with attending this event.
Note: Participants will need to bring their own laptop to this program. No financial support for travel, lodging, or meals is available for this event.
Continue reading “Structural Variant Hackathon”
Have you ever needed to correct or improve SRA metadata after submitting, change the release date for your data or share your data with reviewers? Now you can perform these tasks yourself using the SRA data management features now LIVE in Submission Portal!
If you have an SRA submission and associated BioProject and BioSample, you can log into the Submission Portal, go to the Manage data tab, click into that BioProject and easily perform the following common tasks (Figure 1).
Continue reading “Try our new SRA data management tools!”
From January 10-12, 2018, the NCBI will help with a bioinformatics hackathon in Southern California hosted by San Diego State University. The hackathon will focus on advanced bioinformatics analysis of next generation sequencing data, proteomics, and metadata. This event is for researchers, including students and postdocs, who have already engaged in the use of bioinformatics data or in the development of pipelines for bioinformatics analyses from high-throughput experiments. Some projects are available to other non-scientific developers, mathematicians, or librarians.
The event is open to anyone selected for the hackathon and willing to travel to SDSU (see below). Applications are due Monday, December 11th, 2017 by 3 pm PT (6PM EST).
Continue reading “NCBI to assist in Southern California genomics hackathon in January”
A new version of IgBLAST is now available on FTP, along with a new manual on GitHub. This release has the following improvements:
- The igblastn executable can now multi-thread much more efficiently for large sets of queries. The default number of threads is now four, but can be changed with the -num_threads option.
- The igblastn executable can now take an SRA accession as the query input. The search runs on the local machine, but the queries are retrieved from the SRA repository at the NCBI. Use the -sra rather than the -query option to enable.
- A lower default nucleotide mismatch penalty values for finding D and J genes (from -4 to -2 and from -3 to -2, respectively). This improves accuracy in finding the best D and J gene hits for moderately mutated sequences.
Our web IgBLAST page also uses the new default nucleotide mismatch penalty values (i.e., -2 for finding both D and J genes).
IgBLAST facilitates the analysis of immunoglobulin and T cell receptor variable domain sequences.
On Wednesday, November 1, 2017, we will present a webinar on GDV, NCBI’s full-featured genome browser. In this webinar, you’ll learn how to explore and analyze sequences and annotations for eukaryotic RefSeq genome assemblies. We’ll show you how to:
- Search across the entire assembly for genes, products and other markers or jump to a specific position or range
- Display any of seven preselected track sets highlighting various aspects of the assembly or create and load your own custom track sets from your NCBI account.
- Load and display submitted alignment data from NCBI’s GEO or SRA.
- Upload your own annotation and variant data
- Display BLAST or Primer-BLAST results on the assembly in the browser.
Date and time: Wednesday, November 1, 2017 12:00-12:30PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. After the live presentation, the webinar will be uploaded to the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
The newest version of Magic-BLAST (v. 1.3.0) offers improved sensitivity and faster run-times as well as a number of other new features and improvements. These include the ability to set the alignment cut-off score as a function of read length, a maximum edit distance option and optional local cacheing for SRA files. For more information on these and other improvements, see the release notes. You can download the new executables from the NCBI FTP site.
Magic-BLAST is a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. Read more here.