From February 25-27, 2019, NCBI will help with a Data Science hackathon at USF in Tampa Florida!
The hackathon will focus on the genomics of Iron-linked Rare Diseases as well as large scale RNA-Seq indexing and analysis. This event is for researchers, including students and postdocs, who have already engaged in the use of large datasets or in the development of pipelines for analyses from high-throughput experiments. Some projects are available to other non-scientific developers, mathematicians, or librarians.
The event is open to anyone selected for the hackathon and willing to travel to Tampa.
Working groups of five to six individuals will be formed into five to eight teams. These teams will build or expand on pipelines and tools to analyze large datasets within a cloud infrastructure. Example subjects for such hackathons include:
Integrative pipelines to analyze large scale RNA-Seq experiments
Visualization tools for mapping phenotypes to genotypes
Rapid clinical diagnostics tools
Structural variant mining with single molecule sequencing data
Please see the application form for more details and additional projects. The project list will continue to evolve and will be updated on the application form.
If you’ve been searching in ClinVar, you might have noticed search improvements introduced in December that reliably connect you with information on your variant of interest. ClinVar has broadened its search capability to accept many different ways of expressing the same variation, including variation described on RefSeq transcripts and proteins. If your variant expression is not reported in ClinVar, we alert you to other variants at the same genomic location or link you to related information in other NCBI resources such as dbSNP, LitVar, and PubMed. ClinVar will also now interpret expressions that contain minor errors or warn you about improper syntax that it cannot interpret.
Figure 1. Improved search results in Clinvar showing mapping of an HGVS expression to the equivalent variant in ClinVar.
Here are some example queries that show the improved search results.
NM_001318787.1:c.2258G>A – an HGVS expression that is not in ClinVar, but ClinVar has an alternate expression for a variant (Figure 1).
NM_004958.3:c.7365C>A – a variant not in ClinVar, but another variant is at the same genomic location is in ClinVar.
NM_002113.2:c.19delG – a variant is not in ClinVar, but there is additional information for the variant in other databases.
We welcome your feedback on your search experience and any additional ideas on how to improve searching in ClinVar.
Join us on Wednesday, February, 2019, when NCBI staff will show you how to use a new set of NCBI variation services that rely on a variant data model called SPDI (Sequence Position Deletion Insertion). These services and data model allow you to inter-convert, map and disambiguate variants in standard formats (RefSNP accessions, HGVS and VCF). Unlike many current variant notation systems, SPDI provides unambiguous, machine-readable definitions of variants. SPDI not only powers SNP build and mapping procedures at NCBI but also our variant sensors that are active in the global search and ClinVar. These services and notation system provide valuable new tools for people who work with sequence variants.additional variant information.
Date and time: Wed, Feb 6, 2019 12:00 PM – 12:30 PM EDT
After registering, you will receive a confirmation email with information about attending the webinar. A few days after the live presentation, you can view the recording on the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.
RefSeq release 92 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.
This full release incorporates genomic, transcript, and protein data available, as of January 4, 2019 and contains 185,738,687 records, including 130,366,644 proteins, 25,088,890 RNAs, and sequences from 86,867 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings.
dbSNP build 152 is a small incremental update from build 151 provided for you to begin testing and integrating the new build products into your workflow. Build 152 uses the new system with SPDI variant notation and is now available on FTP and the new RefSNP webpage.
From February 4-6, 2019, the NCBI will help with a data science hackathon at the Fred Hutchinson Cancer Research Center in Seattle. To apply, complete this form (approximately 10 minutes to complete). Initial applications are due Friday, January 11th by 11 pm ET.
The hackathon will focus on genomics as well as general data science. This event is for researchers, including students and postdocs, who have already engaged in the use of large datasets or in the development of pipelines for analyses from high-throughput experiments. Some projects are available to other non-scientific developers, mathematicians, or librarians.
BLAST+ 2.8.1 is now available for download from our FTP site. This the first production release of standalone BLAST to support the new BLAST v5 databases (BLASTDBv5), which are also now available. The new databases have taxonomy information for the database sequences built-in. This gives you the following important advantages over the v4 databases.
The ability to limit your search by taxonomic group — species level as well as higher taxa.
Improved performance when limiting BLAST search with accessions.
Retrieval of sequences by taxonomic group from a BLAST database with blastdbcmd.
There are some additional enhancements to the search program options.
A new option (-subject_besthit) culls HSPs on a per subject sequence basis by removing HSPs that are completely enveloped by another HSP. This is an experimental option and is subject to change.
Use of the -max_target_seqs option for formats 0-4 is now allowed. The number of alignments and descriptions will be set to the max_target_seqs.
BLAST now issues a warning about the possibility not seeing all equivalent matches if -max_target_seqs is set to less than five.