mik black bioinformatics symposium
DESCRIPTION
Mik Black bioinformatics symposiumTRANSCRIPT
How to make bioinformatics accessible to normal people! ���
Mik Black���Department of Biochemistry���
University of Otago���
Some musings…
• Accessibility - two aspects: 1. Methodology development & distribution
(can you get it?) 2. Methodology uptake (can you use it?)
• “Normal people”: 1. Who are they? 2. What do they want? What do they
need? Is there a happy medium?
My background
• Statistical design and analysis of microarray experiments: – Methodology development – Applying existing bioinformatics techniques – Adapting “standard” statistical methods
– “Forensic” analysis
• Technologies: – Microarrays (mRNA, SNPs, CNV/CGH) – Second generation sequencing (SNP/CNV)
http://www.r-project.org/
The joys of the command line…
• Large amounts of statistical genomics methodology available via R – Accessible? – Uptake? – Who are the end users?
• Can’t we just teach EVERYONE to use R?
• GenePattern provides a web-based method for analysing microarray (and other) data.
• Provides a simple interface to tools developed in Java, R, Matlab and other languages.
• Analysis performed on server: – no compute resources required by users. – Facilitates sharing of results.
http://www.broad.mit.edu/cancer/software/genepattern/
Reich et al. (2006) GenePattern 2.0., Nature Genetics, 38, 500-501.
Using GenePattern
• User friendly – Third year bioinformatics course at Otago.
– Workshop for lab personnel at TGen.
• Guided analysis – Facilitates use of standard analysis methods. – Pipeline creation and “versioned” analysis.
• End users? – At Otago: 3rd & 4th year Biochemistry students – At TGen: lab techs, iterns, bench scientists, PIs
Common analysis tasks
• Basic data analysis/exploration: – Heatmap creation
– Hierarchical clustering – Identifying differentially expressed genes – Gene set analysis
– Survival analysis
• GenePattern provides these tools in a modular format.
GenePattern interface
Simple analysis pipeline
Δ
Power calculations P
Network analysis
BeSTGRID: Broadband-enabled Science and Technology GRID ���
http://www.bestgrid.org
GenePattern on BeSTGRID
• Services: – GenePattern server
– Development environment (server and SVN)
• GenePattern training (coming soon): – Basic usage – Module development (uptake path for
bioinformatics tool developers)
Next steps…
• Full GenePattern deployment: – Transfer of development modules to public server – Documentation and training
– Use of ROCKS cluster for job submission
• Modules for Second Gen Sequencing data: – DNAseq, RNAseq, ChIPseq – R/Bioconductor (e.g., ShortRead, Biostrings,
RSamTools, GenomeGraphs…)
– Analysis, visualization and quality assurance
Community effort
• Some current examples: – VISG/MapNet: statisticians & geneticists
– BeSTGRID: middleware development & deployment through to end users
– CTCR: cancer researchers & clinicians
• Each group has the goal of placing powerful (and useful, and usable) tools into the hands of end users.
Bioinformatics community
• NZGL provides opportunity for community-based effort. – National infrastructure for genomics research – Includes strong bioinformatics component
• Key issue: engagement with end users – Methodology development and distribution – Uptake, interaction and training
Bioinformatics community
• NZGL provides opportunity for community-based effort. – National infrastructure for genomics research – Includes strong bioinformatics component
• Key issue: engagement with end users – Methodology development and distribution – Uptake and interaction
LETS GO FIND US SOME “NORMAL” PEOPLE!
Acknowledgements University of Otago Marcus Davy Tim Molteno Thomas Allen Sarah Song Chris Brown Anthony Reeve Tony Merriman
University of Canterbury Vladimir Mencl
The University of Auckland Nick Jones Mark Gahegan Yuriy Halytskyy Cristin Print Daniel Hurley Christoff Knapp