biovlab-microarray: microarray data analysis in virtual environment youngik yang, jong youl choi,...
TRANSCRIPT
![Page 1: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/1.jpg)
BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment
Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce,
Dennis Gannon, and Sun Kim
School of Informatics Indiana University
![Page 2: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/2.jpg)
CONTENTS
• Introduction• Approach• Related Works• Microarray technology• System Architecture• Experiments• Conclusion• Demo
![Page 3: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/3.jpg)
INTRODUCTION
• Analysis of high throughput microarray experiment• Performing microarray analysis is a demanding task
for biologists and small research labs• Computing infrastructure issue
– Computationally intensive– Nontrivial to integrate various bioinformatics applications
• Exploratory data analysis issue– Multiple tasks in a single batch– Repetitive execution
![Page 4: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/4.jpg)
APPROACH
• On-demand computing resources• A suite of microarray analysis applications• Reconfigurable GUI workflow composer can alleviate
technical burden– Well defined workflow can be repetitively used
• Web portal• Reusable, reconfigurable, high-level workflow
execution workbench powered by computing clouds for microarray gene expression analyses
![Page 5: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/5.jpg)
RELATED WORKS
• Efficient and user-friendly workflow composers and execution engine– SIBIOS, BioWBI, KDE Bioscience
• Distributed and heterogeneous computing resources + Workflow system– Taverna, Triana, Kepler, GNARE, RENCI-Bioportal
![Page 6: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/6.jpg)
MICROARRAY TECHNOLOGY• A subset of genes is expressed
corresponding to environmental changes and its changing needs
• Dynamics of cell activity• Measure gene expression levels of
hundreds of thousands of genes within a cell
• Usage– Function prediction: Guilt by association– Interaction: Co-expression of genes in
transcription networks reveals how they interact.
– Drug discovery: Identify genes related to certain disease and detect effectiveness of new drugs
Source: www.liv.ac.uk/lmf/about_microarrays.htm
![Page 7: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/7.jpg)
RESEARCH GOALS• Gene expression analysis
– Search for similar patterns of genes• Similar patterns of gene may reveal the function of a gene with unknown function
– Extraction of differentially expressed genes• Statistical evaluation
– Clustering• Protein function prediction• Genes with similar expression may need to be studied as a group
– Component analysis• Hidden structure of expression patterns may be revealed
• Expression network analysis– Expose hidden structures– Protein-protein interaction (PPI) network analysis
• Central issue: key role in understanding how a cellular system works• Modularity in structure in a network may reflect higher-level functional organization of
cellular components
![Page 8: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/8.jpg)
MICROARRAY ANALYSIS COMMON TASK
• Output of a task can plugged into another task• Repeat the same set of tasks with small
changes of parameters
![Page 9: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/9.jpg)
SYSTEM ARCHITECTURE
• Workflow composer and execution engine• Application services• Web portal
Web PortalWeb Portal
Application Services
Application Services
Workflow Composer & Execution
Workflow Composer & Execution
Execute
Manage Data Create
![Page 10: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/10.jpg)
WORKFLOW COMPOSER & EXECUTION ENGINE
• Introduced in the scientific communities to execute a batch of multiple tasks
• Enables repetitive tasks easily• Directed acyclic graph
– Node: application to execute• Starting node: input• End node: output
– Edge: a flow of data
InputInput
OutputOutput
Task ATask A Task BTask B
Task CTask C
![Page 11: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/11.jpg)
XBaya
• GUI Workflow composer and execution engine• Developed at IU• Drag-and-drop compose from workbench• Monitor status of workflow execution
Application Information Panel
Application Information Panel
Monitor PanelMonitor Panel
Workbench PanelWorkbench Panel Workflow
Composer Panel
Workflow Composer Panel
Drag-and-dropDrag-and-drop
![Page 12: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/12.jpg)
APPLICATION SERVICES
• Interoperability among applications can be achieved by Application Services
• Generic Service Toolkit (Gfac)– Gfac converts command-line bioinformatics application
into a web service
• On-demand computing resources – Amazon Elastic Computing Cloud (EC2)
• Remote storage services– Amazon Simple Storage Services (S3) – Microsoft Application-Based Storage
![Page 13: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/13.jpg)
BioVLAB APPLICATIONDEVELOPMENT PROCEDURE
• Develop a command line app.• Develop a command line app.
• Install the app. in Amazon EC2• Let the app. store any output to
Amazon S3 / Microsoft Application-Based Storage
• Make a virtual machine image• Register the app. by using Gfac
• Install the app. in Amazon EC2• Let the app. store any output to
Amazon S3 / Microsoft Application-Based Storage
• Make a virtual machine image• Register the app. by using Gfac
• Instantiate EC2 and run the app. by using XBaya
• Instantiate EC2 and run the app. by using XBaya (Gfac user manual)
Gfac Registration formGfac Registration form
![Page 14: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/14.jpg)
WEB PORTAL
• Adiministrator– Management of
registered applications by Gfac registry portlet
– User management and access control
• User– access of stored data
• Built by Open Grid Computing Environments (OGCE)
![Page 15: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/15.jpg)
ANALYSIS RESOURCES
• R: statistical learning• Bioconductor: microarray analysis• Data acquisition: NCBI GEO Microarray DB• Similar expression pattern: correlation• Differentially expressed gene: limma package• Clustering: K-means, hierarchical clustering, QT clustering,
biclustering, Self organizing map (SOM)• Component Analysis: principal component analysis (PCA) and
Independent component analysis (ICA)• Network: Database of Interacting Proteins (DIP), Perl Graph
package and GraphViz
![Page 16: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/16.jpg)
EXPERIMENT
• Data set: GDS38– Remotely retrieved from the NCBI GEO database– Time-series gene expression data to observe cell
cycle in Saccharomyces cerevisiae yeast genome.– 7680 spots in each 16 samples– Each sample was taken every 7 minutes as cell
went through cell cycle.
• Expression analysis• PPI network analysis
![Page 17: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/17.jpg)
EXPERIMENTS
![Page 18: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/18.jpg)
CONCLUSION
• Microarray data analysis in virtual environment
• Coupling computing clouds and GUI workflow engine
• Effective system design for small research labs
![Page 19: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/19.jpg)
FUTURE WORKS
• Integration of more packages and analyses• A system of great flexibility
– Integrate various high throughput data• Microarray, mass spectronomy, massively parallel
sequencing, etc
– Integrate various computing resources• Clouds, grid, and multi-core PCs
– Integrate various public resources• NCBI, KEGG, PDB, etc
![Page 20: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/20.jpg)
SCREEN SHOTS
![Page 21: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/21.jpg)
S3 BROWSER
![Page 22: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/22.jpg)
EC2 ACTIVE INSTANCE
![Page 23: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/23.jpg)
WORKFLOW FOR CLUSTERING
![Page 24: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/24.jpg)
INPUT PARAMETERS
![Page 25: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/25.jpg)
WORKFLOW EXECUTION
![Page 26: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/26.jpg)
DATA ACQUISITION
![Page 27: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/27.jpg)
SUBSET EXTRACTION
![Page 28: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/28.jpg)
CLUSTERINGS
![Page 29: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/29.jpg)
WORKFLOW TERMINATION
![Page 30: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/30.jpg)
EXPERIMENT RESULT
![Page 31: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/31.jpg)
DOWNLOAD FILE
![Page 32: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/32.jpg)
HEATMAP FOR K-MEANS CLUSTERING
![Page 33: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/33.jpg)
ACKNOWLEDGEMENT
• The work is partially supported by NSF MCB 0731950 and a MetaCyt Microbial Systems Biology grant from Lilly Foundations.
• Extreme Computing Group at IU – Suresh Marru, Srinath Perera, and Chathura
Herath
![Page 34: BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun](https://reader035.vdocument.in/reader035/viewer/2022070400/56649f155503460f94c2a5ce/html5/thumbnails/34.jpg)
Thank You