![Page 1: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/1.jpg)
e-LICOAn e-Laboratory for Interdisciplinary Collaborative
research in data mining and data intensive sciences
October 12th, 2010
Delivering data mining to the Life Science Community
Simon JuppSchool of Computer Science
University of Manchester, United Kingdom
![Page 2: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/2.jpg)
e-LICO project overview
Infrastructure to support collaborative, data mining enabled experimental research
Knowledge-driven planning of DM workflows– Improve planning by meta-mining
Support research in data-intensive, knowledge-rich domains– Systems biology use case
![Page 3: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/3.jpg)
European Project
European Project, 9 partners. (Month 20/36)– Specialists from Data Mining, Semantic Web, Grid
computing and Systems Biology
• University of Manchester, UK• University of Geneva, Switzerland• Inserm, France• Josef Stefan Institute, Slovenia• NHRF, Greece• Poznan University, Poland• Rapid-I GmbH, Germany• Ruder Boskovic Institute, Coratia• University of Zurich, Switzerland
An EU-FP7 Collaborative Project (2009-2012) Theme ICT-4.4: Intelligent Content and Semantics
![Page 4: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/4.jpg)
Problems…
Capturing the workflow
– Explanation
– Error detection / Repair
– Reproducibility
– Provenance
Steep learning curve
– Many operators to choose from
– Best combination of operators
– Hard for non Data Miners
![Page 5: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/5.jpg)
Problems… and solutions (e-LICO planned workflows)
Develop “Intelligent Discovery Assistant”
(IDA) for Data Analysis
– Automatically generate workflows by planning
– Assist the user in solving DM task
– Structure workflows in workflow templates
– Self improvement through Meta-Mining
Ontology based data model
– Adds semantics
– OWL/RDF based
– Data Mining Experiment Resository
Capturing the workflow
– Explanation
– Error Detection / Repair
– Reproducibility
– Provenance
Steep learning curve
– Many operators to choose from
– Best combination of operators
– Hard for non Data Miners
![Page 6: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/6.jpg)
The e-LICO workflow
Input Data
Ontology based
AI planner
Workflow executionengine
Publish and share
Output: Data, provenance and
models
Meta-mining
1 3 4
2
![Page 7: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/7.jpg)
Ontology based AI planner
Input Data
Ontology based
AI planner
Workflow executionengine
Publish and share
Output: Data, provenance and
models
Meta-mining
1 3 4
2
![Page 8: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/8.jpg)
Hierarchical Task Network (HTN) planning
Set of Tasks to achieve possible Data Mining Goals
Tasks have an I/O specification and set of associated Methods to
achieve that task
Methods composed of simpler Task/Methods
Some methods are Operators with Conditions and Effects
Example: My task is ‘Data Mining With Evaluation’, my Goal is to get a
workflow that does this Evaluation via Cross-Validation
Workflow planning
![Page 9: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/9.jpg)
The Data Mining Worfkflow Ontology (DMWF)Class Description Examples
IO Object Input and output used by operators Data, Model, Report
MetaData Characteristics of the IOObjects Attribute, AttributeType, DataColumn, DataFormat
Operator DM operators DataTableProcessing, ModelProcessing, Modeling, MethodEvaluation
Goal A DM goal that the user could solve DescriptiveModelling, PatternDiscovery, PredictiveModelling, RetrievalByContent
Task A task is used to achieve a goal CleanMV, CategorialToScalar, DiscretizeAll, PredictTarget
Methods A method is used to solve a task CategorialToScalarRecursive, CleanMVRecursive, DiscretizeAllRecursive, DoPrediction
![Page 10: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/10.jpg)
AI Planner
Brute force planning
Probabilistic Planning
What will likely produce better results?
Case-based Planning
– How did we solved that previously?
DMOP (Workflow optimization ontology)
– Algorithm and Model selection given a particular task
– Meta-mining by abstraction and generalisation
Workflow Planning
![Page 11: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/11.jpg)
Meta-Mining
Initially, the AI planner recommends applicable DM workflows, not
necessarily good ones
Self-improves with experience through meta-mining
The meta-miner
– Applies DM techniques to meta-data from past DM experiments
– Extracts workflow patterns that are signatures of high predictive
performance
The planner uses these workflow patterns to design and recommend
promising workflows
![Page 12: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/12.jpg)
Workflow Execution
04/21/23e-LICO Kick-Off, Geneva 12
Input Data
Ontology based
AI planner
Workflow executionengine
Publish and share
Output: Data, provenance and
models
Meta-mining
1 3 4
2
![Page 13: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/13.jpg)
Workflow Execution
All operators in ontology (+200) are exposed as SOAP or REST based Web
Service
Plans converted to Workflow execution language (SCUFL 2)
Provenance capture
– Execution times, intermediate model returned to planner
Taverna
![Page 14: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/14.jpg)
Worflow Publishing and Sharing
04/21/23e-LICO Kick-Off, Geneva 14
Input Data
Ontology based
AI planner
Workflow executionengine
Publish and share
Output: Data, provenance and
models
Meta-mining
1 3 4
2
![Page 15: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/15.jpg)
Workflow Publishing and Sharing
Workflows and data can be shared via myExperiment
Build a community of data miners
Set of re-usable workflows, data and workflow templates (packs)
![Page 16: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/16.jpg)
Use case – Obstructive nephropathy
Demonstrated with System Biology Use Case– Biomarker discovery and pathway modelling in the study of
chronic kidney disease
– KUP challenge initiated (August 2010)
Expression data
KUP KB(RDF store)
Text-mining / Image mining
New modelsAnd hypothesis
Further wet labexperiments
![Page 17: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/17.jpg)
Research Questions
How and when does a planner based “Intelligent Discovery Assistant” help
the end user?
Can we improve planning and suggest better workflows through meta-
mining?
Can we plan complex workflows with Scientific Goals that answer biological
questions?
– KUP goal is to construct diagnostic models that accurately connect the biological
views to the severity of this pathology
![Page 18: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/18.jpg)
Where are we nowAvailability
http://wwww.e-lico.eu
1st year demo –
http://www.youtube.com/watch?v=JtmqZfzyEKs
eProPlan plugin for Protégé 4.0 Ontologies available
Taverna
http://www.taverna.org.uk
RapidMiner
http://rapid-i.com
![Page 19: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/19.jpg)
Summary
e-LICO: virtual laboratory for interdisciplinary collaborative research in
data-mining
Ontology based AI planning of KDD workflows
Generic E-Science platform for DM
Application layer for Systems Biology
![Page 20: E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining](https://reader035.vdocument.in/reader035/viewer/2022062805/5697bfa31a28abf838c96746/html5/thumbnails/20.jpg)
Acknowledgments
Robert Stevens (Manchester) Alan Williams (Manchester) Rishi Ramgolam (Manchester) Jorg-Uwe Kietz (Zurich) Melanie Hilario (Geneva) E-LICO consortium