semantic-driven design and management of kdd processes
DESCRIPTION
Full paper: http://boole.diiga.univpm.it/paper/cts10.pdfTRANSCRIPT
Semantic-Driven Designand Management of KDD Processes
Emanuele [email protected]
Università Politecnica delle MarcheDipartimento di Ingegneria Informatica, Gestionale e dell'AutomazioneAncona, Italy
CTS 2010, Chicago, May 19
Introduction
Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data
CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
Organizations need methods and technologies to analyze huge amounts of data, to support decisional processes
Introduction
Process iteration, many steps
Knowledge user interaction
CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
Team work virtual organizations
Introduction
Process iteration, many steps
Knowledge user interaction
CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
Team work virtual organizations
Introduction
Process iteration, many steps
Knowledge user interaction
CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
Team work virtual organizations
Introduction
Process iteration, many steps
Knowledge user interaction
CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
Team work virtual organizations
domain experts
DBA
DM expert
KDD expert
KDD in a Collaborative Distributed Scenario
Examples: KD for enterprises e-Science workflows
Major issues
Many KDD tools are available for each phase/task:
How to set-up/execute the tools? How to compose them? How to support novice users?
heterogeneity integrationcomplexity
Some general questions:
How to provide support for process design? How to manage execution and interactions?
localization coordination
CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
Distribution of users and tools:
How to locate the needed tools? How to manage coordination?
Approach (i)Service-oriented platform for sharing, discovering, accessing, executing data analysis and knowledge discovery tools
KDD tools produced by different organizations are remotely accessible as basic services through standard protocols
Formalization of experts' knowledge in a conceptual semantic model, to support advanced services (process composition)
CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
KDDONTO: an ontology for describing algorithms, interfaces, data structures, methods, tasks:
sharing of knowledge / agreement on definitions: each actor can refer to the same definition of an algorithm or data
human/machine understandable (conceptual/formal model) automatic reasoning support for non-expert users
CTS 2010, Chicago, May 19
Algorithm
ClassificationAlgorithm
ID3
DecisionTreeAlgorithmID3_v.2.3service
is-a
is-a
is-a
Approach (ii)
KDDONTO fragment
Service + descriptor
Emanuele Storti, UNIVPM, Italy
Separation of information in different layers (reusability):
Algorithm, described into the ontology
Service, implements a specific algorithm its descriptor points to the corresponding ontological concept
Process composition
CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
composer goal,datasetrequirements
KDDONTO
CO
MPO
SITI
ON
Abstract process
Process composition
Planner for semiautomatic composition of abstract KDD process
1. algorithm match: given 2 algorithms, are they compatible? (based on ontology properties - exact vs. approximate match)
CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
y is equal to yX2 is part_of X
Process composition
CTS 2010, Chicago, May 19
KDDComposerPrototype
Emanuele Storti, UNIVPM, Italy
2. goal-oriented composition procedure: iterative execution of algorithm match
Input: goal, dataset, some constraintsExecution: backwards, from goal to datasetOutput: a ranked list of valid abstract processes
Translation to concrete process
CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
composer goal,datasetrequirements
KDDONTO
Abstract process
Concrete process
CO
MPO
SITI
ON
TRA
NSL
ATIO
N
broker UDDIsyntactic verification
Verification and Execution
Collaborative/distributed scenario: complex interactions among actors and time-consuming transactions.
CTS 2010, Chicago, May 19
It is needed to provide guarantees about process correctness at design-time
Reo, a “glue code” for explicitly modeling interaction among components (tools, GUI, ...)
1
2
3
Specification of the interaction protocol
Interaction design
Specs verification
Emanuele Storti, UNIVPM, Italy
Verification and Execution
CTS 2010, Chicago, May 19 Emanuele Storti, UNIVPM, Italy
composer goal,datasetrequirements
KDDONTO
Abstract process
Concrete process
CO
MPO
SITI
ON
TRA
NSL
ATIO
NVE
RIF
ICAT
ION
EXEC
UTI
ON
REOmodeler
modelchecking exec
broker UDDIsyntactic verification