33 process cubes - eindhoven university of technology results generated using procube page 37 . page...
TRANSCRIPT
-
33
PAGE 0
Process Cubes
Wil van der Aalst
AIS Meeting 13-2-2014
-
DSC/e's Motto
Turning Data into Value
-
What
happened?
-
Why did
it happen?
-
What will
happen?
-
What is the
best that
can happen?
-
Data Science
aims to
answer such
questions
Process Mining
-
Motivation
PAGE 7
4.56 compute average
create model
(freq., sum, variance, mode, …)
(alpha, heuristic, fuzzy, social,
dotted, compliance, etc., etc.)
-
2 dimensions: model = number
PAGE 8
hours: 50.5
number: 32 passed
failed
TU/e HSE QUT
hours: 48.5
number: 55
hours: 23.2
number: 49
hours: 23.5
number: 23
hours: 10.5
number: 4
hours: 24.5
number: 8
-
2 dimensions: higher-level models
PAGE 9
passed
failed
TU/e HSE QUT
-
Process Cubes
PAGE 10
W.M.P. van der Aalst.
Process Cubes: Slicing, Dicing, Rolling Up
and Drilling Down Event Data for Process
Mining. In M. Song, M. Wynn, and J. Liu, editors, Asia Pacific
Conference on Business Process Management (AP-
BPM 2013), volume 159 of Lecture Notes in Business
Information Processing, pages 1-22. Springer-Verlag,
Berlin, 2013.
10.1007/978-3-319-02922-1_1
-
Process Cube: Dimensions and Cells
PAGE 11
time
win
dow
event class
case
typ
e
process cell
event
New behaviorprocess cube
cell model
cell sublog
case
s
time
ct
ec
tw
dimension
caseactivitytimestampresourcetype….
event
-
Example Dimensions
PAGE 12
time location
group
concept
drift
analysis cross-
organizational
process
mining
clustering and
classification
acbe abce ade
acbe abce ade
acbe abce ade
cases may
be
distributed
over
multiple
cells, the
same event
may belong
to multiple
cells
-
Example
PAGE 13
gold
silver
normal
-
Another Example
PAGE 14
>100k
50k
>50k & 100k
http://www.win.tue.nl/coselog/wiki/_detail/participants.png?id=start
-
Another Example
PAGE 15
-
Another Example
PAGE 16
{1,2,3,4}
{5,6,7}
{8,9,10}
-
Roll up and drill down
PAGE 17
time
win
dow
event class
case
typ
e A
G
B
C
D
E
F
1: ACDEG2: BCFG3: BCFG
4: ACEDG5: ACFG
6: ACEDG7: BCEDG8: BCDEG
1: AC4: AC5: AC6: AC
1: CDEG4: CEDG5: CFG
6: CEDG
2: BC3: BC7: BC8: BC
2: CFG3: CFG
7: CEDG8: CDEG
A
C
B
C
GC
D
E
F
GC
D
E
F
time
win
dow
event class
case
typ
e
drill down
gold
cu
sto
mer
silv
er c
ust
om
er
2012
2013
2012
2013
sales delivery
roll up
-
Two foundational ways of spitting event
data: horizontal or vertical
PAGE 18
A
start complete
G
B
C
D
E
F
1: ACDEG2: BCFG3: BCFG
4: ACEDG5: ACFG
6: ACEDG7: BCEDG8: BCDEG
1: AC2: BC3: BC4: AC5: AC6: AC7: BC8: BC
A
G
B
C
D
E
FC
1: CDEG2: CFG3: CFG
4: CEDG5: CFG
6: CEDG7: CEDG8: CDEG
split
horizontally
A C F G
B C F G
B C
D
G
E
A C
D
G
E
1: ACDEG4: ACEDG6: ACEDG
7: BCEDG8: BCDEG
5: ACFG
2: BCFG3: BCFG
merge
horizontally
split
vertically
merge
vertically
-
Ingredients
PAGE 19
Event Base (EB) with
(raw) event data
Process Cube Structure (PCS)
defining dimensions
independent of content
Process Cube View (PCV)
based on PCS that can be
linked to real events
-
Event Base (E,P, π)
• Set of events E
• Set of properties P
• Partial mapping π
(πp(e)=v means event e has property p with value v)
PAGE 20
-
Slightly more formal
PAGE 21
-
Process Cube Structure (D,type,hier)
• D is the set of dimensions
• type defines the type of each dimension (string, int,
etc.)
• hier defines the hierarchy per dimension (does not
need to be a tree, elements at the same level may be
partially overlapping)
PAGE 22
caseactivityresourcetypetotaltime
Process Cube Dimensions
-
Compatible: D P and correct types
PAGE 23
reso
urce
activity
case
typ
e
Event Base (EB) with
(raw) event data
Process Cube Structure
(PCS) defining
dimensions D
independent of content
-
Slightly more formal
PAGE 24
-
Process Cube View (Dsel , sel)
• View on a process cube structure (D,type,hier).
• Dsel D are the selected dimensions.
• sel is a function determining the structure of each
dimension in D .
PAGE 25
reso
urce
activity
case
typ
e
reso
urce
activity
-
Slightly more formal
PAGE 26
-
Process Cube View Example (1/3):
Dimension resource is not selected
PAGE 27
reso
urce
activity
case
typ
e
activity
case
typ
e
PCV = (Dsel , sel) PCS = (D,type,hier)
-
Process Cube View Example (2/3):
Dimension activity is coarsened
PAGE 28
reso
urce
activityca
se t
ype
e.g., per subprocess rather than per activity
reso
urce
activity
case
typ
e
PCV = (Dsel , sel) PCS = (D,type,hier)
-
Process Cube View Example (3/3):
Dimension case type is filtered
PAGE 29
reso
urce
activity
case
typ
e
part of the case type dimension is not selected
reso
urce
activity
case
typ
e
PCV = (Dsel , sel) PCS = (D,type,hier)
-
Materialized Process Cube View
PAGE 30
activity
case
typ
e
reso
urce
activity
case
typ
e
reso
urce
activity
case
typ
e
3 x 2 x 1 = 6 event logs
2 x 2 x 3 = 12 event logs
3 x 1 x 3 = 9 event logs
-
An event may be part of multiple cells!!
• departments may
overlap
• subprocesses share
interfaces
• contracts involving
multiple parties
• ….
PAGE 31
activity
case
typ
e
• events are unique,
• but may be part of multiple cells
-
Slightly more formal
PAGE 32
-
OLAP (Online Analytical Processing)
PAGE 33
-
OLAP operations can be formalized (see W.M.P. van der Aalst. Process Cubes: Slicing, Dicing, Rolling Up and Drilling Down Event Data
for Process Mining. In AP-BPM 2013, volume 159 of Lecture Notes in Business Information
Processing, pages 1-22. Springer-Verlag, Berlin, 2013.)
PAGE 34
-
Initial Implementation
(work of Tatiana Mamaliga)
PAGE 35 http://www.processmining.org/_media/blogs/pub2013/masterthesistatianamamaliga.pdf
-
Architecture
PAGE 36
-
Results Generated Using ProCube
PAGE 37
-
PAGE 38
problem 1
performance
-
Sparsity is a problem when unloading
cells to ProM
PAGE 39
Process cubes tend to
be sparse. Palo does
not handle this well.
Ongoing work of Shengnan Guo.
-
PAGE 40
problem 2
interpretation
-
PAGE 41
-
Two variants of the same "model" …
PAGE 42
comparing models is closely related to configurable process models
-
PAGE 43
link to decomposed
process mining
-
Remember
PAGE 44
A
start complete
G
B
C
D
E
F
1: ACDEG2: BCFG3: BCFG
4: ACEDG5: ACFG
6: ACEDG7: BCEDG8: BCDEG
1: AC2: BC3: BC4: AC5: AC6: AC7: BC8: BC
A
G
B
C
D
E
FC
1: CDEG2: CFG3: CFG
4: CEDG5: CFG
6: CEDG7: CEDG8: CDEG
split
horizontally
A C F G
B C F G
B C
D
G
E
A C
D
G
E
1: ACDEG4: ACEDG6: ACEDG
7: BCEDG8: BCDEG
5: ACFG
2: BCFG3: BCFG
merge
horizontally
split
vertically
merge
vertically
both process
cubes and
decomposed
process mining
split event logs
-
Decomposing Conformance Checking
PAGE 45
SNprocessmodel
Levent log
decomposition technique
conformancechecking
technique
M1submodel
L1sublog
conformance check
M2submodel
L2sublog
conformance check
Mnsubmodel
Lnsublog
conformance check
conformance diagnostics
decompose model
decompose event log
e.g., maximal decomposition, passage-based decomposition, or SESE/RPST-based decomposition
e.g., A* based alignments, token-based replay, or simple replay until first deviation
yields a (valid) activity partitioning
See "divide and conquer"
framework by Eric Verbeek.
-
Example of a valid decomposition
PAGE 46
a
start t1
SN1
a
c
e
c2
t1
t4
t6SN3
ab
d
e
c1 c3
t1
t2
t3
t5
t6
SN2
d
g
h
e
c5
f
t5
t6
t7 t8
t9
t10
c6
c7
SN5
c
d
c4
t4
t5
SN4
g
h
end
f
t8
t9
t10
t11c8
c9
SN6
a
start
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
Log can be split in the same way!
-
Example of an alignment for observed
trace a,b,c,d,e,c,d,g,f
PAGE 47
a
start
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
a
start t1
SN1
a
c
e
c2
t1
t4
t6SN3
ab
d
e
c1 c3
t1
t2
t3
t5
t6
SN2
d
g
h
e
c5
f
t5
t6
t7 t8
t9
t10
c6
c7
SN5
c
d
c4
t4
t5
SN4
g
h
end
f
t8
t9
t10
t11c8
c9
SN6
Etc.
a,b,c,d,e,c,d,g,f
-
Conformance checking can be
decomposed !!!
• General result for any valid decomposition: Any
event log or trace is perfectly fitting the overall
model if and only if it is also fitting all the individual
fragments
PAGE 48
a
start
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
a
start t1
SN1
a
c
e
c2
t1
t4
t6SN3
ab
d
e
c1 c3
t1
t2
t3
t5
t6
SN2
d
g
h
e
c5
f
t5
t6
t7 t8
t9
t10
c6
c7
SN5
c
d
c4
t4
t5
SN4
g
h
end
f
t8
t9
t10
t11c8
c9
SN6
Wil van der Aalst, Decomposing Petri nets for process mining:
A generic approach. Distributed and Parallel Databases,
Volume 31, Issue 4, pp 471-507, 2013
-
Decomposing Process Discovery
PAGE 49
Mprocessmodel
Levent log
decomposition technique
process discoverytechnique
M1submodel
L1sublog
discovery
M2submodel
L2sublog
Mnsubmodel
Lnsublog
compose model
decompose event log
discovery discovery
e.g., causal graph based on frequencies is decomposed using passages or SESE/RPST
e.g., language/state-based region discovery, variants of alpha algorithm, genetic process mining
yields a (valid) activity partitioning
See "divide and conquer"
framework by Eric Verbeek.
-
Learn more about decomposing process
mining problems?
• W.M.P. van der Aalst. Decomposing Petri Nets for Process Mining: A Generic Approach. Distributed and Parallel
Databases, 31(4):471-507, 2013.
• W.M.P. van der Aalst. A General Divide and Conquer Approach for Process Mining. In M. Ganzha, L. Maciaszek,
and M. Paprzycki, editors, Federated Conference on Computer Science and Information Systems (FedCSIS
2013), pages 1-10. IEEE Computer Society, 2013.
• W.M.P. van der Aalst and E. Verbeek. Process Discovery and Conformance Checking Using Passages.
Fundamenta Informaticae, 2014.
• W.M.P. van der Aalst. Decomposing Process Mining Problems Using Passages. In S. Haddad and L. Pomello,
editors, Applications and Theory of Petri Nets 2012, volume 7347 of Lecture Notes in Computer Science, pages
72-91. Springer-Verlag, Berlin, 2012.
• J. Munoz-Gama, J. Carmona, and W.M.P. van der Aalst. Hierarchical Conformance Checking of Process Models
Based on Event Logs. In J.M. Colom and J. Desel, editors, Applications and Theory of Petri Nets 2013, volume
7927 of Lecture Notes in Computer Science, pages 291-310. Springer-Verlag, Berlin, 2013.
• J. Munoz-Gama, J. Carmona, and W.M.P. van der Aalst. Conformance Checking in the Large: Partitioning and
Topology. In F. Daniel, J. Wang, and B. Weber, editors, International Conference on Business Process
Management (BPM 2013), volume 8094 of Lecture Notes in Computer Science, pages 130-145. Springer-Verlag,
Berlin, 2013.
• J. Munoz-Gama, J. Carmona, and W.M.P. van der Aalst. Single-Entry Single-Exit Decomposed Conformance
Checking. Information Systems, 2014.
• E. Verbeek and W.M.P. van der Aalst. Decomposing Replay Problems: A Case Study. In D. Moldt and H. Roelke,
editors, Proceedings of the International Workshop on Petri Nets in Software Engineering (PNSE 2013), volume
989 of CEUR Workshop Proceedings, pages 213-232. CEUR-WS.org, 2013.
PAGE 50
-
PAGE 51
Challenges
-
PAGE 52
-
PAGE 53
formal
(not just a
picture)
fast
(should not
take years) sound
(result should
at least be free
of deadlocks,
etc.)
ability to balance
all conformance
dimensions
(fitness, precision,
generalization, and
simplicity) incl.
noise
provide
guarantees
(not just a best
effort)
1
2
3 4
5
process
discovery
challenge
-
Selected challenges in process mining
• Process mining in the Large (DeLiBiDa project)
− decomposition & distribution (SESE, passages, etc.)
− streaming data
− not just discovery (conformance, repair, ext., pred.!)
• Mining with/for stochastic models
− likelihood of models, alignments, delays
• Operational support
− cases!
• Data-aware and resource-aware process mining
− decision points, work distribution, etc.
− exploiting more informative event logs
− low-level logs versus higher-level processes
PAGE 54
-
Selected challenges in process mining
• Process of process mining
− workflow support, reporting
− parameter tuning, experimentation
− links to other types of mining (e.g., text mining)
− repositories (models, data, and analysis techniques)
− guidelines
• Log extraction
− correlation, n:m relations, etc.
− continuous to discrete ("mine your own body")
• Comparative process mining
− process cubes, concept drift, etc.
• Process improvement
− optimization (improve your past), simulation, cigar box
PAGE 55
-
Conclusion
• Challenges in process mining:
concept drift, comparing processes,
cases, and organizations, need for
contextual analysis, limited
scalability, performance problems.
• Unifying view: Process Cubes.
• Similar to OLAP but also many
differences.
• Related to decomposing process
mining problems.
• Initial implementations, but just the
starting point.
PAGE 56