how people really (like to) work - uni-paderborn.de · 2014-09-24 · how people really (like to)...
TRANSCRIPT
How People Really (Like To) Work
prof.dr.ir. Wil van der Aalst
5th International Conference on Human-Centered Software
Engineering (HCSE 2014), Paderborn, 16-9-2014.
Comparative Process Mining
to Unravel Human Behavior
Process mining: The missing link
PAGE 3
process mining
data-oriented analysis(data mining, machine learning, business intelligence)
process model analysis(simulation, verification, optimization, gaming, etc.)
performance-oriented
questions, problems and
solutions
compliance-oriented
questions, problems and
solutions
Positioning Process Mining
PAGE 4
process
discoveryprocess
mining
conformance
checking predictive
analytics
data
mining /
machine
learning
BPM
BI?
BPI?
Starting point for process mining:
Event data
student name course name exam date mark
Peter Jones Business Information systems 16-1-2014 8
Sandy Scott Business Information systems 16-1-2014 5
Bridget White Business Information systems 16-1-2014 9
John Anderson Business Information systems 16-1-2014 8
Sandy Scott BPM Systems 17-1-2014 7
Bridget White BPM Systems 17-1-2014 8
Sandy Scott Process Mining 20-1-2014 5
Bridget White Process Mining 20-1-2014 9
John Anderson Process Mining 20-1-2014 8
… … … …
PAGE 7
case id activity
name timestamp other data
every row is an event
(here: an exam attempt)
Another event log: order handling
order
number
activity timestamp user product quantity
9901 register order [email protected] Sara Jones iPhone5S 1
9902 register order [email protected] Sara Jones iPhone5S 2
9903 register order [email protected] Sara Jones iPhone4S 1
9901 check stock [email protected] Pete Scott iPhone5S 1
9901 ship order [email protected] Sue Fox iPhone5S 1
9903 check stock [email protected] Pete Scott iPhone4S 1
9901 handle payment [email protected] Carol Hope iPhone5S 1
9902 check stock [email protected] Pete Scott iPhone5S 2
9902 cancel order [email protected] Carol Hope iPhone5S 2
… … … … … …
PAGE 8 case id
activity
name timestamp other data resource
Another event log: patient treatment
patient activity timestamp doctor age cost
5781 make X-ray [email protected] Dr. Jones 45 70.00
5541 blood test [email protected] Dr. Scott 61 40.00
5833 blood test [email protected] Dr. Scott 24 40.00
5781 blood test [email protected] Dr. Scott 45 40.00
5781 CT scan [email protected] Dr. Fox 45 1200.00
5833 surgery [email protected] Dr. Scott 24 2300.00
5781 handle payment [email protected] Carol Hope 45 0.00
5541 radiation therapy [email protected] Dr. Jones 61 140.00
5541 radiation therapy [email protected] Dr. Jones 61 140.00
… … … … … …
PAGE 9
case id activity
name timestamp other data
resource
Quiz Question:
How to remove behavior?
PAGE 40
A
B
C
DE
p2
end
p4
p3p1
start
ABCD AED
ACBD
ABCD ABCD
ABCD AED ACBD
Answer:
Add places or remove transitions
PAGE 41
A
B
C
DE
p2
end
p4
p3p1
start
ABCD AED
ACBD
ABCD ABCD
ABCD AED ACBD
Answer:
Add transitions or remove places!
PAGE 43
A
B
C
DE
p2
end
p4
p3p1
start
FABCD AED
AFBD
ABCD ABCD
BACD ABFD ACBD
BAFD
Process discovery algorithms (small selection)
PAGE 44
α algorithm
α++ algorithm
α# algorithm
language-based regions
state-based regions genetic mining
heuristic mining
hidden Markov models
neural networks
automata-based learning
stochastic task graphs
conformal process graph
mining block structures
multi-phase mining partial-order based mining
fuzzy mining
LTL mining
ILP mining
distributed genetic mining
ETM genetic algorithm Inductive Miner (infrequent)
Language based regions
PAGE 45
a1
a2
b1
b2
d
pR
e
c1
c
f
YX
Region R = (X,Y,c) corresponding to place pR: X = {a1,a2,c1} =
transitions producing a token for pR, Y = {b1,b2,c1} = transitions
consuming a token from pR, and c is the initial marking of pR.
Basic idea: enough tokens should be
present when consuming
PAGE 46
a1
a2
b1
b2
d
pR
e
c1
c
f
YX
A place is feasible if it
can be added without
disabling any of the
traces in the event log.
Process mining is about connecting things
• Data – Process
• Business – IT
• Business Intelligence – Business Process Management
• Performance – Compliance
• Runtime – Design time
• …
PAGE 47
Processes are not just about control-
flow!
PAGE 48
control-flow
perspective
resource
perspective
data-flow
perspective time
perspective cost
perspective
energy
perspective
Process mining spectrum
• Online and offline (near
realtime).
• All perspectives.
• De facto models are
descriptive/predictive.
• De jure models are
normative/prescriptive.
• Process discovery is just
one element: Aligning
model and reality is the
key thing.
PAGE 49
information system(s)
current
data
“world”people
machines
organizationsbusiness
processes documents
historic
data
resources/
organization
data/rules
control-flow
de jure models
resources/
organization
data/rules
control-flow
de facto models
provenance
event logs
models
“pre
mortem”
“post
mortem”
check
conformancediscover
enhance/
repair
performance
analysis
monitor
compliance
predict perf.,
risks, costs, ...
recommend
based on goal
Process discovery is like applying
a function
PAGE 51
4.24
3.95
4.89
…
4.56 compute average
e1
e2
e3
…
create model
(freq., sum, variance, mode, …)
(alpha, heuristic, fuzzy, social,
dotted, compliance, etc., etc.)
2 dimensions: model = number
PAGE 52
hours: 50.5
number: 32 passed
failed
TU/e HSE QUT
hours: 48.5
number: 55
hours: 23.2
number: 49
hours: 23.5
number: 23
hours: 10.5
number: 4
hours: 24.5
number: 8
Example: Hertz has 8,650 rental locations
and different types of customers
PAGE 55
gold
silver
normal
Example: All Dutch municipalities need
to handle building permits
PAGE 56
>100k
50k
>50k & 100k
10 Dutch
municipalities
Example: students watching recorded
video lectures and making exams
PAGE 58
{1,2,3,4}
{5,6,7}
{8,9,10}
mark
Process Cubes (OLAP for processes)
PAGE 59
Cour
se_I
nsta
nce
Student.nationality
Stu
den
t.ge
nd
er
process cell
event
New behavior
process cube
cell sublog
case
s
time
male
German
Jan-Aug-2011
dimension
mal
efe
mal
e
Dutch Chinese German
process mining results per cell
Overview data for a particular course
PAGE 60
every horizontal line corresponds to a
student (287 students)
a red dots refers to an exam attempt
course instance running from January
2011 until August 2011
a non-red dots refers to a student viewing a particular lecture (each
lecture has a unique color)
time runs from left to right (period of
approximately 3 years)each dot refers to an event (6744 events in total)
Drilling-down into a course instance
PAGE 61
Dutch studentsDutch students International studentsInternational students
Stu
den
ts t
hat
pas
sed
Stu
den
ts t
hat
pas
sed
Stu
den
ts t
hat
did
no
t p
ass
Stu
den
ts t
hat
did
no
t p
ass
Comparing processes
PAGE 62
Fitness of event
log wrt idealized
model is 0.37
Fitness of event
log wrt idealized
model is 0.28
Overview of approach
PAGE 63
event data
process cube
event logs (per cell)
1
2
3
discovered models (per cell)
normative models
5
4
6
7
1. Store events in the
process cube.
2. Materialize the events
in a cell as an event log
that can be analyzed.
3. Automatically discover
models per cell (e.g., a
BPMN or UML model).
4. Check conformance by
replaying event data on
normative (process)
models.
5. Compare discovered
models and normative
models.
6. Compare discovered
models corresponding
to different cells.
7. Compare different
behaviors by replaying
event data of one cell
on another cell's model.
What if?
PAGE 66
book car
c
add extra
insurance
dchange
booking
e
confirm initiate
check-in
j
check driver’s
license
k
charge credit
card
i
select car
g
supply
car
in
a
b
skip extra
insurance
f
h
add extra
insurance
skip extra
insurance
l
out
c1 c2
c3
c4
c5
c6
c7
c8
c9
c10
c11
acefgijkl
acddefhkjil
abdefjkgil
acdddefkhijl
acefgijkl
abefgjikl
...
processdiscovery
conformance checking
there are more than
1000 different
activities?
there are more
than 1.000.000
cases?
there are more
than 100.000.000
events?
Vertical distribution I:
Split cases arbitrarily
PAGE 68
abcdegabdcefbcdeg
abdcegabcdefbcdegabdcefbdcegabcdefbdceg
abcdegabdceg
abdcefbdcefbdcegabcdeg
abcdefbcdefbdcegabcdefbdceg
abcdegabdceg
abdcefbcdegabcdeg
abcdegabdcefbcdeg
abdcegabcdefbcdegabdcefbdcegabcdefbdceg
abcdegabdceg
abdcefbdcefbdcegabcdeg
abcdefbcdefbdcegabcdefbdceg
abcdegabdceg
abdcefbcdegabcdeg
sets of
cases
Horizontal distribution
PAGE 70
abcdegabdcefbcdeg
abdcegabcdefbcdegabdcefbdcegabcdefbdceg
abcdegabdceg
abdcefbdcefbdcegabcdeg
abcdefbcdefbdcegabcdefbdceg
abcdegabdceg
abdcefbcdegabcdeg
abegabefbeg
abegabefbegabefbegabefbeg
abegabeg
abefbefbegabeg
abefbefbegabefbeg
abegabeg
abefbegabeg
bcdebdcebcde
bdcebcdebcdebdcebdcebcdebdce
bcdebdce
bdcebdcebdcebcde
bcdebcdebdcebcdebdce
bcdebdce
bdcebcdebcde
sets of
activities
Horizontal distribution: The key idea
PAGE 71
abegabefbeg
abegabefbegabefbegabefbeg
abegabeg
abefbefbegabeg
abefbefbegabefbeg
abegabeg
abefbegabeg
bcdebdcebcde
bdcebcdebcdebdcebdcebcdebdce
bcdebdce
bdcebdcebdcebcde
bcdebcdebdcebcdebdce
bcdebdce
bdcebcdebcde
a b e
c1in
g
outc6
f
b
c
d
ec2
c3
c4
c5
projected on
a,b,e,f,g
projected on
b,c,d,e
Two foundational ways of spitting event
data: horizontal or vertical
PAGE 72
A
start complete
G
B
C
D
E
F
1: ACDEG2: BCFG3: BCFG
4: ACEDG5: ACFG
6: ACEDG7: BCEDG8: BCDEG
1: AC2: BC3: BC4: AC5: AC6: AC7: BC8: BC
A
G
B
C
D
E
FC
1: CDEG2: CFG3: CFG
4: CEDG5: CFG
6: CEDG7: CEDG8: CDEG
split
horizontally
A C F G
B C F G
B C
D
G
E
A C
D
G
E
1: ACDEG4: ACEDG6: ACEDG
7: BCEDG8: BCDEG
5: ACFG
2: BCFG3: BCFG
merge
horizontally
split
vertically
merge
vertically
Decomposing Conformance Checking
PAGE 74
SNprocessmodel
Levent log
decomposition technique
conformancechecking
technique
M1
submodel
L1sublog
conformance check
M2
submodel
L2sublog
conformance check
Mn
submodel
Lnsublog
conformance check
conformance diagnostics
decompose model
decompose event log
e.g., maximal decomposition, passage-based decomposition, or SESE/RPST-based decomposition
e.g., A* based alignments, token-based replay, or simple replay until first deviation
yields a (valid) activity partitioning
See "divide and conquer"
framework by Eric Verbeek.
Example of a valid decomposition
PAGE 75
a
start t1
SN1
a
c
e
c2
t1
t4
t6SN3
ab
d
e
c1 c3
t1
t2
t3
t5
t6
SN2
d
g
h
e
c5
f
t5
t6
t7 t8
t9
t10
c6
c7
SN5
c
d
c4
t4
t5
SN4
g
h
end
f
t8
t9
t10
t11c8
c9
SN6
a
start
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
Log can be split in the same way!
Example of an alignment for observed
trace a,b,c,d,e,c,d,g,f
PAGE 76
a
start
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
a
start t1
SN1
a
c
e
c2
t1
t4
t6SN3
ab
d
e
c1 c3
t1
t2
t3
t5
t6
SN2
d
g
h
e
c5
f
t5
t6
t7 t8
t9
t10
c6
c7
SN5
c
d
c4
t4
t5
SN4
g
h
end
f
t8
t9
t10
t11c8
c9
SN6
Etc.
a,b,c,d,e,c,d,g,f
Conformance checking can be
decomposed !!!
• General result for any valid decomposition: Any
event log or trace is perfectly fitting the overall
model if and only if it is also fitting all the individual
fragments
PAGE 77
a
start
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
a
start t1
SN1
a
c
e
c2
t1
t4
t6SN3
ab
d
e
c1 c3
t1
t2
t3
t5
t6
SN2
d
g
h
e
c5
f
t5
t6
t7 t8
t9
t10
c6
c7
SN5
c
d
c4
t4
t5
SN4
g
h
end
f
t8
t9
t10
t11c8
c9
SN6
Wil van der Aalst, Decomposing Petri nets for process mining:
A generic approach. Distributed and Parallel Databases,
Volume 31, Issue 4, pp 471-507, 2013
Decomposing Process Discovery
PAGE 79
Mprocessmodel
Levent log
decomposition technique
process discoverytechnique
M1
submodel
L1sublog
discovery
M2
submodel
L2sublog
Mn
submodel
Lnsublog
compose model
decompose event log
discovery discovery
e.g., causal graph based on frequencies is decomposed using passages or SESE/RPST
e.g., language/state-based region discovery, variants of alpha algorithm, genetic process mining
yields a (valid) activity partitioning
See "divide and conquer"
framework by Eric Verbeek.
PAGE 81
Event data is everywhere!
Process are everywhere!
Why not connect them?
Process mining!
Challenges:
• comparative process mining
• Big event data, Big processes
Process Mining: Data Science in Action https://www.coursera.org/course/procmin
data mining
processmining
visualization
data science
behavioral/social
sciences
domainknowledge
machine learning
large scale distributedcomputing
statistics industrialengineering
databases
business process
intelligence
stochastics
privacy
algorithms
visual analytics
First Massive Open
Online Course (MOOC)
on Process Mining