33 mine your own business - ibm your own business ... c a t i o n b p m s y s t e m 1960 1975 1985...
TRANSCRIPT
33
PAGE 0
Mine Your Own Business
Evidence-Based BPM using Process Mining
prof.dr.ir. Wil van der Aalst
IBM BPM Symposium
Kurhaus Wiesbaden 25-9-2013
PAGE 4
How to get
started?
BPM until
now
Process
Discovery
Process Mining:
The missing link
Big (Event)
Data
Aligning
reality and
model
Challenges
PAGE 5
How to get
started?
BPM until
now
Process
Discovery
Process Mining:
The missing link
Big (Event)
Data
Aligning
reality and
model
Challenges
History and Origins of the Domain
PAGE 6
database
system
user
interface
database
system
user
interface
database
system
ap
plic
atio
n
BP
M s
yste
m
1960 1975 1985 2000
ap
plic
atio
n
ap
plic
atio
n
ap
plic
atio
n
BPM
WFM
office
automation
data
modeling
operations
management
scientific
management
data/
process
mining
software
engineering
formal
methods
business
process
reengineering
Skip Ellis,
Office Talk,
1979
Michael Zisman,
SCOOP, 1977
Anatol Holt,
Information
Systems Theory
Project, 1968
Carl Adam Petri,
Petri nets, 1962
Yet another variant of the BPM lifecycle
PAGE 7
diagnosis/
requirements
configuration/
implementation
enactment/
monitoring
adustment
(re)designmodelsdata
insight
discussion
verification
performance
analysisanimation
specificationdocumentation
configuration
Overview BPM Use Cases (see appendix)
PAGE 8
diagnosis/
requirements
configuration/
implementation
enactment/
monitoring
adustment
(re)designmodelsdata
insight
discussion
verification
performance
analysisanimation
specificationdocumentation
configuration
1
2
3
4
5
6
7
8
9
10
11 12
13
14
15
16 17
1819
20
• Use cases to obtain a model [1-5]
• Use cases to obtain a configurable model [6-8]
• Use cases related to enactment [9-13]
• Use cases for model-only-based analysis [14-15]
• Use cases for log&model-based analysis [16-17]
• Use cases to repair, extend or improve process models [18-20] Wil M. P. van der Aalst, “Business Process
Management: A Comprehensive Survey,”
ISRN Software Engineering, vol. 2013, 37
pages, 2013. doi:10.1155/2013/507984
Four main activities related to BPM
PAGE 10
model
enact
analyze
manage
creating a
process model
to be used for
discussion,
training,
analysis or
enactment
using a
process model
to control and
support
concrete
cases
analyzing a process
using a process
model and/or event
logs (verification,
simulation, process
mining, etc.)
all other activities,
e.g., adjusting the
process, reallocating
resources, or
managing large
collections of related
process models
trend
register
request
examine casually
examine thoroughly
check ticket
decide
pay compensation
reject request
reinitiate request
start end
start
register
request
examinethoroughly
examine casually
checkticket
decide
pay compensation
reject request
end
e1
AND
OR
XOR
OR
AND
XOR end
e2
e3
e4 e5
e6
start
register
request
examine thoroughly
examine casually
checkticket
decide
pay compensation
reject request
new information
end
c1
c2
OR-split OR-join
c3
a
start register
request
b
examine thoroughly
c
examine casually
d
check ticket
decide
pay compensation
reject request
reinitiate request
e
g
h
f
end
c1
c2
c3
c4
c5
[start]
register
request
examine thoroughly
examine casually
checkticket
decide
pay compensation
reject request
reinitiate request
[end][c1,c2]
[c1,c4]
[c2,c3]
[c3,c4]
checkticket
examine casually
examine thoroughly
[c5]
PAGE 12
• enormous investments in process models
• large collections of "dead" process models
• not taken seriously, unrelated to reality
Models should be:
- descriptive,
- predictive, and/or
- prescriptive
PAGE 15
How to get
started?
BPM until
now
Process
Discovery
Process Mining:
The missing link
Big (Event)
Data
Aligning
reality and
model
Challenges
Motivation: Increasing awareness of the
value of (Big) Data
• "In God we trust. All others must bring data"
(William Edwards Deming, statistician),
• "Data is a precious thing and will last longer
than the systems themselves" (Tim Berners-
Lee),
• "Statistics are like bikinis. What they reveal is
suggestive, but what they conceal is vital"
(Aaron Levenstein, statistician),
• "Every 2 days we create as much information as
we did up to 2003" (Eric Schmidt, Google CEO,
August 4, 2010).
PAGE 16
PAGE 26
Process-awareness is an
essential but often forgotten
ingredient when converting
big data into real value
PAGE 27
How to get
started?
BPM until
now
Process
Discovery
Process Mining:
The missing link
Big (Event)
Data
Aligning
reality and
model
Challenges
PAGE 28
process mining
data-oriented analysis (data mining, machine learning, business intelligence)
process model analysis (simulation, verification, optimization, gaming, etc.)
performance-oriented
questions, problems and
solutions
compliance-oriented
questions, problems and
solutions
A
B
C
DE
p2
end
p4
p3p1
start
Play-Out (Classical use of models)
PAGE 34
A B C D
A C B D A B C D
A E D
A C B D
A C B D
A E D
A E D
Let’s not worry about syntax (there is
difference between analysis and presentation)
PAGE 35
A
B
C
DE
p2
end
p4
p3p1
start
A
B
C
DE
p2
end
p4
p3p1
start
Play-In
PAGE 37
A C B D A B C D
A E D
A C B D
A C B D
A E D
A E D A B C D
Replay
PAGE 41
event log process model
· extended model
showing times,
frequencies, etc.
· diagnostics
· predictions
· recommendations
A
B
C
DE
p2
end
p4
p3p1
start
Replay can detect problems
PAGE 44
A C D
Problem!
missing token
Problem!
token left behind
Conformance Checking (WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)
PAGE 45
A
B
C
DE
p2
end
p4
p3p1
start
Replay can extract timing information
PAGE 46
A5 B8 C9 D13
5
8
9
13
3
4
5
4 3
2 6 5
8
7 6 4
7
7 4
3
PAGE 47
Performance Analysis Using Replay (WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)
PAGE 49
How to get
started?
BPM until
now
Process
Discovery
Process Mining:
The missing link
Big (Event)
Data
Aligning
reality and
model
Challenges
Language identification in the limit
(Mark Gold 1967)
PAGE 50 Language identification in the limit by E Mark Gold, Information and Control, 10(5):447–474, 1967.
abc
abd
abc ?
ab(c|d) ?
ad
abbc
ac
… (ad)|(ab(c|d)) ?
ab*(c|d) ?
A language is learnable in
the limit if there exists a
perfect child that
generates only finitely
many hypotheses.
Learning is not easy …
• Even simple languages like
regular languages are not
learnable in the limit.
• Many settings: evil or well-
behaving mothers, with or
without negative examples,
frequencies, etc.
PAGE 51
sentence trace in event log
language process model
Process discovery algorithms (small selection)
PAGE 52
α algorithm
α++ algorithm
α# algorithm
language-based regions
state-based regions genetic mining
heuristic mining
hidden Markov models
neural networks
automata-based learning
stochastic task graphs
conformal process graph
mining block structures
multi-phase mining partial-order based mining
fuzzy mining
LTL mining
ILP mining
distributed genetic mining
ETM genetic algorithm Inductive Miner (infrequent)
Places limit behavior
• abcd
• ad
• abed
• abccd
• acbd
• aebcd
• aed
• aad
• caed
• aded
PAGE 57
A
B
C
DE
Places limit behavior
• abcd
• ad
• abed
• abccd
• acbd
• aeccd
• aed
• aad
• caed
• aded
PAGE 58
A
B
C
DE
start
Places limit behavior
PAGE 59
A
B
C
DE
endstart
• abcd
• ad
• abed
• abccd
• acbd
• aebcd
• aed
• aad
• caed
• aded
Places limit behavior
PAGE 60
A
B
C
DE
end
p1
start
• abcd
• ad
• abed
• abccd
• acbd
• aebcd
• aed
• aad
• caed
• aded
Places limit behavior
PAGE 61
A
B
C
DE
p2
end
p4
p3p1
start
• abcd
• ad
• abed
• abccd
• acbd
• aebcd
• aed
• aad
• caed
• aded
Example: Process Discovery Using
Language-Based Regions
PAGE 62
a1
a2
b1
b2
d
pR
e
c1
c
f
YX
A place is feasible if it
can be added without
disabling any of the
traces in the event log.
Genetic process mining: Overview
PAGE 63
next generationcompute
fitnesselitism
parents
crossover
children
mutation
create initial
population
“dead” individuals
tournament
select best
individual
event log
termination
Example: crossover
PAGE 64
a
start register
request
b
examine thoroughly
c
examine casually
d
check ticket
decide
pay compensation
reject request
reinitiate request
e
g
h
f
end
a
start register
request
b
examine thoroughly
c
examine casually
d
check ticket
decide
pay compensation
reject request
reinitiate request
e
g
h
f
end
a
start register
request
b
examine thoroughly
c
examine casually
d
check ticket
decide
reinitiate request
e
f
a
start register
request
b
examine thoroughly
c
examine casually
d
check ticket
decide
pay compensation
reject request
reinitiate request
e
g
h
f
end
pay compensation
reject request
g
h
end
Example: mutation
PAGE 65
a
start register
request
b
examine thoroughly
c
examine casually
d
check ticket
decide
pay compensation
reject request
reinitiate request
e
g
h
f
end
a
start register
request
b
examine thoroughly
c
examine casually
d
check ticket
decide
pay compensation
reject request
reinitiate request
e
g
h
f
end
remove place
added arc
PAGE 66
models are like maps, their usefulness
is determined by the intended use,
i.e., there is not a single "perfect map"
PAGE 67
How to get
started?
BPM until
now
Process
Discovery
Process Mining:
The missing link
Big (Event)
Data
Aligning
reality and
model
Challenges
Conformance checking
PAGE 68
an activity that should
not happen happened
an activity that should
happen did not happen
an activity was executed
by the wrong person
an activity was
executed too late
two activities were
swapped
PAGE 69
• conformance checking to diagnose deviations
• squeezing reality into the model to do model-based
analysis
Alignments are essential!
Example: BPI Challenge 2012 (Dutch financial institute, doi:10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f)
PAGE 71
“O_DECLINED” and “W_Wijzigen contractgegevens” are often skipped
Many moves on log of “O_CANCELLED”,
”O_CREATED”,”O_SELECTED”,
“O_SENT” occurred with the same
frequency value (i.e. 60) before parallel
branch
Many moves on log of “W_Afhandelen leads” ( > 2200 times) occurred in the end of traces
Loops of “W_Completeren aanvraag” and “W_Nabellen offertes” are often performed
Work of Arya Adriansyah (Replay project)
PAGE 72
“O_DECLINED” and “W_Wijzigen contractgegevens” are often skipped
Many moves on log of “O_CANCELLED”,
”O_CREATED”,”O_SELECTED”,
“O_SENT” occurred with the same
frequency value (i.e. 60) before parallel
branch
Many moves on log of “W_Afhandelen leads” ( > 2200 times) occurred in the end of traces
Loops of “W_Completeren aanvraag” and “W_Nabellen offertes” are often performed
Synchronous moves of “Completeren aanvraag”
Move on log of “Completeren aanvraag”
Moves on model towards end of traces
Move on log of “O_CANCELLED” and “A_CANCELLED”
“O_ACCEPTED” has average sojourn time of 27.07 minutes, while “A_REGISTERED”, ”A_ACTIVATED”, and
“A_APPROVED” have average sojourn time of 29.56 minutes
Activity “W_Wijzigen contractgegevens” is the bottleneck, but it occured rarely (only 4 times)
The average waiting time for the input place of “W_Nabellen offertes+START” is very long (2.83 days) compares to the average waiting time of other places
PAGE 74
How to get
started?
BPM until
now
Process
Discovery
Process Mining:
The missing link
Big (Event)
Data
Aligning
reality and
model
Challenges
PAGE 77
formal
(not just a
picture)
fast
(should not
take years) sound
(result should
at least be free
of deadlocks,
etc.)
ability to balance
all conformance
dimensions
(fitness, precision,
generalization, and
simplicity) incl.
noise
provide
guarantees
(not just a best
effort)
1
2
3 4
5
process discovery: finding
sheep with five or more legs
PAGE 83
Demand TomTom! Do not settle for restrictive
information systems and
static process models
predict: when
will I be home
recommend:
turn right
adapt: use real-
time traffic
information
PAGE 84
How to get
started?
BPM until
now
Process
Discovery
Process Mining:
The missing link
Big (Event)
Data
Aligning
reality and
model
Challenges
600+ plug-ins available covering the whole process
mining spectrum
86
Download from: www.processmining.org
open-source (L-GPL)
Commercial Alternatives
• Disco (Fluxicon)
• Perceptive Process Mining (before Futura Reflect and BPM|one)
• ARIS Process Performance
Manager
• QPR ProcessAnalyzer
• Interstage Process Discovery
(Fujitsu)
• Discovery Analyst (StereoLOGIC)
• XMAnalyzer (XMPro)
• …
87
Business Process Insight (BPI) platform
(IBM Research)
BPI provides support
for a comprehensive
process analysis
based on event data,
including process
mining techniques
PAGE 88
http://researcher.ibm.com/researcher
/view_project_subpage.php?id=3010
How to Get Started?
Collect event data
• Minimal requirement:
events referring to an
activity name and a
process instance.
• Good to have:
timestamps, resource
information, additional
data elements.
• Challenges: scoping and
sometimes correlation.
Collect questions
• What kind problems would
you like to address (cost,
time, risk, compliance,
service, etc.)?
• Related to discovery,
conformance,
enhancement?
• Iterative process: can be
“curiosity driven” initially.
89
Data Science Center Eindhoven (DSC/e)
• DSC/e strives to become the internationally leading expertise center for data science research and education.
• Launch Symposium: December 2nd 2013, Auditorium TU/e.
• Exciting program with well-known data science experts from both industry and academia.
• Put the date in your agenda!
• Register via www.tue.nl/dsce/sympos
PAGE 91
Join our expedition: Mine your processes!
PAGE 92
process
mining
data-oriented analysis
(data mining, machine learning, business intelligence)
process model analysis
(simulation, verification, etc.)pe
rform
an
ce
-orie
nte
d q
ue
stio
ns
,
pro
ble
ms
an
d s
olu
tion
s
co
mp
lian
ce
-orie
nte
d q
ue
stio
ns
,
pro
ble
ms
an
d s
olu
tion
s
20 BPM Use Cases
• Use cases to obtain a model [1-5]
• Use cases to obtain a configurable model [6-8]
• Use cases related to enactment [9-13]
• Use cases for model-only-based analysis [14-15]
• Use cases for log&model-based analysis [16-17]
• Use cases to repair, extend or improve process
models [18-20]
Notation:
PAGE 95
MD|N|E
CMD|N|E
S E D
human model configurable model
information system
eventdata
diagnosticsD=descriptiveN=normativeE=executable
Use Case 2:
Discover model from event data (DiscM)
PAGE 97
E MD|E
discover model from event data
(DiscM)
Use Case 3:
Select model from collection (SelM)
PAGE 98
MD|N|E
select model from collection
MMMD|N|E
(SelM)
Use Case 7: Merge models into
configurable model (MerCM)
PAGE 102
CMD|N|E
merge models into configurable model
MMMD|N|E
(MerCM)
variant 1
variant 2
aa bb
dd
gg hh
ff
aa
dd
ee
gg hh
cc
ff
aa bb
dd
ee
gg hh
cc
ff
Use Case 8:
Configure configurable model (ConCM)
PAGE 103
MD|N|E
configure configurable model
CMD|N|E
(ConCM)
aa bb
dd
ee
gg hh
cc
ff
aa
dd
gg hh
ff
Use Case 14: Analyze performance
based on model (PerfM)
PAGE 109
analyze performance based on model
ME
PD(PerfM)
Use Case 16: Check conformance using
event data (ConfED)
PAGE 111
check conformance using event data
ME
CDE (ConfED)
Use Case 17: Analyze performance using
event data (PerfED)
PAGE 112
analyze performance using event data
ME
E PD(PerfED)
Use Case 19:
Extend model (ExtM)
PAGE 114
extend model
ME
E ME
(ExtM)
astart
b
c
d
e
g
h
f
end
timestamps in the event log
can be used to analyze waiting
times in-between activities
1391
1537
566
15371537
566
1391
1391
146146
971 971
1537
1537
146
930 930
461 461
Sue Mike
Pete
Mary
Norman
check="OK" and
report="Approved"
resource information in the event log can
be used for social network analysis, role
discovery, and performance analysis
attributes in the event log can be
used for decision point analysis
Overview Use Cases
PAGE 116
diagnosis/
requirements
configuration/
implementation
enactment/
monitoring
adustment
(re)designmodelsdata
insight
discussion
verification
performance
analysisanimation
specificationdocumentation
configuration
1
2
3
4
5
6
7
8
9
10
11 12
13
14
15
16 17
1819
20
• Use cases to obtain a model [1-5]
• Use cases to obtain a configurable model [6-8]
• Use cases related to enactment [9-13]
• Use cases for model-only-based analysis [14-15]
• Use cases for log&model-based analysis [16-17]
• Use cases to repair, extend or improve process models [18-20]