Auditing 2.0Using Process Mining to Support Tomorrow's Auditor
prof.dr.ir. Wil van der Aalstwww.processmining.org
Auditing
PAGE 2
“The term auditing refers to the evaluation of organizations
and their processes. Audits are performed to ascertain the
validity and reliability of information about these
organizations and associated processes. This is done to check whether business
processes are executed within certain boundaries set by
managers, governments, and other stakeholders.”
PAGE 4Data Mining
Smoker
Drinker
Weight
Short(91/10)
YesNo
Long(30/1)
NoYes
Long(150/20)
Short(321/25)
<81.5 ≥81.5
Process Mining =
Process Analysis
start register initial conditions
check_Aneeded?
check_A
modify conditions
check_Bneeded?
check_B
check_Cneeded?
check_C
assesrisk
declinec1
c2
c3
c4
c5
c6
c7
c8
c9
c10
c11
c12
c13
makeoffer
handleresponse
handlepayment
send insurance
documents
timeout1 timeout2 withdraw offer
c14 c15 c16
c17
(RM,RD)(RM,RD)(E,SD) (E,RD)
(SM,SD) (E,SD)(E,FD)
(E,SD)
(E,SD)
(YE,RD)
(YE,RD)
(FE,FD)
(RM,RD)
+
PAGE 5
Process Mining
• Process discovery: "What is really happening?"
• Conformance checking: "Do we do what was agreed upon?"
• Performance analysis: "Where are the bottlenecks?"
• Process prediction: "Will this case be late?"
• Process improvement: "How to redesign this process?"
• Etc.
PAGE 6
Process mining: Linking events to models
software system
(process)model
eventlogs
modelsanalyzes
discovery
records events, e.g., messages,
transactions, etc.
specifies configures implements
analyzes
supports/controls
extension
conformance
“world”
people machines
organizationscomponents
business processes
PAGE 8
>,→,||,# relations
• Direct succession: x>y iff for some case x is directly followed by y.
• Causality: x→y iff x>y and not y>x.
• Parallel: x||y iff x>y and y>x
• Choice: x#y iff not x>y and not y>x.
case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task Acase 5 : task Ecase 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task D case 4 : task D
A>BA>CA>EB>CB>DC>BC>DE>D
A→BA→CA→EB→DC→DE→D
B||CC||B
ABCDACBDAED
PAGE 10
Basic Idea Used by α Algorithm (2)
a
b
c
(b) XOR-split pattern:a→b, a→c, and b#c
a
b
c
(c) XOR-join pattern:a→c, b→c, and a#b
a
b
c
(b) XOR-split pattern:a→b, a→c, and b#c
PAGE 11
Basic Idea Used by α Algorithm (3)
a
b
c
(d) AND-split pattern:a→b, a→c, and b||c
a
b
c
(e) AND-join pattern:a→c, b→c, and a||b
a
b
c
(d) AND-split pattern:a→b, a→c, and b||c
Example Revisited
PAGE 12
A
B
C
DE
p2
end
p4
p3p1
start
B#EC#E…
Result produced by α algorithm
A>BA>CA>EB>CB>DC>BC>DE>D
A→BA→CA→EB→DC→DE→D
B||CC||B
PAGE 13
Where did we apply process mining?
• Municipalities (e.g., Alkmaar, Heusden, Harderwijk, etc.)• Government agencies (e.g., Rijkswaterstaat, Centraal
Justitieel Incasso Bureau, Justice department)• Insurance related agencies (e.g., UWV)• Banks (e.g., ING Bank)• Hospitals (e.g., AMC hospital, Catharina hospital)• Multinationals (e.g., DSM, Deloitte)• High-tech system manufacturers and their customers
(e.g., Philips Healthcare, ASML, Ricoh, Thales)• Media companies (e.g. Winkwaves)• ...
Example: WMO Harderwijk
• Process related to the execution of “Wet Maatschappelijke Ondersteuning” (WMO) Harderwijk
• Handling WMO applications• WMO: supporting citizens of municipalities (illness,
handicaps, elderly, etc.).• Examples:
• wheelchair, scootmobiel, ...• adaptation of house (elevator), ...• household help, ...
PAGE 16
Conformance checking using Replay
PAGE 25
= should not have happened but did
= should have happened but did not
How can process mining help?
• Detect bottlenecks• Detect deviations• Performance
measurement• Suggest
improvements• Decision support
(recommendation and prediction)
• Provide mirror• Highlight important
problems• Avoid ICT failures• Avoid management
by PowerPoint • From “politics” to
“analytics”
PAGE 35
PAGE 37
Business Intelligence Tools?
• Business Objects (SAP)• Cognos Business Intelligence (IBM)• Oracle Business Intelligence • Hyperion (Oracle)• SAS Business Intelligence• Microsoft Business Intelligence• SAP Business Intelligence (SAP BI)• Jaspersoft (Open Source Business Intelligence)• Pentaho BI Suite (Open Source)• ....
• Dashboards, reports, scorecards, • Slicing and dicing, data mining, ...
PAGE 38
Process Mining Software
ARIS Process Performance Manager
Interstage Automated Business Process Discovery & Visualization
Process Discovery Focus
Futura Reflect
Enterprise Visualization Suite
Comprehend
BPM|one fluxicon/nitro
ProcessGold
PAGE 40
Starting point: event logs
event logs, audit trails, databases, message logs, etc. www.xes-standard.org
PAGE 41
extensions loaded
every trace has a name
every event has a name and a transition
classifier = name + transitionstart of trace (i.e. process instance)
name of trace
name of event (activity name)
resource
transition
timestamp
PAGE 42PAGE 42
start of trace
name of trace
name of event (activity name)
resource
data associated to event
timestamp
end of trace (i.e. process instance)
Framework
PAGE 44
information system(s)
current data
“world”people
machines
organizationsbusiness
processes documents
historic data
resources/organization
data/rules
control-flow
de jure models
resources/organization
data/rules
control-flow
de facto models
provenance
expl
ore
pred
ict
reco
mm
end
dete
ct
chec
k
com
pare
prom
ote
disc
over
enha
nce
diag
nose
cartographynavigation auditing
event logs
Models
“pre mortem”
“post mortem”software
system
(process)model
eventlogs
modelsanalyzes
discovery
records events, e.g., messages,
transactions, etc.
specifies configures implements
analyzes
supports/controls
extension
conformance
“world”
people machines
organizationscomponents
business processes
PAGE 45
information system(s)
current data
“world”people
machines
organizationsbusiness
processes documents
historic data
control-flow
de jure models
control-flow
de facto models
provenanceex
plor
e
pred
ict
reco
mm
end
dete
ct
chec
k
com
pare
prom
ote
disc
over
enha
nce
diag
nose
cartographynavigation auditing
event logs
Models
“pre mortem”
“post mortem”
A
B
C
DE
p2
end
p4
p3p1
start
Play Out (Classical use of models)
PAGE 47
A B C D
A C B DA B C D
A E D
A C B DA C B D
A E D
A E D
Play In (Process Discovery)
PAGE 48
A
B
C
DE
p2
end
p4
p3p1
start
ABCDACBDAED
ACBDAED
ABCD…
a process discovery algorithm like the αalgorithm
A
B
C
DE
p2
end
p4
p3p1
start
Replay can detect problems
PAGE 50
AC D
Problem!missing token
Problem!token left behind
A
B
C
DE
p2
end
p4
p3p1
start
Replay can extract timing information
PAGE 51
A5B8 C9 D13
5
8
9
13
3
4
5
43
265
8
764
7
74
3
Using Replay
PAGE 52
information system(s)
current data
“world”people
machines
organizationsbusiness
processes documents
historic data
resources/organization
data/rules
control-flow
de jure models
resources/organization
data/rules
control-flow
de facto models
provenance
expl
ore
pred
ict
reco
mm
end
dete
ct
chec
k
com
pare
prom
ote
disc
over
enha
nce
diag
nose
cartographynavigation auditing
event logs
Models
“pre mortem”
“post mortem”
PAGE 62
Other Metrics
• Fitness is not sufficient: hence other metrics are needed such as behavioral and structural appropriateness, etc.
• These metrics cover aspects such as:• Punishing for "too much" behavior.• Punishing for "overly complex" models.
DECLAREAn alternative approach based on constraints ...
forbidden behavior
deviations from the prescribed
model
IMPERATIVE MODEL
constraint constraint
constraint constraint
Example: "existence response"
• OK:• [ ]• [A,B,C,D,E]• [A,A,A,C,D,E,B,B,B]• [B,B,A,A,C,D,E]• [B,C,D,E]
• NOK• [A]• [A,A,C,D,E]
A B
Example: "response"
• OK:• [ ]• [A,B,C,D,E]• [A,A,A,B,C,D,E]• [B,B,A,A,B,C,D,E]• [B,C,D,E]
• NOK• [A]• [B,B,B,B,A,A]
A B
Example: "precedence"
• OK:• [ ]• [A,B,C,D,E]• [A,A,A,C,D,E,B,B,B]• [A,A,C,D,E]
• NOK• [B]• [B,A,C,D,E]
A B
Model with constraints
• (C.1) Always start with activity register client data.• (C.2) Activity bill must be executed at least once. • (C.3) Every room service must be billed. • (C.4) Every laundry service must be billed. • (C.5) If the client checks-out- she/he must be charged. • (C.6) Sometimes it is recommended that additional cleaning is also be billed. (---optional---)
C.1
C.2
C.4
C.5C.3
C.6
Constraints
• constraints can be:• mandatory− imposed by DECLARE− can be fulfilled or temporarily violated
• optional− used as warnings for users− can be fulfilled or temporarily violated or
permanently violated• at the end of the execution all
mandatory constraints have to be fulfilled
Possible ways of using Declare
• Discover Declare models• Check Declare models offline• Check Declare models online• Quantify “health”• Drill-down
Framework
PAGE 78
information system(s)
current data
“world”people
machines
organizationsbusiness
processes documents
historic data
resources/organization
data/rules
control-flow
de jure models
resources/organization
data/rules
control-flow
de facto models
provenance
expl
ore
pred
ict
reco
mm
end
dete
ct
chec
k
com
pare
prom
ote
disc
over
enha
nce
diag
nose
cartographynavigation auditing
event logs
Models
“pre mortem”
“post mortem”
More Information
PAGE 79PAGE 79
IEEE Task Force on Process Mining
• ProM Software: prom.sourceforge.net• Process mining: www.processmining.org• ProM 5 series nightly builds: prom.win.tue.nl/tools/prom/nightly5/• ProM 6 series nightly builds: prom.win.tue.nl/tools/prom/nightly/• Converting logs (MXML-based) promimport.sourceforge.net • XES: www.xes-standard.org and www.openxes.org• Papers et al.: vdaalst.com• IEEE Task Force on Process Mining: www.win.tue.nl/ieeetfpm/