process mining: auditing based on facts rather than … · process mining: auditing based on facts...

65
Process Mining: Auditing Based on Facts Rather than Fiction prof.dr.ir. Wil van der Aalst www.processmining.org

Upload: trinhnguyet

Post on 26-Jul-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Process Mining: Auditing Based on Facts Rather than Fiction prof.dr.ir. Wil van der Aalstwww.processmining.org

Big Data

PAGE 1

Source: “Big Data: The Next Frontier for Innovation, Competition, and Productivity” McKinsey Global Institute, 2011.

“Enterprises globally stored more than 7 exabytesof new data on disk drives in 2010, while consumers stored morethan 6 exabytes of new data on devices such as PCs and notebooks.”

“All of the world's music can be stored

on a $600 disk drive.”

“Indeed, we are generating so much data today that it is

physically impossible to store it all. Health

care providers, for instance, discard 90

percent of the data that they generate.”

PAGE 2

Hilbert and Lopez. The World's Technological Capacity to Store, Communicate, and Compute Information. Science, 332(6025):60-65, 2011.

PAGE 3

Event Data + Processes

Process Mining =

Data Mining + Process Analysis

PAGE 4

Process Mining: Overview

information system(s)

current data

people

machines

organizationsbusiness

processes documents

historic data

resources/organization

data/rules

control-flow

de jure models

resources/organization

data/rules

control-flow

de facto models

provenance

expl

ore

pred

ict

reco

mm

end

dete

ct

chec

k

com

pare

prom

ote

disc

over

enha

nce

diag

nose

cartographynavigation auditing

event logs

models

“pre mortem”

“post mortem”

We applied ProM in >100 organizations

PAGE 5

• Municipalities (e.g., Alkmaar, Heusden, Harderwijk, etc.)• Government agencies (e.g., Rijkswaterstaat, Centraal

Justitieel Incasso Bureau, Justice department)• Insurance related agencies (e.g., UWV)• Banks (e.g., ING Bank)• Hospitals (e.g., AMC hospital, Catharina hospital)• Multinationals (e.g., DSM, Deloitte)• High-tech system manufacturers and their customers

(e.g., Philips Healthcare, ASML, Ricoh, Thales)• Media companies (e.g. Winkwaves)• ...

Hundreds to plug-ins available covering the whole spectrum

PAGE 6

• Open-source (L-GPL), cf. www.processmining.org• Plug-in architecture• Plug-ins cover the whole process mining spectrum and

also support classical forms of process analysis

Process Mining Manifesto

• On 7 October 2011, the IEEE Task Force on Process Mining released the Process Mining Manifesto

• 53 organizations support the manifesto

• 77 process mining experts contributed to it

• Translated into Chinese, German, French, Spanish, Greek, Italian, Korean, Dutch, Portuguese, Turkish, and Japanese.

PAGE 7

Evidence-Based BPM

PAGE 8

Doing BPM without process mining is like:• physics without experiments• surgery without diagnostic testing• …The proof of the pudding is in the eating!

PAGE 9

Desire Lines or Cow Paths?

PAGE 10

www.olifantenpaadjes.nl

PAGE 11

PAGE 12

PAGE 13

“Cow Paths” by David Larson Evans

PAGE 14

BPR mantra: Don’t pave the cow paths

PAGE 15

“It is time to stop paving the cow paths. Instead of embedding outdated processes in silicon and software, we should obliterate them and start over.” Michael Hammer in “Reengineering Work: Don’t Automate, Obliterate” (1990)

PAGE 16

Don’t pave new roads if you do not know the cow paths!

Don’t close your

eyes for event data!

PAGE 17

The Four Dimensions of Conformance Checking

Challenge: four competing quality criteria

PAGE 18

“able to replay event log” “Occam’s razor”

“not overfitting the log” “not underfitting the log”

Example: one log four models

PAGE 19

astart register

request

bexamine thoroughly

cexamine casually

d checkticket

decide

pay compensation

reject request

reinitiate requeste

g

hfend

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N3 : fitness = +, precision = -, generalization = +, simplicity = +

N2 : fitness = -, precision = +, generalization = -, simplicity = +

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

N1 : fitness = +, precision = +, generalization = +, simplicity = +

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N4 : fitness = +, precision = +, generalization = -, simplicity = -

aregister request

dexamine casually

ccheckticket

decide reject request

e h

a cexamine casually

dcheckticket

decide

e g

a dexamine casually

ccheckticket

decide

e g

register request

register request

pay compensation

pay compensation

aregister request

b dcheckticket

decide reject request

e h

aregister request

d bcheckticket

decide reject request

e h

a b dcheckticket

decide

e gregister request

pay compensation

examine thoroughly

examine thoroughly

examine thoroughly

(all 21 variants seen in the log)

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

process discovery

fitness

precisiongeneralization

simplicity

“able to replay event log” “Occam’s razor”

“not overfitting the log” “not underfitting the log”

Model N1

PAGE 20

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

Model N2

PAGE 21

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

Model N3

PAGE 22

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

Model N4

PAGE 23

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N4 : fitness = +, precision = +, generalization = -, simplicity = -

aregister request

dexamine casually

ccheckticket

decide reject request

e h

a cexamine casually

dcheckticket

decide

e g

a dexamine casually

ccheckticket

decide

e g

register request

register request

pay compensation

pay compensation

aregister request

b dcheckticket

decide reject request

e h

aregister request

d bcheckticket

decide reject request

e h

a b dcheckticket

decide

e gregister request

pay compensation

examine thoroughly

examine thoroughly

examine thoroughly

(all 21 variants seen in the log)

PAGE 24

Conformance Checking by Playing the Token Game

Joint work with Anne Rozinat

Replaying (1/3)σ1 on N1

PAGE 25

astart

b

c

d

e

g

h

f

end

p3p1

p2 p4

p5

p=0c=0m=0r=0

p=1c=0m=0r=0

astart

b

c

d

e

g

h

f

end

p3p1

p2 p4

p5

p=3c=1m=0r=0

Replaying (2/3)

PAGE 26

Replaying (3/3)

PAGE 27

No problems found!

Replaying (1/3)σ3 on N2

PAGE 28

start endp1 p2 p3p4

p=0c=0m=0r=0

p=1c=0m=0r=0

start p1 p2 p3p4

p=2c=1m=0r=0

end

Replaying (2/3)

PAGE 29

start p1 p2 p3p4

p=3c=2m=1r=0

start endp1 p2 p3p4

p=4c=3m=1r=0

m

m

end

Replaying (3/3)

PAGE 30

Problems encountered when replaying σ3 on N2

• One missing token (of 6 consumed tokens)• One remaining token (of 6 produced tokens)

PAGE 31

Computing fitness at trace level

PAGE 32

Computing fitness at the log level

PAGE 33

Example values

PAGE 34

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

p3p1

p2 p4

N1

p5

Diagnostics

PAGE 35

start register request

examine thoroughly

examine casually

checkticket

decide

pay compensation

reject request

reinitiate request

endp1 p2 p3p4

1391 1391

566 566

1537

1537

1537

1537 461461

930

930146

146

971 971

problem443 tokens remain in place p2,

because d did not occur although the model expected d to happen

problem443 tokens were missing in place p2 during replay, because d happened even though

this was not possible according to the model

+443-443

Diagnostics

PAGE 36

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

p1

p2

p3

p4

p5

1391

problem430 tokens remain in place p1,because c did not happen while the model expected c to happen

1391

971 971

1537

1537 930 930

1537

153715371391

problem146 tokens were missing in

place p2 during replay, because d happened while this was not possible according to the model

problem10 tokens were missing in place p1 during

replay, because c happened while this was not possible according to the model

problem566 tokens were missing in

place p3 during replay, because e happened

while this was not possible according to the model

problem607 tokens remain in place p5,because h did not happen while the model expected h to happen

problem461 of the 1391cases did not

reach place end

+430-10 -566

-461+607

-146

Drilling down

PAGE 37

global conformance

measures

local diagnostics

new event log: starting point for process and data mining techniques

replay

drill down

PAGE 38

Conformance Checking Based on Alignments

Joint work with Arya Adriansyah and Boudewijn van Dongen (also see poster)

From “playing the token game” to optimal alignments …

a b d e ga b d e g

PAGE 39

191 times “abdeg”

Example alignments

a b d e ga b d e g

PAGE 40

abdeg

a b » d e ga » c d e g

a b d e g » » » » »» » » » » a c d e g

Moves in an alignment

PAGE 41

a b » d e ga » c d e g

trace in event log

possible run of model

move in log

move in model move in both

Moves have costs

• Standard cost function:

−c(x,») = 1−c(»,y) = 1−c(x,y) = 0, if x=y−c(x,y) = ∞, if x≠y

PAGE 42

… a …… » …

… » …… a …

… a …… a …

… a …… b …

Optimal alignment (smallest costs)

a b d e ga b d e g

PAGE 43

abdeg

a b » d e ga » c d e g

a b d e g » » » » »» » » » » a c d e g

0

2

10

optimal

Non-fitting trace: abefdeg

PAGE 44

abefdeg

a b » e f d » e ga b d e f d b e g 2

2a b e f d e ga b » » d e g

Any cost structure is possible

PAGE 45

… send-letter(John,2 weeks, $400)

… send-email(Sue,3weeks,$500)

• Similar activities (more similarity implies lower costs).• Resource conformance (done by someone that does

not have the specified role).• Data conformance (path is not possible for this

customer). • Time conformance (missed the legal deadline).• cf. cost/risk-aware BPM (costs = risk).

Fitness

PAGE 46

astart register

request

bexamine thoroughly

cexamine casually

d checkticket

decide

pay compensation

reject request

reinitiate requeste

g

hfend

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N3 : fitness = +, precision = -, generalization = +, simplicity = +

N2 : fitness = -, precision = +, generalization = -, simplicity = +

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

N1 : fitness = +, precision = +, generalization = +, simplicity = +

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N4 : fitness = +, precision = +, generalization = -, simplicity = -

aregister request

dexamine casually

ccheckticket

decide reject request

e h

a cexamine casually

dcheckticket

decide

e g

a dexamine casually

ccheckticket

decide

e g

register request

register request

pay compensation

pay compensation

aregister request

b dcheckticket

decide reject request

e h

aregister request

d bcheckticket

decide reject request

e h

a b dcheckticket

decide

e gregister request

pay compensation

examine thoroughly

examine thoroughly

examine thoroughly

(all 21 variants seen in the log)

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

1.0

0.8

1.0

1.0

Precision

PAGE 47

astart register

request

bexamine thoroughly

cexamine casually

d checkticket

decide

pay compensation

reject request

reinitiate requeste

g

hfend

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N3 : fitness = +, precision = -, generalization = +, simplicity = +

N2 : fitness = -, precision = +, generalization = -, simplicity = +

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

N1 : fitness = +, precision = +, generalization = +, simplicity = +

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N4 : fitness = +, precision = +, generalization = -, simplicity = -

aregister request

dexamine casually

ccheckticket

decide reject request

e h

a cexamine casually

dcheckticket

decide

e g

a dexamine casually

ccheckticket

decide

e g

register request

register request

pay compensation

pay compensation

aregister request

b dcheckticket

decide reject request

e h

aregister request

d bcheckticket

decide reject request

e h

a b dcheckticket

decide

e gregister request

pay compensation

examine thoroughly

examine thoroughly

examine thoroughly

(all 21 variants seen in the log)

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

0.97

1

0.41

1

Advantages of Aligning Log and Model

• Observed behavior is directly related to modeled behavior.

• Highly flexible (any cost structure).• Detailed diagnostics.• After aligning log and model, other quality

dimensions can be investigated (separation of concerns).

• Efficiently implemented in ProM (see work of Arya Adriansyah).

PAGE 48

PAGE 49

Using Declarative Languages

Joint work with Fabrizio Maggi, Michael Westergaard, Maja Pesic, et al.

The Declare language

PAGE 50

non co-existence: activities b and c cannot happen both

response: every occurrence of cshould be eventually followed by h

precedence: every occurrence of c needs to be preceded by a

response

non co-existence

precedencebook car

add extra insurance early

confirm

skip extraInsurance early

add extra insurance late

skip extra insurance late

Basic idea

LTL semantics

Example: "existence response"

• OK:• [ ]• [A,B,C,D,E]• [A,A,A,C,D,E,B,B,B]• [B,B,A,A,C,D,E]• [B,C,D,E]

• NOK• [A]• [A,A,C,D,E]

Example: "response"

• OK:• [ ]• [A,B,C,D,E]• [A,A,A,B,C,D,E]• [B,B,A,A,B,C,D,E]• [B,C,D,E]

• NOK• [A]• [B,B,B,B,A,A]

Example: "precedence"

• OK:• [ ]• [A,B,C,D,E]• [A,A,A,C,D,E,B,B,B]• [A,A,C,D,E]

• NOK• [B]• [B,A,C,D,E]

Discovering Declare models

PAGE 55

Conformance checking of Declare models

PAGE 56

0..1

curse

pray

become holy

non co-existence constraint is violated by the first two cases

(cpcpccpph and pppphcp)

precedence constraint is violated by the third case (hpp)

Tool Support

http://declare.sf.net

PAGE 58

Conclusion

Overview

PAGE 59

Learn More?

PAGE 60

PAGE 61

www.processmining.org

www.win.tue.nl/ieeetfpm/

PAGE 62

IEEE Computer, vol. 43, no. 3, pp. 90-93, Mar. 2010

PAGE 63

More Information

PAGE 64PAGE 64

IEEE Task Force on Process Mining

• ProM Software: prom.sourceforge.net• Process mining: www.processmining.org• ProM 5 series nightly builds: prom.win.tue.nl/tools/prom/nightly5/• ProM 6 series nightly builds: prom.win.tue.nl/tools/prom/nightly/• Converting logs (MXML-based) promimport.sourceforge.net • XES: www.xes-standard.org and www.openxes.org• Papers et al.: vdaalst.com• IEEE Task Force on Process Mining: www.win.tue.nl/ieeetfpm/