decomposing data-aware conformance checking massimiliano de leoni, jorge munoz-gama, josep carmona,...
TRANSCRIPT
Decomposing Data-aware Conformance Checking
Massimiliano de Leoni, Jorge Munoz-Gama, Josep Carmona, Wil van der Aalst
PAGE 1
(a; {A = 3000;R = Michael; E = Pete}); (b; {V = OK;E = Sue});(c; {I = 530;D = OK;E = Sue});(f; {E = Pete});
Example: A Credit Institute
PAGE 2
For such a credit amount, should be interest <450
«Sue» not authorized to
perform b: is not Assistant
Activity h hasn’t been executed: D
cannot be OK
(a; {A = 3000;R = Michael; E = Pete}); (b; {V = OK;E = Pete});(c; {I = 530;D = OK;E = Sue}); (d, {I = 599; D = NOK; E = Sue});(f; {E = Pete});
(a; {A = 5001;R = Michael; E = Pete}); (b; {V = OK;E = Pete});(c; {I = 530;D = NOK;E = Sue}); (f; {E = Pete});
Activity d should have occurred, since amount<5000
Petri Net with Data : Variables and Read/Write Operations
PAGE 3
n1 n2
n3 n5 n6
Credit Request(a)
Register Negative Verification (d)
Inform Customers(e)
Renegotiate (f)
Open Credit Loan (h)
Assessment (c)Interests
Amount
Verification
Decision
Register Loan Rejection (g)
Register Loan Rejection (g)
n4n4
Verify (b)Verify (b)
VariablesWrite Operations
Read Operations
Binding
• A binding is a triplet (t,r,w) where• t is the transition that fires• r: V U is the variables that are read along with the
values− dom(r) is the set of read variables− r(v) is the value read for variable v
• w: V U is the variables that are written along with the values− dom(w) is the set of read variables− w(v) is the value read for variable v
PAGE 4
A Sequence of bindings
PAGE 5
n1 n2
n3 n5 n6
Credit Request(a)
Register Negative Verification (d)
Inform Customers(e)
Renegotiate (f)
Open Credit Loan (h)
Assessment (c)Interests
Amount
Verification
Decision
Register Loan Rejection (g)
Register Loan Rejection (g)
n4n4
Verify (b)Verify (b)
Necessary condition for a binding (t,r,w): dom(r) and dom(w) coincides with the expected read and write operations.
Each transition is associated with all valid bindings
PAGE 6
n1 n2
n3 n5 n6
Credit Request(a)
Register Negative Verification (d)
Inform Customers(e)
Renegotiate (f)
Open Credit Loan (h)
Assessment (c)Interests
Amount
Verification
Decision
Register Loan Rejection (g)
Register Loan Rejection (g)
n4n4
Verify (b)Verify (b)
Transition Guard
Credit Request --
Verify 0.1 * r(A) < w(I) < 0.2 * r(A)
Assessment r(V) = true
Register Negative Verification r(V) = false AND w(D) = false
Inform Requester --
Register Loan Rejection r(D) = false
Open Credit r(D) = true
Alignments
PAGE 7
Move in both without incorrect write operations
Move in both withincorrect write operations
Move in log
Move in process
n1 n2
n3 n5 n6
Credit Request(a)
Register Negative Verification (d)
Inform Customers(e)
Renegotiate (f)
Open Credit Loan (h)
Assessment (c)Interests
Amount
Verification
Decision
Register Loan Rejection (g)
Register Loan Rejection (g)
n4n4
Verify (b)Verify (b)
Cost of alignments
• Each move is associated with a cost• Cost of alignment is the sum of the costs of its moves
: Cost of reading/writing a wrong value
<w> <z>
3 2 3 2
3 2 3 2
3 3
<x> <y>
1 2 1 2
2 2
2 2
3 2
3 2
3 2
: Cost of “move on log”
: Cost of not writing or reading a variable
: Cost of “move on model”
Cost of alignments: some examples
8 10
An optimal alignment: an
alignment with the lowest cost
Process: S – A – C – E – A – B - F Process: S {z=1, y=0} – A{x=10} – C{y=11} – E – A{x=3} – B{y=13} - F
Finding optimal alignments: Approach 1
1. Computing the control-flow alignment using existing techniques (the «Arya» technique)
PAGE 10
Log: S {z=10,y=0} – A{x=1} – C{y=11} – E – A{x=3} – B{y=13} -
2. Enriching the alignment with the data operations.
• The alignment is enriched, thus minimizing the cost of the alignment
• Naturally formulated as an Mixed Integer Linear Program
M. de Leoni, W.M.P. van der Aalst: Aligning event logs and process modelsfor multi-perspective conformance checking: An approach based on integer linearprogramming. Proceedings of BPM 2013
Finding optimal alignments: Approach 2
PAGE 11
Process: a b
Log: (a; {A = 3000;R = Michael; E = Pete}); – (b; {V = NOK; E = Sue});
Process: (a; {A = 3000;R = Michael; E = Pete}); – (b; {V = NOK; E = Sue});
F. Mannhardt, M. de Leoni, H. Reijers, W.M.P. van der Aalst: Balanced Multi-Perspective Checking of Process Conformance. Computing Journal, Springer (under review)
Finding an optimal alignments: complexity• Finding an optimal alignments is exponential on the size
of the model, i.e. the number of activities and data variables.
• IDEA: Divide-and-conquer approach • Petri Net with Data is decomposed into smaller fragments
that are checked separetely. • If the decomposition is valid
− Any trace is fitting the entire model if and only if it fits all smaller fragments.
PAGE 12
t1 t2 t3 t4 t6t5
t1 t2 t3
t3 t4
t6t5
Valid decomposition without data
• The following can only appear in precisely one fragment:1. Places
2. Invisible transitions
3. Visible transitions with the same label (name)
4. Arcs • Visible transitions with unique label may appear in
multiple fragments• Each variable appears in precisely one fragment• Each transition shared among fragments may read/write
different variables• The union of the fragments is the entire model
PAGE 13W.M.P. van der Aalst: Decomposing petri nets for process mining: A genericapproach. Distributed and Parallel Databases 31(4) (2013)
Valid decomposition with data
• The following can only appear in precisely one fragment:1. Places
2. Invisible transitions
3. Visible transitions with the same label (name)
4. Arcs • Visible transitions with unique label may appear in
multiple fragments• Each variable appears in precisely one fragment• Each transition shared among fragments may read/write
different variables• The union of the fragments is the entire model
PAGE 14
Instantation of Valid Decompositions
• Different strategies are possible.• We propose two strategies extending what exists for the
data-unaware case:• Maximal Decomposition• SESE-based decomposition
PAGE 15
n1 n2
n3 n5 n6
Credit RequestRegister Negative
Verification
Inform Customers
Renegotiate
Open Credit Loan
AssessmentInterests
Amount
Verification
Decision
Register Loan Rejection Register Loan Rejection
n4n4
VerifyVerify
Maximal Decomposition
• Construction the smallest components that satisfy the Valid Decomposition Definition
• Variables and Places are mutually exclusive
n3Verify
Register Negative Verification
Assessment
n2Credit Request
Verify
Renegotiate
n1Credit Request
Register Negative Verification
Inform Customers
Assessment
n4n4n5
Inform Customers
Renegotiate
Open Credit Loan
Register Loan Rejection Register Loan Rejection
n6
Open Credit Loan
Register Loan Rejection Register Loan Rejection
Verify
Register Negative Verification Open Credit Loan
Assessment
Verification Decision
Register Loan Rejection Register Loan Rejection
Credit Request
Verify
Renegotiate
Interests
Amount
SESE-based Algorithm
PAGE 17
n1 n2 n3 n5 n6Credit Request Verify
Register Negative Verification
Inform Customers
Renegotiate
Open Credit Loan
AssessmentRegister Loan Rejection Register Loan Rejection
n4n4
a bc
de
g
f
h
i
j
lk
m n
po
S1S2
S8S3
S4
S9
S5
S10
S7
S6
S1
S8 S2
S9 S10 S4 S3
S5
S6 S7
a b
m n o p k l
c d i j
e f g h
a) Petri Net
b) Workflow graph and SESEs
c) RPST
Example of the SESE-based Algorithm(k = 2)
PAGE 18
n3Verify
Register Negative Verification
Assessment
n2Credit Request
Verify
Renegotiate
n1Credit Request
Register Negative Verification
Inform Customers
Assessment
n4n4
n5
Renegotiate
Open Credit Loan
Register Loan Rejection Register Loan Rejection
n6
Open Credit Loan
Register Loan Rejection Register Loan Rejection
Verify
Register Negative Verification Open Credit Loan
Assessment
Verification Decision
Register Loan Rejection Register Loan Rejection
Credit Request
Verify
Renegotiate
Interests
Amount
Implementation
Available in the package DataConformanceChecker
PAGE 19
Experiments
• Generating different event logs with 5000 traces with a different average trace length • This ensured by enforcing a larger number of credit renegotiations
• 20% of the transition firings are so as to not satisfy the guards
PAGE 20
n1 n2
n3 n5 n6
Credit RequestRegister Negative
Verification
Inform Customers
Renegotiate
Open Credit Loan
AssessmentInterests
Amount
Verification
Decision
Register Loan Rejection Register Loan Rejection
n4n4
VerifyVerify
Results: an exponential reduction of the computation time
PAGE 21
5 10 15 20 25 3010000
100000
1000000
10000000
No Decomposition
SESE-based decomposition (k=2)
Average number of events per event-log trace
Com
puta
tion
Tim
e (in
sec
onds
)
Projection on the model
• For each transition t:• n = number of fragments in which t occurs• is the i-th fragment in which t occurs.
PAGE 22
𝑓𝑖𝑡𝑇𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛 (𝑡 )=1−∑𝑖=1
𝑛 ¿ 𝑐𝑜𝑟𝑟𝑒𝑐𝑡(𝑡 ,𝐷𝑃𝑁 (𝑡 )𝑖)¿ 𝑡𝑜𝑡𝑎𝑙(𝑡 ,𝐷𝑃𝑁 (𝑡 )𝑖)
𝑛
#correct(t,DPN) = number of moves in both without
incorrect write operations for t in the alignments between each log trace and DPN
#total(t,DPN) = number of moves for t in the alignments of each log trace and DPN
Projection on the model based on decomposition is an approximation!
PAGE 23
t1 t2 t3 t4 t6t5
t1 t2 t3
t3 t4
t6t5
No decomposition Decomposition
Move in both without incorrect write operations for t
Move in both without incorrect write operations for t in all fragments containing t
• Move in both with incorrect write operations for t
• Move in log• Move in move
The same move for t in at least one of fragments containing t
Projection on the model (without decomposition)
PAGE 24
With decomposition
Without decomposition
Conclusion
• Finding an alignment is exponential in the model size • To speed the computation:
1. Decompose the model in submodels
2. Alignment each trace with each submodel• The decomposition needs to be valid:
Any trace is fitting the entire model if and only if it fits all smaller fragments.
• A more extensive evaluation is needed • Using real processes• Synthetic data referring to models with dozens of
transitions
PAGE 25