declare mining for dummies
TRANSCRIPT
Online version of slides: westergaard.eu/short12157
Declarative ModelsIf bored, we have to go to a Britney concert
We can only be truly happy after going to a Britney concert
We cannot go to both a Britney and an Xtina concert
No explicit flow of control
Good for flexible processes
Declarative Mining
0
1Xtina
3
Britney 2
Britney
Xtina
0
1Britney
2
Happy0 1Bored
Britney
Declarative Mining
0
1Britney
2
Happy
Bored, Britney, Happy
Bored, Beer, Happy
If we accept enough
(say ≥ 50%) traces, we
accept the constraint
Declarative Mining0
1Britney
2
HappyBored, Britney, HappyBored, Beer, Happy
2 parameters (p)
4 event classes (n)
4 • 3 = 12 instantiations
In general n • (n-1) • … • (n - p + 1)
Realistic: n ≥ 30 Sometimes: p = 5
Then:30 • 29 • 28 • 27 • 26 =
17.100.720 instantiations
Classic Solution
If often (say ≥ 50%)
!
!
Then and often
Contraposition
Classic Solution
If not and often
!
!
Then not often
Only try mining constraints for often occurring
parameters
Problem
• XHigh Risk,High Cost
Low Risk,Low Cost
X
Happens Rarely
Happens Often
OutlineSymmetry reduction
Prefix sharing
Parallelism
Superscalarity
Support
Use cases
.
PerformanceOld MINERful Naïve Sym Par Super
One 129m
Easy 550m 26 s 19 s 16 s 12 s 2 s
Almost All
117 s 35 s 27 s 7 s
All 215 m 154 s 108 s 56 s
Real data: 13.087 traces
n = 24 24 - 5.100.480
instances
p
2
≤ 2
≤ 3
≤ 5
7.224.024 checks
Foundation
Convert log to efficient structure
Instead of instantiating constraints, use a single parametrized instance
Bored, Britney, HappyBored, Beer, Happy
1 2 3
1 4 3
0
1a
2
b
1 a
2 b
3,4
1 a
3 b
2,4
2 a
3 b
1,4
PerformanceOld MINERful Naïve Sym Par Super
One 129m
Easy 550m 26 s 19 s 16 s 12 s 2 s
Almost All
117 s 35 s 27 s 7 s
All 215 m 154 s 108 s 56 s
7.224.024 checks
158.928.528 checks
p
2
≤ 2
≤ 3
≤ 567.749.981.760
checks
Symmetry
not co-existence(A, B) = not co-existence(B, A)
Symmetry group: set S = { A1, A2, …, An } so that any Ai ∈ S can be swapped for any Aj ∈ S
SymmetryNormally, a constraint hasn • (n - 1) • … • (n - p + 1) instantiations
A symmetry group S means that |S|! of those are the same
If |S| = p, the constraint “only” hasn • (n - 1) • … • (n - p + 1) / p! instantiations
This number is also known as ( )np
SymmetryNormally, a constraint hasn • (n - 1) • … • (n - p + 1) instantiations
A symmetry group S means that |S|! of those are the same
If |S| = p, the constraint “only” hasn • (n - 1) • … • (n - p + 1) / p! instantiations
This number is also known as ( )np
Choice 1 of 3: 12.144 instances
158.928.528 checks
Choice 1 of 3: 3! = 6
Choice 1 of 3: 2.024 instances
26.488.088 checks
Choice 1 of 5: 5.100.480 instances
66.749.981.760 checks
Choice 1 of 5: 5! = 120
Choice 1 of 5: 42.504 instances
556.249.848 checks
Symmetry?
=0
1Xtina
3
Britney 2
Britney
Xtina 0
1Britney
3
Xtina 2
Xtina
Britney
0
1Xtina
3
Britney 2
Britney
Xtina 0
1Britney
3
Xtina 2
Xtina
Britney
Symmetry
=
Here we mean whether L(A1) = L(A2)
We check this by checking that L(A1) \ L(A2) = L(A1) ∩ L(A2)c =
L(A1) • L(A2)c = Ø and L(A2) • L(A1)c = Ø
PerformanceOld MINERful Naïve Sym Par Super
One 129m
Easy 550m 26 s 19 s 16 s 12 s 2 s
Almost All
117 s 35 s 27 s 7 s
All 215 m 154 s 108 s 56 s
p
2
≤ 2
≤ 3
≤ 5
7.224.024 3.612.012
checks
66.749.981.760 556.249.848
checks 158.928.528 26.488.088 checks
Prefix Sharing
0
1Britney
2
Happy
Bored, Britney, Happy
Bored, Beer, HappyWe replay the
“Bored” transition twice…
Prefix Sharing
0
1Britney
2
Happy
Bored, Britney, Happy
Beer, Happy
0 1 1
0 2
PerformanceOld MINERful Naïve Sym Prefix Super
One 129m
Easy 550m 26 s 19 s 16 s 43 s 2 s
Almost All
117 s 35 s 112 s 7 s
All 215 m 154 s 437 s 56 s
While we do get sharing (factor 4), checking becomes
more expensive
ParallelizationMine each constraint
on each core
Split log and mine each fragment on each core
Parallelization
||
Parallelization
||
Parallelization
Mine constraints in parallel Mine traces in parallel
Good for many constraints Good for many traces
Assumes constraints are similar Assumes traces are similar
Requires models are composable Requires results are composable
|| ||
True for our approach as we just count matching traces Better
PerformanceOld MINERful Naïve Sym Par Super
One 129m
Easy 550m 26 s 19 s 16 s 12 s 2 s
Almost All
117 s 35 s 27 s 7 s
All 215 m 154 s 108 s 56 s
Just 2 computation
threads
Superscalar Mining
Superscalar CPUs (90s): improve speed by executing multiple instructions at once
Superscalar mining: improve speed by mining multiple constraints at once
Superscalar Mining
0
1Britney
2
Happy0 1BoredBritney
B H
Superscalar Mining
0
1Britney
2
Happy0 1BoredBritney
BH
H•
0BH
1bH
Bored
2BH
Britney
4B
Happy
Britney
5bHappy
3bHBored
Britney
Bored
Britney
Superscalar Mining
0BH
1bH
Bored
2BH
Britney
4B
Happy
Britney
5bHappy
3bHBored
Britney
Bored
Britney
Bored, Britney, Happy
Superscalar Mining
We do not want to split up symmetry groups (speed)
We do not want to make the automaton too large (memory)
Superscalar Mining
Given two sets of constraints, A and B
A can be merged into B if
We do not break any symmetry groups doing so
All constraints in A are special cases of all constraints in B
Superscalar Mining
Given two sets of constraints, A and B
A can be merged into B if
We do not break any symmetry groups doing so
All constraints in A are special cases of all constraints in B
for all SA of A there is a SB of B such that SA ⊆ SB, and
for all S of A withS ∩ SB ≠ ø, we have S = SA
for all CA of A there is a CB of B so that CA ⇒ CB
Note: This is not
symmetrical! It captures that
it is ok to add something
easy to something hard, but
not the other way around
Superscalar Mining[[A, B, C, D, E]]
exclusive choice[[A, B]]
not co-existence[[A, B]]
choice 1 of 2[[A, B]]
exclusive choice 2 of 3[[A, B, C]]
choice 2 of 3[[A, B, C]]
absence[[A]]
absence2[[A]]
exclusive choice 1 of 3[[A, B, C]]
choice 1 of 3[[A, B, C]]
init[[A]]
strong init[[A]]
existence[[A]]
existence3[[A]]
existence2[[A]]
absence3[[A]]
exactly1[[A]]
choice 1 of 5[[A, B, C, D, E]]
exactly2[[A]]
choice 1 of 4[[A, B, C, D]]
[[A, B, C, D, E]]
exclusive choice[[A, B]]
not co-existence[[A, B]]
choice 1 of 2[[A, B]]
exclusive choice 2 of 3[[A, B, C]]
choice 2 of 3[[A, B, C]]
absence[[A]]
absence2[[A]]
exclusive choice 1 of 3[[A, B, C]]
choice 1 of 3[[A, B, C]]
init[[A]]
strong init[[A]]
existence[[A]]
existence3[[A]]
existence2[[A]]
absence3[[A]]
exactly1[[A]]
exactly2[[A]]
choice 1 of 4[[A, B, C, D]]
Superscalar Mining
for all SA of A there is a SB of B such that SA ⊆ SB, and
for all S of A withS ∩ SB ≠ ø, we have S = SA
for all CA of A there is a CB of B so that CA ⇒ CB
Superscalar Mining[[A], [B]]
alternate[[A], [B]]
succession[[A], [B]]
precedence[[A], [B]]
response[[A], [B]]
alternate response[[A], [B]]
chain response[[A], [B]]
not chain succession[[A], [B]]
absence[[A]]
not succession[[A], [B]]
absence2[[A]]
init[[A]]
strong init[[A]]
existence[[A]]
existence3[[A]]
existence2[[A]]
responded existence[[A], [B]]
alternate succession[[A], [B]]
chain succession[[A], [B]]
absence3[[A]]
exactly1[[A]]
exactly2[[A]]
alternate precedence[[A], [B]]
[[A], [B]]
alternate[[A], [B]]
succession[[A], [B]]
precedence[[A], [B]]
response[[A], [B]]
alternate response[[A], [B]]
chain response[[A], [B]]
not chain succession[[A], [B]]
absence[[A]]
not succession[[A], [B]]
absence2[[A]]
init[[A]]
strong init[[A]]
existence[[A]]
existence3[[A]]
existence2[[A]]
responded existence[[A], [B]]
alternate succession[[A], [B]]
chain succession[[A], [B]]
absence3[[A]]
exactly1[[A]]
exactly2[[A]]
alternate precedence[[A], [B]]
Superscalar Mining
for all SA of A there is a SB of B such that SA ⊆ SB, and
for all S of A with S ∩ SB ≠ ø, we have S = SA
for all CA of A there is a CB of B so that CA ⇒ CB
Superscalar Mining
We can check 34 Declare constraints using just 3 checks:
20 “choices”: 53 states, SG {{A, B, C, D, E}}
22 “orders”: 112 states, SG {{A}, {B}}
2 “outliers”: 7 states, SG {{A}, {B}}
PerformanceOld MINERful Naïve Sym Par Super
One 129m
Easy 550m 26 s 19 s 16 s 12 s 2 s
Almost All
117 s 35 s 27 s 7 s
All 215 m 154 s 108 s 56 s
Using 8 (slower) CPUs, we get below
30 seconds
PerformanceOld MINERful Naïve Sym Par Super
One 129m
Easy 550m 26 s 19 s 16 s 12 s 2 s
Almost All
117 s 35 s 27 s 7 s
All 215 m 154 s 108 s 56 s
If the old ProM miner has as good progression as our naïve miner, it would
take just under 259 days to mine all constraints; if it’s as good as our best, it
would take more than 10 days
Support
Classically: Detect
this using complex
formula operations
SupportClassically: Detect this using complex formula operations The idea of the formula
operations is to prevent certain sub-behavior and
see if the constraint becomes trivially true
Declare doesn’t have non-trivial sub-
behavior, so this just corresponds to
preventing a task
Support
&
&
&
=
=
=
true?
true?
true?
))
Support
&
&
&
=
=
=
Ø?
Ø?
Ø?
c
c
c
(((
)
))
Support
&
&
&
=
=
=
Ø
Ø
Ø
c
c
c
(((
)
)
Support
&
| = true( |
)
Support
&
| = ?( |
c
This expression is our measurement of support for the
not co-existence constraint
)
Support
&
| = ?(c
This expression is our measurement of support for the
precedence constraint
)
Support
&
| = ?(c
This expression is our measurement of support for the
response constraint
The reason for having exclusions of more than one
constraint is disjunctive splits
?
Support
)&
| = true( |
Conjunction with the support expression is
always true ⇒ positive support
&
)( = false
Conjunction with the support expression is always false ⇒
negative support
Support
)&
| = true( |
&
)( = false
“Holds against all odds”
“Never had a chance to hold”
This is by the way very related to
apriori reduction
SupportOur notion of support coincides with previous
Previous notion and ours induce a notion of confidence that is identical when previous is defined
Our is computed automatically and works for all constraints
ComparisonProM Miner MINERful Unconstrained
Miner
Speed Slow!! Fast Faster
Extensibility Good None Better
Confidence choices
Good Good Medium
Confidence generality
None None Full
ComparisonProM Miner MINERful Unconstrained
Miner
Speed Slow!! Fast Faster
Extensibility Good None Better
Confidence choices
Good Good Medium
Confidence generality
None None Full
The reason is that some choices of confidence do
not make sense in general
Use CaseX X
A
B+ +
D
E
C
Assuming this block structure…
Events of block A and block B can never happen together
Events of block B can never happen after events of block C
not co-existence(A, B) not succession(C, B)
Use Cases
X X
A
B+ +
D
E
C
Output: log with 8986 traces and 20 event classes
Use Case
Use Case
Use Case 10.132 constraint instances
Use Case Sort by name and parameters (to group similar constraints)
Use CaseAggregate results fornot co-existence and
not succession
Use Case We clearly recognize the
blocks
…except for parallelism
Use Case
A xor B, C, D, E
X X
A
B+ +
D
E
C
B → C, D, E
C → D, E
Use Case
A xor B, C, D, E
X X
A
B+ +
D
E
C
B → C, D, E
C → D, E
What about loops?
Use Case
Support and location in the hierarchy
is computed automatically
Use Caseweak relations
a or b
bottoma | b
a -> b a <- b
a xor b
!a or b
top!a | b
!a -> b !a <- b
!a xor b
Use Caseweak relations
a or b
choice 1 of 2 choice 1 of 3
bottoma | b
a -> b
!loop
a <- b
not successionnot chain succession
a xor b
not co-existence
!a or b
!choice 1 of 2 !choice 2 of 3!exactly1
!exactly2
chain succession !strong init
top!a | b
last
loop
!precedence
!not chain succession
!responded existence!alternate
exactly1
!choice 1 of 5
exactly2
strong init
exclusive choice 1 of 3
exclusive choice 2 of 3
!a -> b
!a xor b
!a <- b
!not succession
!not co-existence choice 2 of 3
responded existence
!chain succession
!last
precedence!exclusive choice 1 of 3
!exclusive choice 2 of 3
alternate
choice 1 of 5
!choice 1 of 3
Use Case
weak relations
a or b
choice 1 of 2 choice 1 of 3
bottoma | b
a -> b
!loop
a <- b
not successionnot chain succession
a xor b
not co-existence
!a or b
!choice 1 of 2 !choice 2 of 3!exactly1
!exactly2
chain succession !strong init
top!a | b
last
loop
!precedence
!not chain succession
!responded existence!alternate
exactly1
!choice 1 of 5
exactly2
strong init
exclusive choice 1 of 3
exclusive choice 2 of 3
!a -> b
!a xor b
!a <- b
!not succession
!not co-existence choice 2 of 3
responded existence
!chain succession
!last
precedence!exclusive choice 1 of 3
!exclusive choice 2 of 3
alternate
choice 1 of 5
!choice 1 of 3
Use Caseweak relations
a or b
choice 1 of 2 choice 1 of 3
bottoma | b
a -> b
!loop
a <- b
not successionnot chain succession
a xor b
not co-existence
!a or b
!choice 1 of 2 !choice 2 of 3!exactly1
!exactly2
chain succession !strong init
top!a | b
last
loop
!precedence
!not chain succession
!responded existence!alternate
exactly1
!choice 1 of 5
exactly2
strong init
exclusive choice 1 of 3
exclusive choice 2 of 3
!a -> b
!a xor b
!a <- b
!not succession
!not co-existence choice 2 of 3
responded existence
!chain succession
!last
precedence!exclusive choice 1 of 3
!exclusive choice 2 of 3
alternate
choice 1 of 5
!choice 1 of 3
Use Caseall relations
A
a or b
!a xor B
existence
B
!A xor b
choice 1 of 3 choice 1 of 2
bottoma | b
a -> b
!A <- B
!loop
a -> B
response
A -> b
precedence
A -> B
!a <- b
succession
a <- b
!A -> B
not chain succession
not succession
a <- B
!precedence
A <- b
!response
A <- B
!a -> b
a xor b
not co-existence
a xor B
!A
A xor b
!B
!responded existence
!A -> b!exactly2 chain response!strong init !exactly1!existence !a -> Bchain precedence
!a or b
chain succession!choice 2 of 3 !choice 1 of 2
top!a | b
exactly2
loop
exclusive choice 2 of 3 !not chain succession
!alternate
last
strong init
!choice 1 of 5
exactly1
exclusive choice 1 of 3
!a xor b
!a <- B!A <- b
!not succession
choice 2 of 3 !not co-existence
responded existence
!chain succession
!chain precedence
alternate
!chain response!succession
!last!exclusive choice 2 of 3
!exclusive choice 1 of 3
choice 1 of 5
!choice 1 of 3
Use Case alpha relations
bottoma | b
a -> b a <- b
topa # b
Use Cases
Use CasesA B C D E
A [-‐>] [-‐>][?(<-‐), ?(#),?(-‐>), ?(||)] [-‐>]
B [<-‐] [||] [-‐>][?(<-‐), ?(#),?(-‐>), ?(||)]
C [<-‐] [||] [-‐>][?(<-‐), ?(#),?(-‐>), ?(||)]
D [?(<-‐), ?(#),?(-‐>), ?(||)] [<-‐] [<-‐] [<-‐]
E [<-‐][?(<-‐), ?(#),?(-‐>), ?(||)]
[?(<-‐), ?(#),?(-‐>), ?(||)] [-‐>]
We just implemented the
Alpha algorithm!
Use Cases
Use Cases
ConclusionEfficient implementation
Tricks: symmetry, parallelism, superscalarity
General concept of supportAnything other
miners can produce,
we can produce…and we are fully
extensible!
Future Work
!
Better recognition of modules
Use of model-checking techniques
Directed model-checking for LTL
ConclusionEfficient implementation
Tricks: symmetry, parallelism, superscalarity
General concept of supportAnything other
miners can produce,
we can produce…and we are fully
extensible!