business process drift: detection and...
TRANSCRIPT
Business Process Drift:Detection and Characterization
Alireza Ostovar
BSc. (Software engineering), MSc. (Software engineering)
A dissertation submitted for the degree of
IF49 Doctor of Philosophy
Principal Supervisor:
Prof. Marcello La Rosa (The University of Melbourne)
Associate Supervisors:
Prof. Arthur ter Hofstede (Queensland University of Technology),
Dr. Abderrahmane Maaradji (Algiers University 1)
Business Process Management Discipline
Information Systems School
Science and Engineering Faculty
Queensland University of Technology (QUT)
GPO Box 2434, Brisbane QLD 4001, Australia
2019
Keywords
Business process management, process mining, event log, event stream, process drift,
concept drift, data mining.
i
Abstract
Business processes tend to evolve in response to changes in the business environment
in which they operate. For example, these can be changes in regulations, competi-
tion, supply, demand and technological capabilities as well as internal changes in re-
source capacity or workload, or simply changes due to seasonal factors. Some process
changes are planned ahead and documented, while others may occur unexpectedly and
remain unnoticed. For example, this may be the case of changes induced by the ini-
tiative of individual process workers in order to adjust to variations in workload or in
resource capacity, changes engendered by replacement of human resources, changes in
the frequency of certain types of (problematic) cases, or exceptions that in some cases
lead to new workarounds that over time solidify into norms. Over time, undocumented
process changes like those described above may affect process performance, and more
generally hamper process improvement initiatives.
The objective of this research is to develop a set of methods for the early detection
and characterization of process drifts, i.e. statistically significant changes in the be-
havior of business processes, as recorded in event streams. The main contributions of
this research are: i) an automated method for detecting process drifts at real-time from
event streams; ii) an automated method for characterizing process drifts at the level of
individual activities from event streams; and iii) an automated method for characteriz-
ing process drifts at the level of fragments from event streams.
Early detection and subsequent characterization of process drifts allow organiza-
tions to take prompt remedial actions and avoid potential repercussions resulting from
unplanned changes in the behavior of their processes. The methods devised in this
research have been implemented as a plug-in for the state-of-the-art, open-source pro-
cess analytics platform Apromore. Using this implementation, the proposed methods
have been extensively evaluated by conducting experiments with artificial and real-life
data sets.
iii
Contents
Keywords i
Abstract iii
List of Figures ix
List of Tables xiii
List of Abbreviations xv
Statement of Original Authorship xv
Acknowledgments xvii
1 Introduction 11.1 Problem Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Solution Criteria . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Research Benefits . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Research Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Research Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Research Publications . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Background 152.1 Business Process Management . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Business Process Model and Notation (BPMN) . . . . . . . . 16
2.1.2 Petri nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
v
CONTENTS
2.2 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Concept Drift Detection . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Taxonomy of Concept Drift Detection Mechanisms . . . . . . 20
2.2.3 Concept Drift Characterization . . . . . . . . . . . . . . . . . 22
2.3 Process Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Event Log and Event Stream . . . . . . . . . . . . . . . . . . 24
2.3.2 Business Process Drift . . . . . . . . . . . . . . . . . . . . . 25
3 Process Drift Detection 293.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Drift Detection Method . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1 Intra-trace vs Inter-trace . . . . . . . . . . . . . . . . . . . . 33
3.2.2 α+ Relations . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.3 Statistical Testing over Event Streams . . . . . . . . . . . . . 34
3.2.4 Adaptive Window . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.5 Noise handling . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Tool Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Evaluation on Artificial Logs . . . . . . . . . . . . . . . . . . . . . . 38
3.4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.2 Execution Times . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.3 Impact of Oscillation Filter . . . . . . . . . . . . . . . . . . . 43
3.4.4 Inter-drift Distance . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.5 Comparison with Baseline per Process Change Pattern . . . . 44
3.4.6 Comparison with Baseline over Different Log Variability Rates 45
3.5 Evaluation on Real-life Log . . . . . . . . . . . . . . . . . . . . . . . 46
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4 Process Drift Characterization at Activity Level 494.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Drift Characterization Method . . . . . . . . . . . . . . . . . . . . . 51
4.2.1 Preprocessing: Data Points Extraction . . . . . . . . . . . . . 52
4.2.2 Stage 1: Relevant Binary Relations Retrieval and Ordering . . 54
4.2.3 Stage 2: Change Templates Identification . . . . . . . . . . . 55
4.3 Tool Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Evaluation on Artificial Logs . . . . . . . . . . . . . . . . . . . . . . 62
4.4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE VI
CONTENTS
4.4.2 Impact of Characterization Delay on Relations Ordering . . . 65
4.4.3 Impact of Relation Filtering on Characterization Accuracy . . 66
4.4.4 Comparison with Baseline . . . . . . . . . . . . . . . . . . . 67
4.5 Evaluation on Real-life Log . . . . . . . . . . . . . . . . . . . . . . . 68
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5 Process Drift Characterization at Fragment Level 715.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.3 Partial Traces and Process Tree Discovery . . . . . . . . . . . . . . . 79
5.3.1 Detecting Partial Traces . . . . . . . . . . . . . . . . . . . . 80
5.3.2 Discovering Process Models from Partial Traces . . . . . . . 81
5.4 Process Tree Transformation . . . . . . . . . . . . . . . . . . . . . . 83
5.4.1 Process Tree Edit Operations . . . . . . . . . . . . . . . . . . 84
5.4.2 Finding Process Tree Mappings & Lower Bounding Function 101
5.5 Construct Drift Characterization Statements . . . . . . . . . . . . . . 111
5.5.1 Simple Change Patterns . . . . . . . . . . . . . . . . . . . . 112
5.5.2 Compound Change Patterns . . . . . . . . . . . . . . . . . . 113
5.5.3 Nested Changes . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5.4 Unsupported patterns . . . . . . . . . . . . . . . . . . . . . . 119
5.6 Tool Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.7 Evaluation on Artificial Logs . . . . . . . . . . . . . . . . . . . . . . 122
5.7.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.7.2 Accuracy of Drift Characterization: Fragment-based vs Activity-
based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.7.3 Verbalization Conciseness: Fragment-based vs Activity-based 128
5.7.4 Verbalization Conciseness: Exhaustive vs Greedy . . . . . . . 130
5.7.5 Time Perfromance . . . . . . . . . . . . . . . . . . . . . . . 132
5.8 Evaluation on Real-life Logs . . . . . . . . . . . . . . . . . . . . . . 134
5.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6 Conclusion 143
A. Notation 149
Bibliography 151
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE VII
List of Figures
2.1 BPM lifecycle [32]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Subset of core BPMN elements. . . . . . . . . . . . . . . . . . . . . 17
2.3 Core Petri net elements. . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Mapping of activities, events and gateways to Petri nets. . . . . . . . . 18
2.5 Quality metrics for process discovery algorithms [15]. . . . . . . . . . 24
2.6 Visual example of a small portion of an event stream. Each square box
represents an event. Case ids are color-coded (i.e. each case id has a
unique background color) and labels in boxes indicate activity labels.
The top row of events represents the entire event stream portion, the
remaining rows show the individual cases constituting the stream. . . 25
2.7 Example of a directly follows graph. . . . . . . . . . . . . . . . . . . 26
2.8 Different classes of drifts. Y-axes indicate process variants and blue
rectangles represent process instances. . . . . . . . . . . . . . . . . . 28
3.1 Drift detection using ProDrift plug-in within Apromore. . . . . . . . . 39
3.2 Artificial process model created in CPN tools, used as a base model to
simulate the artificial event logs. . . . . . . . . . . . . . . . . . . . . 41
3.3 F-score and mean delay using different oscillation filter values. . . . . 43
3.4 F-score and mean delay using different inter-drift distances. . . . . . . 43
3.5 F-score and mean delay per change pattern, obtained with our method
vs. [77]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 F-score and mean delay per log variability, obtained with our method
vs. [77]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.7 P-value in our method (left) and in the baseline (right) for the BPIC
2011 log. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.8 Number of events (left) and active cases per month (right) in the BPIC
2011 log. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
ix
LIST OF FIGURES
4.1 Overview of our method for process drift characterization. . . . . . . 52
4.2 From drift detection to drift characterization. . . . . . . . . . . . . . . 53
4.3 Parallelize activities template (T pl) . . . . . . . . . . . . . . . . . . . 57
4.4 Remove activity template (T sre) . . . . . . . . . . . . . . . . . . . . 57
4.5 Drift characterization at activity level using the ProDrift plug-in in
Apromore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.6 Impact of characterization delay on relevant relations retrieval and or-
dering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.7 Impact of relation filtering on characterization accuracy . . . . . . . . 67
4.8 F-score per change template, obtained with our method vs. [115]. . . 68
4.9 Identified template for Drift 1 in BPIC 2011 log. . . . . . . . . . . . . 69
5.1 Overview of our method for process drift characterization. . . . . . . 79
5.2 Example of partial traces. In the window, some traces are observed
partially, as they start and/or end outside of the window. In our exam-
ple, the first and the last trace are only partially observed. . . . . . . . 80
5.3 Two directly follows graphs for L1. . . . . . . . . . . . . . . . . . . . 82
5.4 Examples of process tree edit operations. . . . . . . . . . . . . . . . . 89
5.5 Sample mapping between process trees P and P′. . . . . . . . . . . . 92
5.6 Examples of deleted and inserted fragments in a mapping between
process trees P and P′. Fragment 1 is a maximal deleted fragment,
whereas Fragment 2 is a maximal inserted fragment. . . . . . . . . . 95
5.7 Sample auxiliary operator node, i.e. the ∧-node 2 in P′, and sample
auxiliary τ-node, i.e. the τ-node 5 in P, in a mapping between process
trees P and P′. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.8 Examples of trivial operator nodes in mappings. . . . . . . . . . . . . 97
5.9 Sample mapping that satisfies condition 1. . . . . . . . . . . . . . . . 99
5.10 Sample invalid mapping that does not satisfy condition 2. . . . . . . . 99
5.11 Sample mapping that satisfies condition 3. . . . . . . . . . . . . . . . 100
5.12 Process trees P and P′ (a) and their mapping search tree (b) in Exam-
ple 22. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.13 Process trees P and P′ in Example 23. . . . . . . . . . . . . . . . . . 106
5.14 The running example of the A* algorithm for the process trees P and
P′ in Figure 5.13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.15 Examples of transforming a process tree P into a process tree P′ by the
application of simple changes. . . . . . . . . . . . . . . . . . . . . . 113
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE X
LIST OF FIGURES
5.16 Examples of transforming a process tree P into a process tree P′ by the
application of compound changes. . . . . . . . . . . . . . . . . . . . 116
5.17 Example of transforming a process tree P into a process tree P′ by the
application of overlapping changes. . . . . . . . . . . . . . . . . . . 117
5.18 Example of transforming a process tree P into a process tree P′ by the
application of nested changes. . . . . . . . . . . . . . . . . . . . . . 117
5.19 Example of synchronizing two fragments change pattern. Activity ‘b’
and activity ‘c’ are synchronized. . . . . . . . . . . . . . . . . . . . 120
5.20 Drift characterization at fragment level using the ProDrift plug-in in
Apromore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.21 Average F-score over all logs with different noise ratios per fragment
size, obtained with the fragment-based characterization method vs. the
activity-based characterization method . . . . . . . . . . . . . . . . . 126
5.22 Average F-score over all fragment sizes per single, composite, and
nested change pattern, obtained with the fragment-based characteri-
zation method vs. the activity-based characterization method. . . . . . 127
5.23 Average F-score for singleton fragments per single, composite and
nested change pattern, obtained with the fragment-based characteri-
zation method vs the activity-based characterization method. . . . . . 129
5.24 Average number of statements over all fragment sizes required by our
fragment-based characterization method vs. our activity-based charac-
terization method for characterizing each change pattern. . . . . . . . 130
5.25 Two sample transformations of process tree P into process tree P′. . . 131
5.26 Average number of statements over all fragment sizes per change pat-
tern reported by our fragment-based characterization method using the
A* algorithm vs the greedy algorithm. . . . . . . . . . . . . . . . . . 133
5.27 Transformation of pre-drift process tree to post-drift process tree over
the first drift in the ticketing management process. . . . . . . . . . . . 135
5.28 Directly follows graphs of the ticketing management process before
and after the first drift. . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.29 Transformation of pre-drift process tree to post-drift process tree over
the second drift in the ticketing management process. . . . . . . . . . 136
5.30 Directly follows graphs of ticketing management process before and
after the second drift. . . . . . . . . . . . . . . . . . . . . . . . . . . 137
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE XI
LIST OF FIGURES
5.31 Transformation of pre-drift process tree to post-drift process tree over
the drift in the claim handling process. . . . . . . . . . . . . . . . . . 138
5.32 Example of drift characterization in a partially unstructured process. . 140
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE XII
List of Tables
2.1 Common control-flow change patterns in business processes from [129]. 27
3.1 Change patterns from [129] . . . . . . . . . . . . . . . . . . . . . . . 42
4.1 Change templates defined based on change patterns in [129]. . . . . . 55
4.2 Change templates and their drift characterization statement formats. . 61
5.1 Change patterns from [129] and their relation to our process tree edit
operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2 Costs associated with the process tree edit operations. . . . . . . . . . 90
5.3 Process tree edit operations (cf. Section 16) and their representations
in a mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.4 Change patterns from [129] and their drift characterization statement
formats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
xiii
List of Abbreviations
BPM – Business Process Management
WfM – Workflow Management
IM – Inductive Miner
SESE – Single-entry Single-exit
BPMN – Business Process Model and Notation
EPC – Event-driven Process Chain
UML – Unified Modeling Language
CPN – Colored Petri Nets
SPRT – Sequential Probability Ratio Test
CUSUM – Cumulative Sum
PH – Page-Hinkley est
EWMA – Exponentially Weighted Moving Average
SPC – Statistical Process Control
KSPT – K-sample Permutation Test
RFC – Relative Frequency Change
TRFC – Total Relative Frequency Change
CRFC – Cumulative Relative Frequency Change
DCG – Discounted Cumulative Gain
nDCG – Normalized Discounted Cumulative Gain
nC – Normalized Confidence
CA – Common Ancestor
LCA – Lowest Common Ancestor
LMAs – Lowest Mapped Ancestors
xv
LIST OF TABLES
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE XVI
Acknowledgments
This research is a part of and funded by the “Improving Business Decision-Making
via Liquid Process Model Collections” project (ARC DP150103356). We used an
anonymised data set from an Australian insurance company in our experiments, and
validated the obtained results by conducting an interview with a process analyst from
the same company, for which ethical clearance was obtained (approval number: 1800000366).
xvii
Chapter 1
Introduction
1.1 Problem Area
Business processes are generally supported by information systems, such as Enterprise
Resource Planning systems, Content Management Systems, Customer Relationship
Management systems, Database Management Systems or e-mail servers, that record
data about each individual execution of a process in the form of event logs [116]. Pro-
cess mining [119] aims at turning such event data into valuable, actionable knowledge,
so that process performance or compliance issues can be identified and rectified. Sev-
eral process mining techniques are available. For example, techniques for discovering
a process model from a historical event log, or techniques for predicting future prop-
erties, e.g. remaining time or outcome, of ongoing process cases from a live stream of
events.
Business processes are prone to evolution in response to various types of changes
in the business environment in which they operate. For example, these can be changes
in regulations, competition, supply, demand and technological capabilities, as well as
internal changes in resource capacity or workload, or simply changes in seasonal fac-
tors. It is clear that the success of an organization highly relies on its ability to promptly
and effectively respond to its changing business environment. Therefore, flexibility
and change have been widely studied in the context of Business Process Management
(BPM) [99, 131, 92, 42, 44, 105] and Workflow Management (WfM) [49, 49, 121,
3, 4, 100]. State-of-the-art BPM and WfM systems provide flexibility. Furthermore,
there is even more flexibility in processes controlled by people than those driven by
BPM/WFM systems.
Some process changes are intentional and planned ahead, while others may occur
1
CHAPTER 1. INTRODUCTION
without being noticed or documented. For example, this may be the case of changes
resulting from ad-hoc workarounds initiated by individuals in emergency situations,
changes that are due to the replacement of human resources, or exceptions that in
some cases give rise to new workarounds that over time turn into norms. Over time,
these hidden changes may negatively impact process performance, and more generally
hinder process improvement initiatives.
In this setting, process analysts and managers require methods and tools that allow
them to detect process changes and pinpoint the time periods at which they occurred
as early as possible. Business process drift detection [18, 1, 12, 82, 77] is a family of
process mining techniques which aim at detecting changes based on observations of
business process executions recorded in event logs. Event logs consist of traces, each
representing one execution of the business process. The term “drift” originates from
the concept drift phenomenon in data mining, where it refers to changes in the relation
between the input and the target variables induced by contextual shifts over time [39].
Accordingly, a business process drift is defined as a (statistically) significant change in
the process behavior (concept) [11, 77].
To successfully implement a process improvement initiative, following the detec-
tion of a drift, it is equally important to understand what has changed in the process
behavior, a.k.a drift characterization. The latter aims to pinpoint the location of the
change in the process as well as provide explanations on the manner in which the
change has occurred. As examples of drift characterization in the context of a loan
application process, “activity ‘verify repayment agreement’ that was always executed
before the drift is sometimes skipped after the drift”, or “process fragment ‘Check
credit history’ followed by ‘assess loan risk’ and activity ‘appraise property’ that were
executed in parallel before the drift are executed sequentially after the drift”.
Drift detection and characterization provide the basis for further qualitative and
quantitative process analysis, e.g. root cause analysis or flow analysis, and more gen-
erally may contribute to the success of a process improvement initiative. For example,
early awareness of unexpected process changes enables an organization to take timely
corrective measures and avoid potential repercussions resulting from such changes.
Drift detection and characterization also work as an enabler for other process mining
techniques, as they enable them to select the last “stable” process behavior since the
last drift. An example of the latter is the case of predictive process monitoring tech-
niques whose accuracy is highly dependent on the currency of their underlying predic-
tive models which may be impaired by the occurrence of a drift. This can be avoided
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 2
1.2. PROBLEM STATEMENT
by detecting process drifts and subsequently updating predictive models underlying
these techniques with process changes [81] underpinning the drift.
1.2 Problem Statement
State-of-the-art methods in the area of process drift detection suffer from two major
limitations. First, they do not work in online settings with streams of events that in-
crementally record the executions of a business process. As such, they are designed to
detect inter-trace drifts only, i.e. drifts that occur between complete process executions
(traces), as recorded in event logs. For example, a new legislation requires an insurance
company to perform a more stringent verification on new claims, while old claims are
exempted. Even if some approaches work in online settings (e.g. [77]), they still deal
with streams of complete traces or abstractions thereof. However, process drift may
also occur during the execution of a process, and may impact ongoing executions. For
example, an insurance check may need to be removed altogether due to a contingency
plan triggered by severe weather conditions (e.g. a flood). As such, detecting process
changes as they are happening would enable organizations to take quick remedial ac-
tions and prevent or mitigate repercussions. Existing methods either do not detect such
intra-trace drifts, or detect them with a long delay, as they need to wait for the trace to
complete. A related problem is that they do not perform well with highly-variable pro-
cesses, i.e. processes whose logs exhibit a high number of distinct traces over the total
number of traces – a typical characteristic of healthcare logs. This is because these
methods rely on statistical tests over trace distributions, which may not have sufficient
data samples when the proportion of distinct traces over the total number of traces is
very high, in other words, where there is high variability in the log.
Moreover, the existing methods for drift detection only focus on detecting and
pinpointing the location of a drift. However, detection and localization of a process
drift does not provide, per se, enough insights to undertake a process improvement
initiative, unless the drift is characterized, i.e. unless one can understand what has
changed in the process behavior. To the best of our knowledge, there has not been any
attempt to provide a systematic solution for characterizing process drifts. However,
there are a few ways to approach a drift characterization problem. A possible approach
to characterizing drifts at the level of individual activities is to compare sub-logs from
before and after a drift using an existing log-to-log comparison technique, such as the
one in [115]. However, these techniques report all differences between the pre-drift
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 3
CHAPTER 1. INTRODUCTION
and post-drift process behaviors regardless of the significance of their association with
the occurrence of the drift. Furthermore, these techniques are designed to compare
logs of complete traces. As such, they do not work with event streams, where a sub-
log from before or after a drift contains partial traces, i.e. traces whose start events
are deleted from the stream and/or whose end events are yet to arrive on the stream.
Given the start and end activities of a process one possible workaround is to only use
complete traces within the pre-drift and post-drift sub-logs. However, this may lead
to an incomplete or even inaccurate comparison as the sub-logs may miss fractions of
process behavior that are only captured by the discarded partial traces. This problem
is exacerbated in the logs of highly-variable processes, as in those logs almost every
trace exhibits a unique execution of the process.
Another challenge for drift characterization is to identify changes that impact larger
fragments of activities, e.g. deletion of a fragment of concurrent activities from the
process, or addition of a loop structure over a fragment of mutually exclusive activi-
ties. A possible approach here would be to abstract from low-level relations between
activities and discover process models from before and after a drift and identify their
differences by comparing them using a process model comparison technique, such as
the one in [8]. However, such techniques are designed to identify differences between
two process models at the level of individual activities. Consequently, when used to
characterize process changes at the level of fragments they tend to report a large num-
ber of activity-level differences, which are often difficult to track or understand. For
example, for a simple fragment-level change, where we parallelize two sequential frag-
ments, each consisting of 4 activities, the method by [8] reports 16 differences, each
describing the parallelization of two activities.
1.2.1 Research Questions
Based on the identified research problems in the previous section, we define three
research questions:
RQ1) How to detect a drift from an event stream or event log of a business process?
RQ2) How to characterize a process drift at activity level from an event stream or
event log of a business process?
RQ3) How to characterize a process drift at fragment level from an event stream or
event log of a business process?
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 4
1.2. PROBLEM STATEMENT
1.2.2 Solution Criteria
To address the research questions defined in the previous section, we develop one drift
detection and two drift characterization methods. The drift detection method resulting
from this research will be evaluated using the following criteria:
• Accuracy
– Recall: The method should be able to find a high percentage of existing
drifts in the event stream [39]. Furthermore, the method should be capable
of detecting drifts caused by the application of all typical business process
change patterns [129], e.g insertion, deletion, move, swap, etc. This mea-
sure can be computed on artificial event streams where drifts are known.
– Precision: The method should not identify any false drifts [39]. In other
words, it should be able to distinguish momentary changes, i.e. deviants,
in the behavior of processes from those of a more permanent nature, i.e.
process drifts. This measure describes the resilience of the method and can
be computed either on artificial event streams where the drifts are known
or on real-life event streams that have no drifts, in which case all detected
drifts are counted as false drifts.
• Detection delay: The method should be able to find drifts as early as possible in
the event stream [39], i.e. using only a small number of events from the post-drift
process behavior. This measure describes how much time would elapse before a
drift is detected and is usually expressed as an average value. It can be computed
on artificial event streams where the actual location of the drifts are known.
• Real-time: The method should be able to find drifts in real time [139]. Our goal
is to create an online method that constantly monitors an event stream recording
the behavior of a process within an organization and detects process drifts.
• Automated: The method should be able to identify drifts with no manual inter-
vention [139]. As the method needs to be online it should not need any configu-
ration at any time.
The drift characterization methods resulting from this research will be evaluated
using the following criteria:
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 5
CHAPTER 1. INTRODUCTION
• Accuracy: After a drift is detected it should be characterized, so that it provides
process stakeholders with a full picture of the change that has occurred. How-
ever, an incorrect identification of the change underpinning the drift does not
only confuse the users, but later on, it could also lead to incorrect process im-
provement decisions. An example of an inaccurate drift characterization could
be identifying the addition of a loop structure where in fact an activity is dupli-
cated.
• Completeness: Sometimes a process undergoes multiple different changes. A
complete drift characterization outlines all the occurred changes in a concise and
orthogonal manner. In other words, each change should be described only once
and any two change descriptions should not overlap.
• Understandability: A drift can be characterized by outputting traces before and
after the drift point. However, this does not provide any useful information about
the underlying changes to the stakeholders of the process. A way of character-
izing a drift such that it can be understood by users such as a process analyst
would be to visualize the change on top of the process models before and after
the drift. Alternatively, a change may be described through explanatory state-
ments in natural language, e.g statements like “activity a has been removed from
the process after the drift”.
• Automated: The method should be able to characterize drifts with no manual
intervention. As drifts often occur as a result of unplanned process changes,
drift characterization methods should not rely on the knowledge of the user to
characterize drifts.
1.2.3 Research Benefits
Detection and characterization of a drift helps an organization to identify and act upon
unplanned changes that may negatively impact the performance of their processes. As
an example, a rise in the frequency of certain (problematic) cases may lead to the cre-
ation of bottlenecks in the flow of cases through an insurance claim handling process,
eventuating in performance decline and customer dissatisfaction. Early detection of
process changes can thus be used to alert managers of the process issues, and subse-
quent characterization of the changes can assist them to adopt appropriate measures,
and hence avoid such negative outcomes.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 6
1.2. PROBLEM STATEMENT
Drift detection and characterization can also help to maintain the currency of pro-
cess models within an organization. Process models help to gain a thorough under-
standing of business processes which is a prerequisite to conduct successful process
analysis, redesign or automation [24]. Therefore, process models are the basis for
many critical decisions within an organization. Hence it is necessary that they accu-
rately reflect the “real-world” processes, i.e. the way processes are currently executed
in the organization.
Organizations involved in BPM programs typically collect hundreds of process
models over time [64]. These models tend to become out-of-synch with current pro-
cesses, as the frequency of real-world process changes is such that continuous process
model updates are not cost-effective. This misalignment severely limits the value that
organizations can obtain from process models. After a drift is detected and its under-
lying process changes are characterized, existing process models can be repaired [14,
34, 21, 96] by incorporating the identified changes, so that they reflect processes as
they are executed in the organization. These up-to-date process models can then be the
basis of subsequent process performance improvement initiatives.
Drift detection and characterization also work as an enabler for other process min-
ing techniques, as they enable them to select the last “stable” process behavior since
the last drift. Most process mining techniques assume processes to be in steady state.
For example, process discovery techniques extract process models based on all traces
in an event log, assuming that all traces are produced from the current actual process.
Consequently, the resulting process models are often so-called “spaghetti models”, i.e.
very complex and mostly useless. However, by detecting drifts inside an event log and
discovering process models based on the most-recent behavior of the process after the
last drift, we can obtain the actual process models as they are currently executed within
the organization [16]. Another example is the case of predictive business process mon-
itoring techniques whose performance suffers from the occurrence of a drift, as the
predictive models underlying these methods are trained based on the old pre-drift pro-
cess behavior. This problem can be avoided by detecting drifts and incorporating the
characterized changes underpinning the drifts in their predictive models [81].
1.2.4 Contributions
This thesis provides the following contributions to the state of the art.
• A method for detecting process drifts. We propose an automated statistically-
grounded method for real-time drift detection from event streams of business
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 7
CHAPTER 1. INTRODUCTION
processes. The proposed method is capable of detecting intra-trace as well as
inter-trace drifts. Furthermore, the selected features to capture the process be-
havior are such that it can also detect drifts from event streams of highly-variable
business processes. Finally, by replaying an event log as an event stream, the
method can also be used to detect drifts from an event log of complete traces.
• A method for characterizing process drifts at activity level. By building upon our
drift detection method, we propose an automated statistically-grounded method
for real-time characterization of process drifts at the level of individual activities
from event streams of business processes. In a similar way as for the drift de-
tection method, the proposed drift characterization method can also be used to
characterize drifts detected from an event log of complete traces. The method
reports the identified process changes over a drift as natural language statements
constructed based on typical business process change patterns [129]. To the best
of our knowledge, this is the first method proposed in the context of process drift
characterization.
• A method for characterizing process drifts at fragment level. We propose an
automated method for characterizing process drifts at the level of fragments from
event streams of business processes. In doing so, we adapt a state-of-the-art
process discovery technique, Inductive Miner (IM), to discover process trees,
i.e. block-structured process models, from the event streams before and after a
drift. We also propose the first process tree transformation technique that finds
a minimum cost sequence of edit operations to transform a pre-drift process
tree into a post-drift process tree. The method reports the identified process
changes over a drift as natural language statements constructed based on typical
business process change patterns [129]. Furthermore, the proposed method can
characterize process drifts detected from an event log of complete traces, and can
also be used on top of any process drift detection technique. To the best of our
knowledge, this is the first method proposed for characterizing drifts at fragment
level.
• A process drift detection and characterization tool The proposed drift detec-
tion and characterization methods have been implemented as a standalone open-
source tool called ProDrift,1 as well as a plug-in for the open-source process
1Available at http://apromore.org/platform/tools
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 8
1.3. RESEARCH APPROACH
analytics platform Apromore [65].2
1.3 Research Approach
This research project follows a design science research method [125]. Design science
provides seven guidelines describing characteristics of well-conducted research. De-
sign science research must address an important and relevant business problem and
must lead to an innovative artifact that proposes a more effective approach to a prob-
lem or provides a solution to an unsolved problem. The artifact must be rigorously
evaluated in order to ensure its utility for the addressed problem. The research must
present verifiable contributions and rigor must be applied in both the construction and
the evaluation of the artifact. The proposed solutions must be the result of exploring
existing knowledge and utilizing available means, and must be effectively communi-
cated to appropriate audiences.
The purpose of this research project is to devise a set of methods for detecting
and characterizing business process drifts. As explained in the previous sections, early
detection and characterization of a drift may contribute to the success of a process
improvement initiative in several ways. For example, by identifying undocumented
process changes which may over time negatively impact the process performance, or
by helping to maintain existing process models, i.e. expensive assets of any organiza-
tion, up-to-date, or by improving the performance of other process mining solutions,
e.g. predictive process monitoring techniques, drift detection and characterization con-
tribute to both industry and academic community. Accordingly, they have been the sub-
ject of several scientific papers [18, 1, 12, 82, 77]. The rigor of this research is ensured
by conducting an extensive literature review using well defined criteria for comparing
the various works available, by the use of formal methods, by the implementation of
the envisaged methods as open-source tools, by the use of well established languages
such as Petri nets and process trees, and thorough empirical evaluation of each method
using artificial and real-life datasets in various settings.
A number of techniques have been used for detecting business process drifts in the
literature, including statistical hypothesis testing [11, 77], trace clustering [1], confor-
mance checking using log abstraction [18] and process discovery [79, 17]. Performing
statistical tests over trace abstractions, proposed by Maaradji et al. [77], has proved
to be the most reliable solution in the process drift community. However, as outlined
2Available at http://apromore.org/
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 9
CHAPTER 1. INTRODUCTION
in Section 1.2, the existing techniques suffer from two major limitations: they do not
work in online setting with streams of events, and they do not perform well with event
logs of unpredictable business processes. To address these limitations, we present a
statistically grounded method for detecting drifts from event streams of business pro-
cesses. The method does not require any manual intervention and can scale up to the
extent that it can work in real-time. It is able to detect all typical change patterns (de-
scribed in [129]) with a minimum delay. We perform a non-parametric statistical test
on distributions of certain features vectors over two adjacent windows, namely refer-
ence and detection windows, moving on a stream of events. After exploring a few
different features including, Directly Follows relations (direct succession), Follows re-
lations (succession), Block Structures (extracted from process trees produced by the
Inductive Miner [71]), we observed that α+ Relations [25] are the suitable level of ab-
straction for capturing the behavior of unpredictable processes represented in an event
stream.
As explained in Section 1.2, to the best of our knowledge the drift characterization
problem has not been addressed in the literature. In this thesis, by building upon our
drift detection method we present a statistically-grounded method for characterizing
drifts from event streams of business processes. The method does not need any manual
intervention and in our extensive experiments proves that it is fast and can accurately
identify typical change patterns (described in [129]) applied to individual activities.
The input to the method are distributions of α+ relations from before and after a drift.
We use a statistical test to identify the relations that have significant association with
the occurrence of the drift. These relations are then mapped to a set of predefined
change templates, and the best-matching templates are translated into natural language
statements before being reported to the user.
We address the drift characterization problem in settings where process changes are
applied to larger fragments of activities. We present a fast, accurate, and noise-tolerant
method for characterizing typical change patterns [129] applied to single-entry single-
exit (SESE) fragments from event streams of business processes. We adapt a state-
of-the-art process discovery technique, namely Inductive Miner, to work with event
streams, and using which we discover two process trees from before and after a drift.
A process tree represents a sound block-structured process model, where each process
(sub)tree is a SESE process fragment. We define a set of fragment-based process
tree edit operations and their application costs based on the typical change patterns.
We then introduce a notion of process tree mapping through which we search for a
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 10
1.4. RESEARCH SCOPE
minimum-cost sequence of edit operations that transforms the pre-drift process tree to
the post-drift process tree. We present two search algorithms, an efficient A* and a fast
greedy, which, respectively, find and approximate the optimal solution. The identified
edit operations are then aggregated as much as possible and reported to the user as
natural language statements.
Finally, the methods illustrated in this thesis are implemented as a standalone tool,
and also as a plug-in for the Apromore platform. Apromore is an online open-source
business process analytics platform combining state-of-the-art process mining capa-
bilities with advanced functionality for managing process model collections. Apro-
more features a service-oriented architecture and provides a flexible plug-in frame-
work, which facilitates the seamless addition of new plug-ins. Apromore is the result
of a joint effort between several universities and since its inception has been adopted
in practice by several organizations.
1.4 Research Scope
Process drifts may be divided into four classes based on the form in which they man-
ifest themselves over time including, sudden, gradual, recurring, and incremental
drifts [39]. A sudden drift refers to a scenario where at a certain point in time a pro-
cess is substituted with a new process. A gradual drift, on the other hand, refers to a
scenario where changes are introduced in the process but the old process behavior is
also still allowed for some time and gradually fades out. A recurring drift refers to a
scenario where a set of processes are substituted back and forth with each other. Fi-
nally, an incremental drift refers to a scenario where changes are applied to a process
in smaller increments over a period of time. This thesis focuses on the detection and
characterization of sudden drifts, which can also be used as a starting point to detect
other classes of drifts, e.g. gradual drifts [82, 78].
A drift may occur in different perspectives of a business process, e.g. control-flow,
resource, data, etc. The changes enforced by operations such as insertion, deletion or
reordering of fragments are classified as control-flow changes. Resource perspective
changes include changes in resources, their roles and organizational structure, whereas
data perspective changes refer to the changes in the production and consumption of
data in business processes. In this thesis, we focus on changes related to the control-
flow behavior of a business process.
A process drift may involve changes to different types of process fragments such as,
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 11
CHAPTER 1. INTRODUCTION
multiple-entry multiple-exit (MEME) fragments and single-entry single-exit (SESE)
fragments. MEME fragments are more general and include other types of fragments,
e.g. SESE fragments, as a special case. In this thesis, where we characterize a drift
at fragment level, we capture pre-drift and post-drift process behaviors by discovering
process trees from before and after a drift. Each subtree within a process tree represents
a SESE process fragment. As such, the proposed method expresses any fragment-level
process changes in terms of SESE fragments. Note that by introducing gateways, it is
always possible to transform a MEME fragment into multiple SESE fragments [95].
The methods created in this project are extensively evaluated using both artificial
and real-life process models and event logs. However, experimenting the utility of the
final framework by conducting surveys and interviews, involving humans, is out of the
scope of this project.
1.5 Research Publications
In the course of this research project the following publications were produced:
• A. Maaradji, M. Dumas, M. La Rosa, and A. Ostovar. Fast and accurate busi-
ness process drift detection. In Proceedings of the 13th International Conference
on Business Process Management (BPM’15), volume 9253 of Lecture Notes in
Computer Science, pages 406-422. Springer, Cham, 2015.
• R. Conforti, M. Dumas, M. La Rosa, A. Maaradji, H.H. Nguyen, A. Ostovar,
and S. Raboczi. Analysis of business process variants in apromore. In Pro-
ceedings of the Demo Track of the 13th International Conference on Business
Process Management (BPM’15), volume 9253 of Lecture Notes in Computer
Science, pages 406-422. Springer, Cham, 2015.
• A. Ostovar, A. Maaradji, M. La Rosa, A.H.M. ter Hofstede, and B.F.V. van Don-
gen. Detecting Drift from Event Streams of Unpredictable Business Processes.
In Proceedings of the 35th International Conference on Conceptual Modeling
(ER’16), volume 9974 of Lecture Notes in Computer Science, pages 330-346.
Springer, Cham, 2016. (Chapter 3)
• A. Ostovar, A. Maaradji, M. La Rosa, and A.H.M. ter Hofstede. Character-
izing Drift from Event Streams of Business Processes. In Proceedings of the
29th International Conference on Advanced Information Systems Engineering
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 12
1.6. OUTLINE
(CAiSE’17), volume 10253 of Lecture Notes in Computer Science, pages 210-
228. Springer, Cham, 2017. (Chapter 4)
• A. Maaradji, M. Dumas, M. La Rosa, and A. Ostovar. Detecting Sudden and
Gradual Drifts in Business Processes from Execution Traces. In Journal of IEEE
Transactions on Knowledge and Data Engineering (TKDE’17), volume 29, issue
10, pages 2140-2154. IEEE, 2017.
• SJ. van Zelst, MF. Sani, A. Ostovar, R. Conforti, and M. La Rosa. Filtering
Spurious Events from Event Streams of Business Processes. In Proceedings of
the 30th International Conference on Advanced Information Systems Engineer-
ing (CAiSE’18), Lecture Notes in Computer Science. Springer, Cham, 2018.
(Received Distinguished Paper Award)
• A. Ostovar, S.J.J. Leemans, and M. La Rosa. Robust Drift Characterization
from Event Streams of Business Processes. Submitted to ACM Transactions on
Knowledge Discovery from Data (TKDD), 2018. Technical report in QUT ePrint
121158. (Chapter 5)
1.6 Outline
This thesis is organized as follows. Chapter 2 provides a background on BPM, data
mining, and process mining as well as a description of business process drift, its differ-
ent classes and perspectives. Chapters 3-5, first review, respectively, existing literature
on process drift detection, process drift characterization at the level of activities, and
process drift characterization at the level of fragments and then introduce a new method
for the same, followed by a description of its tool support and extensive evaluation. Fi-
nally, Chapter 6 concludes this thesis and discusses possible avenues for future work.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 13
Chapter 2
Background
This chapter provides a background on business process management, data mining, and
process mining as well as a description of business process drift, its different classes
and perspectives.
2.1 Business Process Management
Business Process Management (BPM) is an established discipline dedicated to the
ways in which organizations identify, capture, analyze, improve, implement and mon-
itor their business processes [32]. BPM influences the effectiveness and efficiency of
a company and is a significant contributor to its overall performance and competitive-
ness. As business processes determine how an organization operates, what activities
need to be performed and what data and resources are required for their successful
execution, they are crucial to successfully achieve targets for a plethora of key perfor-
mance indicators.
BPM can be regarded as a continuous cycle comprising of multiple phases [32].
The BPM lifecycle is illustrated in Figure 2.1. The first phase in the BPM lifecycle is
the identification of the processes inside the organization. The outcome of this phase is
a collection of interrelated processes currently running in the organization. The second
phase is to use the information about the identified processes to build one or multiple
as-is process models. The as-is process model is then analyzed, in the third phase, for
exploring any issues that might affect the performance objectives of the organization.
These issues are then addressed in the fourth phase by modifying the as-is process
model leading to the to-be process model that meets the organization’s desired way
of functioning. The to-be process model is then implemented through organizational
15
CHAPTER 2. BACKGROUND
change management and/or process automation. The last phase consists of monitoring
the running to-be processes with respect to the main performance objectives according
to a set of performance measures. Any new issues found in this phase must be ad-
dressed by means of applying new corrections and modifications to the process. This
may require another iteration of the BPM lifecycle, and this iteration will take place
whenever process performance deviates from the intended objectives.
Process discovery
Process analysis
Process redesign
Process implementation
Process monitoring and
controlling
Conformance and performance insights
As-is process model
Insights on weaknesses and
their impact
To-be process model
Executable process model
Process indetification
Process architecture
Figure 2.1: BPM lifecycle [32].
Process drift detection and characterization, lie in the process monitoring and con-
trolling phase of the BPM lifecycle. Detection and subsequently characterization of a
drift may give rise to a new iteration through the BPM lifecycle which includes updat-
ing existing process models based on the characterized changes, performing qualitative
and quantitative process analysis, e.g. root cause analysis to identify the root cause of
the changes, redesigning the models, that support or discard the changes, and integrat-
ing the new improvements into the BPM system.
2.1.1 Business Process Model and Notation (BPMN)
There are several languages for modeling a business process, e.g. Business Process
Model and Notation (BPMN), Event-driven Process Chain (EPC) and Unified Model-
ing Language (UML). Nowadays, BPMN is a widely used standard for process mod-
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 16
2.1. BUSINESS PROCESS MANAGEMENT
eling [32]. Figure 2.2 shows a subset of the core elements of BPMN. The main events
in a BPMN diagram are the start and end events which represent the initiation and
termination of a process instance, respectively. An activity denotes a work to be per-
formed. A sequence flow represents the order among events, activities and gateways.
Finally, gateways are control-flow elements and control the splitting and merging of
paths. A splitting exclusive gateway only allows the activation of one of its outgoing
paths (based on a defined condition), and a merging exclusive gateway only enables the
execution of its outgoing path once one of its incoming paths is executed. Conversely,
a splitting parallel gateway activates the concurrent execution of all of its outgoing
paths, and a merging parallel gateway waits for all of its incoming paths to be fully
executed before enabling its outgoing path.
BPMN Petri net
Start event
End event Activity
Sequence flowExclusive gateway
Parallel gateway
Place TransitionArc
Start event
End event
Activity
Exclusive gateway
Parallel gateway
Split Join
Fork Merge
Figure 2.2: Subset of core BPMN elements.
2.1.2 Petri nets
Petri nets [93] are a well-known mathematical language for modeling concurrent pro-
cesses [120] and have been widely used in the context of business processes analysis.
Petri nets are backed by precise mathematical definition and offer intuitive graphical
representation. A Petri net is a directed bipartite graph composed of two types of
nodes, namely transitions and places. Intuitively, transitions represents the activities
of a process, and places represent the states of the process. Directed arcs in a Petri net
represent the order among nodes, i.e. which places are pre- and/or postconditions for
which transitions, and cannot connect two nodes of the same type. Figure 2.3 shows
the typical notations used to represent transitions, places and arcs in a Petri net.
BPMN Petri net
Start event
End event Activity
Sequence flowExclusive gateway
Parallel gateway
Place TransitionArc
Start event
End event
Activity
Exclusive gateway
Parallel gateway
Split Join
Fork Merge
Figure 2.3: Core Petri net elements.
Places in a Petri net may hold a discrete number of tokens. Each distribution of
tokens over the places is called a marking and represents a state of the net. A transition
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 17
CHAPTER 2. BACKGROUND
of a Petri net is enabled if there are sufficient tokens in all of its input places. An
enabled transition may fire at any time. Once a transition fires it consumes the required
input tokens and produces tokens in its output places.
Coloured Petri nets (CPN) [56] are an extension of Petri nets that while preserving
its useful properties, allow the distinction between tokens by adding colors, i.e. data
values, to them. It is possible to define complex data types in CPNs, however, each
place of a CPN typically holds tokens of the same type. This type is called color set of
the place.
Figure 2.4 shows a mapping between BPMN elements and their representations in
Petri nets. In this thesis, we use the term process fragment as a generalized concept
covering activities (or transitions) and sub process graphs with single entry and single
exit nodes (known as hammocks in graph literature [134]).
BPMN Petri net
Start event
End event
Activity
Sequence flowExclusive gateway
Parallel gateway
Place TransitionArc
Start event
End event
Activity
Exclusive gateway
Parallel gateway
Split Join
Fork Merge
Figure 2.4: Mapping of activities, events and gateways to Petri nets.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 18
2.2. DATA MINING
2.2 Data Mining
Data mining is the process of extracting useful patterns and knowledge from large
datasets. Data mining algorithms can be applied to various types of data, from typical
database data, transactional data, data streams, ordered data, graph data, etc. The typ-
ical data mining process comprises several steps. First data is cleaned from noise and
inconsistent instances (data cleaning). Then the cleaned data from different sources is
integrated into a data warehouse (data integration). Next, the relevant data with respect
to the analysis objective is selected (data selection). In the fourth step, the selected data
is transformed and combined into the form required for mining (data transformation).
Afterwards, suitable mining methods are applied to the data and the patterns are dis-
covered (data mining). The penultimate step is refining interesting patterns from the
discovered knowledge using interestingness measures (pattern evaluation), and the last
step is representing the result by means of tables, charts, graphs and diagrams (knowl-
edge presentation) [80, 46]
Clustering is a branch of data mining which tries to divide data into groups of sim-
ilar objects. Each group is called a cluster and each member of a cluster is similar
to its cluster-mates and dissimilar to the members of other clusters. Clustering is an
unsupervised learning problem, the outcome of which is extremely dependent on the
selected similarity measures. Summarizing data in the form of clusters simplifies data
interpretation, of course at the price of missing certain details about the data. Cluster-
ing can be used in many other fields. Statistics [7] and, more generally, science [83]
have always leveraged clustering techniques, besides fields such as pattern recognition
where character and speech recognition can be named as typical applications of clus-
tering [29, 30]. Clustering can also be applied for solving density estimation problems
such as multivariate statistical estimations [106, 9].
2.2.1 Concept Drift Detection
Concept drift detection has been extensively studied in the field of data mining [59,
67, 114, 35, 88, 28, 33, 104, 39], where a drift mainly refers to change in the relation
between the input and the target variables in an online supervised learning scenario.
As such, a widely studied challenge is that of devising learning algorithms that can
detect a concept drift as quickly as possible and adapt to the new concept (a.k.a. adap-
tive learning) [50, 132, 60, 38, 58, 40, 6]. This includes for instance changes in the
distributions of numerical or categorical variables.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 19
CHAPTER 2. BACKGROUND
2.2.2 Taxonomy of Concept Drift Detection Mechanisms
In this section, we present a review of the existing concept drift detection techniques
in the field of data mining by dividing them based on their underlying drift detection
mechanism into three categories: techniques based on sequential analysis, techniques
based on control charts, and techniques based on monitoring distributions over two
time windows.
2.2.2.1 Drift Detection Based on Sequential Analysis
The Sequential Probability Ratio Test (SPRT) [127] is a specific type of sequential
hypothesis testing that is used as the basis for several drift detection algorithms. Given
a sequence of observations whose underlying distribution changes from D0 to D1 at
a certain point w, the test evaluates whether the ratio of the probability of observing
certain subsequences under D1 to that under D0 is significant, i.e. above a user-defined
threshold. If the test evaluates to true, the null hypothesis, i.e. the two distributions are
similar, is rejected and a drift is detected at point w.
The Cumulative Sum (CUSUM) is a sequential analysis technique, developed by E.
S. Page [90] based on the principles of SPRT, and is often used for change detection
in stream mining [87]. CUSUM raises an alarm when a parameter, e.g. the mean, of
the probability distribution of the incoming data significantly deviates from zero. The
user is required to set a parameter that specifies the magnitude of the changes that are
considered significant. The value of this parameter controls the trade-off between early
detection of true drifts and detection of false drifts. Initializing this parameter with a
low value allows earlier detection of drifts, but also increases the chance of raising
false drift alarms. The Page-Hinkley test (PH) [90] is a variant of CUSUM that is often
used for change detection in signal processing. PH enables efficient detection of drifts
in the normal behavior of a process as represented by a model.
Motivated by time series, Roberts [103] proposed the Exponentially Weighted Mov-
ing Average (EWMA) method for detecting changes in the moving average of variables
or attributes-type data with normal distributions. Shiryaev [111, 110] introduced a
procedure for detecting changes in the drift of a Brownian motion with the aim of
minimizing expected delay between the time that the change occurs to when it is de-
tected, which is now usually referred to as the Shiryaev-Robert procedure. The same
author several years later presented a Bayesian approach [112] for detecting changes
in online settings which, similar to the CUSUM and PH tests, reports a change once
its computed test statistic goes over a user-defined threshold. The accuracy of such
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 20
2.2. DATA MINING
drift detection methods often relies on the trade-off between the false alarm rate and
the missed detection rate.
2.2.2.2 Drift Detection Based on Control Charts
Statistical Process Control (SPC) [109] is a statistically-grounded method for monitor-
ing and controlling the quality of a process. SPC can be applied to any process where
the conformance of the product to its specifications can be measured, e.g. in manufac-
turing lines. Control charts [109] or process-behavior charts are a SPC tool used to
check whether a manufacturing or business process is in a state of control. There are
several drift detection techniques based on control charts, for example [66, 38, 43, 13].
A control chart is constructed by first drawing points which represent statistics,
e.g. means, of measurements of a quality characteristic of a process in samples taken
at different times. Then, a center line is drawn at the value of the mean of these
statistics (e.g. the mean of the means of samples). Next, the standard deviation of the
mean of the statistics over all samples is also calculated, and based on which the upper
and lower control limits are drawn at 3 standard deviations from the center line. For
a process that is in control, over 99% of all process outputs fall within these limits.
Any observation that falls outside these limits indicates a likely unexpected source of
variation, and should be investigated. Optionally, two warning thresholds may also be
added to the chart at 2 standard deviations below and above the center line to provide
early notifications of a likely change to the quality engineers of the process. This
allows them to, for example, increase the rate at which the samples are taken until it
is ensured that the process is truly in control. The impact of the sample size on the
overall performance of a control chart has been studied by several articles [102, 76, 20,
133, 19], and a sample size of 2 has shown to work best for many test cases [47].
2.2.2.3 Drift Detection Based on Monitoring Distributions over Two Time Win-dows
These techniques perform a statistical test over data distributions in a reference and a
detection window, containing past and most recent data samples, respectively. If the
null hypothesis, i.e. the distributions are equal, is rejected a drift is declared at the
start or end of the detection window. The monitored data may be univariate, bivariate,
or multivariate. The size of the two windows may be fixed or adaptive, and different
window positioning strategies may be used [2].
For example, Kifer et al. proposed to compare distributions of successive data
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 21
CHAPTER 2. BACKGROUND
points in two adjacent windows sliding over a data stream using statistical tests and
Chernoff bounds [45] to determine whether the two distributions are statistically dif-
ferent. An entropy-based metric is introduced by Vorburger and Bernstein [126] for
measuring the distribution inequality between two sliding windows containing older
and more recent data instances, respectively. An entropy value of 1 indicates equal
distributions whereas an entropy value of 0 suggests completely different distributions.
By continuously monitoring the entropy metric over time, a concept drift is detected
when the value of the entropy metric drops below a user-defined threshold. Similarly,
Dasu et al. [23] and Sebastiao and Gama [107] use Kullback-Leibler divergence [63]
to measure the distance between the probability distributions of two time windows,
containing old and recent data samples, to identify concept drifts in streams of multi-
dimensional data.
The main advantage of techniques based on monitoring two distributions as com-
pare to sequential analysis techniques is more precise localization of drifts. On the
other hand, they have larger memory footprints as they need to store the data within
two windows, whereas the sequential analysis techniques do not need to store the in-
coming data.
2.2.3 Concept Drift Characterization
In this context, the term drift characterization is used for describing different prop-
erties of a drift as well as explaining concept changes. For example, some studies
focus on analyzing a specific metric of a drift, e.g. severity, predictability, and fre-
quency [85, 62]. In this respect, Webb et al. [128] propose a comprehensive framework
for quantitative analysis of a drift, e.g. measuring drift magnitude or drift duration.
They also qualitatively categorize drifts into different types based on their occurrence
with respect to time, e.g. sudden or gradual. On the other hand, some studies have
explored techniques for the identification of features that explain the drift. For in-
stance, in [97], authors use brushed parallel histograms to visualize concept drifts in
multidimensional problem spaces.
However, the methods developed for detecting and characterizing a drift in data
mining deal with simple data structures (e.g. numerical or categorical variables and
vectors thereof), while in business process drift detection and characterization we seek
to detect and characterize changes in more complex structures, specifically behavioral
relations between process activities or fragments (e.g. concurrency, conflicts, loops).
Thus, methods from the field of concept drift detection and characterization in data
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 22
2.3. PROCESS MINING
mining cannot be readily transposed to business process drift detection and character-
ization.
2.3 Process Mining
Process Mining starts by collecting information about ongoing processes. Process min-
ing assumes that details about each activity performed in the organization as well as
their order are stored in event logs. Any transactional information system such as En-
terprise resource planning, Customer relationship management, Business-to-business,
Supply Chain Management and Workflow Management produces such logs.
Process mining techniques fall into three broad categories:
• Automatic discovery: Techniques for automatic discovery of process models
based on event logs.
• Conformance checking: Techniques for verifying the conformance of event logs
to process models.
• Process enhancement: Techniques for modifying or extending process models
based on actual models in the organization, recorded in the form of event logs.
A process discovery algorithm is evaluated based on four metrics: fitness, simplicity,
generalization and precision. Most of the times the quality of a process discovery
algorithm is measured by the percentage of the log that can be reproduced by the dis-
covered model (fitness). Process discovery algorithms often output spaghetti-like pro-
cesses which are difficult to read. Therefore, simplicity is another criterion to consider.
In addition, the ability of the discovered model for generalizing the observed behavior
in the log (generalization), and also the extent that the model allows generation of the
behavior not observed in the log (precision) are other metrics for evaluating a process
discovery algorithm [15]. The described metrics are illustrated in Figure 2.5.
Concept drift detection and characterization methods lie in the family of both pro-
cess discovery and conformance checking techniques.
Below, we introduce basic notions such as traces, event logs, event streams and
directly follows relations used as the basis for defining notions related to each method
in the next chapters. The notation used in this thesis is summarized in Appendix A.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 23
CHAPTER 2. BACKGROUND
simplicity
precision
replay fitness
generalization
"Occam's razor""able to replay event log"
"not underfitting the log""not overfitting the log"
process discovery
Figure 2.5: Quality metrics for process discovery algorithms [15].
2.3.1 Event Log and Event Stream
Event logs are at the core of all process mining techniques. An event log is a set of
traces, each capturing the sequence of events originated from a given process instance
(case). Each event represents an occurrence of an activity.
Let L be the set of all activity labels (labels, for short), C be the set of all case
identifiers and T be the set of timestamps, then, we define event and event universe as
follows:
Definition 1 (Event, Event universe). An event e is a triple e = (c, l, t) ∈ C ×L ×T
which describes the occurrence of activity l in case c at time t. The set of all possible
events is called event universe and is indicated as E .
To identify each component of an event e=(c, l, t) we define the functions #case(e)=
c, #label(e) = l and #timestamp(e) = t.
Definition 2 (Event log, Trace). Let L be an event log over the set of labels L , i.e. L ∈P(L ∗). A trace σ ∈ L is a sequence of events Eσ ⊆ E , ordered by their timestamps,
with |Eσ |= n such that σ = 〈#label(e0),#label(e1), . . . ,#label(en−1)〉. Any sub-sequence
of a trace represents a sub-trace.
For example, the following represents an event log with a total of six traces, with
two distinct traces: L = {〈a, b, d〉2, 〈a, b, c, d〉4}. For a trace 〈a, b, c, d〉, 〈a, b〉and 〈b, c, d〉 are two sample sub-traces.
The configuration where events are read individually from an online source is
known as event streaming. An event stream is a potentially infinite sequence of events,
where events are ordered by their timestamps. Events of the same trace do not need to
be consecutive in the event stream, i.e. traces can be “overlapping”. Formally:
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 24
2.3. PROCESS MINING
Definition 3 (Event stream). An event stream is a partial bijective function ES :N→ E
that maps every element from the index N to E .
Figure 2.6 shows a small portion of an event stream. Note that two subsequent
events may belong to different cases.
A B C D EA B A BC D C E E
Time
A B C D
A B C D E
A B C E
Case c1
Case c2
Case c3
ES
E
Figure 2.6: Visual example of a small portion of an event stream. Each square box rep-resents an event. Case ids are color-coded (i.e. each case id has a unique backgroundcolor) and labels in boxes indicate activity labels. The top row of events represents theentire event stream portion, the remaining rows show the individual cases constitutingthe stream.
Definition 4 (Directly follows relation). Let L be an event log over L and a, b ∈ L .
There is a directly follows relation from a to b, denoted by a >L b, if and only if there
is a trace σ = l1l2l3...ln and i ∈ 1, ...,n−2 such that σ ∈ L and li = a and li+1 = b.
Directly follows relations can be extracted from event logs and process models. A
directly follows graph is a directed graph whose nodes represent activities and whose
edges represent directly follows relations between activities. Each edge in the directly
follows graph that is derived from an event log may be annotated by a weight, denoting
how often its corresponding directly follows relation is observed in the log. Figure 2.7
shows the directly follows graph derived from the event log
L = {〈a,b, f 〉4, 〈 f ,a,b〉3, 〈e,d,a,b〉, 〈d,e,a,b〉2, 〈 f ,a,b,c,a,b〉3}.
2.3.2 Business Process Drift
A business process drift is defined as a (statistically) significant change in the process
behavior [11, 77]. Three primary perspectives in the context of business processes are
the control-flow, data and resource perspectives. A drift may occur in one or more of
these perspectives.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 25
CHAPTER 2. BACKGROUND
a b
c
d e
f6
2
343
16
2
Figure 2.7: Example of a directly follows graph.
• Control-flow/behavioral perspective Refers to behavioral and structural changes
in a business process model. A list of common control-flow change patterns is
outlined in [129]. In [77], these control-flow changes are classified into three
categories: Insertion (I), e.g. inserting or deleting a fragment, Resequentializa-
tion (R), e.g. parallelizing two sequential fragments, and Optionalization (O),
e.g. embedding an existing fragment in a loop. We also use this classification,
specifically when experimenting with artificial logs, in this thesis. Table 2.1
shows the common control-flow change patterns obtained from [129] and their
categories. For example, an insurance company which used to perform a certain
check on cases after they are processed by case officers now performs the check
in the beginning of the process before cases are processed any further. Here, a
move change patterns has been applied to the case check fragment which falls
into the insertion category.
Sometimes, the change is not in the control-flow structure of a process, but in the
behavioral aspects of it. For example, in a loan application process, applications
above 5000$ were considered as “high” last year, while this year those above
10000$ are labeled as high, due to the banks’s decision to increase the loan
application limit. In this case, the structure of the process remains unchanged
but the routing of cases changes. We refer to such a change as a branching
frequency change, and still consider it as a control-flow change in this thesis.
• Resource Perspective Refers to changes in resource behavior, e.g. their skills,
utilization, preferences, productivity, collaboration, etc., as well as in organiza-
tional structure of a process. Examples of change in resource perspective are,
replacing a resource who performs a particular activity, a change in the perfor-
mance of a resource in performing a certain activity, a change in the workload
of a resource, or a change in the collaboration pattern of a resource with another
resource. Pika et al. [94] present a method for detecting drifts in resource behav-
ior based on a set of predefined resource behavior indicators (RBI). To detect a
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 26
2.3. PROCESS MINING
Change pattern Cat.Insert/delete a fragment between two fragments IInsert/delete a fragment in/from parallel branch IInsert/delete a fragment in/from conditional branch IDuplicate a fragment ISubstitute a fragment ISwap two fragments IMove a fragment to between two fragments IMove a fragment into/out of conditional branch IMove a fragment into/out of parallel branch IMake fragments mutually exclusive/sequential RMake fragments parallel/sequential RSynchronize two fragments RMake a fragment loopable/non-loopable OMake a fragment skippable/non-skippable OChange branching frequency O
Table 2.1: Common control-flow change patterns in business processes from [129].
drift they perform statistical tests on a time series that records the evolution of
each RBI over time.
• Data Perspective Refers to changes in the requirement and generation of data in
activities of a process. For example, in a loan application process, reducing the
number of co-signatures required to enable the execution of a particular activity.
Process drifts may be divided into four classes based on the form in which they
manifest themselves over time, as shown in Figure 2.8.
• Sudden drift Refers to a scenario where a current process P1 is substituted with
a new process P2, and from the moment of substitution all process instances are
processed based on the new process, as shown in Figure 2.8a. For example,
requiring a new health check in a citizenship application process due to a new
regulation.
• Gradual drift Refers to a scenario where a current process P1 is substituted
with a new process P2, however both processes coexist for some time with the
old process is gradually discontinued, as shown in Figure 2.8b. For example, a
new policy in an insurance company requires claim handlers to perform a new
check on each insurance claim. The insurance company decide to first start by
performing the check on long-term and high-value claims and over time extend
it to short-term and low-value claims.
• Recurring drift Refers to a scenario where a set of processes, e.g. P1 and P2
in Figure 2.8c, are substituted back and forth with each other. Such drifts can
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 27
CHAPTER 2. BACKGROUND
be divided into periodic and non-periodic, and are often induced by changes in
the external environment in which a business process operates. An example of
a periodic recurrence is in the tourism industry, where a travel agency may de-
ploy different processes during different seasons. An example of a non-periodic
recurrence is a deployment of a different process based on the market condi-
tions. The time of the deployment and its duration are dependent on the market
conditions.
• Incremental drift Refers to a scenario where an existing process P1 is substi-
tuted with a new process Pn by applying smaller incremental changes over a
period of time, resulting in process variants P2, . . .Pn, as shown in Figure 2.8d.
This class of drift is more common in organizations that follow agile business
process management methodology.
P1
P2
time
P1
P2
time
P1
P2
time
P1
P2
time
..P3
Pn
..
(a) Sudden
P1
P2
time
P1
P2
time
P1
P2
time
P1
P2
time
..P3
Pn
..
(b) Gradual
P1
P2
time
P1
P2
time
P1
P2
time
P1
P2
time
..P3
Pn
..
(c) Recurring
P1
P2
time
P1
P2
time
P1
P2
time
P1
P2
time
..P3
Pn..
(d) Incremental
Figure 2.8: Different classes of drifts. Y-axes indicate process variants and blue rect-angles represent process instances.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 28
Chapter 3
Process Drift Detection
In the introduction, we highlighted some ways in which drift detection can contribute
to the success of a process improvement initiative within an organization. Specifically,
early detection of drifts enables organizations to take timely corrective measures and
avoid any negative consequences that would otherwise result from unplanned changes
in the behavior of their business processes. We also specified that state-of-the-art drift
detection techniques cannot detect drifts at real-time from streams of events that incre-
mentally record the executions of a business process. As such, they may also fail to
detect or detect with a long delay intra-trace drifts, i.e. drifts that occur during the exe-
cution of a process and may also impact ongoing process executions. Furthermore, as
they rely on statistical tests over trace distributions to detect drifts, they do not perform
well with unpredictable processes, e.g. a healthcare process, whose logs exhibit high
trace variability, i.e. a high number of distinct traces over the total number of traces.
To address the identified limitations, in this chapter we propose a fully automated,
online method for detecting process drifts from event streams. We perform statisti-
cal tests over distributions of behavioral relations between activities such as conflict,
causality and concurrency, as observed from two adjacent windows of adjustable size,
which we slide over the stream. Given that behavioral relations between activities are
a type of sub-trace features, the method does not suffer from low accuracy when the
log is highly variable (i.e. for unpredictable processes). We extensively evaluate the
accuracy and scalability of our method by simulating event streams from artificial and
real-life logs. The results show that the approach is fast and highly accurate in detect-
ing common change patterns, and significantly better than the state of the art in process
drift detection.
This chapter is structured as follows. Section 3.1 discusses related work on process
29
CHAPTER 3. PROCESS DRIFT DETECTION
drift detection. Section 3.2 introduces the proposed method while Sections 3.4 and
3.5 present its evaluations on artificial and real-life logs, respectively. Section 3.6
concludes the chapter.
3.1 Related Work
Various methods have been proposed to detect process drifts from event logs [18, 1, 12,
82, 77]. These methods are based on the idea of extracting features (e.g. patterns) from
the traces of an event log. For example, Carmona et al. [18] propose to represent a
log as a polyhedron. This representation is computed for prefixes in a random sample
of the initial traces in the log. The method checks the fitness of subsequent trace
prefixes against the constructed polyhedron. If a significant number of these prefixes
does not lie in the polyhedron, a drift is declared. The method guarantees that drifts
of certain types will always be detected. However, to find a second drift after the
first one, the entire detection process must be restarted, thus adversely affecting on the
scalability of the method. In previous experiments we conducted [77], the execution
of this implemented method took hours to complete. Another drawback is its inability
to pinpoint the exact moment of the drift.
Accorsi et al. [1] propose to cluster the traces in a moving window of the log,
based on the average distance between each pair of events in the traces. This method
heavily depends on the choice of the window size: a low window size may lead to false
positives while a high window size may lead to false negatives (undetected drifts),
as drifts happening inside the window go undetected. In addition the method is not
designed to deal with loops, and may fail to detect types of changes that do not cause
significant variations to the distances between activity pairs, e.g. changes involving an
activity being skipped.
Bose et al. [12] propose a method to detect process drifts based on statistical testing
over feature vectors. The method is not fully automated, as the user is asked to identify
the features to be used for drift detection, implying that they have some a-priori knowl-
edge of the possible nature of the drift. Further, this method is unable to identify certain
types of drifts such as inserting a conditional branch or a conditional move, even if the
relevant process activities are selected as features. Finally, similar to Accorsi et al. [1],
the user is required to set a window size for drift detection. Depending on how this
parameter is set, some drifts may be missed. This latter limitation is partially lifted in a
subsequent extension [82], which introduces a notion of adaptive window. The idea is
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 30
3.1. RELATED WORK
to increase the window size until it reaches a maximum size or until a drift is detected.
However, this technique requires the user to set a minimum and a maximum window
size. If the minimum window size is too small, minor variations (e.g. noise) may be
misinterpreted as drifts (false positives). Conversely, if the maximum window size is
too large, the execution time is affected and some drifts may go undetected.
Li et al. [74] identify a drift as a difference between the binary activity relations
of causality and concurrency, as well as length-two loop, extracted by the Heuristic
Miner [130], in two overlapping windows of the same size sliding over a stream of
traces. The proposed method suffers from a few problems. First, it does not factor in
the frequency of binary relations when detecting a drift, and as such cannot detect drifts
caused by branching frequency changes. Furthermore, there is no statistical support
for determining whether the identified changes are actually significant. Finally, the
accuracy of drift detection highly depends on the size of the drift detection windows,
which needs to be manually set by the user.
All these methods may miss certain types of changes that are not covered by the
types of features used. Moreover, their scalability is constrained by the need to extract
and analyze a feature space that is potentially very large. Hence, they are not suit-
able for online settings. This motivated us to propose a new method [77] for detecting
process drifts determined by a wide range of typical process change patterns [129].
The method is based on statistical tests over the distribution of runs (an abstraction
of complete traces), as observed in two consecutive time windows. The size of these
windows is adjusted automatically based on changes in log variability. In the exper-
iments with artificial as well as real-life event logs this method outperformed all the
above methods in terms of detection accuracy and scalability. As such, we selected
it as a baseline for the experiments in Section 3.4. As shown in our experiments in
this chapter, this method also does not cater for highly variable event logs. In such
logs each distinct run occurs only a few times, leading to a less reliable statistical test,
and hence too many false negatives. Further, as the method works based on complete
traces, it cannot detect (intra-trace) drifts from event streams.
To the best of our knowledge, the only method that deals with event streams has
been proposed by Burattin et al. [16]. However, this work mainly focuses on the on-
line discovery of process models captured as a set of business constraints (formulated
in Linear Temporal Logic) between events. Any change in the extracted constraints
over time may be considered as a drift. Nonetheless, there is no statistical support
for detecting whether changes are in fact significant, and the exact positions of the
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 31
CHAPTER 3. PROCESS DRIFT DETECTION
identified drifts are not reported. As such, drift detection accuracy is not evaluated.
In summary, none of the existing process drift detection techniques fully satisfies
the process drift detection criteria outlined in Section 1.2.2.
3.2 Drift Detection Method
From a statistical viewpoint, the problem of business process drift detection can be for-
mulated as follows: identify a time point before and after which there is a statistically
significant difference between the observed process behaviors. Therefore, to detect a
drift we need features that properly capture the behavior of a process. By monitor-
ing and analyzing the feature vectors over time, we can identify the time points where
the feature vectors exhibit statistically significant changes. We explored a few differ-
ent features including Directly Follows relations (direct succession), Follows relations
(succesion), Block Structures (extracted from process trees produced by the Inductive
Miner [71]) and α+ Relations [25]. We found that while the directly follows and
follows relations are over-fitting features, block structures were under-fitting features.
However, α+ relations proved to be the suitable level of abstraction for capturing the
behavior of unpredictable processes represented in an event stream.
To detect a process drift we perform a statistical test, namely the G-test of indepen-
dence,1 over distributions of α+ relations observed in two adjacent time windows of
adaptive size, sliding along with a stream of events. Basically, the most recent events
are equally divided into reference window (less recent events), and detection window
(more recent events). Each time a new event enters the event stream, the two win-
dows shift forward so that the new event is in the detection window. The set of events
within each window is used to build a corresponding sub-log. This sub-log represents
the process behavior observed within the respective window. The sliding window is a
well-stablished technique in the concept drift community [39].
Then the α+ relations and their frequencies are extracted from each sub-log, and
used to populate a 2×n matrix, the so-called contingency matrix, where n is the num-
ber of distinct relations. Each column in the contingency matrix corresponds to a
category of a statistical variable, here an α+ relation. The first row in the contingency
matrix contains the frequencies of the relations in the detection window, i.e. the ob-
served frequencies, while the second row contains the frequencies of the relations in
1The G-test is a non-parametric hypothesis statistical test which assumes no a-priori knowledge ofthe statistical distributions. The G-test is a better approximation to the theoretical chi-squared distribu-tion than the chi-squared test [48].
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 32
3.2. DRIFT DETECTION METHOD
the reference window, i.e. the expected frequencies.
The result of applying the G-test of independence on the contingency matrix is the
significance probability (P–value) that the populations of α+ relations over the two
windows come from the same distribution. A P–value above a predefined threshold2
accepts the null hypothesis, i.e. the frequency distributions of the α+ relations in the
two windows are similar. However, a P–value below the threshold rejects the null
hypothesis, meaning that the α+ relations in the two windows come from different
distributions. In other words, they reflect different process behaviors (process drift).
3.2.1 Intra-trace vs Inter-trace
A drift may occur between complete executions of a process. We call this an inter-
trace drift. For example, a new legislation requires an insurance company to perform
a more stringent verification on new claims, while old claims are exempted. These
however are not the only type of drift. In reality, a drift may also occur during the
execution of a process and may impact ongoing process instances [129]. We call these
intra-trace drifts. For example, an insurance check may need to be removed altogether
due to a contingency plan triggered by severe weather conditions (e.g. a flood). Such
a change may impact new process instances as well as the instances that have already
started, but that have not yet gone through the check to be removed.
In addition, in order to detect a drift using a stream of traces, we have to wait until
each trace completes before we can use it. This delays the detection of the drift. On
the other hand, working on a stream of events allows us to instantly use each observed
event, thereby detecting a drift as soon as possible during the execution of the process.
3.2.2 α+ Relations
In this chapter, we use the α+ relations [25], as an extension of the α relations [118],
to capture the behavior of a process. The α-algorithm defines three exclusive rela-
tions: conflict, concurrency and causality. The α+-algorithm adds two more relations:
length-two loop and length-one loop. The α+ relations are formally defined as follows:
Definition 5 (α+ relations from [25]). Let L be an event log over L . Let a, b ∈L :
• a4Lb if and only if there is a trace σ = l1l2l3...ln and i ∈ 1, ...,n−2 such that σ ∈ L and li =
li+2 = a and li+1 = b,
• a�L b if and only if a4Lb and b4La,
2The typical value of the threshold, i.e. significance level, for the G-test is 0.05 [89].
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 33
CHAPTER 3. PROCESS DRIFT DETECTION
• a >L b if and only if there is a trace σ = l1l2l3...ln and i ∈ 1, ...,n−2 such that σ ∈ L and li = a
and li+1 = b,
• a→L b if and only if a >L b and (b≯L a or a�L b),
• a#Lb if and only if a≯L b and b≯L a, and
• a ‖L b if and only if a >L b and b >L a, and a 6�L b.
A length-two loop relation, including a and b, is denoted with a4Lb. The fre-
quency of this relation in a log is the number of occurrences of the substring aba. A
causality relation from a to b is denoted with a→L b. The frequency of this relation
in a log is the number of occurrences of the substring ab. A parallel relation between
a and b is denoted with a ‖L b. The frequency of this relation in a log is the minimum
of the frequencies of the two substrings, ab and ba. A conflict relation between a and
b is denoted with a#Lb, and indicates that there is no trace with the substring ab or ba.
The frequency of this relation in a log is the number of occurrences of a and b. The
α+-algorithm also discovers length-one loop relations as a pre-processing operation.
For example, there is a length-one loop including the activity a in a log if there is a
trace with the substring aa. The frequency of this relation in a log is the number of
occurrences of the substring aa.
3.2.3 Statistical Testing over Event Streams
This section describes our online drift detection algorithm as presented in Algorithm 1.
The drift detection algorithm has three parameters: 1. eventStream: a stream of events.
2. initWinSize: initial size of the detection and reference windows. 3. maxBufSize:
maximum available memory for the event buffer storing the incoming events, namely
eventBuf . Since the algorithm works online the size of this buffer must not exceed
maxBufSize. Therefore, each time a new event e arrives we first check if the buffer has
reached its maximum size, and if so we shift the events in the buffer and discard the
least recent event (lines 11-13). We then insert the new event into the buffer (line 14).
The first statistical test should be performed when the number of events in the
buffer is 2× initWinSize (line 15). Before each statistical test we adapt the size of
the two windows to improve the accuracy of the approach (line 16). The notion of
adaptive window is explained in Section 3.2.4. The method updateSublogs updates
the sub-logs related to the detection and reference windows, namely detSubLog and
refSubLog, respectively, using the events within their corresponding windows (line
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 34
3.2. DRIFT DETECTION METHOD
17). The first time this method is called the sub-logs are built from scratch. The α+
relations and their frequencies are extracted from the two sub-logs and populated in
a contingency matrix (line 19). We then perform the G-test of independence on this
contingency matrix and obtain the P–value (line 20). The value of the G-test threshold,
GtestThreshold, is set to the typical value of the G-test, which is 0.05.
Each time the P–value drops below the threshold GtestThreshold, we store the
current event and the current window size in pbtEvent and pbtWinSize, respectively
(lines 24-25). Since any statistical test is subject to sporadic stochastic oscillations, we
introduced an additional filter, namely oscillation filter. The P–value drops have to be
consistent over many consecutive statistical tests in order to avoid reporting incidental
drops in the P–value (oscillations). The size of the oscillation filter is calculated by
function Φ which uses the window size w as input. The number of consecutive tests in
which the P–value is below the threshold GtestThreshold is stored in pbtLen. We detect
a drift only if pbtLen is at least equal to Φ(w) (line 27). Our experiments showed that a
value of Φ(w) = w/2 provides the best results in terms of accuracy (cf. section 3.4.3).
The drift is localized at the event where the P–value dropped consistently below the
threshold, stored at pbtEvent (line 28). Whenever the P–value exceeds the threshold
we reset pbtLen, pbtEvent and pbtWinSize (lines 31-33).
3.2.4 Adaptive Window
Best practices of using the G-test recommend that no more than 20 percent of the
expected frequencies in the contingency matrix have less than 5 occurrences, to have
a reliable statistical test [48]. Thus, each time before performing the statistical test
we ensure the size of the two windows is large enough to fulfill this requirement.
Even though the larger the window size is the higher the chances that the requirement
of the statistical test is met, a very large window size may increase the number of
new events needed to detect a drift, so-called mean delay. Furthermore, it may also
cause the detection and reference windows to span over multiple drifts, thereby letting
some of the drifts go undetected. Therefore, we need to balance between improving
the reliability of the statistical test, by increasing the window size, and reducing the
detection delay of the method, by decreasing the window size.
The idea behind our adaptive window originates from the requirement of the sta-
tistical test mentioned above, meaning that on average we aim to have a frequency
of no less than 5 for each of the α+ relations in the contingency matrix. Given that
the maximum number of possible relations over the set of labels (activity names) L
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 35
CHAPTER 3. PROCESS DRIFT DETECTION
Algorithm 1 Drift Detection Algorithm1: procedure DETECTDRIFT(eventStream, initWinSize, maxBufSize)2: eventBuf /*Event buffer*/3: w←− initWinSize /*Current window size*/4: detSubLog, refSubLog /*List of sub-traces within detection and reference windows, respec-
tively*/5: GtestThreshold←− 0.05 /*Typical threshold value of G-test*/6: pbtEvent←− NIL /*Current event when P–value drops below GtestThreshold*/7: pbtWinSize←−−1 /*Value of w when P–value drops below GtestThreshold*/8: pbtLen←− 0 /*# of consecutive tests that P–value remains below GtestThreshold*/9: while true do
10: e←− fetch(eventStream)/*Fetch a new event e*/11: if size(eventBuf ) = maxBufSize then12: shift(eventBuf )13: end if14: insert(eventBuf ,e) ebLength←− length of eventBuf15: if ebLength≥ 2 · initWinSize then16: newWinSize←− adWin(eventBuf ,w)17: updateSublogs(eventBuf ,detSubLog,refSubLog,w,newWinSize)18: w←− newWinSize19: conMat←− buildContingencyMatrix(detSubLog,refSubLog)20: pValue←− Gtest(conMat)21: if pValue < GtestThreshold then22: pbtLen←− pbtLen+123: if pbtEvent = NIL then24: pbtEvent←− e25: pbtWinSize←− w26: end if27: if pbtLen = Φ(pbtWinSize) then28: reportDrift(pbtEvent) /*Drift detected and reported*/29: end if30: else31: pbtLen←− 032: pbtEvent←− NIL33: pbtWinSize←−−134: end if35: end if36: end while37: end procedure
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 36
3.2. DRIFT DETECTION METHOD
is |L |2, we calculate |L | over both detection and reference windows, denoted by
|Ldet |, |Lre f |, respectively. By multiplying max(|Ldet |, |Lre f |)2 by 5 it is likely to
have enough events in both windows to fulfill the requirement of the statistical test.
Hence window size w is defined as w = max(|Ldet |, |Lre f |)2 ·5.
The expansion and the shrinkage of the windows is performed recursively. This
is because each time the windows are, for example, expanded there may be a need to
expand the windows again due to changes in |Ldet | and/or |Lre f |. It is worth men-
tioning that our adaptive window is not dependent on the initial window size, since
starting from any initial value the window sizes converge to the length needed to fulfill
the requirement of the statistical test. The maximum size each window could grow to
is the length of the event buffer divided by two.
It is worth mentioning that in the unlikely extreme scenario where the overlapping
between traces is to the extent that each event within a window comes from a distinct
trace, data streaming techniques with a gradual forgetting strategy [39] should be used.
3.2.5 Noise handling
Real-life event streams often contain noisy events. These events may negatively impact
the accuracy of α+ relations discovered from an event stream, leading to lower drift
detection accuracy. To handle drift detection on noisy event streams, we first filter out
infrequent directly follows relations from the reference and detection windows. We
consider a directly follows relation as infrequent if its frequency lies below a certain
threshold, defined as a percentage of the sum of the frequencies of all directly follows
relations in each of the reference and detection windows. In the experiments with
noisy event streams in this thesis, we set this threshold to 10%. The remaining noise-
free directly follow relations are then used to construct α+ relations from the reference
and detection windows.
Alternatively, more advanced noise filtering techniques such as the one we pro-
posed in [122] can also be used to filter out spurious events from an event stream.
In offline settings, the technique proposed in [22] provides a systematic solution for
removing infrequent activities from an event log.
Time complexity Each time a new event is received from the stream, we first extract
the α+ relations in each sliding window and count their frequencies, and then perform
the G-test of independence. The worst-case complexity of computing the α+ relations
is quadratic in the cardinality of the label set, i.e. O(|L |2). Given a contingency matrix
of maximum size 2× |L |2, the complexity of the G-test is O(|L |2). Since the two
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 37
CHAPTER 3. PROCESS DRIFT DETECTION
mentioned operations have the same complexity and are executed in a sequence, the
complexity of our method is O(|L |2) for every new event read from the stream.
3.3 Tool Support
We implemented the proposed method as a plug-in for the Apromore platform3 as well
as a standalone open-source tool called ProDrift.4 Figure 3.1 shows a screenshot of the
plug-in in Apromore. The plug-in can be launched by selecting an event log from the
repository within Apromore and pressing “Detect process drifts” from the “Analyze”
menu, as shown in Figure 3.1a. Alternatively, it is possible to click on the menu item
without selecting a log first. In this case, the tool will ask the user to import a log from
their local computer. This second option is particularly useful when the user does not
wish to store their log in the repository that comes with Apromore.
As shown in Figure 3.1b, the plug-in comes with two drift detection configura-
tion options: “event-based” and “trace-based”. The former selects the drift detection
method presented in this chapter, while the latter selects the run-based drift detection
method proposed by Maaradji et al. [77], that is used as the baseline in our experiments
in Section 3.4. When using our method for drift detection we replay the input event
log as an event stream. By default, we use the adaptive window mechanism (cf. Sec-
tion 3.2.4) to automatically set the size of the drift detection windows. Alternatively, it
is possible for the user to select fixed windows of a certain size.
Once the processing of the log is complete, the tool shows a plot of the P–value
of the statistical test, where the position of each detected drift is marked as a circle on
the P–value curve, as shown in Figure 3.1c. Furthermore, the event index at which
each drift is detected and its corresponding date are reported in the list bellow the
P–value plot. Also, by pressing the “Save Sublogs” button, one can download the sub-
logs, each containing sub-traces (event-based setting) or traces (trace-based setting),
between every two consecutive drifts.
3.4 Evaluation on Artificial Logs
We used ProDrift to assess the goodness of our method in terms of accuracy and scal-
ability in a variety of settings. In the rest of this section we discuss the design of the
3Available at http://apromore.org/4Available at http://apromore.org/platform/tools
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 38
3.4. EVALUATION ON ARTIFICIAL LOGS
(a) Launch ProDrift.
(b) Set drift detection parameters (optional).
(c) Drift detection results. Drifts are marked as circles on the P-valuecurve of the statistical test and their locations and dates are reportedin the list bellow the plot.
Figure 3.1: Drift detection using ProDrift plug-in within Apromore.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 39
CHAPTER 3. PROCESS DRIFT DETECTION
experiments, the datasets used, the impact that oscillation filter and inter-drift distance
have on our method, and conclude by comparing our method with the method in [77].
3.4.1 Setup
To evaluate the effectiveness of our method, we created a variety of artificial logs
with different configurations, and then replayed these logs as event streams. We first
modeled a base business process using CPN tools, as illustrated in Figure 3.2, and
then used this model to generate the logs.5 The model features 42 different activities,
combined with different intertwined structural patterns: five XOR, six AND structures,
and three loop structures. We built this model in a way that the resulting log is highly
variable. To produce logs that include drifts, we then injected different types of control-
flow changes into the base CPN model.
We applied in turn one out of fifteen simple change patterns [129] to the base
model. These patterns, summarized in Table 3.1, describe different change operations
commonly occurring in business process models, such as inserting/deleting a model
fragment, putting a model fragment in a loop, swapping two fragments, or paralleliz-
ing two sequential fragments. We organized the simple changes into three categories:
Insertion (“I”), Resequentialization (“R”) and Optionalization (“O”) (cf. Table 3.1).
These categories make six possible composite change patterns (“IOR”, “IRO”, “OIR”,
“ORI”, “RIO”, and “ROI”) by nesting the simple patterns within each other. For exam-
ple, the composite pattern “ROI” can be obtained by first adding a new activity (“I”),
then making this activity parallel to an existing activity (“O”) and finally by putting the
whole parallel block into a loop structure (“R”).
Each of these change patterns were applied locally on the base model in such a
way that it is possible during log replay to choose between the base model execution
path and the altered one. For instance, if the applied change pattern was to replace
a process fragment (rp), the CPN model would have a branching point, called drift
toggle, right before this fragment, that allows the execution to follow either the initial
model fragment or the new process fragment. A drift is injected by switching the
toggle on or off. In this way, we can generate intra-trace drifts. For instance, if the
toggle is switched on when trace #500 starts, the traces that started before that trace
and have not yet reached the branching point, will follow the new process behavior,
thus exhibiting the change. These traces will therefore have an intra-trace drift. In the
remainder, whenever we say that a drift has been injected at a given trace number (after
5http://cpntools.org
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 40
3.4. EVALUATION ON ARTIFICIAL LOGSCID
CID
CID C
ID
CID
CID CID
CID
CID
CID
CID
star
t
CID
1
CID CID
CID
CID
CID
CID
CID
CID
CID
CID
CID
CID
CID
CID
CID
CID
CID
EN
D
CID
CID
CID
CID
CID
CID
CID
CID
CID
CID
CID
CID
CID
CID
CID
CID
CID CID
CID
CID
CID
CID
CID
CID
CID
CID
CID
[i<
tota
lNum
OfT
race
s]
c e f g
a b
h
ij
k
input
();
outp
ut
(Lra
n);
action
dis
cret
e(1,
100)
l
p1
m
s
input
();
outp
ut
(Lra
n);
action
dis
cret
e(1,
100)
r
u
v wx
input
();
outp
ut
(Lra
n);
action
dis
cret
e(1,
100)
d
o7
n1
o1t
yz
n2
p2
o2
o5
n3 n5
n4 n6
o6
p3
p4
p5
p6
o4o3
DP
i
i i i
i i i i
i i i
i@+
TTim
e()
i@+
TTim
e()
i@+
TTim
e()
i@+
TTim
e()
i
i+1@
+In
terA
rriv
alTi
me(
)
i
ii
i
ii
i@+
TTim
e()
i@+
TTim
e()
ii
i@+
TTim
e()
i i i
i@+
TTim
e()
i@+
TTim
e()
ii@
+TT
ime(
)i
i@+
TTim
e()
i
loop
(Lra
n,
i)@
+TT
ime(
)
noL
oop(L
ran,
i)
noL
oop(L
ran,
i)@
+TT
ime(
)
ii@
+TT
ime(
)i
loop
(Lra
n,
i)@
+TT
ime(
)
ii@
+TT
ime(
)
noL
oop(L
ran,
i)@
+TT
ime(
)
ii@
+TT
ime(
)
i i
i@+
TTim
e()
i
i
ii
loop
(Lra
n,
i)@
+TT
ime(
)
ii
ii@
+TT
ime(
) i@+
TTim
e()
ii@
+TT
ime(
)
i
i@+
TTim
e()
ii
noL
oop(L
ran,
i)@
+TT
ime(
)i
i@+
TTim
e()
i
i
i
ii@
+TT
ime(
)
i@+
TTim
e()
ii@
+TT
ime(
)
i
i
i
i@+
TTim
e()
i@+
TTim
e()
i i
i@+
TTim
e()
i@+
TTim
e()
i i
i@+
TTim
e()
i@+
TTim
e()
i i
i
i
i@+
TTim
e()
i@+
TTim
e()
i i
i i
i i
i iii
i i
i@+
TTim
e()
i@+
TTim
e()
i
i@+
TTim
e()
i@+
TTim
e()
i
i@+
TTim
e()
i@+
TTim
e()
ii
i@+
TTim
e()
i i
i@+
TTim
e()
i@+
TTim
e()
i@+
TTim
e()
i
Figu
re3.
2:A
rtifi
cial
proc
ess
mod
elcr
eate
din
CPN
tool
s,us
edas
aba
sem
odel
tosi
mul
ate
the
artifi
cial
even
tlog
s.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 41
CHAPTER 3. PROCESS DRIFT DETECTION
a given number of traces) it means that the drift toggle has been switched on at the first
event of that given trace number (resp. after that given number of traces have started).
Code Simple change pattern Categorysre Insert/delete an fragment between two fragment Ipre Insert/delete a fragment in/from parallel branch Icre Insert/delete a fragment in/from conditional branch Icp Duplicate fragment Irp Substitute fragment Isw Swap two fragments Ism Move fragment to between two fragments Icm Move fragment into/out of conditional branch Ipm Move fragment into/out of parallel branch Icf Make two fragments mutually exclusive/sequential Rpl Make two fragments parallel/sequential Rcd Synchronize two fragments Rlp Make fragment loopable/non-loopable Ocb Make fragment skippable/non-skippable Ofr Change branching frequency O
Table 3.1: Change patterns from [129]
Finally, in order to vary the distance between drifts, for each change pattern we
generated three logs of 2,500, 5,000 and 10,000 traces, and injected drifts by switching
the drift toggle on and off every 10% of the log. This led to an inter-drift distance of
250, 500 and 1,000 traces per change pattern, with 9 drifts per log. The position of
an injected drift is given by the index of the first event in the event stream, after the
drift toggle has been switched on. These indexes are used as the true positives of our
evaluation (the gold standard). Further, for each of the 6 composite change patterns,
we created 3 possible combinations, by changing the type of pattern used. This led to
15 (simple patterns) + 18 (complex patterns) = 33 different variants of the CPN model
times three inter-drift distances, resulting in a total of 99 logs.6 All these logs exhibit
a very high trace variability (80%± 2), measured as the ratio between the number of
distinct traces and the number of total traces in the log. According to our analysis of
real-life logs, this value is very indicative of logs of unpredictable processes, such as
the one used in the second part of this evaluation.
To assess the scalability of our method for online drift detection, we measured the
execution time per each new event read from the stream. To evaluate accuracy, we used
F-score and mean delay. The F-score is computed as the harmonic mean of recall and
precision, where recall measures the proportion of actual drifts that have been detected
and precision measures the proportion of detected drifts that are correct. The mean6All the CPN models used for this simulation, the resulting artificial logs, and the detailed evaluation
results are available with the software distribution.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 42
3.4. EVALUATION ON ARTIFICIAL LOGS
delay [52] assesses the ability of the method to find drifts as early as possible in an
event stream, and is measured as the number of events between the actual position of
the drift and the end of the detection window.
3.4.2 Execution Times
We conducted all tests on an Intel i7 2.20GHz with 16GB RAM (64 bit), running Win-
dows 7 and JVM 7 with standard heap space of 2GB, and a stream buffer (maxBufSize)
of 1GB. The time required to update the α+ relations and perform the G-test, ranges
from a minimum of 10ms to a maximum of 50ms with an average of 14ms. These
results show that the method is suited for online drift detection, including scenarios
where the inter-arrival time between events is in the order of milliseconds.
3.4.3 Impact of Oscillation Filter
In the first experiment, we measured the impact of the oscillation filter Φ(w) on F-
score and mean delay, by varying its value from w/4 to w, where w is the window
size. Figure 3.3 shows the obtained F-score and mean delay averaged over all change
patterns. As expected, we observe that the F-score increases as the filter value grows
and eventually plateaus when it reaches the sliding window size, by filtering out false
positives. However, a larger filter value causes a much higher delay. On the other
hand, while a smaller filter value leads to a smaller delay, it may induce our method
to consider incidental changes as actual drifts, causing the F-score to drop, though this
still remains above 0.9. As a tradeoff, for the remainder of this evaluation, we used
Φ(w) = w/2. With this parameter being set empirically, our method is completely
automated, and no parameter setting is required from the user.
0.88
0.9
0.92
0.94
0.96
0.98
0.25 0.5 0.75 1
F-sc
ore
Oscillation filter (×ѡ)
0
1000
2000
3000
4000
5000
0.25 0.5 0.75 1
Me
an d
ela
y (e
ven
ts)
Oscillation filter (×ѡ)
Figure 3.3: F-score and mean delay usingdifferent oscillation filter values.
0.9
0.91
0.92
0.93
0.94
0.95
0.96
250 500 1000
F-sc
ore
Inter-drift distance
0
500
1000
1500
2000
2500
3000
3500
250 500 1000
Me
an d
ela
y (e
ven
ts)
Inter-drift distance
Figure 3.4: F-score and mean delay usingdifferent inter-drift distances.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 43
CHAPTER 3. PROCESS DRIFT DETECTION
3.4.4 Inter-drift Distance
In the second experiment, we compared the F-score and mean delay obtained on logs
of different inter-drift distances (250, 500 and 1,000), in order to assess the minimum
distance that our method can handle. The results, averaged over all change patterns,
indicate that the method performs similarly for the logs with 500 and 1,000 traces of
inter-drift distance, achieving an F-score of about 0.95 and mean delay of about 2,500
(cf. Fig. 3.4). There is a slight decrease in the F-score and a notable increase in the
mean delay when using a distance of 250 traces. In this case, the two sliding windows
may contain two drifts as these are very close. In such cases, the method may miss one
of the two drifts, leading to a lower recall. These cases however are not very common,
as evidenced by the value of the F-score, which does not go below 0.92.
3.4.5 Comparison with Baseline per Process Change Pattern
In the third experiment, we evaluated the accuracy of our method in detecting each of
the 21 change patterns. Figure 3.5 shows the F-score and mean delay for each change
pattern, averaged over the three log sizes, in comparison with those obtained with the
run-based method [77] (the baseline).
Our method could find all the change patterns with a high F-score (above 0.9 in all
but four cases), and a delay in the range of 2,500 events (approximately 100 traces),
peaking at 4,000 events. When compared to the baseline method, our method out-
performs the baseline in terms of F-score in the majority of change patterns (cf. Fig.
3.5 (top)), while the baseline fails to detect almost half of the simple change patterns
(cre, sw, pl, cd, l p, cb). Since in highly variable logs each distinct run is observed
only a few times, the result of the statistical test is less reliable. Thus, in such logs,
the run-based method can only find drift types whose occurrences replace the current
set of runs with a considerably new set of runs, e.g. when deleting a process fragment
from between two other fragments (pattern sre). On the other hand, our current method
considers events (as opposed to traces) and extracts fine-grained, yet abstract features
that capture the process behavior into a few basic relations. Each drift type would be
represented in a handful of α+ relations, and any change in its frequency would be
“echoed” through its correspondent basic relations, making it easier for the statistical
test to detect such a change. Moreover, our method could always detect the drift faster
than the baseline (cf. Fig. 3.5 (bottom)) as it does not need to wait until a trace is
completed to consider it as an input for the statistical test.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 44
3.4. EVALUATION ON ARTIFICIAL LOGS
0
0.2
0.4
0.6
0.8
1
sre
pre cre cp rp sw sm cm p
m cf pl
cd lp cb fr
IOR
IRO
OIR
OR
I
RIO
RO
I
F-sc
ore
Change patterns
α+
Runs
0
2000
4000
6000
8000
10000
12000
14000
sre
pre cre cp rp sw sm cm p
m cf pl
cd lp cb fr
IOR
IRO
OIR
OR
I
RIO
RO
I
Me
an d
ela
y (e
ven
ts)
Change patterns
α+
Runs
Figure 3.5: F-score and mean delay per change pattern, obtained with our method vs.[77].
3.4.6 Comparison with Baseline over Different Log Variability Rates
In this last experiment with artificial logs, we evaluated our method in comparison with
the baseline, when changing the variability rate of the log. As said before, the trace
variability of a log is the ratio between distinct traces and the total number of traces.
It varies from close to 0%, where all traces are the same, to 100%, where every trace
is distinct. Similarly, we define the run variability as the ratio between distinct runs
and the total number of runs. Depending on the concurrency oracle used, a high trace
variability does not necessarily imply a high run variability. On the other hand, a high
run variability always implies an equal or higher trace variability. For instance, a log
with 50% trace variability results in a run variability of 10% (i.e. on average each run
is repeated 10 times). This is due to the aggregation of traces into runs based on the
concurrency oracle. The baseline method performs relatively well with a log with 10%
run variability. Thus, we studied how F-score and mean delay vary as we increase the
run variability of a log.
For this purpose, we generated a new set of artificial logs as described in Section
3.4.1 with different run variability rates, achieved by varying the loopback branching
probability in the CPN model. For each run variability rate and change pattern, we
generated logs of 10,000 traces. The results of this evaluation are reported in Fig. 3.6.
As the variability of the log increases, the baseline method’s accuracy drops signif-
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 45
CHAPTER 3. PROCESS DRIFT DETECTION
00.20.40.60.8
1
10% (50%) 25% (80%) 40% (90%)
F-sc
ore
Run variability (Trace variability)
α+
Runs0
2000
4000
6000
8000
10% (50%) 25% (80%) 40% (90%)
Me
an d
ela
y (e
ven
ts)
Run variability (Trace variability)
α+
Runs
Figure 3.6: F-score and mean delay per log variability, obtained with our method vs.[77].
icantly. This is because the statistical test adopted by this method is inadequate when
the number of distinct runs is large, as their frequency will be low. In contrast, captur-
ing the process behavior at a lower level of abstraction, as done by the α+ relations, as
opposed to runs, leads to much higher frequencies in the contingency table of the sta-
tistical test, ensuring its reliability. This property is valid regardless of the variability
of the log which explains the steady performance of our method.
3.5 Evaluation on Real-life Log
In addition to the experiments with artificial logs, we evaluated out method on the
BPI Challenge (BPIC) 2011 log, and compared the results with those obtained by the
baseline.7 This log records patient treatments in the Gynaecology department of a
Dutch academic hospital. It contains 150,291 events in over 1,143 traces, of which
981 are distinct, and 623 labels. We first filtered the noise from this event log, using an
offline noise filter [22], which basically removes infrequent activities. This operation
reduced the number of traces to 1,121, of which 798 are distinct, and the number of
labels to 42, resulting in the same trace and run variability of 71%.
We applied our method on the stream of events obtained by replaying the filtered
log. The average execution time for each new event in the stream was 44ms. As shown
in Fig. 3.7 (left), two drifts were detected at the event indexes of 71,321 and 78,541,
corresponding to the dates 6/9/2007 and 29/11/2007 respectively. The baseline could
not detect any drift as the p-value quickly dropped and remained under the threshold,
as shown in Fig. 3.7 (right).
In order to validate the results, we profiled the number of events per month, shown
in Fig. 3.8 (left). The plot exhibits a sharp and consistent increase in the number of
events between July and Sept. 2007 followed by a sharp and consistent decrease be-
tween Sept. and Dec. 2007. We investigated the log and found that the frequencies of
7http://dx.doi.org/10.4121/uuid:d9769f3d-0ab0-4fb8-803b-0d1120ffcf54
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 46
3.6. SUMMARY
five activities do increase and then decrease notably over the period in question. More-
over, the number of active cases per month (cf. Fig. 3.8 (right)) decreases gradually
after August 2006. Thus, this variation in the number of events cannot be explained
because of new cases. Rather, this phenomenon could be the result of some rework
in the business process. A rework may manifest itself with looping behavior and/or
duplicate activities, which are change patterns our method is able to detect.
In conclusion, while these observations support the hypothesis of the presence of
two drifts in the log, the results should be validated with domain experts.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1
47
36
94
71
14
20
6
18
94
1
23
67
6
28
41
1
33
14
6
37
88
1
42
61
6
47
35
1
52
08
6
56
82
1
61
55
6
66
29
1
71
02
6
75
76
1
80
49
6
P-v
alu
e
Event index
Drift 1
Drift 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1
64
127
190
253
316
379
442
505
568
631
694
757
820
883
946
1009
1072
P-v
alu
e
Completed trace index
Figure 3.7: P-value in our method (left) and in the baseline (right) for the BPIC 2011log.
0
500
1000
1500
2000
2500
3000
3500
4000
Feb-05
Mar-05
Apr-05
May-05
Jun-05
Jul-05
Aug-05
Sep-05
Oct-05
Nov-05
Dec-05
Jan-06
Feb-06
Mar-06
Apr-06
May-06
Jun-06
Jul-06
Aug-06
Sep-06
Oct-06
Nov-06
Dec-06
Jan-07
Feb-07
Mar-07
Apr-07
May-07
Jun-07
Jul-07
Aug-07
Sep-07
Oct-07
Nov-07
Dec-07
Jan-08
Feb-08
Mar-08
Nu
mb
er
of
eve
nts
Time
Drift 2
Drift 1
0
100
200
300
400
500
600
Feb-05
Mar-05
Apr-05
May-05
Jun-05
Jul-05
Aug-05
Sep-05
Oct-05
Nov-05
Dec-05
Jan-06
Feb-06
Mar-06
Apr-06
May-06
Jun-06
Jul-06
Aug-06
Sep-06
Oct-06
Nov-06
Dec-06
Jan-07
Feb-07
Mar-07
Apr-07
May-07
Jun-07
Jul-07
Aug-07
Sep-07
Oct-07
Nov-07
Dec-07
Jan-08
Feb-08
Mar-08
Nu
mb
er
of
acti
ve c
ase
s
Time
Drift 2
Drift 1
Figure 3.8: Number of events (left) and active cases per month (right) in the BPIC2011 log.
3.6 Summary
In this chapter, we presented a fully automated method for online detection of business
process drifts from event streams. The method relies on a statistical test over distribu-
tions of behavioral relations observed in two adjacent windows sliding along the event
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 47
CHAPTER 3. PROCESS DRIFT DETECTION
stream. We proposed an adaptive window technique in order to automatically adjust
the sliding windows size, striking a good tradeoff between accuracy and detection de-
lay. By replaying an event log as an event stream the proposed method can also be
deployed for drift detection in event logs.
We evaluated our method against different degrees of log variability and varying
inter-drift distance, by injecting various change patterns into artificial logs. The results
showed that the method is able to scale up to online settings and detect drifts very
accurately, while outperforming a state-of-the-art baseline for all the change patterns.
A second evaluation on a healthcare log with very high variability showed that our
method could detect two drifts that were supported by observations from the log.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 48
Chapter 4
Process Drift Characterization atActivity Level
In the previous chapter, we presented an online automated method for detecting drift
from event streams of business processes. However, as highlighted in the introduction,
detecting a drift without explaining its characteristics does not provide analysts with
a full picture of the changes occurred in a process. The latter is known as drift char-
acterization and aims to shed light on what has changed in the behavior of a process.
While early detection of drifts alerts organizations of process changes as they occur,
drift characterization enables to identify how the process has changed.
To the best of our knowledge, there has not been any attempt to provide a sys-
tematic solution for characterizing process drifts. To fill this gap, in this chapter we
propose a fully automated online method for characterizing process drifts at the level
of individual activities from event streams. For each detected drift, we perform a statis-
tical test to measure the statistical association between the drift and the distributions of
the behavioral relations between activities such as causality, conflict and concurrency,
extracted from the portions of an event stream before and after the drift. We then rank
the relations based on their relative frequency change, and try to match them with a set
of predefined change templates. The best-matching templates are then reported to the
user as the changes underpinning the drift. We extensively evaluated the accuracy of
our method by simulating event streams from artificial and real-life logs. The results
show that the approach is fast and highly accurate in characterizing common change
patterns, and performs significantly better than a state-of-the-art technique for log delta
analysis.
This chapter is structured as follows. Section 4.1 discusses related work on drift
49
CHAPTER 4. PROCESS DRIFT CHARACTERIZATION AT ACTIVITY LEVEL
characterization at activity level. Section 4.2 introduces the proposed method while
Sections 4.4 and 4.5 present its evaluation on artificial and real-life logs, respectively.
Section 4.6 concludes the chapter.
4.1 Related Work
As already remarked, existing process drift detection methods only report the existence
of a drift, and while some can also localize it with high accuracy in the log, none
can actually characterize the detected drift. As described in Section 3.1, the authors
in [18] propose to detect a drift by evaluating the fitness of causal constraints extracted
from the post-drift process behavior in a polyhedron built from the pre-drift event
log. To characterize a drift, they then suggest to use the same causal constraints to
discover a process model from the post-drift process behavior. However, they do not
provide an actual method, nor do they evaluate the practicality of such an approach.
Furthermore, discovering a process model from the post-drift process behavior without
pinpointing its changes over the drift may not be enough to facilitate the understanding
of changes underlying the drift. Nonetheless, it can be used as the basis for developing
drift characterization solutions.
A possible approach to characterize process drifts is to compare two sub-logs ex-
tracted from event streams before and after a drift and identify their differences. In
this context, Bolt et al. [10] propose a technique for comparing the behavior of dif-
ferent variants of the same process based on observed executions of such variants in
event logs. Given two event logs each corresponding to a process variant, they fol-
low a three-step approach. In the first step, they build a transition system from the
event logs and annotate each of its states and transitions with the measurements of the
variants with respect to a certain process metric. In the second step, they perform a
statistical test between every two sets of measurements of each metric on each state or
transition to identify the differences that are statistically significant. Finally, the identi-
fied differences are highlighted by changing the appearance of the states or transitions.
For example, if the difference between the frequency of executing an activity in two
compared variants is statistically significant, the arc corresponding to that activity in
the transition system is thickened. By using the sub-logs extracted from before and
after a drift as input to this technique, we can identify some of the significant differ-
ences between the pre-drift and post-drift process variants. However, this technique
has several limitations. With respect to the control-flow differences, it is only able to
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 50
4.2. DRIFT CHARACTERIZATION METHOD
identify that a certain activity (transition) occurs after a sequence of activities (state)
in one process but not in the other, while missing the structural differences of the pro-
cesses, e.g. the occurrence of two activities in an XOR construct in one process but
not in the other. Furthermore, this technique is not meant to work with event streams
where each sub-log before or after a drift contains partial traces, i.e. traces whose start
events are removed from the stream and/or whose end events are yet to arrive on the
stream. Assuming that we know the start and end activities of the process, one pos-
sible workaround is to build a transition system by only using complete traces within
the pre-drift and post-drift sub-logs. However, this may lead to an incomplete or even
inaccurate transition system as fractions of process behavior that are only captured by
the discarded partial traces are missed by the transition system. This problem is wors-
ened in the event streams of highly variable processes, as almost every trace of such
processes exhibits a unique execution of the process. A sub-log extracted from such
an event stream is likely to only contain partial traces. Van Beest et al. [115] propose
a technique for diagnosing behavioral differences between two event logs. The idea is
to use two prime event structures, i.e. a formalism composed of events and behavioral
relations, such as causality and conflict, for modeling concurrent processes, to loss-
lessly encode the event logs, and by comparing them report their differences as natural
language statements. A problem of this technique when used for drift characterization
is that it reports all differences between the pre-drift and post-drift sub-logs regardless
of the significance of their association with the occurrence of the drift. Furthermore,
similar to the technique proposed by Bolt et al. [10], this technique also does not work
with partial traces. Consequently, it may miss the fractions of process behavior that
are only captured by those traces.
As the technique proposed by Van Beest et al. [115] is able to report a complete
set of control-flow differences between two event logs we use it as a baseline for the
experiments in Sections 4.4 and 4.5.
4.2 Drift Characterization Method
The purpose of process drift characterization is to identify the differences in the pro-
cess behavior before and after the drift point that best explain the drift. In the previous
chapter, the α+ binary relations (cf. Definition 5) were shown to be suitable for captur-
ing process behavior, in particular in the context of highly variable business processes.
These behavioral relations and their frequencies are extracted from the time window
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 51
CHAPTER 4. PROCESS DRIFT CHARACTERIZATION AT ACTIVITY LEVEL
containing the most recent events of the stream. As a preprocessing operation, each
time this window slides, a snapshot of the process behavior is captured and stored as
a data point. Each binary relation actually represents a dimension of the stored data
point, while the frequency of this relation is the scalar in this dimension. Sliding the
window along the event stream provides us with a set of data points representing snap-
shots of the pre-drift and post-drift process behaviors. These data points are used as
input to our two-stage characterization method.
In Stage 1 we measure the statistical association of each of the α+ relations with the
drift using an information gain metric. Those relations that are significantly associated
with the drift are then ordered based on their explanatory power with respect to the
drift. In Stage 2, the resulting ordered list of relations is fed to a template matching
algorithm, where we find the best-matching templates that characterize the drift. The
identified templates are then reported to the user in natural language. An overview of
our method is shown in Fig. 4.1. The rest of this section describes the method in detail.
Driftdetection
Datapointsextraction
Relationsretrievalandordering
Changetemplates
identification
Preprocessing Stage1 Stage2
Figure 4.1: Overview of our method for process drift characterization.
4.2.1 Preprocessing: Data Points Extraction
For drift detection, we use our drift detection method (cf. Chapter 3), which works
in online settings with event streams of highly-variable business processes. However,
the drift characterization method proposed here can in principle be used on top of any
process drift detection method.
Our detection technique captures process behavior by extracting α+ binary rela-
tions in two juxtaposed windows of the same size, namely reference and detection
windows, sliding along the event stream. The most recent events are equally divided
into these two windows, where the reference window contains the less recent events,
and the detection window contains the more recent ones. The size of these windows
is adjusted using a formula based on the maximum number of distinct activity labels
within the two windows. This adaptive window sizing ensures that there are enough
events in each window for accurately capturing the process behavior.
We use the detection window as a snapshot of the most recent process behavior.
Each time this window slides with the stream on arrival of a new event, we extract
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 52
4.2. DRIFT CHARACTERIZATION METHOD
α+ relations and their frequencies and store them as a multidimensional data point in
a buffer, namely characterization buffer. Each α+ relation represents a dimension of
this data point. By sliding the detection window the new data points are added to the
head of the buffer. As a drift is detected, the P–value of the statistical test, i.e. G–test
(cf. Section 3.2), drops below the detection threshold (drift point). At this point we
stop inserting any new data point into the characterization buffer. We then remove the
last w (window size at drift point) data points from the head of the characterization
buffer, as these data points may include the post-drift process behavior. This results
in a set of recent data points that only encode the process behavior from the pre-drift
area. We retain these data points for characterizing the detected drift.
The P–value remains below threshold until the process behaviors within the two
reference and detection windows become statistically similar. In other words until the
process behavior, reflected in the event stream, starts to stabilize. Therefore, we call
the point where P–value returns to above the detection threshold a stabilization point.
This is where we start inserting new data points into the characterization buffer, as the
detection window only includes the behavior from the post-drift process. We continue
extracting data points from the event stream with the next n incoming events. We de-
fine n as the characterization delay, as it indicates the delay that is needed after the
stabilization point to characterize the drift. Similarly, we consider only the n most
recent pre-drift data points for drift characterization. In Section 4.4.2, we perform an
experiment to determine the suitable characterization delay that leads to a hight accu-
racy of retrieving and ordering the relevant binary relations. The behavioral relations
extraction, explained above, is illustrated in Fig. 4.2.
Stabilization point
P-v
alu
e
Detection threshold
Event stream Characterization delay (n)
Characterization delay (n)
Drift point
Pre-drift area Post-drift area
Characterization point
w (removed data points)
Figure 4.2: From drift detection to drift characterization.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 53
CHAPTER 4. PROCESS DRIFT CHARACTERIZATION AT ACTIVITY LEVEL
4.2.2 Stage 1: Relevant Binary Relations Retrieval and Ordering
The purpose of the first stage of our approach is to identify and order the α+ binary
relations that are statistically associated with the detected drift. In other words, we
would like to measure the explanatory power of each relation with respect to the de-
tected drift. We approach this issue as a classification problem with the α+ binary
relations, extracted from the event stream, as the explanatory variables, and the bi-
nary target variable defined with the labels pre-drift and post-drift. One might first opt
for a logistic regression model because of its additive and interpretability properties.
However, the logistic regression requires the least correlation between the indepen-
dent variables (multicollinearity problem [84]). Such a requirement cannot be guaran-
teed, particularity in our case where the binary relations come from the same process
(model). We opted for a less restrictive classification approach, namely decision tree,
where we use K-sample permutation test (KSPT) in order to measure the statistical as-
sociation between each individual explanatory variable (here a binary relation) and the
target variable (the drift classification variable). Similarly to the information gain, the
permutation test allows us to measure the mutual information between two variables.
We opted for the permutation test since it is more suitable for small sample sizes [36].
We perform a pairwise permutation test to measure the significance of the statistical
association of each binary relation with the target variable (drift). This latter is en-
coded with the value 0 (resp. 1) for the pre-drift (resp. post-drift) behavior. If the null
hypothesis is rejected, we discard the relation as it is not significantly associated with
the drift.
As suggested in [36], the KSPT can be applied to identify the relevant features, then
an appropriate distance measure is used to order the selected features. Indeed, despite
identifying the relations that are found to be statistically associated with our binary
drift target variable, some relations may contribute more than others to the change that
occurred. We use a measure that is similar to the chi-squared statistic to measure the
contribution of each relation to the overall change. This metric measures the relative
frequency change (RFC) of each relation, and is defined as RFC = (O−E)2/max(O,E),
where O and E are the average frequencies of a relation before and after the drift point,
respectively. In addition, total relative frequency change (TRFC) is defined as the sum
of the RFCs of all relations. With relations ordered based on their RFCs in descending
order, we can filter out the relations with insignificant RFCs by retaining only the top
relations, summing up to x% of the TRFC, where x% · TRFC is defined as cumula-
tive relative frequency change (CRFC). In section 4.4.3, we perform an experiment to
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 54
4.2. DRIFT CHARACTERIZATION METHOD
investigate the impact of varying CRFC on the characterization accuracy.
4.2.3 Stage 2: Change Templates Identification
The output of the Stage 1 is a list of relations ordered based on their explanatory power
(RFC) with respect to the drift, where the first ordered relation and the last ordered
relation have the highest and the lowest explanatory power, respectively. In the stage
2, we aim to match the relations with the typical change patterns that may characterize
the drift the best. For that we define a set of templates based on the change patterns
defined in [129] (listed in Table 2.1) at the level of individual activities. Table 4.1 shows
the defined templates. Each template is represented based on α+ binary relations. We
try to match the process relations, obtained from Stage 1, with the binary relations
of the predefined templates. Using a matching confidence metric we find the best
matching between templates and the process relations. In the rest, we explain our
template matching algorithm in detail.
Code Simple change template Cat.sre Insert/delete an activity between two activities Ipre Insert/delete an activity in/from parallel branch Icre Insert/delete an activity in/from conditional branch Icp Duplicate an activity Irp Substitute an activity Isw Swap two activities Ism Move an activity to between two activities Icm Move an activity into/out of conditional branch Ipm Move an activity into/out of parallel branch Icf Make activity mutually exclusive/sequential Rpl Make activities parallel/sequential Rcd Synchronize two activities Rlp Make activities loopable/non-loopable Ocb Make an activity skippable/non-skippable Ofr Change branching frequency O
Table 4.1: Change templates defined based on change patterns in [129].
Example 1. As a running example, let us assume the output of the stage 1 of our
method is the ordered relation list of 〈 e→ f : −, e ‖ f : +, e→ g: +, d→ f : +, a→b: −, f → g:↘, d → e : ↘, b→ c: −, a→ c: +〉, where + (resp. −) indicates that
the relation appeared (resp. disappeared) after the drift, and ↗ (resp. ↘) indicates
that the frequency of the relation increased (resp. decreased) after the drift.
In the remainder of this chapter, unless otherwise indicated, we use both “feature”
and “relation” to refer to an α+ binary relation between two activity labels. A feature
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 55
CHAPTER 4. PROCESS DRIFT CHARACTERIZATION AT ACTIVITY LEVEL
set is used to represent the α+ relations before or after the drift, and is defined as
follows.
Definition 6 (Feature Set). Let L be a set of activity labels, and T := {→,‖,#, �,4}a set of binary α+ relations symbols, denoting causality, concurrency, conflict, length
one and two loops, respectively. A feature set F :L ×L � T is a partial function
that yields the type of α+ relation between two labels.
Two feature sets, will be used to represent the sets of the discovered features before
and after a given drift point, along with a classification of a feature frequency change
before and after the drift point. The classification only considers the relations that ex-
isted both before and after the occurrence of the drift, in our example { f → g, d → e}.A relation is classified as increasing (↗), decreasing (↘) or not applicable (⊥), de-
pending on whether its frequency increased, decreased, or remained unchanged. A
relation that disappeared (resp. appeared) after the drift does not need to be classified
as it only belongs to the pre-drift (resp. post-drift) feature set. All the features existing
before and after the drift are ordered in terms of their explanatory power. The two
feature sets from before and after the drift, the classification and the ordering functions
form a drift feature set which constitutes the output of the first stage of our method.
Formally, a drift feature set is defined as follows:
Definition 7 (Drift Feature Set). Let O := {↗,↘,⊥} be a set of feature frequency
change types. A drift feature set is a tuple D := 〈Fpre,Fpost ,DiffD,v,L 〉, where Fpre
(resp. Fpost) is the feature set before (resp. after) a drift, DiffD is a classification
function defined as DiffD:Fpre∩Fpost →O , and v is a total order on Fpre∪Fpost .
The following function returns the index of a feature in a given drift feature set.
Definition 8 (Rank). Let � be a total order on a finite set B. For all b ∈B,
Rank(b,�,B) = |{b′ ∈ B | b′ � b}|.
Example 2. With Definition 7, Example 1 is represented as a drift feature set D1 =
〈FD1
pre,FD1
post ,DiffD,↘,L 〉, where L = {a,b,c,d,e, f ,g}, FD1pre = {e → f , a→ b, f → g, d→ e,
b→ c}, FD1post = {e ‖ f , e→ g, d→ f , f → g, d→ e, a→ c},v = 〈 e→ f , e ‖ f , e→ g, d→ f , a→ b,
f → g, d→ e, b→ c, a→ c〉, and DiffD = { ( f → g,↘), (d→ e,↘)}.
Our drift characterization method aims at explaining a detected drift using prede-
fined change templates. In this regard, we define a set of change templates representing
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 56
4.2. DRIFT CHARACTERIZATION METHOD
the typical change patterns [129]. These templates are presented in Table 4.1. A change
template is represented by a process model fragment before the change compared to
another process model fragment after the change.
Consequently, a template is a generic way to describe a typical change pattern.
It enumerates the expected sets of relations before and after the change based on a
change pattern representation. The relations that are present in both process model
fragments, before and after the change, need to be classified based on their expected
frequency evolution in the change pattern. Besides, the importance of every relation in
the change pattern is appended to the template. A template handles variables that can
be instantiated with actual activity labels in a matching operation.
Definition 9 (Template). Let V be a set of variables, T a set of α+ binary rela-
tions symbols, and O a set of relation frequency change types. A template is a tuple
T := 〈 Tpre, Tpost , DiffT , S , V 〉 where Tpre : V ×V � T represents the relations
before the change, Tpost : V × V � T represents the relations after the change,
DiffT is a classification function defined as DiffT : Tpre ∩ Tpost → O , and S is
a function specifying the importance of each relation to the template T defined as
S : Tpre∪Tpost → (0,1].
Example 3. Let us assume the two change templates, parallelize activities (T pl) and
remove activity (T sre), for our example, illustrated in the Fig. 4.3 and Fig. 4.4, re-
spectively. With the Definition 9 T pl = 〈{X → Y , W → X, Y → Z}, {X ‖ Y , W → Y , X → Z,
W → X, Y → Z}, {(W → X ,↘), (Y → Z,↘)}, {(X → Y,1), (W → X , 1), (Y → Z,1), (X ‖ Y,1),
(W → Y,1), (X → Z,1)}, {W,X ,Y,Z}〉, and T sre = 〈{X → Y , Y → Z}, {X → Z}, ∅, {(X → Y,1),
(Y → Z,1), (X → Z,1)}, {X ,Y,Z}〉.
X
X YYW Z W Z
X Y Z X Z
Figure 4.3: Parallelize activities template(T pl)
X
X YYW Z W Z
X Y Z X Z
Figure 4.4: Remove activity template(T sre)
In order to explain a drift, the discovered features represented with a drift feature
set are matched to a predefined template. Every variable in the template needs to be
mapped to a label from the drift feature set. This operation is called a valid instantia-
tion, and is defined as follows:
Definition 10 (Valid Instantiation). Given a drift feature set D := 〈Fpre,Fpost ,DiffD,v,L 〉, and a template T := 〈Tpre,Tpost ,DiffT ,S ,V 〉, a valid instantiation of T through
D is a function ID,T : V →L such that
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 57
CHAPTER 4. PROCESS DRIFT CHARACTERIZATION AT ACTIVITY LEVEL
• Tpre(v1,v2) = t1 iff Fpre(ID,T (v1),ID,T (v2)) = t1,
• Tpost(v3,v4) = t2 iff Fpost(ID,T (v3),ID,T (v4)) = t2, and
• Diff T (v5,v6) = ϑ iff Diff D(ID,T (v5),ID,T (v6)) = ϑ
Example 4. In our example, we can have two valid instantiations, one per template.
The first instantiation ID1,T pl = { W : d, X : e, Y : f , Z : g} , whereas the second
instantiation ID1,T sre = { X : a, Y : b, Z : c}.
A confidence is calculated for each matching (valid instantiation) in order to assess
the likelihood of such a matching. The confidence of an instantiation is based on
the Discounted Cumulative Gain (DCG) measure [55], which indicates the quality of
ranking relations in a drift feature set with regards to their predefined importance in a
template. In our method, we consider the same importance of 1 for all the relations of
a template. The confidence of an instantiation is defined as follows.
Definition 11 (Confidence in an Instantiation). Given a drift feature set D := 〈Fpre,Fpost
,DiffD,v,L 〉, a template T := 〈Tpre,Tpost ,DiffT ,S ,V 〉, and a valid instantiation ID,T :
V →L , the confidence C (ID,T ) of D matching T through ID,T is:
C (ID,T ) = ∑(x,y,t)∈Tpre∪Tpost
S (x,y, t)log2(Rank((ID,T (x),ID,T (y), t), v, Fpre∪Fpost)+1)
Example 5. In our example, the confidence of ID1,T pl is calculated as follows:
C(ID1,T pl ) = 1log2(1+1) +
1log2(2+1) +
1log2(3+1) +
1log2(4+1) +
1log2(6+1) +
1log2(7+1) ≈ 2.25.
The confidence of ID1,T sre is calculated in the same way and approximates to 0.62.
As we want to find the best-matching template among all matching templates we
need to rank them based on their confidences. However, as the number of relations
in different templates may not be the same, we need to normalize the confidence of
an instantiation with respect to the maximal confidence of its template. Similarly to
the normalized DCG (nDCG) [55], we first define the notion of ideal confidence of a
template T as the DCG obtained after ordering relations of T based on their importance
defined by S . The normalized confidence (nC) of an instantiation is calculated by
dividing the confidence of the instantiation by the ideal confidence of its template.
Definition 11 (continued). The Ideal confidence iC (T ) of T is computed as
iC (T ) = ∑(x,y,t)∈Tpre∪Tpost
S (x,y, t)log2(Rank((x,y, t), ≥, range(S ))+1)
, and the normalized
confidence nC (ID,T ) of D matching T through ID,T is computed as nC (ID,T ) =C (ID,T )iC (ID,T )
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 58
4.2. DRIFT CHARACTERIZATION METHOD
Example 6. In our example, iC (T pl)≈ 2.30 and nC (ID1,T pl)≈ 0.98, whereas iC (T sre)≈1.13 and nC (ID1,T sre) ≈ 0.54. As nC (ID1,T pl) ≥ nC (ID1,T sre), T pl is identified as the
best-matching template with the drift feature set.
Simultaneous changes. Identifying one template is not enough as a process drift may
involve more than one change. In order to characterize all the simultaneous changes,
each time that a best-matching template with the drift feature set is identified, we re-
move the features that were used for this template instantiation from the drift feature
set. The new resulting drift feature set is then reused for the identification of a new
best-matching template. We repeat this cycle until we cannot find any more templates
that match the remaining features within the drift feature set. It is worth mentioning
that if there are two overlapping changes in the process, i.e. changes that share a non-
empty set of features, only the one with higher nC can be matched with a template.
This is because each time we find a best-matching template we remove the matched
features from the drift feature set. This limits the ability of the proposed method to the
identification of non-overlapping simultaneous changes.
Example 7. In our example, as there is no feature shared between ID1,T pl and ID1,T sre ,
both change templates can be identified. The identified templates, T pl and T sre, are
then reported to the user using the two following statements, respectively:
• Before the drift, activities “e” and “f” were sequential, while after the drift, they
are parallel.
• After the drift, activity “b” is deleted from between activities “a” and “c”.
Finally, we also report the remaining features that are not used in any template
instantiation to the user via statements such as “Before the drift, activity X was fol-
lowed by activity Y, while after the drift it is not” or “Before the drift, activity X was
more frequently followed by activity Y”. This provides the user with useful insight for
further investigation of process changes.
Table 4.2 shows the format of drift characterization statements produced by our
method for each change template.
Time complexity. Given the number of data points 2n, where n is the characteriza-
tion delay, and the maximum possible number of α+ relations |L |2, where L is the
label set, the complexity of our drift characterization method is the maximum of the
worst-case complexities of the following sequential operations: (i) performing KSPT
between the α+ relations and a binary target variable (O(2n · |L |2)), (ii) computing
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 59
CHAPTER 4. PROCESS DRIFT CHARACTERIZATION AT ACTIVITY LEVEL
sre Insert/delete an activity between two activitiesTemplate
X
X
X
X YY
X X
X X
X
Y
X
Y
X
X
X X
X
X YY
X70%
Y30%
X40%
Y60%
X Z X Y Z
W X Z W X
Y
Z
W X
Y
ZW X Z
W X Z W Y Z
U X V W Y Z U Y V W X Z
W Y Z U V W Z U Y V
W Y Z W ZU X V U X
Y
V
W Y Z W ZU X V U X
Y
V
W
Y
X
Z W X ZY
W
Y
X
Z W X ZY
YY
YYX X
Y1 Yn Y1 Yn
W Y ZW Y Z
40%
60%
W
X
Y
70%
30%
W
X
Y
Characterizationstatement
After the drift, activity Y is inserted (resp., deleted from) between activities Xand Z.
pre Insert/delete an activity in/from parallel branchTemplate
X
X
X
X YY
X X
X X
X
Y
X
Y
X
X
X X
X
X YY
X70%
Y30%
X40%
Y60%
X Z X Y Z
W X Z W X
Y
Z
W X
Y
ZW X Z
W X Z W Y Z
U X V W Y Z U Y V W X Z
W Y Z U V W Z U Y V
W Y Z W ZU X V U X
Y
V
W Y Z W ZU X V U X
Y
V
W
Y
X
Z W X ZY
W
Y
X
Z W X ZY
YY
YYX X
Y1 Yn Y1 Yn
W Y ZW Y Z
40%
60%
W
X
Y
70%
30%
W
X
Y
Characterizationstatement
Insert: After the drift, activity Y is inserted between activities W and Z in aparallel branch (with activity X). Delete: After the drift, activity Y which wasin a parallel branch (with activity X) between activities W and Z is deleted.
cre Insert/delete an activity in/from conditional branchTemplate
X
X
X
X YY
X X
X X
X
Y
X
Y
X
X
X X
X
X YY
X70%
Y30%
X40%
Y60%
X Z X Y Z
W X Z W X
Y
Z
W X
Y
ZW X Z
W X Z W Y Z
U X V W Y Z U Y V W X Z
W Y Z U V W Z U Y V
W Y Z W ZU X V U X
Y
V
W Y Z W ZU X V U X
Y
V
W
Y
X
Z W X ZY
W
Y
X
Z W X ZY
YY
YYX X
Y1 Yn Y1 Yn
W Y ZW Y Z
40%
60%
W
X
Y
70%
30%
W
X
Y
Characterizationstatement
Insert: After the drift, activity Y is inserted between activities W and Z in aconditional branch (with activity X). Delete: After the drift, activity Y whichwas in a conditional branch (with activity X) between activities W and Z isdeleted.
cp Duplicate an activityTemplate Duplication is the insertion of an existing activity and is discovered in a post-
processing step. As such, it has a similar template as sre/pre/cre.Characterizationstatement
After the drift, activity Y , i.e. a duplicate of activity X , is inserted ... (continueswith sre, pre, or cre).
rp Substitute an activityTemplate
X
X
X
X YY
X X
X X
X
Y
X
Y
X
X
X X
X
X YY
X70%
Y30%
X40%
Y60%
X Z X Y Z
W X Z W X
Y
Z
W X
Y
ZW X Z
W X Z W Y Z
U X V W Y Z U Y V W X Z
W Y Z U V W Z U Y V
W Y Z W ZU X V U X
Y
V
W Y Z W ZU X V U X
Y
V
W
Y
X
Z W X ZY
W
Y
X
Z W X ZY
YY
YYX X
Y1 Yn Y1 Yn
W Y ZW Y Z
40%
60%
W
X
Y
70%
30%
W
X
Y
Characterizationstatement
After the drift, activity X , which was between activities W and Z, is substitutedby activity Y .
sw Swap two activitiesTemplate
X
X
X
X YY
X X
X X
X
Y
X
Y
X
X
X X
X
X YY
X70%
Y30%
X40%
Y60%
X Z X Y Z
W X Z W X
Y
Z
W X
Y
ZW X Z
W X Z W Y Z
U X V W Y Z U Y V W X Z
W Y Z U V W Z U Y V
W Y Z W ZU X V U X
Y
V
W Y Z W ZU X V U X
Y
V
W
Y
X
Z W X ZY
W
Y
X
Z W X ZY
YY
YYX X
Y1 Yn Y1 Yn
W Y ZW Y Z
40%
60%
W
X
Y
70%
30%
W
X
Y
Characterizationstatement
After the drift, activity X , which was between activities U and V , is swappedwith activity Y , which was between activities W and Z.
sm Move an activity to between two activitiesTemplate
X
X
X
X YY
X X
X X
X
Y
X
Y
X
X
X X
X
X YY
X70%
Y30%
X40%
Y60%
X Z X Y Z
W X Z W X
Y
Z
W X
Y
ZW X Z
W X Z W Y Z
U X V W Y Z U Y V W X Z
W Y Z U V W Z U Y V
W Y Z W ZU X V U X
Y
V
W Y Z W ZU X V U X
Y
V
W
Y
X
Z W X ZY
W
Y
X
Z W X ZY
YY
YYX X
Y1 Yn Y1 Yn
W Y ZW Y Z
40%
60%
W
X
Y
70%
30%
W
X
Y
Characterizationstatement
After the drift, activity Y , which was between activities W and Z, has movedto between activities U and V .
cm Move an activity into/out of conditional branchTemplate
X
X
X
X YY
X X
X X
X
Y
X
Y
X
X
X X
X
X YY
X70%
Y30%
X40%
Y60%
X Z X Y Z
W X Z W X
Y
Z
W X
Y
ZW X Z
W X Z W Y Z
U X V W Y Z U Y V W X Z
W Y Z U V W Z U Y V
W Y Z W ZU X V U X
Y
V
W Y Z W ZU X V U X
Y
V
W
Y
X
Z W X ZY
W
Y
X
Z W X ZY
YY
YYX X
Y1 Yn Y1 Yn
W Y ZW Y Z
40%
60%
W
X
Y
70%
30%
W
X
Y
Characterizationstatement
After the drift, activity Y , which was between activities W and Z, has movedto between activities U and V and in a conditional branch (with activity X).
pm Move an activity into/out of parallel branchTemplate
X
X
X
X YY
X X
X X
X
Y
X
Y
X
X
X X
X
X YY
X70%
Y30%
X40%
Y60%
X Z X Y Z
W X Z W X
Y
Z
W X
Y
ZW X Z
W X Z W Y Z
U X V W Y Z U Y V W X Z
W Y Z U V W Z U Y V
W Y Z W ZU X V U X
Y
V
W Y Z W ZU X V U X
Y
V
W
Y
X
Z W X ZY
W
Y
X
Z W X ZY
YY
YYX X
Y1 Yn Y1 Yn
W Y ZW Y Z
40%
60%
W
X
Y
70%
30%
W
X
Y
Characterizationstatement
After the drift, activity Y , which was between activities W and Z, has movedto between activities U and V and in a parallel branch with activity X .
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 60
4.2. DRIFT CHARACTERIZATION METHOD
cf Make activity mutually exclusive/sequentialTemplate
X
X
X
X YY
X X
X X
X
Y
X
Y
X
X
X X
X
X YY
X70%
Y30%
X40%
Y60%
X Z X Y Z
W X Z W X
Y
Z
W X
Y
ZW X Z
W X Z W Y Z
U X V W Y Z U Y V W X Z
W Y Z U V W Z U Y V
W Y Z W ZU X V U X
Y
V
W Y Z W ZU X V U X
Y
V
W
Yn
Y1
Z W Y1 ZYn
YY
YYX X
Y1 Yn Y1 Yn
W Y ZW Y Z
40%
60%
W
X
Y
70%
30%
W
X
Y
W
Yn
Y1
Z W Y1 ZYn
Characterizationstatement
Before the drift, activities Y1, . . .Yn were mutually exclusive (resp., sequential),while after the drift, they are sequential (resp., mutually exclusive).
pl Make activities parallel/sequentialTemplate
X
X
X
X YY
X X
X X
X
Y
X
Y
X
X
X X
X
X YY
X70%
Y30%
X40%
Y60%
X Z X Y Z
W X Z W X
Y
Z
W X
Y
ZW X Z
W X Z W Y Z
U X V W Y Z U Y V W X Z
W Y Z U V W Z U Y V
W Y Z W ZU X V U X
Y
V
W Y Z W ZU X V U X
Y
V
W
Yn
Y1
Z W Y1 ZYn
YY
YYX X
Y1 Yn Y1 Yn
W Y ZW Y Z
40%
60%
W
X
Y
70%
30%
W
X
Y
W
Yn
Y1
Z W Y1 ZYn
Characterizationstatement
Before the drift, activities Y1, . . .Yn were parallel (resp., sequential), while afterthe drift, they are sequential (resp., parallel).
cd Synchronize two activitiesTemplate
X
X
X
X YY
X X
X X
X
Y
X
Y
X
X
X X
X
X YY
X70%
Y30%
X40%
Y60%
X Y Y X
X Z X Y Z
W X Z W X
Y
Z
W X
Y
ZW X Z
W X Z W Y Z
U X V W Y Z U Y V W X Z
W Y Z U V W Z U Y V
W Y Z W ZU X V U X
Y
V
W Y Z W ZU X V U X
Y
V
W
Yn
Y1
Z W Y1 ZYn
YY
YYX X
Y1 Yn Y1 Yn
W Y ZW Y Z
40%
60%
W
X
Y
70%
30%
W
X
Y
W
Yn
Y1
Z W Y1 ZYn
X
Y
X
Y
ZY
W
X
ZY
W
X
Characterizationstatement
Before the drift, activities x and y were parallel (resp., synchronized), whileafter the drift they are synchronized (resp., parallel).
lp Make activities loopable/non-loopableTemplate
X
X
X
X YY
X X
X X
X
Y
X
Y
X
X
X X
X
X YY
X70%
Y30%
X40%
Y60%
X Z X Y Z
W X Z W X
Y
Z
W X
Y
ZW X Z
W X Z W Y Z
U X V W Y Z U Y V W X Z
W Y Z U V W Z U Y V
W Y Z W ZU X V U X
Y
V
W Y Z W ZU X V U X
Y
V
W
Y
X
Z W X ZY
W
Y
X
Z W X ZY
YY
YYX X
Y1 Yn Y1 Yn
W Y ZW Y Z
40%
60%
W
X
Y
70%
30%
W
X
Y
Characterizationstatement
After the drift, activities Y1, . . .Yn have become loopable/non-loopable.
cb Make an activity skippable/non-skippableTemplate
X
X
X
X YY
X X
X X
X
Y
X
Y
X
X
X X
X
X YY
X70%
Y30%
X40%
Y60%
X Z X Y Z
W X Z W X
Y
Z
W X
Y
ZW X Z
W X Z W Y Z
U X V W Y Z U Y V W X Z
W Y Z U V W Z U Y V
W Y Z W ZU X V U X
Y
V
W Y Z W ZU X V U X
Y
V
W
Y
X
Z W X ZY
W
Y
X
Z W X ZY
YY
YYX X
Y1 Yn Y1 Yn
W Y ZW Y Z
40%
60%
W
X
Y
70%
30%
W
X
Y
Characterizationstatement
After the drift, activity Y has become skippable/non-skippable.
fr Change branching frequencyTemplate
X
X
X
X YY
X X
X X
X
Y
X
Y
X
X
X X
X
X YY
X70%
Y30%
X40%
Y60%
X Z X Y Z
W X Z W X
Y
Z
W X
Y
ZW X Z
W X Z W Y Z
U X V W Y Z U Y V W X Z
W Y Z U V W Z U Y V
W Y Z W ZU X V U X
Y
V
W Y Z W ZU X V U X
Y
V
W
Y
X
Z W X ZY
W
Y
X
Z W X ZY
YY
YYX X
Y1 Yn Y1 Yn
W Y ZW Y Z
40%
60%
W
X
Y
70%
30%
W
X
Y
Characterizationstatement
After the drift, following activity W , branch of activity X is more frequentlyexecuted, while branch of activity Y is less frequently executed.
Table 4.2: Change templates and their drift characterization statement formats.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 61
CHAPTER 4. PROCESS DRIFT CHARACTERIZATION AT ACTIVITY LEVEL
the average frequencies and RFCs of the relations (O(2n . |L |2)), (iii) ordering the
relations (O(|L |2· log(|L |2))), and (iv) template identification O(|L |2· m · |L |2!)1.
Hence, the time complexity of our method is O(|L |2· m · |L |2!). This time complex-
ity is a theoretical upper-bound, however in practice the number of relations rarely
approaches |L |2, and not all permutations are verified for the template identification
operations (relations are first filtered based on their types, e.g. causality).
4.3 Tool Support
We implemented the proposed method as an extension of ProDrift, which is available
as a plug-in of Apromore2 as well as a standalone open-source tool3. To enable drift
characterization for a detected drift, the user needs to tick the “Drift characterization”
checkbox in the configuration panel of the plug-in, as shown in Figure 4.5a. It is
then required to choose between the two drift characterization configuration options:
“activity level” and “fragment level”. To characterize drifts using the method presented
in this chapter, the “activity level” option needs to be selected.
By default, the value of the CRFC threshold (cf. Section 4.2.2) is set to 95% as
in the experiments presented later in this chapter, this value empirically resulted in
the highest characterization accuracy. Alternatively, it is possible for the user to set a
different CRFC threshold in the “Cumulative change” field.
After a drift is detected, it is characterized by the drift characterization method.
Once the parsing of the log is complete, by clicking on each detected drift on the
list bellow the P–value plot, the user can inspect its natural language characterization
statements, as shown in Figure 4.5b.
4.4 Evaluation on Artificial Logs
We used ProDrift to evaluate the effectiveness of our method with different parame-
ters settings. The tool is fed with an event stream replayed from an event log, and
1Matching a template of k relations to a drift feature set of |L |2 relations requires iterating over allpossible permutations (nPk = |L |
2!/(|L |2−k)!). The upper-bound complexity of this operation is O(|L |2!).Next, to identify the best-matching template, we iterate over the number of predefined templates m. Fi-nally, we need to match simultaneous changes which in the worse case are |L |2 (where each templatehas only one relation). The upper-bound time complexity of identifying multiple non-overlapping tem-plates is O(|L |2· m · |L |2!).
2Available at http://apromore.org/3Available at http://apromore.org/platform/tools
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 62
4.4. EVALUATION ON ARTIFICIAL LOGS
(a) Enable drift characterization and choose the “activity level” configuration for using the driftcharacterization method presented in this chapter.
(b) Inspect natural language characterization statements for each detected drift.
Figure 4.5: Drift characterization at activity level using the ProDrift plug-in in Apro-more.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 63
CHAPTER 4. PROCESS DRIFT CHARACTERIZATION AT ACTIVITY LEVEL
reports, for each detected drift, its characterization as a verbalization in natural lan-
guage, based on the applicable templates. In the rest of this section we discuss the
setup of the experiments and a two-pronged evaluation to assess the effectiveness of
the relevant relations retrieval and ranking with respect to each individual template,
and the accuracy of template identification. Finally, we compare our method with
log-to-log comparison.
4.4.1 Setup
We generated a artificial dataset using the same approach and CPN base model as in
the previous chapter (cf. Section 3.4) that represents a highly variable process. For
each simple change template in Tab. 4.1, we generated a log featuring 9 drifts, each in-
jected by alternatively activating and deactivating the template within the base model.
For instance, for the template “sre” we alternatively inserted or deleted an activity into
or from the process model. For the particular change template “lp”, three logs were
generated with length-one, length-two and length-three loops, and the reported results
for this template were averaged over these three logs. This resulted in 17 logs, each
containing 10,000 traces with nine equidistant drifts of the same change template. To
evaluate the characterization of drifts in the context of simultaneous changes, we or-
ganized our change templates in three categories: Insertion (“I”), Resequentialization
(“R”) and Optionalization (“O”) (cf. Table 4.1). Limited to two and three simultaneous
cross-category changes, these categories make four possible scenarios of simultaneous
changes (“IR”, “IO”, “RO”, “RIO”). For each such scenarios two logs were generated
by randomly selecting single templates from different categories. For instance, a drift
from the simultaneous changes scenario of “IR” could simultaneously add a new activ-
ity (“I”) and a loop back (“R”) in two different locations of the process. This resulted
in eight logs for the simultaneous changes setting. All in all, the dataset contained 25
logs for both single and simultaneous changes.4
In these experiments, we used our drift detection method (cf. Chapter 3, to de-
tect drifts, because this method works in online settings with event streams of highly-
variable business processes.
4All the CPN models used for this simulation, the resulting artificial logs, and the detailed evaluationresults are available with the software distribution.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 64
4.4. EVALUATION ON ARTIFICIAL LOGS
4.4.2 Impact of Characterization Delay on Relations Ordering
In Stage 1 of our method, the KSPT is used to retrieve the relations that are significantly
associated with the drift, and discard the irrelevant ones. Then, the retrieved binary
relations are ordered based on their RFCs with respect to the TRFC that occurred in
the drift. For each detected drift, the ground truth (ideal case) is that the relations
related to the injected drift template are correctly identified and placed in the top of
the returned ordered list. However, some spurious relations may affect the relations
ordering. We use the normalized discounted cumulative gain (nDCG) to evaluate the
accuracy of the relations ordering. The nDCG is a relative measure where a value of
1.0 indicates that the ordered list corresponds to the ground truth, while 0.0 indicates
that none of the relations related to the injected drift template have been retrieved.
This measure is also used for computing the confidence of a template matching, as
explained in Section 4.2.2.
In the first experiment, we study how the accuracy of the ordered binary relations
list is impacted by changing the characterization delay. We vary the characterization
delay from 200 to 1,000 events, and report the mean and the standard deviation of
the nDCG over all the simple change templates, where each template was evaluated
separately over nine injected drifts (cf. Fig. 4.6). In this experiment, we do not apply
any filtering on the ordered binary relations list (CRFC = 100% · TRFC).
Not surprisingly, for a characterization delay of 200 events, the KSPT does not
have enough data to identify the relevant binary relations causing the drift, which leads
to a relatively low average nDCG of around 0.84 and a standard deviation of 0.19
over all templates. Consequently, spurious relations, most often resulting from a slight
change in a branching probability, appear in the ordered relations list. However, we
observe that the accuracy of the relations ordering increases when the characterization
delay grows and eventually plateaus at an average of 0.98 with a standard deviation
of 0.02. As expected, the more data points are fed to the KSPT, the more accurate is
the statistical association between the explanatory variable (here an individual binary
relation) and the target variable (the drift classification variable), and the better the
estimation of the RFC for ordering the relations is. However, the characterization
delay cannot grow indefinitely, hence, we select 500 events as a trade-off between
a short characterization delay and a high characterization accuracy (fewer spurious
relations). This value is used as the default delay in the remaining experiments.
We note that the characterization delay does not only indicate how many events our
method needs to fetch from the event stream to obtain an accurate characterization, but
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 65
CHAPTER 4. PROCESS DRIFT CHARACTERIZATION AT ACTIVITY LEVEL
it also allows us to infer the minimum inter-drift distance that our method can handle.
In other terms, the next potential drift must occur at least after a number of events equal
to this characterization delay (+ one detection window) after the stabilization point (cf.
Fig. 4.2) in order to be accurately characterized.
4.4.3 Impact of Relation Filtering on Characterization Accuracy
As introduced in Section 4.2.2, the ordered relations list resulting from Stage 1 can
be filtered based on the CRFC to discard the relations with insignificant RFCs. Thus,
only the top relations that sum up their CRFC to a certain proportion of the TRFC
are retained. The filtered list is then fed to the template identification stage to find
the best-matching templates with the relations. In this experiment, we study how the
filter affects the accuracy of template identification. We vary the CRFC threshold (x%)
from 70% to 100% (no filtering), and report the F-score of the template identification
averaged over the 25 artificial logs. The F-score is measured as the harmonic mean
of recall and precision, where recall measures the ratio of correctly identified change
templates of a specific type over the total number of injected templates of the same
type, and precision measures the ratio of correctly identified change templates of a
specific type over the total number of identified templates of that same type. Figure
4.7 shows the average accuracy over all templates and per single change, double and
triple simultaneous changes.
As expected, we observe that the F-score increases as the CRFC threshold in-
creases. When the threshold is low, many relations are filtered out, and if only one
relation corresponding to an injected template is discarded then its corresponding tem-
plate will not be matched. On the other hand, when the threshold increases, more
relations remain in the filtered list, thereby increasing the likelihood of matching the
relevant template, leading to a higher recall. However, when no relations are filtered
out (threshold = 100%), spurious relations will be matched with the frequency tem-
plate “fr”. This will impact the precision, explaining the drop in the average F-score
at the threshold value of 100%. As an example, for the change template parallel move
“pm” (with 8 relations), the output of the first stage of our method was an ordered list
of 50 relations. A filter threshold of 70% retains only the top five relations out of 50,
leading to a recall of 0 for this template. On the other hand, a threshold of 90% retains
the top nine relations, leading to a recall of 1. In the remaining experiments we use a
CRFC threshold of 95% that is suitable for both single and simultaneous changes.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 66
4.4. EVALUATION ON ARTIFICIAL LOGS
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
200 300 400 500 600 700 800 900 1000
Ord
eri
ng
accu
racy
(nC
)
Characterization delay (events)
Figure 4.6: Impact of characterization de-lay on relevant relations retrieval and or-dering
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
70 75 80 85 90 95 100
F-sc
ore
Cumulative change [x%]
Single template
Double templates
Triple templates
Average (all)
Figure 4.7: Impact of relation filtering oncharacterization accuracy
4.4.4 Comparison with Baseline
As discussed in Section 4.1, a possible approach to process drift characterization is
to apply the log-to-log comparison technique in [115] on two sub-logs extracted from
before and after a drift. This technique is designed to compare logs with complete
traces, while in our setting the pre-drift and post-drift sub-logs are extracted from an
event stream, and hence contain many incomplete traces. As a first attempt, we fed
the log comparison technique with the two sub-logs before and after the drift as is, but
as expected, the comparison led to a large number of misleading differences. We then
decided to only use complete traces within the two sub-logs. This was possible as we
knew the start and end activities of the process. For each change template, we evaluated
the accuracy of the differences returned by the technique manually. We calculated
recall by considering the missing differences for a given template as false negatives,
so that a recall of 1 is obtained if a template is fully described by the differences.
Similarly, precision was calculated by considering the statements that were not related
to the template as false positives.
Figure 4.8 reports the F-score obtained for each change template for our method
and for the baseline. Our method had almost a perfect F-score for every template as
it could retain the (great majority of the) relations that were involved in the injected
change template, without returning relations that did not fit the templates. On the other
hand, the baseline produced a low F-score for all the change templates. Admittedly,
this technique had a high average recall of around 0.85 over all logs. However, its
precision was very low due to a high number of false positives (wrong differences
returned). Indeed, the two sub-logs capture partial process behavior, which, even if
similar at the event level, is quite variable at the trace level. This was exacerbated by
the high variability of the process. These results are in line with the findings in the
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 67
CHAPTER 4. PROCESS DRIFT CHARACTERIZATION AT ACTIVITY LEVEL
previous chapter on drift detection (cf. Section 3.4.5). That is, the techniques based
on (abstraction of) complete traces such as [77] do not perform well when detecting
drifts in highly variable logs and that finer-grained features such as the α+ relations
are more suitable to capture process behavior in high variability settings.
0
0.2
0.4
0.6
0.8
1
sre
pre cre cp rp sw sm cm p
m cf pl
cd lp cb fr RI
RO IO
RIO
F-sc
ore
Change templates
Our method
Log delta
Figure 4.8: F-score per change template, obtained with our method vs. [115].
We conducted all the experiments on an Intel i7 2.20GHz with 16GB RAM (64 bit),
running Windows 7 and JVM 7 with standard heap space of 4GB. The time required
to extract, order, and then match the α+ relations to the predefined templates for each
drift ranged from a minimum of 410ms to a maximum of 660ms with an average of
530ms. The baseline method took on average 15 seconds to report the differences
between the pre-drift and post-drift sub-logs.
4.5 Evaluation on Real-life Log
We further evaluated our method on the BPI Challenge (BPIC) 2011.5 We chose this
log, which records patient treatments in a Dutch hospital, because of its high trace
variability (∼ 70%). We prepared the log by filtering out infrequent behavior using
the noise filter in [22] with its default settings. This operation resulted in a log with
1,121 traces, of which 798 are distinct, and 42 activity labels. In the previous chapter
(cf. Section 3.5), we detected two drifts from this filtered log, using our drift detection
method. The two drifts were supported by the observation of a sudden increase, and
a subsequent decrease in the number of events while the number of active cases was
decreasing.
We applied our method for drift characterization in order to identify the change
templates that explain these two drifts. Two frequency change templates were identi-
fied to characterize the first drift, while the second drift was explained by one frequency
change template. This template was symmetric to the first frequency change template,
5http://dx.doi.org/10.4121/uuid:d9769f3d-0ab0-4fb8-803b-0d1120ffcf54
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 68
4.6. SUMMARY
identified for the first drift. After investigation, we found that the probability of the
branch which was identified by the change template as increasing (resp. decreasing) af-
ter the first (resp. second) drift point included five activities in a loopback. The increase
from 34% to 46% (resp. decrease from 46% to 34%) in the upper branch probability
of the identified frequency change template is, in fact, the cause of the increased (resp.
decreased) number of events after the first (resp. second) drift. Figure 4.9 depicts the
identified template, with the activity labels in their original language.
As discussed in Section 4.4.4, the baseline technique for log-to-log comparison
[115] is designed to compare logs with complete traces. However, since there was no
complete trace within the pre-drift and post-drift sub-logs, we ran the baseline tech-
nique using the sub-logs containing only partial traces. Nevertheless, we had to abort
the experiment as it did not complete within six hours.
aanname laboratoriumonderzoek
ordertarief 190021 klinische opname a002
190205 klasse 3b a205190101 bovenreg.toesl. a101ligdagen - alle spec.beh.kinderg.-reval.
Figure 4.9: Identified template for Drift 1 in BPIC 2011 log.
4.6 Summary
In this chapter, we proposed a systematic online method for characterizing process
drifts at the level of individual activities from event streams. The method can charac-
terize multiple simultaneous changes so long as they do not overlap in terms of process
behavior. The strength of our method resides in the features used to encode the process
behavior and its well-grounded statistical approach, that allow us to deal with highly
variable processes. The collection of change templates that we use to describe a drift is
based on a well-established categorization of typical change patterns. We do not claim
this collection to be complete, but it can easily be extended. Furthermore, the change
templates that best characterize the drift are reported to the user as natural language
statements. The method may also be used on top of any process drift detection tech-
nique so long as it is provided with the point (or period) in which a drift occurs. Finally,
by replaying an event log as an event stream the method can as well be deployed for
characterizing drifts in event logs.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 69
CHAPTER 4. PROCESS DRIFT CHARACTERIZATION AT ACTIVITY LEVEL
We extensively evaluated our method using both highly variable artificial logs as
well as a real-life log. The results on the artificial logs show high accuracy of our
method in characterizing drifts induced by the application of typical process changes to
individual activities as well as its low characterization delay and low time performance.
And despite the lack of a ground truth to validate our findings on the real-life log, the
results were supported by various observations from the log. In addition, the method
outperforms a state-of-the-art technique for log-to-log comparison.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 70
Chapter 5
Process Drift Characterization atFragment Level
Drift detection and characterization play equally important roles in identifying and
explaining undocumented process changes that may over time negatively impact the
performance of a business process. As such, in Chapters 3 and 4 we proposed two au-
tomated methods for detecting and characterizing drifts from event streams of business
processes. The characterization method starts with extracting α+ binary behavioral re-
lations, such as causality, concurrency and conflict, from an event stream before and
after a drift and then by performing a statistical test filters out unrelated relations to
the drift. The remaining relations are mapped to a predefined set of change patterns to
produce statements that explain the drift. However, this method is limited to charac-
terizing changes applied to individual activities, e.g. removing an activity or swapping
two activities. This limitation is in fact due to the low-level abstraction of process be-
havior as captured by the α+ relations. Consequently, changes to process fragments
are either completely missed by this method, e.g. skipping a fragment of two concur-
rent activities, or only partially explained. An example of the latter case is when we
remove a fragment of two mutually exclusive activities from the process, for which this
method only identifies the removal of one of the activities. Another limitation is the
inability to characterize complex changes such as overlapping changes, i.e. changes
that share some behavioral relations, as well as nested changes, i.e. a set of overlapping
changes, each applied to the resulting subprocess from the application of the previous
one.
In the light of the above, this chapter proposes a fully automated method for char-
actering process drifts at the level of fragments from event streams. The core idea is to
71
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
discover two process trees, i.e. block-structured process models, from the portions of
the event stream before and after the drift, and use a process tree transformation tech-
nique to find a minimum-cost sequence of edit operations that transforms the pre-drift
process tree to the post-drift process tree. The underpinning assumption is that edit
operations within such a sequence manifest control-flow changes of the process un-
derlying the drift. Each process (sub)tree represents a single-entry single-exit (SESE)
process fragment. As such, we define a set of fragment-based edit operations, each
representing a change to one or more process fragments. The definition of the edit
operations and the cost of applying them is such that a minimum-cost sequence of edit
operations provides a detailed yet concise explanation of the process changes. That is,
if a change involves an individual activity within the process then it is explained by
one change in the sequence referring to that activity. On the other hand, if a change
involves a fragment of multiple activities, then it is explained by one change in the
sequence referring to that fragment as a whole. Moreover, the hierarchical structure
of a process tree allows the characterization of more complex changes such as over-
lapping changes as well as nested changes. The identified fragment-level changes are
translated into natural language statements based on typical change patterns of busi-
ness processes that we have already used for detection and characterization of drifts in
the previous two chapters.
We extensively evaluated the accuracy and the conciseness of the statements re-
ported by our method by characterizing drifts on event streams simulated from artifi-
cial and real-life event logs in various settings. The results indicate that the proposed
method is fast and highly accurate in characterizing typical change patterns via concise
statements, and performs better than the method proposed in the previous chapter at
characterizing changes applied to fragments of multiple activities, overlapping changes
as well as nested changes.
This chapter is structured as follows. Sections 5.1 and 5.2 discuss related work and
preliminaries, respectively. Sections 5.3, 5.4 and 5.5 illustrate the various ingredients
of the proposed method, divided into process tree discovery, process tree transforma-
tion, and computing characterization statements, respectively. Sections 5.7 and 5.8
present the evaluation on artificial and real-life logs, respectively. Finally, Section 5.9
discusses some factors that influence the accuracy of the proposed method, while Sec-
tion 5.10 concludes the chapter.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 72
5.1. RELATED WORK
5.1 Related Work
In the previous chapter, we discussed that a possible approach to characterize a drift
is to extract two sub-logs from before and after a drift and compare them using a log-
to-log comparison technique, called log-delta analysis [115, 10]. We also pointed out
some limitations of these techniques, e.g. inability to work with partial traces and
hence event streams. Another approach to drift characterization is to first discover two
process models, one from the pre-drift sub-log and the other from the post-drift sub-
log and compare them using a model-to-model comparison technique. In this context,
Armas-Cervantes et al. [8] propose a method for diagnosing behavioral differences
between two process models based on canonically reduced event structures. The idea
is similar to that of Van Beest et al. [115], that is to build two event structures, here
from two process models, and by comparing them report their differences as natural
language statements. However, the identified differences by this method are at the level
of individual activities. Consequently, a problem is that this method reports a large
number of differences, specially when the changes are applied to process fragments,
or when they occur in a nested way. For example, for a simple fragment-level change,
where we parallelize two sequential fragments, each consisting of four activities, this
method would report 16 differences, each capturing the parallelization of two activities.
Obviously, it is not easy to understand and analyze such a large number of differences.
Another limitation of this method is its high execution time, specially when it needs to
compare two large event structures with several differences.
All the approaches described above, as well as the one presented in Chapter 4
work at the level of individual activities and thus are not suitable for characterizing
process fragment changes. A possible approach to discover process fragments is to
abstract from the low-level behavioral relations in an event stream by discovering a
block-structured process model, where each block represents a single-entry single-exit
(SESE) process fragment. The latter requires a process discovery technique that works
on event streams. A few techniques have been proposed for process discovery on
event streams [17, 98, 16], among which, only the technique developed by Redlich
et al. [98] guarantees the discovery of block-structured process models. However, the
latter technique can only work on completed traces within an event stream, and hence
it misses the behavior represented by partial traces. In this chapter, we adapt Inductive
Miner (IM) [68] to work on event streams. Inductive Miner is an automated process
discovery technique that guarantees the discovery of block-structured process models
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 73
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
in the form of process trees. We chose IM because this algorithm recursively constructs
a process tree from an event log, so it naturally lends itself to be adapted to work on
partial traces. Alternatively, it is possible to incrementally compute a process map
from an event stream using a technique such as the one in [73] and use it as input
to any other process discovery technique that builds a block-structured process model
from a process map, provided this technique can work on partial traces.
A block-structured process model can be represented as a process tree. Therefore,
given two process trees from before and after a drift we can characterize the drift by
finding a sequence of edit operations that transform the pre-drift process tree to the
post-drift process tree. This problem in the algorithms and data structures community
is known as tree edit distance, where a widely studied challenge is to find the mini-
mum number of edit operations to turn an ordered (resp., unordered) tree into another
ordered (resp., unordered) tree. There are several techniques for finding the minimum
tree edit distance between two ordered trees [113, 137, 57, 27, 31, 91] and between
two unordered trees [138, 136, 108, 135, 54, 5, 51, 37, 61]. A process tree, as opposed
to conventional trees, may contain both ordered and unordered nodes at the same time.
The existing tree comparison techniques, however, are designed to transform an or-
dered (resp. unordered) tree to another ordered (resp. unordered) tree. Moreover, due
to the specific syntactic rules of process trees the basic node deletion/insertion/substi-
tution edit operations defined by these techniques are not suitable to capture the differ-
ences between the process behaviors expressed by the two process trees. For example,
non-leaf nodes in a process tree should at least have two children, or some leaves may
only be parented by certain nodes. Such rules often give rise to the situations where the
deletion of a node from a process tree triggers a sequence of trivial node deletions that
do not change the process behavior expressed by the tree. Therefore, these techniques,
in their current form, cannot be used for the process tree comparison problem. In this
chapter we introduce a process tree comparison technique that finds a minimum-cost
sequence of process tree edit operations needed to transform a process tree to another
process tree. We implement the proposed technique using two alternative search strate-
gies, exhaustive and greedy, and assess the relative merits.
5.2 Preliminaries
This section introduces basic notions such as process trees and fragments. The notation
used in this chapter is summarized in Appendix A.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 74
5.2. PRELIMINARIES
A tree is an acyclic, connected graph. For a tree T , the sets containing nodes and
edges are denoted by V (T ) and E(T ), respectively. The size of T is |V (T )| and is
denoted by |T |. We sometimes denote v ∈V (T ) as v ∈ T . The root node of a tree T is
denoted by root(T ). We denote the subtree of T rooted at v ∈ T by T 〈v〉.
For each non-root node v in T , let DownT (v)⊂V (T ) be the sequence of nodes on
the shortest path from root(T ) to v. The parent of v is its adjacent node in DownT (v).
The parent of root r is undefined. We say v is a child of u if u is the parent of v. A nodes
in DownT (v) preceding v is called an ancestor of v in T . We say v is a descendant of
u if u is an ancestor of v. The nodes with the same parent are called siblings. A node
with no children is called a leaf. A non-leaf node is called an internal node. The set of
leaves under an internal node v ∈ T is denoted by leaves(v). We denote the label of a
node v by l(v).
The depth of v in T , is denoted by dep(v) and equals to |DownT (v)|−1. The depth
of T , denoted by dep(T ) equals to the maximum depth of its nodes, i.e. dep(T ) =
maxv∈V (T ) dep(v). For the nodes v1, . . . ,vn in T we define a common ancestor (CA)
as a node in DownT (v1)∩ . . .∩DownT (vn), and denote it by CA(v1, . . . ,vn). Also,
we define the lowest common ancestor (LCA) as the deepest CA, and denote it by
LCA(v1, . . . ,vn). Accordingly, for the subtrees T 〈v1〉, . . . ,T 〈vn〉 the lowest common an-
cestor (LCA) is denoted by LCA(T 〈v1〉, . . . ,T 〈vn〉), and is the same as LCA(v1, . . . ,vn).
A process tree is a rooted labeled tree that provides an abstract hierarchical repre-
sentation of a block-structured workflow net [68]. We define its syntax as follows:
Definition 12 (Process tree). Let L be a set of activity labels, and O = {×,→,∧,}be a set of operator labels. Then, an activity node t with l(t) ∈L is a process tree, a
τ-node with l(τ) ∈ {τ} is a process tree, and ⊕(P1, . . .Pn) is a process tree, in which
⊕ is a process tree operator node with l(⊕) ∈ O, and P1 . . .Pn are process trees.
A process tree expresses a language: an activity node t represents the singleton lan-
guage l(t), a τ-node represents the language with the empty trace, and an operator node
represents a certain combination of the languages of its subtrees P1 . . .Pn, depending on
its label. In this chapter, we have the following four operator labels: 1)× expresses the
exclusive choice between its subtrees, 2) → expresses the sequential composition of
its subtrees, 3) ∧ expresses the concurrent composition of its subtrees, 4) expresses
the structured loop of its first subtree (loopbody), followed by the alternative loopback
path of its second subtree.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 75
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
For instance, the process tree ×
dc
∧
ba
expresses the language {〈a,b〉, 〈b,a〉, 〈c〉,
〈c,d,c〉, 〈c,d,c,d,c〉 . . .}.
In a process tree P, a leaf node is either an activity node or a τ node, whereas
an internal node is always an operator node and must at least have two children. We
define Γ =L ∪{τ}∪O as a fixed finite alphabet which assigns a label to each node in
a process tree. In a process tree, if an activity node has a unique label, we sometimes
refer to that activity node by its label. The set of activity nodes under an operator node
v∈P is denoted by C(v), and contains the activity nodes in P〈v〉. By replaying an event
log on top of a process tree we can annotate each node of the tree with its execution
frequency. We call the ratio of the frequency of a node v to the frequency of its parent
the relative frequency of v.
The relation between the nodes v1, . . . ,vn in P is defined by the operator of their
LCA, i.e. mutually-exclusive (×), concurrent (∧), sequential (→), or loop (). Ac-
cordingly, the relation between the process trees P〈v1〉, . . . ,P〈vn〉 in P is the same as
the relation between the nodes v1, . . . ,vn.
A process tree can contain both ordered and unordered operator nodes. An operator
node ⊕ is unordered if it is commutative, i.e. ⊕(P1, . . . ,Pn) = ⊕(Pn, . . . ,P1), it is
ordered otherwise. The operator nodes × and ∧ are unordered, whereas→ and are
ordered. For example, ×
ba
= ×
ab
, whereas →
ba
6= →
ab
.
The pre-order index of v in a process tree P is denoted by preP(v), and is the
same as one in an ordered tree when arbitrarily fixing the order of siblings parented by
unordered operator nodes in P. We refer to the node with the pre-order index of i in P
by P[i]. Also, in the process tree examples in this chapter, the number on the left side
of a node indicates its pre-order index.
For a node v and an ordered operator node ⊕ in a process tree P such that v is a
descendant of ⊕, we define a function Rankk returning the rank of v in ⊕.
Definition 13 (Rankk). Let� be the order on the children of an ordered operator node
⊕ in a process tree P. Also, let v ∈ P be a descendant of ⊕ and c ∈ P be a child of ⊕such that c ∈ DownP(v), then Rankk(v,⊕) = |{c′ ∈ children of ⊕ | c′ � c}|.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 76
5.2. PRELIMINARIES
Example 8. As an example, let us assume the process tree P →
∧
dc
×
ba
0
1
2 3
4
5 6
. The rank of
each non-root node P[i] in the→-node P[0] is as follows.
Rankk(P[1], P[0]) = 1, Rankk(P[2], P[0]) = 1, Rankk(P[3], P[0]) = 1,
Rankk(P[4], P[0]) = 2, Rankk(P[5], P[0]) = 2, Rankk(P[6], P[0]) = 2.
There might be multiple process trees with the same language. For example, the
tree ×(a,×(b, c)) expresses the same language as ×(×(a, b), c). As in this chapter
we compare the structures of two trees to characterize a drift, we need to have one
structurally unique tree for each language. A set of structural reduction rules is intro-
duced in [72], which guarantees to preserve the language of a process tree. Repeated
application of these rules to a process tree leads to a syntactic unique normal form, i.e.
for each language, there is at most one process tree in normal form. In our example, the
normal form would be ×(a, b, c). In this chapter, we use a subset of these reduction
rules, defined below.
Definition 14 (Reduction rules). Let M, Q, and R be process trees, and let . . . be any
number of process trees (possibly 0). Then, the reduction rules are as follows:
singularity rule
(S) ⊕(M)⇒M with ⊕ ∈ {×,→,∧}
associativity reduction rules
(A×) ×(. . .1 ,×(. . .2))⇒ ×(. . .1 , . . .2)
(A→) →(. . .1 ,→(. . .2), . . .3)⇒ →(. . .1 , . . .2 , . . .3)
(A∧) ∧(. . .1 ,∧(. . .2))⇒ ∧(. . .1 , . . .2)
τ-reduction rules
(T→) →(. . . ,M,τ)⇒ →(. . . ,M)
(T∧) ∧(. . . ,M,τ)⇒ ∧(. . . ,M)
A process tree to which no rule can be applied is in normal form and is called a
canonical process tree.
A singularity rule applies to all operators except , as a -node always has two
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 77
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
children. This rule is based on the definition of the process tree operators (provided
above) that a→-node, a ∧-node, or an ×-node with one child has the same behavior
as the child itself. The associativity rule applies to ×,→, and ∧ operators and reduces
a tree such as ×(a,×(b, c)) to ×(a, b, c). The τ reduction rules target τ constructs
and are defined for→ and ∧ operators. A τ-node as a child of a→-node, or a ∧-node
does not change the language (T→, T∧).
We define a fragment as a process tree representing a single-entry single-exit pro-
cess fragment. Formally:
Definition 15 (Fragment). Let P be a process tree rooted at w.
• P is a fragment.
• Let S = {P〈v1〉, . . .P〈vn〉} be the set of process trees under w, where v1, . . .vn
are children of w, . . . is any number of process trees (possibly 0), and l(w) ∈{×,∧}. A process tree ⊕(P1, . . . Pm) parented by w, where l(⊕) = l(w) and
s = {P1, . . .Pm} is any non-empty proper subset of S, is a fragment. We call such
a fragment a sub-fragment of w.
• Let S= {P〈v1〉, . . .P〈vn〉} be the sequence of process trees under w, where v1, . . .vn
are children of w, . . . is any number of process trees (possibly 0), and l(w) =→.
A process tree⊕(P1, . . .Pm) parented by w, where l(⊕)= l(w) and s= {P1 . . .Pm}is any nonempty proper subsequence of S, is a fragment provided that any two
consecutive elements P〈vi〉,P〈vi+1〉 in s are consecutive in S. We call such a
fragment a sub-fragment of w.
Any fragment formed by the nodes within a process tree P is a sub-fragment of P.
We sometimes refer to P〈v〉 by P〈v〉-fragment. Also, a fragment P〈v〉, where v is the
child of a node w, is called a child fragment of w. Furthermore, a fragment f1 = τ is
called a τ-fragment, and as such a fragment f2 6= τ is called a non-τ-fragment.
Example 9. As an example, in the process tree ×
τba
the set of all sub-fragments of ×
is { ×
ba
, ×
τb
, ×
τa
, a, b, τ}.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 78
5.3. PARTIAL TRACES AND PROCESS TREE DISCOVERY
Example 10. As an example, for the process tree
→
fed
×
τba
the set of all sub-fragments
is {
→
fed
×
τba
, ×
τba
, ×
ba
, ×
τb
, ×
τa
, →
fed
, →
ed
, →
fe
, a, b, τ , d, e, f}.
5.3 Partial Traces and Process Tree Discovery
Given two sub-logs of partial traces, one extracted from before and the other extracted
from after a drift, our method characterizes the drift in three steps. In the first step, two
process trees P and P′ are discovered from the pre-drift and post-drift sub-logs, respec-
tively. In the second step, a minimum-cost sequence of edit operations that transforms
P into P′ is computed. In the third step, our method constructs characterization state-
ments based on the identified edit operations. An overview of our method is shown in
Figure 5.1. In the rest of this section we illustrate Step 1, while in the next to sections
we cover the other two steps.
Process tree discovery Process tree transformation
Construct Drift characterization
statementsPost-drift sub-log
Pre-drift sub-log P
P'A sequence of edit operations
Characterization statements
Figure 5.1: Overview of our method for process drift characterization.
Due to the traces being derived from streams of events, and our application of
window-based extraction, we might observe some traces only partially. That is, the
start and/or the end of the traces might fall outside of the considered window, as illus-
trated in Figure 5.2. Partial traces can be found outside the area of streams as well:
if an event log is extracted from a running process, one in fact applies a window to
the running process, and every case that is still in progress falls partially outside of the
window. Furthermore, cases that were already started before the event log was being
captured also fall outside of the window.
Partially observed traces might influence discovery, which we illustrate using Fig-
ure 5.2. If all traces would have been observered completely, in the log
L = [〈,a,b,c,d,e, f ,g〉3], IM would discover the model →
gfedcba
. However, the
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 79
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
〈a,b,c,d,e, f ,g〉
〈a,b,c,d,e, f ,g〉
(a) Traces in a window.
Lp = [ |b,c,d,e, f ,g〉,〈a,b,c,d,e| ]
(b) A corresponding log with partial traces(their partiality is denoted by |).
Figure 5.2: Example of partial traces. In the window, some traces are observed par-tially, as they start and/or end outside of the window. In our example, the first and thelast trace are only partially observed.
event log observed is Lp = [|b,c,d,e, f ,g〉, 〈a,b,c,d,e|]. Without knowledge of partial
traces, IM discovers the model →
×
→
gfe
τ
dcb×
aτ
. This process tree does not capture
the meaning of the partial traces well, as it allows a, e, f and g to be skipped, even
though there has not been evidence of this skipping in the event log. One could sim-
ply remove the partial traces. However, as seen in our example, these traces add vital
information as without them the event log would be empty.
In this section, we first describe how partial traces can be detected. Second, we
describe an existing process tree discovery algorithm (Inductive Miner (IM)). Third,
we introduce a new process tree discovery algorithm that extends IM by adapting its
steps to handle partial traces better.
5.3.1 Detecting Partial Traces
Two pieces of information constitute knowledge of partial traces: one needs to decide
whether one has seen the first event, and whether the last event has been seen. We refer
to a trace of which we have seen the first event as having a reliable start, and to a trace
of which we have seen the last event as having a reliable end. Traces might have both
an unreliable start and an unreliable end, or both might be reliable. In Figure 5.2, the
first trace has a reliable end and the second trace has a reliable start.
To detect whether a trace has a reliable start or end, one could incorporate domain
knowledge. For instance, it could be known that a trace always starts with a “regis-
tration” step and always ends with an “archive” step. Then, each trace that starts with
“registration” has a reliable start and each trace that ends with “archive” has a reliable
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 80
5.3. PARTIAL TRACES AND PROCESS TREE DISCOVERY
end. Other ways to determine reliability include the use of attribute data. For instance,
attribute data attached to the trace could indicate whether a trace has been completed.
In absence of any domain knowledge, one could mark the most occurring start and
end activities, given some threshold, and mark traces as reliable accordingly.
Many more ways of detecting partial traces might be proposed. The extension of
IM that is described in this section does not depend on the way reliability is decided.
5.3.2 Discovering Process Models from Partial Traces
We introduce an extension of Inductive Miner (IM) to handle partial traces, namely
Inductive Miner - partial traces (IMpt). We chose Inductive Miner (IM) [68] because
this algorithm recursively constructs a process tree from an event log, so it naturally
lends itself to be adapted to work on partial traces, given that we need to produce
process trees as output. Specifically, in the recursion, IM tries to find a cut of the
event log, consisting of a partition of the activities in the event log and a process tree
operator. This cut describes the most important behavior in the event log. For instance,
the cut (→,{a,b},{c}) denotes that the most important behavior in the event log is
‘some behavior with a and b’ sequentially followed by ‘some behavior with c’. If
such a cut can be found, the event log is split accordingly into sub-logs, and on these
sub-logs IM recurses, thereby constructing a process tree in a top-down manner. The
recursion ends in a base case, for instance if only a single activity remains in the event
log. Alternatively, if no base case applies and a cut cannot be found, a fall through is
selected. Several fall throughs have been defined (see [72]), decreasing in precision, in
the worst case leading to a flower model that allows any behavior with the activities in
the event log (e.g.
τ×
an. . .a1
).
As seen in the example of Figure 5.2, partial traces might introduce skips in the
resulting process model. For each of the steps of IM, we describe the effects of partial
traces, and briefly how IMpt addresses them.
IM detects cuts by considering the directly follows graph of the event log. The
nodes of this graph are the activities in the log, and the edges denote the activities
that were directly followed by other activities in the event log. Furthermore, directly
follows graphs contain information about which activities where observed in the log as
the start or end of a trace.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 81
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
As an example, we consider an event log Ll:
Ll = [〈a,b,c,a,b,c,a,b〉,〈a,b,c,a,b,c|, |b,c,a,b〉, |c,a|]
The directly follows graph of Ll , without considering partial traces, is shown in Fig-
ure 5.3a.
In the directly follows graph, IM identifies characteristic footprints of the process
tree operators ×, →, ∧ and . For instance, in Figure 5.3a, the cut (∧,{a,b},{c})can be identified, as a and b are fully connected to c. For more details on cut de-
tection, please refer to [71]. As a final result, IM would discover the process tree
∧(c,(a,τ),(τ,b)).
However, with the available knowledge of partial traces, this tree does not do Ll
justice. To take partial traces into account, IMpt considers only reliable start and end
activities. For Ll , the directly follows graph then becomes as shown in Figure 5.3b.
In this graph, the cut (,{a,b},{c}) can be identified. As a final result, IMpt would
discover the process tree (→(a,b),c), which matches the intuitive idea of the log
better than the model discovered by IM.
a b
c
(a) Without considering partial traces.
a b
c
(b) Considering partial traces.
Figure 5.3: Two directly follows graphs for L1.
After a cut has been detected, the event log is split into several sub-logs, based on
the cut that was found. During log splitting, information about the reliability of traces
has to be copied to the sub-logs and adjusted.
For × and ∧, no adjustments are necessary. For instance, for ∧, if the trace had
an unreliable end, then, both sub-traces have unreliable ends. That is: 〈a,b,c| split on
(∧,{a,b},{c}) becomes 〈a,b| and 〈c|.For→ and , if the to-be split trace has an unreliable end, the last sub-trace will
have an unreliable end, but all other sub-traces will have reliable ends. For instance,
〈a,b| split on (→,{a},{b},{c}) becomes 〈a〉 for {a}; and 〈b| for {b}; and no trace for
{c}.1 For unreliable starts of traces and for -cuts, a similar strategy is applied.
Most base cases of IM are unaffected by partial traces. However, if the log contains
1If the trace would have a reliable end, then an empty trace would be introduced in the sub-log for{c}.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 82
5.4. PROCESS TREE TRANSFORMATION
empty traces, then the base case EMPTYTRACES [72, p195] might remove the empty
traces from the log, recurse and return a ×(τ, .) construct. Like other traces, empty
traces might have unreliable starts or ends. If a trace had an unreliable start, then the
actual trace might have events that fell before the window of observation (similar for
unreliable ends). Therefore, IMpt considers empty traces only if these have a reliable
start and a reliable end.
The concepts of IMpt can be straightforwardly extended to handle infrequent be-
havior (analogous to IM - infrequent [69], which filters noise from the directly follows
graph before cut detection and from the log during log splitting), yielding Inductive
Miner - infrequent - partial traces (IMfpt), and to handle incomplete behavior (anal-
ogous to Inductive Miner - incompleteness [70], which optimises to find the best cut
rather than a perfect cut), yielding Inductive Miner - incompleteness - partial traces
(IMcpt).
Introducing a technique to handle event logs with partial traces yields the need for
conformance checking concepts and techniques that are aware of partial traces as well,
such as fitness and precision in the presence of partial traces. However, defining these
concepts is outside the scope of this thesis.
5.4 Process Tree Transformation
We use IM to discover two process trees from the sub-logs before and after a drift. In
this section, we present a method for finding a sequence of edit operations with the
minimum cost, that transforms the pre-drift process tree P to the post-drift process tree
P′. We first define a set of process tree edit operations and the cost of applying them
in Section 5.4.1. A direct approach to solve the process tree transformation problem
is then to try all possible sequences of edit operations that transform P into P′ and
find the cheapest one. However, there are infinite number of such sequences and it
may be impossible to enumerate all of them. To prune the search space, we define
a notion of mapping between two process trees, where a valid mapping is one that
represents a sequence of edit operations that transforms P into P′. By defining the cost
of a valid mapping based on the cost of edit operations, we reformulate our goal as
to find a minimum-cost valid mapping between P and P′. By means of a mapping we
substantially prune the search space as we only need to try all possible valid mappings
between P and P′ to find a minimum-cost sequence of edit operations that transforms
P into P′. In Section 5.4.2.1 we present an A* algorithm to compute a minimum-cost
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 83
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
valid mapping between two process trees. As a faster alternative, a greedy algorithm
is presented in Section 5.4.2.2 to approximate such a mapping.
5.4.1 Process Tree Edit Operations
A process tree edit operation is an edit operation applied to a process tree at any step
during its transformation to another process tree. In a process tree transformation prob-
lem the goal is to find a minimum-cost sequence of edit operations to transform one
process tree into another process tree (optimal solution). Hence, the granularity of pro-
cess tree changes expressed in the optimal solution depends on the size of process tree
constructs based on which the edit operations are defined as well as the cost of each
edit operation. For example, consider the transformation of process tree P : →
∧
dc
×
ba
0
4
into process tree P′ : →
×
fba
e
0 , and assume two edit operations, delete/insert a frag-
ment (of any size), where each edit operation has a unit cost. These two edit operations
yield the optimal solution consisting of two changes: delete P〈P[0]〉-fragment and in-
sert P′〈P′[0]〉-fragment, i.e. delete the original process tree and insert the new process
tree. However, such an abstract explanation does not provide any detail on the actual
changes occurred in the process. On the other hand, assume two edit operations with
unit costs which only allow the insertion/deletion of individual nodes in a process tree.
For P and P′ in the above example, the optimal solution would become: delete activ-
ities c and d, and insert activities e and f . This sequence of changes provide detailed
characterization of changes in the process trees. However, explaining the changes at
the level of activities can become verbose and confusing, specially when changes in-
volve large fragments of a process. As such, we need to define edit operations and
their costs such that the optimal solution characterizes process tree changes in enough
detail while avoiding verbosity. For example, instead of reporting on the deletion of
activities c and d individually, we could report on the deletion of the P〈P[4]〉-fragment
containing those activities without loss of information.
A process tree edit operation represents a change in its underlying process. There-
fore, we define process tree edit operations based on the typical change patterns in
business processes [129], introduced in Chapter 2. We classify each change patterns,
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 84
5.4. PROCESS TREE TRANSFORMATION
except “synchronize two fragments”, as simple (S) or compound (C), where a com-
pound change pattern is one that can be expressed using multiple simple change pat-
terns. Table 5.1 shows the class of each change pattern. Note that the synchronization
of two fragments introduces unstructuredness into a process model and hence cannot
be used as a basis for defining process tree edit operations. This change pattern is
illustrated with an example in Section 5.5. We set our goal as to find a sequence of
simple changes that fully explains the transformation of the pre-drift process tree P to
the post-drift process tree P′, while satisfying three requirements. 1. To improve the
understandability of the changes, a change in the relation between fragments, e.g. from
sequential to parallel, should only involve fragments that exist in both P and P′. 2. The
changes within the sequence should not overlap, i.e. any two changes should cover
distinct differences between the trees. 3. The sequence of changes needs to be detailed
yet concise. That is, if a change involves an individual activity within the process tree
then it should be explained by one change in the sequence referring to that activity.
On the other hand, if a change involves a fragment of multiple activities then it should
be explained by one change in the sequence referring to that fragment as a whole. To
satisfy these requirements, we first define a set of process tree edit operations based on
the simple change patterns in Definition 16, 17, 18 and 19. The defined edit operations
can be applied to fragments of any size, from individual activities to larger fragments.
We then search for a minimum-cost sequence of edit operations which transforms the
pre-drift process-tree P into the post-drift process tree P′. In this search, we only con-
sider sequences of edit operations in which edit operations that delete (resp., insert)
fragments occur before (resp., after) edit operations that change the relation between
fragments. Furthermore, by limiting each node within P or P′ to be subject to one edit
operation we ensure that the edit operations within a sequence do not overlap. We also
define the cost of edit operations such that a minimum-cost sequence of edit operations
which transforms P into P′ provides a detailed description of changes within P. In a
post-processing step, we then aggregate the edit operations within a sequence to make
it as concise as possible.
Therefore, based on a defined set of edit operations our goal is to find a minimum
cost sequence of edit operations to transform P into P′ and to subsequently make it as
concise as possible. In this chapter, we use six process tree edit operations: substitution
of operators SUB⊕, substitution of activities SUBac, deletion of fragments (D f ), dele-
tion of -operator nodes (D), insertion of fragments (I f ), and insertion of -operator
nodes (I). The relation with the change patterns is shown in Table 5.1.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 85
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
Code Change pattern Cat. Class Process tree edit op-erations
sre Insert/delete a fragment between two fragments I S I f , D fpre Insert/delete a fragment in/from parallel branch I S I f , D fcre Insert/delete a fragment in/from conditional branch I S I f , D fcp Duplicate a fragment I Crp Substitute a fragment I C SUBac (covers activ-
ity substitution)sw Swap two fragments I Csm Move a fragment to between two fragments I Ccm Move a fragment into/out of conditional branch I Cpm Move a fragment into/out of parallel branch I Ccf Make fragments mutually exclusive/sequential R S SUB⊕pl Make fragments parallel/sequential R S SUB⊕cd Synchronize two fragments R - -lp Make a fragment loopable/non-loopable O S I, D
cb Make a fragment skippable/non-skippable O S I f , D ffr Change branching frequency O -
Table 5.1: Change patterns from [129] and their relation to our process tree edit oper-ations.
Definition 16 (Process tree edit operations). A process tree edit operation γ transforms
a canonical process tree P into another canonical process tree P′, denoted by Pγ−→
P′.
Definition 17 (Substitution operations). We use the following process tree edit opera-
tions for substitution:
SUB⊕ Operator substitution Let ⊕(M1, . . .M2) be a fragment, where . . . is any num-
ber of process trees (possibly 0), l(⊕) ∈ {→,×,∧}, and M1 and M2 are process
trees. Operator substitution replaces the operator of ⊕ with a different operator
in {→,×,∧}.
This edit operation cannot be applied to an ×(. . . ,τ)-node, where . . . is any
number of process trees, as a→- or ∧-node may not have a τ-child.
SUBac Activity substitution Applies to activity nodes, where it replaces the activity
with a different activity.
Example 11. Figure 5.4a shows two examples of substitution operations, where the
operator of the→(b,×(c,d))-node is substituted with ∧, and activity ‘a’ is substituted
with activity ‘e’.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 86
5.4. PROCESS TREE TRANSFORMATION
After the application of each edit operation to a process tree, we reduce the result-
ing tree to normal form by repeatedly applying the reduction rules (cf. Definition 14).
We do not report on the changes in a process tree as a result of the application of
reduction rules, as they do not change the language of the tree.
Example 12. For instance, in →
c∧
ba
1
SUB⊕−−−−−→∧ ⇒ →
→
c→
ba
A→−−→ →
cba
after the substitu-
tion of the operator of the ∧-node 1 with→, we can reduce the resulting process tree
by applying the associativity reduction rule A→.
Definition 18 (Deletion operations). We use the following process tree edit operations
for deletion:
D f Fragment deletion Deletes a fragment f .
If f is a sub-fragment of an operator node ⊕ and as a result of deleting it ⊕ is
left with one child, ⊕ will be removed by the singularity reduction rule (S). For
example, in Figure 5.4b (left to right) the ∧-node P[1] is deleted subsequently by
the singularity reduction rule.
If a -node with less than two children remains after applying a fragment dele-
tion, the deleted construct is replaced with a τ-child to keep the number of chil-
dren of the -node at 2. Such τ-nodes are called auxiliary τ-nodes.
D -operator deletion Let P = ⊕(. . . ,w(M1,τ), . . .), where . . . is any number of
process trees (possibly 0), w is a -node, and M1 is a process tree. Deletion of
the -node w makes ⊕ the parent of M1 and deletes the τ-node.
Example 13. Figure 5.4b (left to right) shows an example of D f , where Fragment 1 is
deleted. In
c→
ba
D f−→→
ba
cτ1
, the deleted→-fragment is replaced by the τ-node 1 to
keep the number of children of the -node at 2. Figure 5.4c (left to right) shows an
example of D, where the -node P[2] is deleted.
Definition 19 (Insertion operations). We use the following process tree edit operations
for insertion:
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 87
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
I f Fragment insertion Inserts a fragment (as a child of an operator node or an
auxiliary operator node).
As discussed above, the deletion of a fragment may cause its parent to be deleted
as well by the singularity reduction rule. Thus, the fragment insertion opera-
tion needs to insert auxiliary operator nodes again, to ensure that a fragment
insertion can offset a fragment deletion. An auxiliary operator node is an extra
non--operator node, inserted (as a child of an operator node ⊕) in a process
tree P (and) as the parent of an inserted fragment and a sub-fragment (of ⊕) in
P. An auxiliary operator node defines the relation between the inserted fragment
and the sub-fragment.
As explained before, the deletion operations insert τ-leaves if a -node would,
as a result of the deletion, not have 2 children. Similarly, when inserting a frag-
ment as the first child of a (τ,M1)-node, the τ-node is replaced (and similar
for the symmetric second-child case). Such τ-nodes that are inserted (resp.,
deleted) as a result of deleting (resp., inserting) child fragments of -nodes are
called auxiliary τ-nodes.
I -operator insertion Inserts a -node n in a process tree P. As a result of this
edit operation, one of the non-τ-sub-fragments of P is inserted as the first child
(loop body) of n, while the second child (loopback) of n is a τ-node.
Example 14. Figure 5.4b (right to left) shows an example of I f , where Fragment 1 is
inserted as a child of the auxiliary ∧-node P[1], in a concurrent relation with activity
‘a’. In
cτ1
I f−→→
ba
c→
ba
, the τ-node 1 is replaced by the inserted→-fragment to keep
the number of children of the -node at 2. Figure 5.4c (right to left) shows an example
of I, where the -node P[2] is inserted as a child of the→-node P[0] (P′[0]), and as
the parent of activity ‘b’.
We defined a set of 6 edit operations based on the simple change patterns in Ta-
ble 5.1, which allow to provide detailed characterization of changes in a process tree.
Included in this set are the two edit operations, insert/delete a fragment, which alone
suffice for explaining any types of changes in the structure of a process tree. Therefore,
the set of 6 edit operations defined above is complete, i.e. it is possible to characterize
any changes in the structure of a process tree using the edit operations in this set.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 88
5.4. PROCESS TREE TRANSFORMATION
P P'0
1eb
0
1 2aSubstitite'a'
X
with'e'Substitite with X
P
d daInsert Fragment 1
P P'0
1aInsert
c
2
3 4b
0
1a 3 cb2^
P'
aInsert
2
b
P
ba?
?
Delete Fragment 1
Delete ^
Delete?
cb
X
Fragment 1
^
a
1
P'
^
3 c 2
3b c4
P P'
baSubstitite 'a'with'e'
dc
X Substitite
b
dc
X
in with
b
e
dc
^X
^
P P'
baSubstitite 'a'with'e'
dc
X Substitite with ^X be
dc
^
P
d daInsert Fragment 1
Delete Fragment 1
cb
X
Fragment 1
a
P'
P'
aInsert
b
P
ba? ?
Delete?
(a) SUBacandSUB⊕
P P'0
1eb
0
1 2aSubstitite'a'
X
with'e'Substitite with X
P
d daInsert Fragment 1
P P'0
1aInsert
c
2
3 4b
0
1a 3 cb2^
P'
aInsert
2
b
P
ba?
?
Delete Fragment 1
Delete ^
Delete?
cb
X
Fragment 1
^
a
1
P'
^
3 c 2
3b c4
P P'
baSubstitite 'a'with'e'
dc
X Substitite
b
dc
X
in with
b
e
dc
^X
^
P P'
baSubstitite 'a'with'e'
dc
X Substitite with ^X be
dc
^
P
d daInsert Fragment 1
Delete Fragment 1
cb
X
Fragment 1
a
P'
P'
aInsert
b
P
ba? ?
Delete?
(b) D f (left to right) and I f (right to left)
P P'0
1eb
0
1 2aSubstitite'a'
X
with'e'Substitite with X
P
d daInsert Fragment 1
P P'0
1aInsert
c
2
3 4b
0
1a 3 cb2^
P'
aInsert
2
b
P
ba?
?
Delete Fragment 1
Delete ^
Delete?
cb
X
Fragment 1
^
a
1
P'
^
3 c 2
3b c4
P P'
baSubstitite 'a'with'e'
dc
X Substitite
b
dc
X
in with
b
e
dc
^X
^
P P'
baSubstitite 'a'with'e'
dc
X Substitite with ^X be
dc
^
P
d daInsert Fragment 1
Delete Fragment 1
cb
X
Fragment 1
a
P'
P'
aInsert
b
P
ba?
?
Delete?
00
(c) D (left to right) and I (right to left)
Figure 5.4: Examples of process tree edit operations.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 89
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
Each operation has an associated cost θ ; these costs are shown in Table 5.2. For
our cost function θ , it can be shown that the triangle inequality holds, that is, for all
process trees w, u and v and all edit operations x, y and z it holds that θ(w x−→ u) ≤θ(w
y−→ v) +θ(v z−→ u).
Edit operation Cost θ
SUB⊕ 1.SUBac 1.D f If the deleted fragment is a τ-node, then 1. Otherwise, the number of
non-τ leaves in the fragment. Auxiliary nodes have no cost.D 1.I f If the inserted fragment is a τ-node, then 1. Otherwise, the number of
non-τ leaves in the fragment. Auxiliary nodes have no cost.I 1.
Table 5.2: Costs associated with the process tree edit operations.
5.4.1.1 Edit Operation Sequences
Let S = e1, . . . ,en be a sequence of edit operations that transforms a process tree P
into a process tree P′. That is, there is a sequence of process trees P0, . . . ,Pn such that
P = P0, P′ = Pn, and Pi−1ei−→ Pi for 1≤ i≤ n. By extending θ the cost of the sequence
S is given by θ(S) = ∑ni=1 θ(ei).
The edit distance d(P,P′) from process tree P to process tree P′ is defined to be the
minimum cost of all sequences of edit operations which transform P into P′, i.e.
d(P,P′) = min{θ(S) | S is a sequence of edit operations which transforms P into P′}
As stated in Section 5.4.1, to improve the understandability of changes within a
sequence of edit operations, it should only be allowed to change the relation between
fragments that exist in both P and P′. This is illustrated in the following example.
Example 15. As an example, consider this sequence of edit operations that transforms
process tree P into process tree P′,
×
d∧
cba
1
SUB⊕(∧,×)−−−−−−→ ×
dcba
D f−→c
×
dba
. First the operator of the ∧-node
1 is substituted with × by a SUB⊕ edit operation, followed by the application of the
reduction rule A× to this node. Then, activity ‘c’ is deleted by a D f edit operation.
The first edit operation describes that the relation between activities ‘a’, ‘b’ and ‘c’
has changed from concurrent in P to mutually exclusive in P′. However, as activity ‘c’
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 90
5.4. PROCESS TREE TRANSFORMATION
is deleted by the subsequent edit operation, and hence does not exist in P′, describing
a change in the relation between this activity and other activities may be misleading.
This problem can be avoided by applying fragment deletion operations before and
symmetrically fragment insertion operations after other operations in a sequence of
edit operations. In the above example, this could be achieved by reversing the order of
the two edit operations.
As such, we define the following condition for a sequence of edit operations.
Definition 20 (Fragment deletion/insertion order). Let S be a sequence of edit oper-
ations that transforms a process tree P into a process tree P′. It should hold that
fragment deletion operations precede and fragment insertion operations follow other
operations in S.
Furthermore, we consider less higher-level edit operations to be more understand-
able than more lower-level edit operations. Therefore, given a sequence of edit op-
erations that transforms P into P′, we aggregate the edit operations of the sequence
as much as possible to obtain a concise sequence of edit operations. For example, in
P: →
d×
cb
a 2
−→ →
da
the minimum-cost sequence of edit operations {D f (b),D f (c)}
can be reduced to the concise sequence of {D f (P〈P[2]〉)} by aggregating the two ac-
tivity deletion operations into the deletion of the fragment containing those deleted
activities.
In the remainder of this chapter, unless otherwise indicated, a sequence of edit
operations that transforms P into P′ always refers to a concise sequence of edit opera-
tions.
5.4.1.2 Process Tree Mappings
There are infinite numbers of different sequences of edit operations that transform P
into P′. Therefore, it may be impossible to enumerate all sequences and find the short-
est one. In the next section, we define structures called process tree mappings to prune
the search space further and solve this problem more efficiently. We adapt the map-
ping between ordered trees by Tai [113], a.k.a. Tai mapping, to work on process trees
featuring both ordered and unordered nodes. Figure 5.5 illustrates a sample mapping
between two process trees P and P′.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 91
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
P P
c
a b e
X
1 5
d b a f
0
1 X
e
^
d
'
2 3
4
6 7 2 3
4
5 6 7
0
^
Figure 5.5: Sample mapping between process trees P and P′.
A dotted line connecting a node n ∈ P to a node m ∈ P′ indicates that n is to
be substituted with m if l(n) 6= l(m), or remain unchanged. Each node in P that is
not connected by a dotted line is to be deleted from P, whereas each node in P′ not
connected by a dotted line is to be inserted in P. To maintain the hierarchical structure
of the trees we add two virtual nodes with the same label as the roots of the trees and
always map them to each other. Formally, a process tree mapping is defined as follows.
Definition 21 (Process tree mapping). A process tree mapping between two process
trees P and P′ is defined by a triple (M, P, P′), where M is any set of pairs of integers
(i, j) satisfying the following conditions:
1) A pair (i, j) ∈ M, where i 6= −1 and j 6= −1, indicates that P[i] needs to be sub-
stituted with P′[ j] if l(P[i]) 6= l(P′[ j]); otherwise it remains unchanged. A pair
(i,−1) ∈ M indicates that the node P[i] is to be deleted from P, whereas a pair
(−1, j) ∈M indicates that the node P′[ j] is to be inserted in P:
−1≤ i≤ |P|−1∧−1≤ j ≤ |P′|−1∧ (i 6=−1∨ j 6=−1)
2) Every node of P or P′ is in the mapping:
∀0≤i1≤|P|−1∃−1≤ j1≤|P|−1(i1, j1) ∈M∧∀0≤ j1≤|P′|−1∃−1≤i1≤|P|−1(i1, j1) ∈M
3) Each node of P or P′ is mapped at most once:
∀(i1, j1),(i2, j2)∈M∧(i1 6=−1∨i2 6=−1)∧( j1 6=−1∨ j2 6=−1)i1 = i2⇔ j1 = j2
4) For every pair (i 6=−1, j 6=−1) ∈M the following conditions should hold:
a) P[i] is a non- operator node iff P′[ j] is a non- operator node.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 92
5.4. PROCESS TREE TRANSFORMATION
b) P[i] is a -node iff P′[ j] is a -node.
c) P[i] is an activity node iff P′[ j] is an activity node.
d) P[i] is a τ-node iff P′[ j] is a τ-node.
e) Any two mapped -nodes w in P and u in P′, the nodes on the loopbody (resp.
loopback) path of w can only be mapped to the nodes on the loopbody (resp.
loopback) path of u:
Let w = P[r] and u = P′[s] be ancestors of P[i] and P′[ j] in P and P′, such that
l(w) = l(u) = and (r,s) ∈M, then P[i] is on the loopbody path of w iff P′[ j]
is on the loopbody path of u.
5) For every two pairs (i1 6=−1, j1 6=−1),(i2 6=−1, j2 6=−1) ∈M the following con-
ditions should hold:
a) P[i1] is an ancestor (resp., descendant) of P[i2] iff P′[ j1] is an ancestor (resp.,
descendant) of P′[ j2].
b) Let w be a common ordered ancestor of P[i1] and P[i2] in P, and u be a common
ordered ancestor of P′[ j1] and P′[ j2] in P′,
if Rankk(P[i1], w)< Rankk(P[i2], w) then Rankk(P′[ j1], u)≤ Rankk(P′[ j2], u)
if Rankk(P[i1], w)> Rankk(P[i2], w) then Rankk(P′[ j1], u)≥ Rankk(P′[ j2], u)
if Rankk(P′[ j1], u)< Rankk(P′[ j2], u) then Rankk(P[i1], w)≤ Rankk(P[i2], w)
if Rankk(P′[ j1], u)> Rankk(P′[ j2], u) then Rankk(P[i1], w)≥ Rankk(P[i2], w)
Condition 1 ensures that a node in P or P′ is either mapped to a node in the other
tree or to -1. Condition 2 and 3 ensure that every node in P or P′ is exactly mapped
once. Conditions 4a-4d ensure that M complies with the constraints of the substitution
edit operations (cf. 17). Condition 4e ensures that for any two mapped -nodes w
in P and u in P′, respectively, the nodes on the loopbody (resp. loopback) path of w
can only be mapped to the nodes on the loopbody (resp. loopback) path of u. For the
sample mapping in Figure 5.5, M = {(0, 0), (1, 1), (2, 3), (3,−1), (4,−1), (5, 4),
(6, 6), (7, 5), (−1, 2), (−1, 7)}.Condition 5a in conjunction with the previous conditions are sufficient to ensure
that after each touched node P[i] is changed to its paired node P′[ j] (if l(P[i]) 6= l(P′[ j])),
untouched nodes of P are deleted and untouched nodes of P′ are inserted in P, P and
P′ are equivalent provided that the two process trees only contain unordered opera-
tor nodes. However, as mentioned before, a process tree may contain both ordered
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 93
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
and unordered operator nodes. Hence, we add condition 5b to preserve the order
among siblings in both P and P′. For instance, for the two process trees P and P′
in the sample mapping in Figure 5.5, two nodes P[1] and P′[1] are the only ordered
nodes. Among descendants of P[1], i.e. {P[2], P[3]}, and descendants of P′[1], i.e.
{P′[2], P′[3]}, P[2] is mapped to P′[3] in the mapping M, i.e. (2, 3) ∈ M. Conse-
quently, P[3] cannot be mapped to P′[2] in M, i.e. (3, 2) /∈ M, as otherwise con-
dition 5b will be violated: Rankk(P[2], P[1])(= 1) < Rankk(P[3], P[1])(= 2), but
Rankkank(P′[3], P′[1])(= 2)� Rankk(P′[2], P′[1])(= 1)
To fully comply with the process tree edit operations and sequences thereof defined
in Section 5.4.1, a mapping needs to satisfy further conditions. We call a mapping
that satisfy those conditions a valid process tree mapping (valid mapping). Before
presenting the formal definition of a valid mapping we define the following notions. In
the remainder of this chapter, unless otherwise indicated, a mapping always refers to a
valid mapping.
Definition 22 (Deleted fragments in a mapping). Let M be a mapping between two
process trees P and P′, and let f be a fragment in P. The fragment f is deleted through
M if ∀P[k]∈ f (k,−1) ∈M.
Let S = { f1, . . . , fn} be the set of all deleted fragments in M. A fragment fi ∈ S is
a maximal deleted fragment if there is no f j(6= fi) ∈ S such that fi is a sub-fragment of
f j.
Definition 23 (Inserted fragments in a mapping). Let M be a mapping between two
process trees P and P′, and let f be a fragment in P′. The fragment f is inserted
through M if ∀P′[k]∈ f (−1,k) ∈M.
Let S = { f1, . . . , fn} be the set of all inserted fragments in M. A fragment fi ∈ S is
a maximal inserted fragment if there is no f j(6= fi) ∈ S such that fi is a sub-fragment
of f j.
Example 16. Figure 5.6 shows examples of deleted and inserted fragments in a map-
ping between process trees P and P′. The set of all deleted fragments in this mapping
is S = {b, c, ∧(b,c)}, among which Fragment 1 = ∧(b,c) is a maximal deleted frag-
ment. The set of all inserted fragments in this mapping is S′ = {e, f , →(e, f )}, among
which Fragment 2 =→(e, f ) is a maximal inserted fragment.
Definition 24 (Auxiliary operator nodes in a mapping). Let M be a mapping between
two process trees P and P′. A non--operator node ⊕ = P[i] (resp., ⊕ = P′[ j]) is an
auxiliary operator in M if:
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 94
5.4. PROCESS TREE TRANSFORMATION
P P
d1
'0
5a
c
2
b3 4
^
d
1
0
3
a
f
4
e5 6
2 X
Fragment 2
Fragment 1
P P
4
'0
c
b
1
a2 3
^
c
0
5
b
2
a3 4
1 X
e
5
d6 7
X
^ e
6
d7 8
^
Fragment 1
Fragment 1
P P
1
'0
a
c
2
b3 4
^ 1
0
a 2
Fragment 1
?
cb4 5Fragment 1
3^ 6
P P
1
'0
a
c
2
b3 4
^ 1
0
a 2
Fragment 1
cb4 5Fragment 1
3^ 6
X
P P'
Figure 5.6: Examples of deleted and inserted fragments in a mapping between processtrees P and P′. Fragment 1 is a maximal deleted fragment, whereas Fragment 2 is amaximal inserted fragment.
i) (i,−1) ∈M (resp., (−1, j) ∈M)
ii) Exactly one child fragment of ⊕ is not a deleted (resp., inserted) fragment in M.
An auxiliary operator node v = P[i] in a mapping corresponds to a node deleted
by the singularity reduction rule after a fragment deletion edit operation (cf. Defini-
tion 18), whereas an auxiliary operator node v = P′[i] in a mapping corresponds to an
auxiliary operator node inserted along with a fragment insertion (cf. Definition 19).
Definition 25 (Auxiliary τ-nodes in a mapping). Let M be a mapping between two
process trees P and P′. Also, let v∈ P (resp., v∈ P′) be a τ-node parented by a -node
u ∈ P (resp., u ∈ P′). v is an auxiliary τ-node if v is deleted (resp., inserted) in M while
u is not deleted (resp., inserted).
An auxiliary τ-node in a mapping corresponds to an auxiliary τ-node inserted
(resp., deleted) as a result of deleting (resp., inserting) a child fragment of a -node by
an edit operation D f (resp., I f ) to keep the number of children of the -node at 2 (cf.
Definitions 18 and 19).
Example 17. As an example, in the mapping between process trees P and P′ in Fig-
ure 5.7, the ∧-node 2 in P′ is an auxiliary operator node, inserted along with the
insertion of activity ‘e’, and the τ-node 5 in P is an auxiliary τ-node, deleted as a
result of inserting activity ‘d’.
Definition 26 (Trivial operator nodes). Let M be a mapping between two process trees
P and P′. A non--operator node v = P[i] (resp., v = P′[ j]) is a trivial operator node in
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 95
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
P P
1
'0
a
0
b2
d
3
c4 7
?
1 a 5
c6
?
e
2
b3 4
^
5
P P'0 0
c4 5 d1
a2
X
b3
5 d1
a2
X
b3 c4
P P'0 0
6 d
2
a3 b4
1
c5
X b2 3 d1 a
Figure 5.7: Sample auxiliary operator node, i.e. the ∧-node 2 in P′, and sample auxil-iary τ-node, i.e. the τ-node 5 in P, in a mapping between process trees P and P′.
M if v is deleted (resp., inserted), at least two child fragments of v contain some nodes
that are not deleted (resp., inserted), and at least one of the following conditions holds
for v:
i) There exists an inserted (resp., deleted) operator node v′ in P′ (resp., P) such that
l(v′) = l(v), and that all undeleted (resp., uninserted) leaves under v are mapped
to leaves under v′ and at least one uninserted (resp., undeleted) leaf under v′ is
not mapped to a node under v. Then, we refer to v as an indirectly-trivial operator
node. Let v′ be the deepest node that satisfies this condition, then we refer to v′ as
indirect parent of v.
ii) Let u be the deepest ancestor of v that satisfies one of the following conditions:
• u is mapped to a node u′ in P′ (resp., P). • u is an indirectly-trivial operator
node and a node u′ in P′ (resp., P) is its indirect parent. • u is an indirect parent
for an indirectly-trivial operator node u′ in P′ (resp., P). such that all undeleted
(resp., uninserted) leaves under v are mapped to leaves under u′. Then, one of the
following should hold for v and u:
a) l(v) = l(u).
b) l(v) = l(u′).
A trivial deleted operator node corresponds to an operator node deleted by the
associativity reduction rules after the application of an edit operation. Inversely, a
trivial inserted operator node corresponds to an operator node inserted as the root of a
sub-fragment of a non--operator node as a result of applying an edit operation.
Example 18. Figure 5.8 shows examples of trivial operator nodes in mappings. In
Figure 5.8a, the ×-node 1 in P is an indirectly-trivial operator node and the ×-node 1
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 96
5.4. PROCESS TREE TRANSFORMATION
in P′ is its indirect parent (condition i in Definition 26). After substituting the operator
of the→-node in the fragment→(×(a,b),c) with ×, resulting in the insertion of the
×-node 1 ∈ P′, the ×-node 1 ∈ P, i.e. ×(a,b)-node, is deleted by the associativity
reduction rule A×. In Figure 5.8b, the →-node 2 is a trivial operator node, as after
deleting activity ‘c’, and subsequently the×-node 1 by a singularity reduction rule, the
→-node 2 is deleted by the associativity reduction rule A→ (condition iia in Defini-
tion 26). In Figure 5.8c, the ∧-node 1 is a trivial operator node, as after changing the
operation of the→-node 0 to ∧, the ∧-node 1 is deleted by the associativity reduction
rule A∧ (condition iib in Definition 26).
P P
1
'0
a
0
b2
d
3
c4 7
?
1 a 5
c6
?
e
2
b3 4
^
5
P P'0 0
c4 5 d1
a2
X
b3
5 d1
a2
X
b3 c4
P P'0 0
6 d
2
a3 b4
1
c5
X b2 3 d1 a
(a) ×-node 1 in P.
P P
1
'0
a
0
b2
d
3
c4 7
?
1 a 5
c6
?
e
2
b3 4
^
5
P P'0 0
c4 5 d1
a2
X
b3
5 d1
a2
X
b3 c4
P P'0 0
6 d
2
a3 b4
1
c5
X b2 3 d1 a
(b)→-node 2.
P P'0 0
4 d1
a2 b3
b2 3 d1 a^
^
(c) ∧-node 1.
Figure 5.8: Examples of trivial operator nodes in mappings.
Definition 27 (Lowest mapped ancestors). Let M be a mapping between two process
trees P and P′. The lowest mapped ancestors (LMAs) of two nodes v ∈ P and v′ ∈ P′
in M, denoted by LMAsM(v, v′), is a pair (u, u′) of nodes, where u = P[r] and u′ =
P′[s] are ancestors of v and v′, respectively, such that (r, s) ∈M and there is no pair
(m, n) in M, where P[m] is an ancestor of v and P′[n] is an ancestor of v′, such that
dep(P[m])> dep(P[r]) ∧ dep(P′[n])> dep(P′[s]).
Definition 28 (Valid process tree mapping). Given two process trees P and P′, a valid
process tree mapping from P to P′ is a mapping M satisfying the following conditions:
1) For every subtree R = P[i](Q1,Q2) in P (resp., R = P′[ j](Q1,Q2) in P′), where P[i]
(resp., P′[ j]) is a -node, and Q1 and Q2 are process trees, if (i,−1) ∈ M (resp.,
(−1, j) ∈M) then Q2 is a deleted fragment (resp., inserted fragment) in M.
2) Let ⊕ be an operator node in P (resp., P′) that is mapped to an operator node in
the other tree. If l(⊕) ∈ {→,∧} (resp., l(⊕) = ) then at least two (resp., one)
child fragments of⊕ should contain some activity nodes that are not deleted (resp.,
inserted) in M. If l(⊕) = × then at least two child fragments of ⊕ should not be
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 97
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
deleted (resp., inserted) fragments, and one of which should contain some activity
nodes that are not deleted (resp., inserted).
3) For every pair (i, j) in M such that t = P[i] and t ′ = P′[ j] are two τ-nodes, one of
the following conditions should hold:
Let q = P[r] and q′ = P′[s] be the parents of t and t ′, respectively.
a) There exists a pair (r, s) in M.
b) Let v∈P and v′ ∈P′ be the deepest ancestors of t and t ′, respectively, that satisfy
one of the following conditions: • (v, v′) = LMAsM(t, t ′) (cf. Definition 27). • v
is an indirectly-trivial operator node and v′ is its indirect parent (cf. Definition
26). • v′ is an indirectly-trivial operator node and v is its indirect parent. Let u
be an ancestor of t (resp., t ′) such that u is on the shortest path from v (resp., v′)
to q (resp., q′) and l(u) ∈ {→,∧}. One of the following conditions should hold
for u:
i) u is an auxiliary operator node in M (cf. Definition 24).
ii) The child fragment of u containing t (resp., t ′) should at least contain an
activity node that is not deleted (resp., inserted) in M.
The above conditions are defined to ensure that a mapping satisfies all the condi-
tions of process tree edit operations and sequences thereof, defined in Section 5.4.1.
As defined for the edit operations I and D, the second child fragment of a -
node which is to be deleted (resp., inserted) is a τ-node (cf. Definitions 18 and 19).
That is, to delete a -node we need to first delete its second child fragment (if 6= τ) by
a D f operation. And to insert a -node with a non-τ second child fragment f we first
need to insert the -node by a D operation and subsequently insert f as its second
child. This is ensured in M by condition 1, which requires the deletion (resp., insertion)
of the second child fragment of a deleted (resp., an inserted) -node in M.
Example 19. As an example, in the mapping between two process tree P and P in
Figure 5.9 since the -node P[1] is deleted, its second child fragment, i.e. activity b, is
also deleted.
As defined in Definition 20, fragment deletions precede and fragment insertions
follow all other operations in a sequence of edit operation. Also, as we explained in
Section 16, after the application of each edit operation to a process tree we reduce the
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 98
5.4. PROCESS TREE TRANSFORMATION
P P
d
c
1
a
'
2
0
^
?
b3 4
5 d
0?
2a1
P P
c
b
1
a
'
2
0
3
5 c
0
3a1?
d2
P
c
b
1
a2
0
3
4^ d5 e6
P'
cb
1
a2
0
3 4
^
d5
e6
P P'
c
b
1
0
5
6 c
0
31
X X
X2
4a3
b2
P P
c
b
1
'0
5
6 c
0
31
X X
X2
4a3
b2
Pt
c
0
5
X
b4X1
3a2
Delete Delete a
P P'
b
0
4X1
3a2
b
0
4X1
3c2
Figure 5.9: Sample mapping that satisfies condition 1.
tree to normal form by applying reduction rules. As a result of the latter some operator
nodes may be deleted. Thus, to ensure that an operator node⊕ in P (resp., in P′) that is
mapped to an operator node in P′ (resp., P) cannot be deleted by a reduction rule after
(resp., before) the application of all fragment deletions (resp., insertions) we require⊕to satisfy condition 2.
Example 20. As an example, the invalid mapping between two process tree P and P
in Figure 5.10 does not satisfy condition 2. This is because the ∧-operator node P[1]
is mapped to a node in P′, while it does not at least have two child fragments that have
some undeleted activity nodes. As a result, after deleting activity ‘a’, the ∧-node P[1]
will also be deleted by the singularity reduction rule S and hence cannot be mapped
to a node in P′.
P P
d
c
1
a
'
2
0
^
?
b3 4
5 d
0?
2a1
P P
c
b
1
a
'
2
0
3
5 c
0
3a1?
d2
P
c
b
1
a2
0
3
4^ d5 e6
P'
cb
1
a2
0
3 4
^
d5
e6
P P'
d
b
2
0
6
8 d
0
42
X X
X3
5a4
c3
P P
c
b
1
'0
5
6 c
0
31
X X
X2
4a3
b2
Pt
c
0
5
X
b4X1
3a2
Delete Delete a
P P'
b
0
4X1
3a2
b
0
4X1
3c2
c7
1
^
a1
d
0
71
X3
5b4
c6
a2
^ d
0
41
c3b2
^
P P'
Figure 5.10: Sample invalid mapping that does not satisfy condition 2.
As defined in Definition 14, a τ-node may be deleted by one of the τ-reduction
rules, T→ or T∧. As such, we defined condition 3 to ensure that a τ-node to which
one of the τ-reduction rules can be applied is always deleted in a mapping. That is,
we only allow a τ-node t ∈ P to be mapped to a τ-node t ′ ∈ P′ in M if for which one
of the two conditions, 3a or 3b, holds. Condition 3a requires the parents q and q′ of
t and t ′, respectively, to be mapped in M. For condition 3b we first define two nodes
v and v′ as the deepest ancestors of t and t ′, respectively, that are either mapped in M
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 99
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
Edit operation Representation in a mapping (M, P, P′)SUB⊕ A non--operator node n ∈ P mapped to a non--operator node m ∈ P′
such that l(n) 6= l(m) or a non-auxiliary nontrivial deleted or insertednon--operator node.
SUBac An activity node n∈P mapped to an activity node m∈P′ such that l(n) 6=l(m).
D f A maximal deleted fragment (6= trivial τ-node).D A deleted (M1,M2)-node n, where M1 is not a deleted fragment (i.e.
n /∈ a maximal deleted fragment).I f A maximal inserted fragment (6= trivial τ-node).I An inserted (M1,M2)-node n, where M1 is not an inserted fragment (i.e.
n /∈ a maximal inserted fragment).
Table 5.3: Process tree edit operations (cf. Section 16) and their representations in amapping.
or one of them is an indirectly-trivial operator node and the other one is its indirect
parent. To avoid the deletion of t or t ′ by one of T→ or T∧, we then require each→-
or ∧-node u on the shortest path from q (resp., q′) to v (resp., v′) to satisfy one of the
two conditions, 3bi or 3bii.
Example 21. As an example, consider the mapping between the two process trees
P and P′ in Figure 5.11, where the τ-node P[5] is mapped to the τ-node P′[1], and
LMAsM(P[5],P′[1]) = (P[0], P′[0]). In this mapping, the ∧-node P[2] satisfies condi-
tion 3bi, and the→-node P[1] satisfies condition 3bii.
P P
d
c
1
a
'
2
0
^
?
b3 4
5 d
0?
2a1
P P
c
b
1
a
'
2
0
3
5 c
0
3a1?
d2
P
c
b
1
a2
0
3
4^ d5 e6
P'
cb
1
a2
0
3 4
^
d5
e6
P P'
d
b
2
0
6
8 d
0
42
X X
X3
5a4
c3
P P
c
b
1
'0
5
6 c
0
31
X X
X2
4a3
b2
Pt
c
0
5
X
b4X1
3a2
Delete Delete a
P P'
b
0
4X1
3a2
b
0
4X1
3c2
c7
1
^
a1
d
0
71
X3
5b4
c6
a2
^ d
0
41
c3b2
^
Figure 5.11: Sample mapping that satisfies condition 3.
Table 5.3 illustrates how each edit operation is represented in a mapping.
Definition 29 (Process tree mapping cost). Let M be a mapping between two process
trees P to P′, We define the cost of M as follows:
cost(M) = total cost of all node substitutions +
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 100
5.4. PROCESS TREE TRANSFORMATION
total cost of all maximal deleted and inserted fragments +
total cost of all deleted and inserted -nodes /∈maximal deleted or inserted fragments +
total cost of all deleted and inserted non- operator nodes /∈maximal deleted or inserted fragments.
We compute each of the first three costs in the same way as we did for the edit op-
erations, while the cost of deleting or inserting a non--operator node is 1. Auxiliary
or trivial nodes have no cost.
Thus, the cost of M is just the cost of the sequence of edit operations consisting
of: a SUB⊕ for each operator node substitution or non-auxiliary nontrivial deleted
or inserted non--operator node in M that is not in a maximal deleted or inserted
fragment, a SUBac for each activity node substitution in M, a D f (resp., I f ) for each
maximal deleted (resp., inserted) fragment (excluding trivial τ-nodes) in M, and a D
(resp., I) for each deleted (resp., inserted) -operator in M that is not in a maximal
deleted (resp., inserted) fragment. It can be shown that d(P, P′) can be determined by a
minimum-cost mapping from P to P′. This proof is similar to the proof of Theorem 3.1
in [113] and is omitted. Since d(P,P′) = min{θ(S) | S is a sequence of edit operations
which transforms P into P′}, we obtain: d(P,P′) = min{cost(M) | M is a mapping
from P to P′}Hence, the search for a minimum-cost sequence of edit operations has been reduced
to a search for a minimum-cost mapping.
5.4.2 Finding Process Tree Mappings & Lower Bounding Func-tion
In the next two sub-sections we present two algorithms for finding a minimum-cost
mapping between two process trees. Here we define a mapping search tree which is a
data structure to capture the search space of the mapping, based on two different search
strategies: exhaustive (A*) and greedy.
Definition 30 (Mapping search tree). A mapping search tree between two process trees
P and P′, denoted by MST (P, P′), is a tree such that the label of the root is 0, the depth
is |P|−1, and every internal node has a maximum of |P′| children, each labeled by one
of −1,1,2, . . . , |P|−1.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 101
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
We say that a node v in MST (P, P′) is valid if the following set Mv of pairs of
integers forms a mapping between P and P′:
Mv ={(dep(w), l(w)) | w ∈ DownMST (P, P′)(v)} ∪
{(r,−1) | r ∈ {dep(v)+1, . . . , |P|−1}} ∪
{(−1, s) | s ∈ {1, . . . , |P′|−1}−{l(w) | w ∈ DownMST (P, P′)(v)}}
In this chapter, we refer to a mapping search tree as one consisting of just valid
nodes. Hence, Mv is a mapping in which each node on the DownMST (P, P′)(v) denotes
the pair (i, j), such that i = dep(v) and j = l(v). Every node m in P, with r = preP(m),
for which r > dep(v) is deleted in Mv ((r,−1) ∈ Mv), and every node n in P′, with
s= preP′(n), for which s /∈ {l(w) | w∈DownMST (P, P′)(v)} is inserted in Mv ((−1, s)∈Mv).
Example 22. Consider process trees P and P′ in Figure 5.12a. Figure 5.12b illustrates
the mapping search tree MST (P, P′). For example, the path 〈0, 2,−1〉 in MST (P, P′)
represents the mapping {(0, 0), (1, 2), (2,−1), (−1, 1)} between P and P′. In this
path, the node labeled with “2” does not have a child with the label “1”, because the
set of pairs {(0, 0), (1, 2)}, in compliance with the condition 5b of mapping, cannot
form a mapping with the pair (2, 1).
0
-1 1 2 -1 2
-1 1
P P'
a
0
1 2b
0
1
2
b
0
1 2a
-1
1
(a)
0
-1 1 2 -1 2
-1 1
P P'
a
0
1 2b
0
1
2
b
0
1 2a
-1
2
(b)
Figure 5.12: Process trees P and P′ (a) and their mapping search tree (b) in Example 22.
In a mapping M between P and P′, each activity node in P can only be mapped to
−1 or to an activity node in P′, and vise versa (conditions 3c and 4 in Definition 21).
Moreover, the cost of substituting, inserting or deleting an activity node in M is always
1, and the cost of mapping two activity nodes with the same label is 0. Therefore, it
holds that the cost of M at least equals to the minimum cost of mapping two sets of
activity nodes under P and P′. For example, assume C1 = {a, b, c, d} and C2 = {a, c,
e, f , g} are the two sets of activity nodes under P and P′, respectively. The cost of M
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 102
5.4. PROCESS TREE TRANSFORMATION
at least equals to the minimum cost of mapping C1 and C2, i.e. 3, obtained from the
activity mapping set S = {(a, a), (b, e), (c, c), (d, f ), (−1, g)} between C1 and C2.
Given two sets C1 and C2 of activity nodes, Algorithm 2 computes the minimum cost
of mapping C1 and C2.
Algorithm 2 Compute the minimum cost of mapping two sets of activity nodes1: procedure MINMAPPINGCOST(C1, C2)2: cost← 03: for each c ∈C1 do4: for each d ∈C2 do5: if l(c) = l(d) then6: C1←C1−{c}7: C2←C2−{d}8: break9: end if
10: end for11: end for12: cost←min(|C1|, |C2|) + (||C1| − |C2||)13: return cost14: end procedure
For every node in C1, Algorithm 2 iterates over all nodes in C2 to find a node with
the same label. If such a node is found it removes the two nodes from their respective
sets (lines 3-8). After processing every node in C1, there is no pair of nodes from
C1 and C2 with the same label. The remaining nodes in C1 and C2 are then mapped
injectively to each other, constituting min(|B1|, |B2|) mappings. Finally, the remaining
|B1|−|B2| nodes in C1 or C2 are mapped to −1.
5.4.2.1 Exhaustive search
In this section we introduce an A* algorithm that finds the minimal cost mapping be-
tween process trees P and P′, by finding the cheapest path from the root to a leaf
in the mapping search tree MST(P, P′). However, instead of constructing the whole
MST(P, P′) our A* algorithm traverses P in a pre-order manner and only constructs
nodes in MST(P, P′) that are potentially a part of the cheapest path to the leaves.
It is necessary to define two functions g∗(v) and h∗(v) for any instantiation of the
A* algorithm. For a node v ∈MST(P, P′), g∗(v) determines the mapping cost up to v,
whereas h∗(v) estimates the cost of mapping the nodes that have not yet been mapped
up to v. Let v be a node in MST(P, P′) such that dep(v) = preP(w) and l(v) = preP′(u)
or l(v) =−1. Let Y1 and Y2 be the sets of activity nodes in P and P′, respectively. Also,
let C1 =C(w) and C2 =C(u) be the sets of activity nodes under w and u (if l(v) 6=−1),
respectively. Furthermore, let Pm and P′m be the sets of nodes in P and P′, respectively,
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 103
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
that are already mapped (either to a node in the other process tree or to −1). Then,
h∗(v) is defined as follows.
h∗(v) =
minMappingCost(C1, C2) + if l(v) 6=−1
minMappingCost(Y1 \C1 \Pm, Y2 \C2 \P′m)
minMappingCost(Y1 \Pm, Y2 \P′m) if l(v) =−1
To compute g∗(v) we need to compute and sum the cost of every node substitution,
node deletion, and node insertion induced by the mappings on the path from the root
to v in MST(P, P′). However, as specified in Definition 29 for computing the cost of
a mapping M between two process trees P and P′, the cost of deletion or insertion of
an auxiliary or trivial operator node in M is 0. Whether a deleted or inserted operator
node is considered auxiliary or trivial in a mapping depends on how its descendants
are mapped (cf. Section 5.4.1.2). For example, a deleted operator node o is auxiliary
in M if at most one of its child fragments is not deleted in M. Therefore, to determine
if o is an auxiliary operator node we need to know how its descendants are mapped.
However, as we construct a mapping search tree by traversing P in a pre-order man-
ner, the descendants of o are still unmapped when computing the mapping cost at v.
Thus, to enable computing the cost of a deleted or inserted operator node we assume a
mapping of −1 for each of its descendant nodes that is not already mapped. However,
this assumption is only to assist computing the cost of mapped operator nodes up to
each node on a mapping search tree, and does not imply the deletion or the insertion of
those unmapped descendant nodes. Furthermore, this assumption does not result in the
overestimation of the mapping cost as it actually leads to the temporary consideration
of a deleted or inserted operator node as an auxiliary or trivial operator node, with a
cost of 0.
The A* algorithm computes the value of f ∗(v) = g∗(v) + h∗(v) for each node v in
MST(P, P′), and at each step searches for the node with the lowest f ∗. The A* algo-
rithm for finding the lowest cost mapping between P and P′ is given as Algorithm 3.
The A* Algorithm starts with constructing the root node v of MST(P, P′) from a
mapping between two fake root nodes added to P and P′ (line 4). These fake nodes have
the same label, randomly selected from {→,×,∧}\ ({l(root(P)}∪{l(root(P′))}). A
list L holds child-free nodes in MST(P, P′). The algorithm proceeds with adding nodes
to MST(P, P′) and selecting the node in L with the lowest cost at each step (lines 5-14).
Each time a node is selected its subsequent node in the pre-order traverse of P is first
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 104
5.4. PROCESS TREE TRANSFORMATION
Algorithm 3 A*
1: procedure ASTAR(P, P′)2: /* MST(P, P′) is a mapping search tree between P and P′*/3: /*L is a list of triples*/4: add the node v labeled by 0 as the root to MST(P, P′)5: while dep(v) 6= |P|−1 do6: i← dep(v) + 17: add the node u such that l(u) =−1 to MST(P, P′) as the child of v8: L← L∪{(u, g∗(u), h∗(u))}9: for each w ∈ P′ do
10: if (Mv−{(i, −1)}) ∪{(i, preP′(w))} forms a mapping between P and P′ then11: add the node u such that l(u) = preP′(w) to MST(P, P′) as the child of v12: L← L∪{(u, g∗(u), h∗(u))}13: end if14: end for15: select (v, g∗(v), h∗(v)) ∈ L such that f ∗(v) is minimum16: L← L−{(v, g∗(v), h∗(v))}17: end while18: return Mv19: end procedure
deleted (mapped to −1) (lines 7-8), and then mapped to any node of P′ that does not
lead to the violation of mapping conditions (lines 9-12). At each iteration of the while
loop the node v in L with the lowest f ∗ is selected (line 13). The while loop halts if
v is a leaf of MST(P, P′). At the end, the algorithm outputs the mapping Mv, i.e. the
minimum-cost mapping between P and P′ (line 15).
Example 23. Consider process trees P and P′ in Figure 5.13. Two fake ×-nodes are
added as roots to P and P′, to be mapped at the first step of the A* algorithm. We
illustrate the run of the A* algorithm to construct the MST(P, P′) in Figure 5.14. Here,
the index of a node in MST(P, P′) represents the value g∗(v) + h∗(v) for that node. At
each step, the node with the minimum cost in L, highlighted with gray, is selected and
deleted from L. The children of this node are then added in the following step both to
the MST(P, P′) and to L. Also, the mapping corresponding to the path from the root
to this node is illustrated by dotted lines on the two trees on the left. Each number on
the left side of the MST(P, P′) indicates the pre-order index of the node in P that is
mapped to nodes in P′ via the nodes in that depth of MST(P, P′).
At the step (a) of the running example, the two added fake roots are mapped to each
other, constructing the root of MST(P, P′). At the step (b), the two children of the root
of MST(P, P′) are added by mapping the→-node in P, with the pre-order index of 1,
to −1 and to the→-node in P′, with the pre-order index of 1. As a result of deleting
the→-node in P (mapping to −1) the→-node in P′ is also deleted, since there is no
other operator node in P that can potentially be mapped to it. Here, the value of g∗
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 105
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
0
X X
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X
-10+1 10+1
0+1P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X
00+1
'
0
1
0
Figure 5.13: Process trees P and P′ in Example 23.
for mapping the→-node in P to −1 is 0. Because, as explained before, the algorithm
initially considers a deleted/inserted operator node as an auxiliary or trivial operator
node if it does not know how its descendants are mapped. As these two added children
nodes have the same cost one of them is randomly selected, here the mapping to −1.
At the step (c), the node ‘a’ in P is mapped to −1, to ‘b’ and to ‘a’ in P′, respectively.
Among the existing nodes in L, the mapping of the→-node in P to the→-node in P′
is selected as it has the lowest cost of 1. At the step (d), the children of this node are
added, i.e. again mapping the node ‘a’ to −1, to ‘b’ and to ‘a’ in P′, respectively, and
the node with the lowest cost is selected. At the step (e), the algorithm maps the node
‘b’ in P to −1, to ‘b’ and to a in P′, respectively, and selects the mapping to ‘b’ as the
minimum-cost node. At the step (f), the children of this node are added, i.e. mapping
the node ‘c’ to −1, and to ‘a’ in P′, respectively. Note that mapping the node ‘c’ to
the node ‘a’ makes the deleted (resp., inserted) →-node in P (resp., P′) on the path
〈0,1,−1,3〉 non-auxiliary and non-trivial as it does not satisfy the conditions of an
auxiliary or a trivial operator node anymore (cf. Definition 28). The minimum-cost
node at this step is the mapping of ‘a’ to −1 on the path 〈0, 1,−1〉. At the step (g),
the node ‘b’ in P is again mapped to −1, and to ‘b’ and to ‘a’ in P′, respectively, with
the mapping to ‘b’ being the one with the minimum cost among all nodes. Finally, at
the step (h), the node ‘c’ in P is mapped to −1 and the node ‘a’ in P′, with the latter
mapping forming the minimum-cost node. At this step the A* algorithm terminates as
the node v with the lowest mapping cost is a leaf in MST (P, P′). The minimum-cost
path from the root to v is 〈0, 1,−1, 2, 3〉, and the minimum-cost mapping between P
and P′ is Mv = {(0, 0), (1, 1), (2,−1), (3,2), (4,3)}, with the cost of 2.
Time Complexity
It is known that the problem of computing the tree edit distance between two unordered
trees is NP-hard [138]. A process tree may contain unordered operator nodes, such as
×-nodes and ∧-nodes as well as ordered operator nodes, such as →-nodes and -
nodes. Therefore, the problem of computing the minimum-cost mapping between two
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 106
5.4. PROCESS TREE TRANSFORMATION
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
0
X X
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X
-10+1 10+1
0+1P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X
00+1
'
0
1
0
(a)
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
0
X X
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X
-10+1 10+1
0+1P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X
00+1
'
0
1
0
(b)0
-11+1 21+2 31+2
-10+1 10+1
0+1
0
-11+1 21+2 31+2 -11+1 21+2 31+2
-10+1 10+1
0+1
0
-11+1 21+2 31+2 -11+1 21+2 31+2
-10+1 10+1
0+1
-12+2 21+1 33+1
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X0
1
2
3
0
1
2
0
1
2
(c)
0
-11+1 21+2 31+2
-10+1 10+1
0+1
0
-11+1 21+2 31+2 -11+1 21+2 31+2
-10+1 10+1
0+1
0
-11+1 21+2 31+2 -11+1 21+2 31+2
-10+1 10+1
0+1
-12+2 21+1 33+1
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X0
1
2
3
0
1
2
0
1
2
(d)
0
-11+1 21+2 31+2
-10+1 10+1
0+1
0
-11+1 21+2 31+2 -11+1 21+2 31+2
-10+1 10+1
0+1
0
-11+1 21+2 31+2 -11+1 21+2 31+2
-10+1 10+1
0+1
-12+2 21+1 33+1
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X0
1
2
3
0
1
2
0
1
2
(e)
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 107
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
0
-11+1 21+2 31+2 -11+1 21+2 31+2
-10+1 10+1
0+1
-12+2 21+1 33+1
-13+0
0
-11+1 21+2 31+2 -11+1 21+2 31+2
-10+1 10+1
0+1
-12+2 21+1 33+1 -12+2 21+1 33+1
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X0
1
2
3
4
0
1
2
3
4 34+0
-13+0 34+0
(f)
0
-11+1 21+2 31+2 -11+1 21+2 31+2
-10+1 10+1
0+1
-12+2 21+1 33+1
-13+0
0
-11+1 21+2 31+2 -11+1 21+2 31+2
-10+1 10+1
0+1
-12+2 21+1 33+1 -12+2 21+1 33+1
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X0
1
2
3
4
0
1
2
3
4 34+0
-13+0 34+0
(g)
0
-11+1 21+2 31+2 -11+1 21+2 31+2
-10+1 10+1
0+1
-12+2 21+1 33+1 -12+2 21+1 33+1
-13+0 32+0
P P
a b
1
'
2 3 c4
0
a
1
2 3
0
b
X X0
1
2
3
4 -13+0 34+0
(h)
Figure 5.14: The running example of the A* algorithm for the process trees P and P′
in Figure 5.13.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 108
5.4. PROCESS TREE TRANSFORMATION
process trees is also NP-hard.
5.4.2.2 Greedy Search
As mentioned in the previous section, the problem of computing the minimum-cost
mapping between two process trees is NP-hard. Consequently, depending on how
different the two process trees are, the A* algorithm may not be able to compute the
optimal solution within a reasonable time. Therefore, in this section we introduce a fast
greedy algorithm, illustrated in Algorithm 4, that approximates the optimal solution.
Algorithm 4 Greedy1: procedure GREEDY(P, P′, threshold)2: /* MST(P, P′) is a mapping search tree between P and P′*/3: /*L is a list of triples*/4: add the node v labeled by 0 as the root to MST(P, P′)5: while dep(v) 6= |P|−1 do6: i← dep(v) + 17: add the node z such that l(z) =−1 to MST(P, P′) as the child of v8: if P[i] is not an operator node then9: L← L∪{(z, g∗(z), h∗(z))}
10: end if11: for each w ∈ P′ do12: if (Mv−{(i, −1)}) ∪{(i, preP′(w))} forms a mapping between P and P′ then13: add the node u such that l(u) = preP′(w) to MST(P, P′) as the child of v.14: L← L∪{(u, g∗(u), h∗(u))}15: end if16: end for17: select (v, g∗(v), h∗(v)) ∈ L such that f ∗(v) is minimum18: y← P[i]19: if y is an operator node then20: if v 6= null then21: y′← P′[l(v)]22: ubc← max(|C(y)|, |C(y′)|) /*Upper bound for the cost of mapping two activity
sets C(y) and C(y′)*/23: mmc← minMappingCost(C(y), C(y′))24: matchingScore← (ubc−mmc)/ubc25: if matchingScore < threshold then26: v← z27: end if28: else29: v← z30: end if31: end if32: L← /033: end while34: return Mv35: end procedure
The Greedy algorithm is similar to the A* algorithm. The latter finds a mapping
between P and P′ that has the lowest global cost. As such, it may process each node in
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 109
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
P multiple times. In contrast, the greedy algorithm selects and fixes a locally optimal
mapping for each node in P at each step of constructing MST (P,P′). This is performed
by clearing the list L containing the child-free nodes in MST (P,P′) at the end of each
step (line 26). In addition, the greedy algorithm has a different strategy for mapping
operator nodes. For every operator node y in P, this algorithm first finds an operator
node y′ in P′ with the lowest mapping cost (lines 10-14). Next, it computes a matching
score between y and y′. The matching score measures the similarity of activity nodes
under y and y′, and lies in the range [0,1], where 0 indicates that there is no pair (a,b) of
activities in C(y)×C(y′) such that l(a) = l(b), whereas 1 indicates {l(a) | a ∈C(y)}={l(b) | b ∈C(y′)}. The nodes y and y′ are mapped if the matching score between them
is above the threshold. However, in case there is no node in P′ that can be mapped to
y or if the matching score is bellow the threshold, y is mapped to −1, by selecting z as
the optimal node in MST (P, P′) (lines 16-25).
Time Complexity
Given two process trees P and P′, the time complexity of the greedy algorithm is
dominated by the complexity of the while loop (line 5). The complexity of this loop
is the maximum of the worst-case complexity of three sequential steps. This loop
iterates |P| times. At each iteration, we first map a node y ∈ P to every node in P′
that satisfies the mapping conditions (lines 10-13), thus O(|P| · |P′|). We then select
the mapping between y and a node in P′ with the lowest cost (line 14). The latter step
requires, in the worst case, iterating over |P′| mappings, thus O(|P| · |P′|). If y is an
operator node for which we are able to find a mapping node y′ ∈ P′, then we compute
the minimum cost of mapping two sets of activity nodes under y and y′ (line 20), thus
O(|P|2 · |P′|). Therefore, the worst-case complexity of the while loop and so that of the
greedy algorithm is O(|P|2 · |P′|).
From Process Tree Mapping to Sequence of Edit Operations
Finally, from a given mapping between two process trees P and P′, we extract a concise
sequence of edit operations that transforms P into P′. In that, we need to satisfy the
fragment deletion/insertion order condition of sequence of edit operations (cf. Defini-
tion 20), that requires fragment deletions (resp., insertions) to precede (resp., to follow)
other operations. Accordingly, we extract fragment deletions first and fragment inser-
tions last from the mapping. Furthermore, edit operations of the same type in each step
are ordered based on the lexicographical order of the fragments to which they are ap-
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 110
5.5. CONSTRUCT DRIFT CHARACTERIZATION STATEMENTS
plied. We extract a concise sequence of edit operations from a mapping by performing
the following steps in this order.
1) Fragment deletion
for i = 1 to dep(P) doAdd a D f operation for every maximal deleted fragment (6= trivial τ-node)
rooted at depth i of P.
end for
2) Activity Substitution
Add a SUBac operation for every activity node substitution.
3) Operator Substitution, -Operator deletion, and -Operator insertion
for i = 1 to dep(P) doAdd a SUB⊕ operation for every non-auxiliary nontrivial deleted non- op-
erator node at depth i of P.
Add a D operation for every deleted -node at depth i of P.
end for
for i = dep(P′) to 1 doAdd a SUB⊕ operation for every non-auxiliary nontrivial inserted non- op-
erator node at depth i of P′.
Add a SUB⊕ operation for every operator node substitution at depth i of P′.
Add an I operation for every inserted -node at depth i of P′.
end for
4) Fragment insertion
for i = dep(P′) to 1 doAdd an I f operation for every maximal inserted fragment (6= trivial τ-node)
rooted at depth i of P′.
end for
5.5 Construct Drift Characterization Statements
The output of the previous section is a sequence of edit operations that transforms a
pre-drift process tree P into a post-drift process tree P′. In this section, we construct
a sequence of characterization statements based on a given sequence of edit opera-
tions. As explained in Section 5.4.1, each edit operation describes a simple change in
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 111
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
Table 5.1. By aggregating the simple changes obtained from a sequence of edit op-
erations in a post-processing step we create compound changes (cf. Table 5.1). This
further reduces the number of changes reported to the user and creates higher-level
changes that are easier to interpret. Each remaining simple change and each created
compound change is then reported to the user as a natural language statement.
5.5.1 Simple Change Patterns
Here we describe how each simple change pattern is captured by an edit operation.
• Insert/delete a fragment (sre, pre, cre) The application of a D f edit operation
on a non-τ-fragment represents a fragment deletion, whereas the application of
a I f edit operation on a non-τ-fragment represents a fragment insertion. The
fragment insertion/deletion is serial (sre), parallel (pre), or conditional (cre) if
the parent node of the inserted/deleted fragment is a →-node, a ∧-node, or an
×-node, respectively. Also, the insertion/deletion of a fragment in/from the loop-
body (resp., loopback) of a -node is considered as a serial (resp., conditional)
fragment insertion/deletion. For example, in the transformation of P into P′ in
Figure 5.15a, Fragment 1 is deleted from between activity ‘a’ and activity ‘c’
(serial deletion), whereas Fragments 2 is inserted in a conditional branch with
activity ‘d’.
• Make fragments mutually exclusive/parallel/sequential (cf, pl) The applica-
tion of a SUB⊕ operation on an operator node v changes the relation between
child fragments of v. For example, in the transformation of P into P′ in Fig-
ure 5.15b, Fragment 1 precedes activity ‘a’ in P, but, after the substitution of
the operator of the→(∧(a,b),c)-node with×, they are mutually exclusive in P′.
In addition to the change patterns cf and pl, the relation between two fragments
can also be changed from mutually exclusive to parallel, and vice versa. Though,
this change pattern is not defined as one of the common change patterns in [129].
For example, in Figure 5.15b, activities ‘d’ and ‘e’ were mutually exclusive in
P, but after the substitution of the operator of the ×-node P[5] with the ∧, they
are parallel in P′.
• Make a fragment loopable/non-loopable (lp) The insertion (resp., deletion) of
a -node by a I (resp., D) edit operation as the parent of a fragment makes
that fragment loopable (resp., non-loopable). For example, in the transformation
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 112
5.5. CONSTRUCT DRIFT CHARACTERIZATION STATEMENTS
of P into P′ in Figure 5.15c, Fragment 1 in P has become loopable in P′ with the
insertion of the -node P′[2].
• Make a fragment skippable/non-skippable (cb) The insertion (resp., deletion)
of a τ-node by a I f (resp., D f ) edit operation under an×-node makes other child
fragments of the ×-node skippable (resp., non-skippable). For example, in the
transformation of P into P′ in Figure 5.15d, with the insertion of the τ-node
P′[6], Fragment 1 has become skippable in P′.
P P
d
'0
a
cb
^
d
0
a
fe
2 X
Fragment 2
Fragment 1
P P'
c
ba
^
c
ba
X
e
5
d
X
^ ed
^
Fragment 1Fragment 1
Make Fragment 1 and 'c'
mutually-exclusive
Make 'd' and 'e' parallel
P P'
a
cb
^ a 2
Fragment 1
?
cbFragment 1
^
Make Fragment 1
loopable
P P'
a
cb
^ a
Fragment 1
cbFragment 1
^ 6
X
Make Fragment 1 skippable
P P
c
'
c
ba
X
Fragment 2
Fragment 1ba
^
^ba
^
Fragment 1
Duplicate Fragment 1
P P'
a
db
a
fe
X X
c
Fragment 1 Fragment 2
Substitute Fragment 1 with
Fragment 2
P
c
ba
ed
^
X
P'
c
de
^
ba
X
Fragment 1 Fragment 2
Fragment 2 Fragment 1
^
^
Swap Fragment 1 and Fragment 2
e
e
X
ba
^
ba
^dc
^
dc
^
Fragment 1
Fragment 1
P P'
Move Fragment 1
ba
^ c
fe
^
ba
^ c
fe
^40%
30%30% 70%10%
20%
Fragment 1
P P'Change
branching frequency
X X
P P
ca b ed X
c d
'
eaDelete 'b'
Make 'c' and 'd' mutually-exclusive
PX
b c
?
aa cb
XP
b c
t
a
XMake 'b' and 'c' sequential
Make fragment (b, c)
loopable
Ptt
2
P
d
b c
'
^?
a
XInsert 'd'
Fragment 1
Insert Fragment 2
Delete Fragment 1
(a) Insert/delete a fragment.
P P
d
'0
a
cb
^
d
0
a
fe
2 X
Fragment 2
Fragment 1
P P'
c
ba
^
c
ba
X
e
5
d
X
^ ed
^
Fragment 1Fragment 1
Make Fragment 1 and 'c'
mutually-exclusive
Make 'd' and 'e' parallel
P P'
a
cb
^ a 2
Fragment 1
?
cbFragment 1
^
Make Fragment 1
loopable
P P'
a
cb
^ a
Fragment 1
cbFragment 1
^ 6
X
Make Fragment 1 skippable
P P
c
'
c
ba
X
Fragment 2
Fragment 1ba
^
^ba
^
Fragment 1
Duplicate Fragment 1
P P'
a
db
a
fe
X X
c
Fragment 1 Fragment 2
Substitute Fragment 1 with
Fragment 2
P
c
ba
ed
^
X
P'
c
de
^
ba
X
Fragment 1 Fragment 2
Fragment 2 Fragment 1
^
^
Swap Fragment 1 and Fragment 2
e
e
X
ba
^
ba
^dc
^
dc
^
Fragment 1
Fragment 1
P P'
Move Fragment 1
ba
^ c
fe
^
ba
^ c
fe
^40%
30%30% 70%10%
20%
Fragment 1
P P'Change
branching frequency
X X
P P
ca b ed X
c d
'
eaDelete 'b'
Make 'c' and 'd' mutually-exclusive
PX
b c
?
aa cb
XP
b c
t
a
XMake 'b' and 'c' sequential
Make fragment (b, c)
loopable
Ptt
2
P
d
b c
'
^?
a
XInsert 'd'
Fragment 1
Insert Fragment 2
Delete Fragment 1
(b) Make fragments mutually exclusive/paral-lel/sequential.
P P
d
'0
a
cb
^
d
0
a
fe
2 X
Fragment 2
Fragment 1
P P'
c
ba
^
c
ba
X
e
5
d
X
^ ed
^
Fragment 1Fragment 1
Make Fragment 1 and 'c'
mutually-exclusive
Make 'd' and 'e' parallel
P P'
a
cb
^ a 2
Fragment 1
?
cbFragment 1
^
Make Fragment 1
loopable
P P'
a
cb
^ a
Fragment 1
cbFragment 1
^ 6
X
Make Fragment 1 skippable
P P
c
'
c
ba
X
Fragment 2
Fragment 1ba
^
^ba
^
Fragment 1
Duplicate Fragment 1
P P'
a
db
a
fe
X X
c
Fragment 1 Fragment 2
Substitute Fragment 1 with
Fragment 2
P
c
ba
ed
^
X
P'
c
de
^
ba
X
Fragment 1 Fragment 2
Fragment 2 Fragment 1
^
^
Swap Fragment 1 and Fragment 2
e
e
X
ba
^
ba
^dc
^
dc
^
Fragment 1
Fragment 1
P P'
Move Fragment 1
ba
^ c
fe
^
ba
^ c
fe
^40%
30%30% 70%10%
20%
Fragment 1
P P'Change
branching frequency
X X
P P
ca b ed X
c d
'
eaDelete 'b'
Make 'c' and 'd' mutually-exclusive
PX
b c
?
aa cb
XP
b c
t
a
XMake 'b' and 'c' sequential
Make fragment (b, c)
loopable
Ptt
2
P
d
b c
'
^?
a
XInsert 'd'
Fragment 1
Insert Fragment 2
Delete Fragment 1
(c) Make a fragment loopable/non-loopable.
P P
d
'0
a
cb
^
d
0
a
fe
2 X
Fragment 2
Fragment 1
P P'
c
ba
^
c
ba
X
e
5
d
X
^ ed
^
Fragment 1Fragment 1
Make Fragment 1 and 'c'
mutually-exclusive
Make 'd' and 'e' parallel
P P'
a
cb
^ a 2
Fragment 1
?
cbFragment 1
^
Make Fragment 1
loopable
P P'
a
cb
^ a
Fragment 1
cbFragment 1
^ 6
X
Make Fragment 1 skippable
P P
c
'
c
ba
X
Fragment 2
Fragment 1ba
^
^ba
^
Fragment 1
Duplicate Fragment 1
P P'
a
db
a
fe
X X
c
Fragment 1 Fragment 2
Substitute Fragment 1 with
Fragment 2
P
c
ba
ed
^
X
P'
c
de
^
ba
X
Fragment 1 Fragment 2
Fragment 2 Fragment 1
^
^
Swap Fragment 1 and Fragment 2
e
e
X
ba
^
ba
^dc
^
dc
^
Fragment 1
Fragment 1
P P'
Move Fragment 1
ba
^ c
fe
^
ba
^ c
fe
^40%
30%30% 70%10%
20%
Fragment 1
P P'Change
branching frequency
X X
P P
ca b ed X
c d
'
eaDelete 'b'
Make 'c' and 'd' mutually-exclusive
PX
b c
?
aa cb
XP
b c
t
a
XMake 'b' and 'c' sequential
Make fragment (b, c)
loopable
Ptt
2
P
d
b c
'
^?
a
XInsert 'd'
Fragment 1
Insert Fragment 2
Delete Fragment 1
(d) Make a fragment skippable/non-skippable.
Figure 5.15: Examples of transforming a process tree P into a process tree P′ by theapplication of simple changes.
5.5.2 Compound Change Patterns
By aggregating simple change patterns we can construct compound change patterns.
We describe these compound patterns below.
• Duplicate a fragment An inserted fragment f2 in a process tree is a duplicate
fragment if there is another fragment f1 in the tree, such that stringify( f2) =
stringify( f1), where stringify is a recursive function that converts a fragment to
a unique and stable textual representation and is defined as follows.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 113
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
– For an activity node or a τ-node v, stringify(v) = l(v).
– For a fragment F =⊕(F1, . . . Fn) such that l(⊕) ∈ {→, },stringify(F) = l(⊕)(stringify(F1)+ . . .+ stringify(Fn)).
– For a fragment F =⊕(F1, . . . Fn) such that l(⊕) ∈ {×, ∧},stringify(F)= l(⊕)(stringify(Fi) + . . .+ stringify(Fm)), where 1≤ i, . . . m ≤n and {stringify(Fi), . . . stringify(Fm)} is an ordered sequence obtained by
arranging the sequence {stringify(F1), . . . , stringify(Fn)} in ascending al-
phabetical order.
For example, in the transformation of P into P′ in Figure 5.16a, Fragment 2 in P′
is a duplicate of Fragment 1, since Fragment 2 is inserted, and stringify(Fragment 2) =
stringify(Fragment 1)
• Substitute a fragment (rp) The application of a SUBac edit operation represents
an activity substitution, and the application of a SUB⊕ edit operation represents
an operator substitution. To discover a fragment substitution we need to abstract
from the operator and the activity substitutions within the fragment. A fragment
f in P is substituted by a fragment f ′ in P′ if at least one node within f is
substituted by a node within f ′, and every other node within f (resp., f ′) is either
substituted by (resp., either substitutes) a node within f ′ (resp., f ) or is deleted
(resp., inserted). For example, in the transformation of P into P′ in Figure 5.16b,
fragment 1 in P is substituted with fragment 2 in P′.
• Swap two fragments (sw) In the transformation of P into P′, two fragments
f1 and f2 in P are swapped if they are substituted by two fragments f ′1 and f ′2in P′, respectively, such that stringify( f1) = stringify( f ′2) and stringify( f2) =
stringify( f ′1). For example, in the transformation of P into P′ in Figure 5.16c,
Fragment 1 and Fragment 2 in P are swapped as they are substituted by Frag-
ment 2 and Fragment 1 in P′, respectively, and stringify(Fragment 1P) =
stringify(Fragment 2P′) = “ab” and stringify(Fragment 2P) =
stringify(Fragment 1P′) = “bc”.
• Move a fragment (sm, pm, cm) The combination of deleting a fragment f from
P and inserting a fragment f ′ in P such that stringify( f ) = stringify( f ′) repre-
sents a move of the fragment f within P. The fragment move is serial (sre),
parallel (pre), or conditional (cre) if the parent node of the inserted fragment f ′
is a →-node, a ∧-node, or an ×-node, respectively. For example, in the trans-
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 114
5.5. CONSTRUCT DRIFT CHARACTERIZATION STATEMENTS
formation of P into P′ in Figure 5.16d, Fragment 1 has moved to a conditional
branch with activity ‘e’.
• Change branching frequency (fr) In the transformation of P into P′, let v ∈ P
be an ×-node with no deleted or inserted children. Also let c be a child frag-
ment of v that is not substituted by another fragment. We define the relative
frequency of c as the ratio between the frequency of c and the frequency of v,
and express it as a percentage by multiplying it by 100. A significant change
in the relative frequency of c over the transformation of P into P′ represents a
change of branching frequency. Let freqB and freqA be the relative frequencies
of c in P and P′, respectively. We compute the relative frequency change of c
by | f reqB− f reqA|∗100/avg( f reqB, f reqA), where the function avg( f reqB, f reqA) com-
putes the average of f reqA and f reqB. The significance of the relative frequency
change can be defined by the user. To focus on more significant branching fre-
quency changes, in the evaluation sections of this chapter we consider a relative
frequency change of at least 50% as a significant change, where the relative fre-
quency of the fragment is at least 25% in P or P′. To compute the frequency of
nodes in a process tree we replay its underlying event log on top of the process
tree. For example, in the transformation of P into P′ in Figure 5.16e, the relative
frequency of Fragment 1 has changed from 40% in P to 70% in P′, while the
relative frequency of activity ‘c’ has changed from 30% in P to 10% in P′.
5.5.3 Nested Changes
As defined in the previous chapter (cf. Section 4.2), multiple changes that share some
behavioral relations between activities, e.g. causality or concurrency, are called over-
lapping changes. For example, in the transformation of the process tree P to the process
tree P′ in Figure 5.17, there are two overlapping changes: 1. activity ‘b’ is deleted, 2.
change in the relation between activity ‘c’ and activity ‘d’ from sequential in the left
process tree to mutually exclusive in the right process tree. When applied in isolation,
these two changes share the causal relation b→ c. The application of the first change
deletes this behavioral relation, while the application of the second change decreases
the frequency of its execution. By abstracting from the low-level behavioral relations
between activities, IM discovers relations between fragments within a process tree.
Consequently, overlapping changes are isolated from each other, and can be character-
ized in the same way as the non-overlapping ones. For example in Figure 5.17, in P,
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 115
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
P P
d
'0
a
cb
^
d
0
a
fe
2 X
Fragment 2
Fragment 1
P P'
c
ba
^
c
ba
X
e
5
d
X
^ ed
^
Fragment 1Fragment 1
Make Fragment 1 and 'c'
mutually-exclusive
Make 'd' and 'e' parallel
P P'
a
cb
^ a 2
Fragment 1
?
cbFragment 1
^
Make Fragment 1
loopable
P P'
a
cb
^ a
Fragment 1
cbFragment 1
^ 6
X
Make Fragment 1 skippable
P P
c
'
c
ba
X
Fragment 2
Fragment 1ba
^
^ba
^
Fragment 1
Duplicate Fragment 1
P P'
a
db
a
fe
X X
c
Fragment 1 Fragment 2
Substitute Fragment 1 with
Fragment 2
P
c
ba
ed
^
X
P'
c
de
^
ba
X
Fragment 1 Fragment 2
Fragment 2 Fragment 1
^
^
Swap Fragment 1 and Fragment 2
e
e
X
ba
^
ba
^dc
^
dc
^
Fragment 1
Fragment 1
P P'
Move Fragment 1
ba
^ c
fe
^
ba
^ c
fe
^40%
30%30% 70%10%
20%
Fragment 1
P P'Change
branching frequency
X X
P P
ca b ed X
c d
'
eaDelete 'b'
Make 'c' and 'd' mutually-exclusive
PX
b c
?
aa cb
XP
b c
t
a
XMake 'b' and 'c' sequential
Make fragment (b, c)
loopable
Ptt
2
P
d
b c
'
^?
a
XInsert 'd'
Fragment 1
Insert Fragment 2
Delete Fragment 1
(a) Duplicate a fragment.
P P
d
'0
a
cb
^
d
0
a
fe
2 X
Fragment 2
Fragment 1
P P'
c
ba
^
c
ba
X
e
5
d
X
^ ed
^
Fragment 1Fragment 1
Make Fragment 1 and 'c'
mutually-exclusive
Make 'd' and 'e' parallel
P P'
a
cb
^ a 2
Fragment 1
?
cbFragment 1
^
Make Fragment 1
loopable
P P'
a
cb
^ a
Fragment 1
cbFragment 1
^ 6
X
Make Fragment 1 skippable
P P
c
'
c
ba
X
Fragment 2
Fragment 1ba
^
^ba
^
Fragment 1
Duplicate Fragment 1
P P'
a
db
a
fe
X X
c
Fragment 1 Fragment 2
Substitute Fragment 1 with
Fragment 2
P
c
ba
ed
^
X
P'
c
de
^
ba
X
Fragment 1 Fragment 2
Fragment 2 Fragment 1
^
^
Swap Fragment 1 and Fragment 2
e
e
X
ba
^
ba
^dc
^
dc
^
Fragment 1
Fragment 1
P P'
Move Fragment 1
ba
^ c
fe
^
ba
^ c
fe
^40%
30%30% 70%10%
20%
Fragment 1
P P'Change
branching frequency
X X
P P
ca b ed X
c d
'
eaDelete 'b'
Make 'c' and 'd' mutually-exclusive
PX
b c
?
aa cb
XP
b c
t
a
XMake 'b' and 'c' sequential
Make fragment (b, c)
loopable
Ptt
2
P
d
b c
'
^?
a
XInsert 'd'
Fragment 1
Insert Fragment 2
Delete Fragment 1
(b) Substitute a fragment.
P P
d
'0
a
cb
^
d
0
a
fe
2 X
Fragment 2
Fragment 1
P P'
c
ba
^
c
ba
X
e
5
d
X
^ ed
^
Fragment 1Fragment 1
Make Fragment 1 and 'c'
mutually-exclusive
Make 'd' and 'e' parallel
P P'
a
cb
^ a 2
Fragment 1
?
cbFragment 1
^
Make Fragment 1
loopable
P P'
a
cb
^ a
Fragment 1
cbFragment 1
^ 6
X
Make Fragment 1 skippable
P P
c
'
c
ba
X
Fragment 2
Fragment 1ba
^
^ba
^
Fragment 1
Duplicate Fragment 1
P P'
a
db
a
fe
X X
c
Fragment 1 Fragment 2
Substitute Fragment 1 with
Fragment 2
P
c
ba
ed
P'
c
de
ba
Fragment 1 Fragment 2
Fragment 2 Fragment 1
^
^
Swap Fragment 1 and Fragment 2
e
e
X
ba
^
ba
^dc
^
dc
^
Fragment 1
Fragment 1
P P'
Move Fragment 1
ba
^ c
fe
^
ba
^ c
fe
^40%
30%30% 70%10%
20%
Fragment 1
P P'Change
branching frequency
X X
P P
ca b ed X
c d
'
eaDelete 'b'
Make 'c' and 'd' mutually-exclusive
PX
b c
?
aa cb
XP
b c
t
a
XMake 'b' and 'c' sequential
Make fragment (b, c)
loopable
Ptt
2
P
d
b c
'
^?
a
XInsert 'd'
Fragment 1
Insert Fragment 2
Delete Fragment 1
X
X ?
?
(c) Swap two fragments.
P P
d
'0
a
cb
^
d
0
a
fe
2 X
Fragment 2
Fragment 1
P P'
c
ba
^
c
ba
X
e
5
d
X
^ ed
^
Fragment 1Fragment 1
Make Fragment 1 and 'c'
mutually-exclusive
Make 'd' and 'e' parallel
P P'
a
cb
^ a 2
Fragment 1
?
cbFragment 1
^
Make Fragment 1
loopable
P P'
a
cb
^ a
Fragment 1
cbFragment 1
^ 6
X
Make Fragment 1 skippable
P P
c
'
c
ba
X
Fragment 2
Fragment 1ba
^
^ba
^
Fragment 1
Duplicate Fragment 1
P P'
a
db
a
fe
X X
c
Fragment 1 Fragment 2
Substitute Fragment 1 with
Fragment 2
P
c
ba
ed
^
X
P'
c
de
^
ba
X
Fragment 1 Fragment 2
Fragment 2 Fragment 1
^
^
Swap Fragment 1 and Fragment 2
e
e
X
ba
^
ba
^dc
^
dc
^
Fragment 1
Fragment 1
P P'
Move Fragment 1
ba
^ c
fe
^
ba
^ c
fe
^40%
30%30% 70%10%
20%
Fragment 1
P P'Change
branching frequency
X X
P P
ca b ed X
c d
'
eaDelete 'b'
Make 'c' and 'd' mutually-exclusive
PX
b c
?
aa cb
XP
b c
t
a
XMake 'b' and 'c' sequential
Make fragment (b, c)
loopable
Ptt
2
P
d
b c
'
^?
a
XInsert 'd'
Fragment 1
Insert Fragment 2
Delete Fragment 1
(d) Move a fragment.
P P
d
'0
a
cb
^
d
0
a
fe
2 X
Fragment 2
Fragment 1
P P'
c
ba
^
c
ba
X
e
5
d
X
^ ed
^
Fragment 1Fragment 1
Make Fragment 1 and 'c'
mutually-exclusive
Make 'd' and 'e' parallel
P P'
a
cb
^ a 2
Fragment 1
?
cbFragment 1
^
Make Fragment 1
loopable
P P'
a
cb
^ a
Fragment 1
cbFragment 1
^ 6
X
Make Fragment 1 skippable
P P
c
'
c
ba
X
Fragment 2
Fragment 1ba
^
^ba
^
Fragment 1
Duplicate Fragment 1
P P'
a
db
a
fe
X X
c
Fragment 1 Fragment 2
Substitute Fragment 1 with
Fragment 2
P
c
ba
ed
P'
c
de
ba
Fragment 1 Fragment 2
Fragment 2 Fragment 1
^
^
Swap Fragment 1 and Fragment 2
e
e
X
ba
^
ba
^dc
^
dc
^
Fragment 1
Fragment 1
P P'
Move Fragment 1
ba
^ c
fe
^
ba
^ c
fe
^40%
30%30% 70%10%
20%
Fragment 1
P P'Change
branching frequency
X X
P P
ca b ed X
c d
'
eaDelete 'b'
Make 'c' and 'd' mutually-exclusive
PX
b c
?
aa cb
XP
b c
t
a
XMake 'b' and 'c' sequential
Make fragment (b, c)
loopable
Ptt
2
P
d
b c
'
^?
a
XInsert 'd'
Fragment 1
Insert Fragment 2
Delete Fragment 1
X
X ?
?
(e) Change branching frequency.
Figure 5.16: Examples of transforming a process tree P into a process tree P′ by theapplication of compound changes.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 116
5.5. CONSTRUCT DRIFT CHARACTERIZATION STATEMENTS
activity ‘b’ precedes the fragment →(c,d), whereas in P′, activity ‘b’ is deleted and
the operator of the root of the fragment→(c,d) is substituted with ×, resulting in the
fragment ×(c,d).
P P
d
'0
a
cb
^
d
0
a
fe
2 X
Fragment 2
Fragment 1
P P'
c
ba
^
c
ba
X
e
5
d
X
^ ed
^
Fragment 1Fragment 1
Make Fragment 1 and 'c'
mutually-exclusive
Make 'd' and 'e' parallel
P P'
a
cb
^ a 2
Fragment 1
?
cbFragment 1
^
Make Fragment 1
loopable
P P'
a
cb
^ a
Fragment 1
cbFragment 1
^ 6
X
Make Fragment 1 skippable
P P
c
'
c
ba
X
Fragment 2
Fragment 1ba
^
^ba
^
Fragment 1
Duplicate Fragment 1
P P'
a
db
a
fe
X X
c
Fragment 1 Fragment 2
Substitute Fragment 1 with
Fragment 2
P
c
ba
ed
^
X
P'
c
de
^
ba
X
Fragment 1 Fragment 2
Fragment 2 Fragment 1
^
^
Swap Fragment 1 and Fragment 2
e
e
X
ba
^
ba
^dc
^
dc
^
Fragment 1
Fragment 1
P P'
Move Fragment 1
ba
^ c
fe
^
ba
^ c
fe
^40%
30%30% 70%10%
20%
Fragment 1
P P'Change
branching frequency
X X
P P
ca b ed X
c d
'
eaDelete 'b'
Make 'c' and 'd' mutually-exclusive
PX
b c
?
aa cb
XP
b c
t
a
XMake 'b' and 'c' sequential
Make fragment (b, c)
loopable
Ptt
2
P
d
b c
'
^?
a
XInsert 'd'
Fragment 1
Insert Fragment 2
Delete Fragment 1
Figure 5.17: Example of transforming a process tree P into a process tree P′ by theapplication of overlapping changes.
Nested changes are a set of overlapping changes, each applied to the resulting
process subtree from the application of its previous change. The hierarchical structure
of a process tree allows us to characterize the changes applied to the inner structure of
a fragment and those applied to the fragment as a whole independently of each other.
In 5.4.2.2, we explained in what order we traverse process trees to extract a sequence
of edit operations from a mapping. We apply the changes in the same order that we
extracted them to transform a process tree P into a process tree P′. For example, to
transform the process trees P into the process tree P′ in Figure 5.18, we first make
activity ‘b’ and activity ‘c’ sequential, resulting in the fragment→(b,c). Next, a loop
structure is placed over this fragment, by inserting the -node Ptt [2]. Finally, activity
‘d’ is inserted in a parallel branch with the fragment ∧((→(b,c),τ),d).
P P
d
'0
a
cb
^
d
0
a
fe
2 X
Fragment 2
Fragment 1
Insert Fragment 2
Delete Fragment 1
P P'
c
ba
^
c
ba
X
e
5
d
X
^ ed
^
Fragment 1
Fragment 1
Make Fragment 1 and 'c'
mutually-exclusive
Make 'd' and 'e' parallel
P P'
a
cb
^ a 2
Fragment 1
?
cbFragment 1
^
Make Fragment 1 loopable
P P'
a
cb
^ a
Fragment 1
cbFragment 1
^ 6
X
Make Fragment 1 skippable
P P
c
'
c
ba
X
Fragment 2
Fragment 1ba
^
^ba
^
Fragment 1
Duplicate Fragment 1
P P'
a
db
a
fe
X X
cFragment 1 Fragment 2
Substitute Fragment 1 with
Fragment 2
P
c
ba
ed
^
X
P'
c
de
^
ba
X
Fragment 1 Fragment 2
Fragment 2 Fragment 1
^
^
Swap Fragment 1 and Fragment 2
e
e
X
ba
^
ba
^dc
^
dc
^
Fragment 1
Fragment 1
P P'
Move Fragment 1
ba
^ c
fe
^
ba
^ c
fe
^
40% 30% 30% 70% 10% 20%
Fragment 1
P P'
Change branching frequency
X X
P P
ca b ed X
c d
'
eaDelete 'b'
Make 'c' and 'd' mutually-exclusive
PX
b c
?
aa cb
XP
b c
t
a
XMake 'b' and 'c' sequential
Make fragment (b, c)
loopable
Ptt
2
P
d
b c
'
^?
a
XInsert 'd'
Fragment 1
Figure 5.18: Example of transforming a process tree P into a process tree P′ by theapplication of nested changes.
Table 5.4 shows the format of drift characterization statements produced by our
method for each change pattern.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 117
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
Code Change pattern Drift characterization statement formatsre Insert/delete a fragment be-
tween two fragmentsAfter the drift, fragment f1 = . . . is inserted (resp., deletedfrom) between fragments f2 = . . . and f3 = . . ..
pre Insert/delete a fragment in-/from parallel branch
After the drift, fragment f1 = . . . is inserted in (resp., deletedfrom) a parallel branch with fragment f2 = . . ..
cre Insert/delete a fragment in-/from conditional branch
After the drift, fragment f1 = . . . is inserted in (resp., deletedfrom) a conditional branch with fragment f2 = . . ..
cp Duplicate a fragment After the drift, fragment f1 = . . ., i.e. a duplicate of frag-ment f2 = . . ., is inserted ... (continues with sre, pre, or cre).
rp Substitute a fragment After the drift, fragment f1 = . . . is substituted by fragmentf2 = . . ..
sw Swap two fragments After the drift, fragments f1 = . . . and f2 = . . . are swapped.sm Move a fragment to between
two fragmentsAfter the drift, fragment f1 = . . . has moved to betweenfragments f2 = . . . and f3 = . . ..
cm Move a fragment into/out ofconditional branch
After the drift, fragment f1 = . . . has moved to a conditionalbranch with fragment f2 = . . ..
pm Move a fragment into/out ofparallel branch
After the drift, fragment f1 = . . . has moved to a parallelbranch with fragment f2 = . . ..
cf Make fragments mutually ex-clusive/sequential
Before the drift, fragments f1 = . . . , . . . and fn = . . . weremutually exclusive (resp., sequential), while after the driftthey are sequential (resp., mutually exclusive).
pl Make fragments parallel/se-quential
Before the drift, fragments f1 = . . . , . . . and fn = . . . wereparallel (resp., sequential), while after the drift they are se-quential (resp., parallel).
lp Make a fragmentloopable/non-loopable
After the drift, fragment f1 = . . . has become loopable/non-loopable.
cb Make a fragmentskippable/non-skippable
After the drift, fragment f1 = . . . has become skippable/non-skippable.
fr Change branching frequency Before the drift, after the ×-node ⊕ the branch of fragmentf1 = . . . was executed x% of the time, while after the drift itis executed y% of the time.
Table 5.4: Change patterns from [129] and their drift characterization statement for-mats.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 118
5.6. TOOL SUPPORT
5.5.4 Unsupported patterns
The only change pattern from Table 5.1 that our method is unable to support is the
Synchronize two fragments change pattern. This pattern refers to changes where two
parallel fragments are synchronized, or vice versa. As discussed in Section 5.4.1, this
pattern introduces unstructuredness into a process model and hence cannot be used as
a basis for defining process tree edit operations. Figure 5.19a shows an example of
this change pattern. In this example, before the change, activity ‘b’ was performed
in parallel with activity ‘c’, while after the change, activity ‘b’ precedes activity ‘c’.
Observe that the synchronization change pattern is different from the change pattern
where we sequentialize two parallel fragments (“pl” in Table 5.1) by transforming the
parallel block containing the fragments to a sequential block without impairing the
structuredness of the process. However, in the synchronization change pattern two
parallel fragments are synchronized by directly connecting one fragment to the other.
This results in the loss of structuredness between the two branches in the parallel block
that contains the two synchronized fragments. Consequently, to discover a structured
process model IM needs to generalize from the behavior of this unstructured block. As
such, the resulting process tree does not precisely represent the synchronization change
applied to the process, i.e. it also represents false process changes. Figure 5.19b shows
the process trees corresponding to the process models in Figure 5.19a discovered by
IM. Activity ‘c’ which was in parallel with activity ‘b’ and mutually exclusive with
activity ‘d’ in process tree P is performed after the parallel block in process tree P′.
Furthermore, activities ‘c’ and ‘d’ can be skipped in P′. Although the occurrence of ‘c’
after ‘b’ is accurately represented in P′, there are also several false changes in P′ such
as the occurrence of ‘c’ after ‘d’ or the occurrence of ‘e’ after ‘b’ by skipping ‘c’.
5.6 Tool Support
As for the methods presented in Chapters 3 and 4, we also implemented the method
for characterizing drifts at the level of process model fragments as an extension of
ProDrift, available both as a standalone tool as well as a plugin of Apromore. To enable
the characterization of a detected drift the user needs to tick the “Drift characterization”
checkbox in the configuration panel of the plug-in, as shown in Figure 5.20a. It is
then required to choose between the two drift characterization configuration options:
“activity level” and “fragment level”. To characterize drifts using the method presented
in this chapter, the “fragment level” option needs to be selected.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 119
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
a
d
b
ec
a
d
b
e
c
a e
P P'
^
b X
c d
a e^
b X
d
X
c
(a) Petri net models before and after the synchronization.
a
d
b
ec
a
d
b
e
c
a e
P P'
^
b X
c d
a e^
b X
d
X
c
(b) Process trees P and P′ discovered by IM be-fore and after the synchronization.
Figure 5.19: Example of synchronizing two fragments change pattern. Activity ‘b’ andactivity ‘c’ are synchronized.
By default, the tool uses the A* algorithm (cf. Section 5.4.2.1) to search for a min-
imum cost sequence of edit operations that transforms the pre-drift process tree to the
post-drift process tree, resulting in a complete and concise set of drift characterization
statements. As a more efficient alternative, the greedy algorithm (cf. Section 5.4.2.2)
may be selected to speed up the search at the price of a less concise characterization.
Furthermore, to discover process trees from noisy logs we use IMfpt, i.e. a variant of
IM with noise filtering capabilities that works with partial traces. By default, we set
the value of the noise filtering threshold of IMfpt to 10% as this showed to effectively
handle noise in our experiments with artificial and real-life logs, as reported later in
this chapter. Alternatively, it is also possible for the user to set a different threshold by
changing the value of the “Drift characterization noise filter” field.
After a drift is detected it is characterized by the drift characterization method.
Once the parsing of the log is complete, by clicking on each drift on the list of detected
drifts, the user can inspect its natural language characterization statements, as shown
in Figure 5.20b.
The new variants of Inductive Miner including IMpt, IMfpt and IMcpt, have been
implemented as plug-ins of the ProM framework2 [124] and their source code is pub-
licly available.
2http://promtools.org
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 120
5.6. TOOL SUPPORT
(a) Enable drift characterization and choose the ”fragment level” configuration for using thedrift characterization method presented in this chapter.
(b) Inspect natural language characterization statements for each detected drift.
Figure 5.20: Drift characterization at fragment level using the ProDrift plug-in in Apro-more.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 121
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
5.7 Evaluation on Artificial Logs
To evaluate the effectiveness of our method we used ProDrift to conduct experiments
on artificial and real-life event logs with different parameters settings. The tool is fed
with an event stream replayed from an event log, and outputs, for each detected drift, its
characterization statements in natural language. In the rest of this section, we present
the results of our evaluation on artificial logs. Specifically, we measured the accuracy
of drift characterization, the conciseness of the statements produced to characterize
such drifts and the time performance, and compared the results against our activity-
based characterization method, presented in the previous chapter. In the next section,
we present the results of our evaluation on real-life logs.
5.7.1 Setup
We generated an artificial dataset using the same CPN3 base model as in Chapter 3 (cf.
Figure 3.2). This model represents a block-structured process, consisting of 42 activ-
ities, five XOR, six AND, and three loop structures, modeled in an intertwined way,
producing highly variable event logs with trace variability of around 80%. For each
change pattern, except “Duplicate a Fragment”, in Table 5.1, we generated five logs,
each featuring two drifts applied to fragments of a different size between one to five.
Note that as IM does not discover process trees with duplicate activities, we do not
experiment with logs containing drifts caused by a fragment duplication. Nonetheless,
the process tree transformation algorithms presented in this chapter can be applied to
process trees with duplicate fragments and are able to identify insertion (resp., dele-
tion) of a duplicate fragment in (resp., from) a process tree. Also, label duplication
techniques such as the ones introduced in [26, 75] can be used to pre-process a log
before applying IM. For each generated log we simulated 3,000 traces, with drifts
injected during the simulation at 1,000-trace intervals. The first drift is injected by ap-
plying a change pattern to the base model, and the second drift is injected by reversing
the applied change and reverting to the base model.
We also evaluated our methods in more complex settings by simulating logs fea-
turing drifts caused by multiple non-overlapping simultaneous changes (i.e. compos-
ite changes) as well as nested changes. To create such logs, we divided our change
patterns into three categories, as described in Chapter 2: Insertion (“I”), Resequen-
tialization (“R”) and Optionalization (“O”) (cf. Table 5.1). Limited to three cross-
3http://cpntools.org
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 122
5.7. EVALUATION ON ARTIFICIAL LOGS
category changes, these categories make six possible scenarios for each of the com-
posite changes and nested changes (“IOR”, “IRO”, “OIR”, “ORI”, “RIO”, “ROI”). For
each such scenario, five logs were generated by randomly selecting one template from
each category and applying them to fragments of a certain size, from one to five. For
example, a drift from the composite change scenario of “IOR” could simultaneously
delete a fragment of size two (“I”), add a loop over a fragment of size two (“O”), and
parallelize two sequential fragments of size two (“R”) in three different locations of
the process. As another example, a drift in the process from the nested changes sce-
nario of “IOR” could first parallelize two sequential fragments of size three (“R”), then
add a loop over the two parallelized fragments (“O”), and finally insert a fragment of
size three in a conditional branch with the resulting loop fragment (“I”). In turn, this
resulted in 30 logs for each of the non-overlapping and nested changes settings. This
resulted in a collection of 65 logs with single changes, 30 logs with composite changes,
and 30 logs with nested changes, each containing 3,000 traces with two equidistant
drifts involving one or multiple fragments of a certain size.
For each such log, we also generated two variants with 2.5% and 5% noise by
inserting random events into the traces of the log. Altogether, the artificial dataset
contained 375 logs.4
5.7.2 Accuracy of Drift Characterization: Fragment-based vs Activity-based
In the first experiment, we evaluate and compare the accuracy of our fragment-based
characterization method in characterizing drifts detected in the artificial logs versus
that of our activity-based characterization method, presented in the previous chapter.
To ensure that we use the same sub-logs as the activity-based characterization
method, we used the same method for drift detection in our experiments with the arti-
ficial and real-life event streams in this chapter. Furthermore, we also used the same
strategy as the activity-based characterization method to extract pre-drift and post-drift
sub-logs after the detection of a drift. Specifically, we use the two sub-logs of partial
traces built, respectively, from the events in the reference window as the P–value drops
below the threshold, and from the events in the detection window as the P–value rises
above the threshold, to discover the pre-drift and post-drift process trees. By doing
so, we try to obtain pre-drift and post-drift process trees that only represent the actual
4All the CPN models used for this simulation, the resulting artificial logs, and the detailed evaluationresults are available with the software distribution.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 123
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
process behaviors before and after a drift. It is worth noting that our fragment-based
drift characterization method can be applied on top of any drift detection technique
that works on event streams or trace streams. The only required input to our method is
a pair of sub-logs containing partial or complete traces from before and after a drift.
The output of both the fragment-based characterization method and the activity-
based characterization method is a list of statements explaining the changes underpin-
ning a drift. To compare the accuracy of the reported statements by the two methods
we use F-score, i.e. the harmonic mean of recall and precision, where recall measures
the ratio of reported statements relevant to the drift over the total number of statements
required to explain the drift, and precision measures the ratio of reported statements
relevant to the drift over the total number of reported statements. The relevance of a
statement to a drift is assessed manually such that a statement is considered to be rel-
evant to the drift if it describes at least a fraction of the changes applied to the process
in order to inject that drift. We count the number of statements required by a method
to explain a drift based on the changes applied to the process to inject that drift and the
abstraction level of the characterization statements produced by that method. For ex-
ample, to explain deleting a fragment of three activities, the activity-based characteri-
zation method needs three statements, one per activity, since it is designed to character-
ize changes at the level of individual activities. On the other hand, the fragment-based
method needs only one statement to explain the same fragment deletion.
For the experiments in this section we use the A* algorithm (cf. 5.4.2.1) to compute
edit operations to transform pre-drift process tree to post-drift process trees. We also
use the activity-based characterization method with its default parameter settings.
5.7.2.1 Fragments of Different Size
Figure 5.21 shows the average F-score over all logs of a certain fragment size, with
and without noise, for the fragment-based as well as the activity-based characteriza-
tion methods. Figure 5.21a shows that the accuracy of our fragment-based character-
ization method is not influenced by the size of fragments involved in a drift, as the
average F-score remains around 0.99 for all fragment sizes, over all noise-free logs.
On the other hand, the average F-score of the activity-based characterization method
drops as the fragment size increases, being on average around 0.85, 0.56, 0.38, 0.28
and 0.21 for fragments of size one, two, three, four and five over all noise-free logs,
respectively. This is explained by the fact that this method is limited to characteriz-
ing changes to fragments of size one, i.e. individual activities. For a change involving
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 124
5.7. EVALUATION ON ARTIFICIAL LOGS
larger fragments this method either fails to characterize the change or can only partially
characterize it, resulting in a significant drop in the recall, from 0.82 for fragments of
size one to 0.13 for those of size five. However, the precision of the activity-based char-
acterization method is not influenced as much by the increase in the size of fragments,
dropping from 0.98 for fragments of size one to 0.82 for those of size five.
For the experiments with logs that contain noise we used IMfpt, i.e. a variant
of Inductive Miner that filters out infrequent behavior in the logs of partial traces,
discovering noise-free pre-drift and post-drift process trees. To avoid introducing false
differences between the pre-drift and post-drift process trees as a result of filtering, a
process behavior is treated as noise if it does not meet the filtering requirements on
both sides of the drift. This significantly improved the accuracy of our fragment-based
characterization method in experiments with noisy logs. We set the noise filtering
threshold parameter of IMfpt to 10% for the experiments with these logs. The results
with the logs with 2.5% and 5% noise, in Figures 5.21b, and 5.21c, suggest that both
characterization methods can to a great extent handle different levels of noise injected
in the logs. The accuracy of the fragment-based method incurs a slight decrease of
around 15% for both 2.5% and 5% noise, with F-score being above 0.82 averaged over
all logs of the same fragment size. This is mostly caused by a decrease in the precision
of this method from 0.99, averaged over all fragment sizes, for noise-free logs, to
0.76 and 0.73 for logs with 2.5% and 5% noise, respectively. The average F-score
of the activity-based characterization method also drops by around 10% per fragment
size for logs with 2.5% and 5% noise. The precision of this method also drops from
0.91, averaged over all fragment sizes, for noise-free log to 0.7 and 0.67 for logs with
2.5% and 5% noise, respectively. The activity-based characterization method uses
a statistical technique to filter out spurious relations from the extracted α+ relations
before matching them with change templates. With regards to the impact of fragment
size on the characterization accuracy of the two methods, we observe similar trends
as the noise-free logs. The accuracy of the fragment-based characterization method is
not affected by the fragment size, whereas that of the activity-based characterization
method drops significantly as fragments became larger.
5.7.2.2 Process Change Patterns
Figure 5.22 reports the average F-score for each single, composite and nested change
pattern over all fragment sizes, with and without noise in the logs, for the fragment-
based as well as the activity-based characterization methods. In this figure, we dis-
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 125
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5
F-s
core
Fragment size
Fragment-
based
Activity-
based
(a) Noise ratio = 0%
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5
F-s
core
Fragment size
Fragment-
based
Activity-
based
(b) Noise ratio = 2.5%
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5
F-s
core
Fragment size
Fragment-
based
Activity-
based
(c) Noise ratio = 5%
Figure 5.21: Average F-score over all logs with different noise ratios per fragmentsize, obtained with the fragment-based characterization method vs. the activity-basedcharacterization method
tinguish the composite change patterns from the nested ones by appending “ c” and
“ n” to their names, respectively. The results of the experiment in Figure 5.22a shows
that in the absence of noise in the logs, the fragment-based characterization method
has a perfect F-score of 1 for all the single change patterns and for all but four of the
composite and nested change patterns, namely IOR c, IRO c, IRO n, and RIO n. For
these four logs, the discovered process trees by IM had minor imprecisions, leading to
some false statements. On the other hand, the activity-based characterization method
has an F-score in the range of 0.5− 0.6 for most of the single and composite change
patterns, with “cb” having the lowest F-score of 0.31, and “rp” and “sw” having the
highest F-score of 0.7.
For the nested change patterns the activity-based characterization method, as ex-
pected, performs poorly, with a maximum F-score of 0.34 for “ROI n”. This is due to
the inherent inability of this method to characterize nested changes. For the logs with
noise, as shown in Figure 5.22b and Figure 5.22c, despite a small drop in the accuracy
of the fragment-based characterization method, this could filter out most of the injected
noise in the logs, and achieve a higher F-score than the activity-based method for all
single, composite and nested change patterns. The F-score falls to around 0.8 for most
single change patterns for both 2.5% and 5% noise, and to around 0.9 and 0.85 for
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 126
5.7. EVALUATION ON ARTIFICIAL LOGS
most composite and nested change patterns for 2.5% and 5% noise, respectively. The
activity-based characterization method also handles the injected noise well and only
incurs slight drops in its F-score. As explained before, this method can inherently filter
out infrequent relations formed by spurious events.
0
0.2
0.4
0.6
0.8
1
sre
pre cre rp sw sm cm p
m cf pl
lp cb fr
IOR
_c
IRO
_c
OIR
_c
OR
I_c
RIO
_c
RO
I_c
IOR
_n
IRO
_n
OIR
_n
OR
I_n
RIO
_n
RO
I_n
F-s
core
Change pattern
Fragment-
basedActivity-
based
(a) Noise ratio = 0%
0
0.2
0.4
0.6
0.8
1
sre
pre cre rp sw sm cm p
m cf pl
lp cb fr
IOR
_c
IRO
_c
OIR
_c
OR
I_c
RIO
_c
RO
I_c
IOR
_n
IRO
_n
OIR
_n
OR
I_n
RIO
_n
RO
I_n
F-s
core
Change pattern
Fragment-
based
Activity-
based
(b) Noise ratio = 2.5%
0
0.2
0.4
0.6
0.8
1
sre
pre cre rp sw sm cm p
m cf pl
lp cb fr
IOR
_c
IRO
_c
OIR
_c
OR
I_c
RIO
_c
RO
I_c
IOR
_n
IRO
_n
OIR
_n
OR
I_n
RIO
_n
RO
I_n
F-s
core
Change pattern
Fragment-
basedActivity-
based
(c) Noise ratio = 5%
Figure 5.22: Average F-score over all fragment sizes per single, composite, and nestedchange pattern, obtained with the fragment-based characterization method vs. theactivity-based characterization method.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 127
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
5.7.2.3 Singleton Fragments
The results of the previous experiments show that the fragment-based characteriza-
tion method on average outperforms the activity-based characterization method in all
change patterns over different fragment sizes. However, the latter is engineered to
characterize non-overlapping activity-level changes. Therefore, in the last experiment
in this section we study how the two methods compare in characterizing changes to
singleton fragments, i.e. activities. Figure 5.23 shows the F-score for singleton frag-
ments per single, composite and nested change patterns for the fragment-based and the
activity-based characterization methods. For noise-free logs, as shown in Figure 5.23a,
the fragment-based characterization method achieves a perfect F-score of 1 for all the
change patterns except “IRO n”, for which the discovered process trees by IM were not
precise, leading to some false statements. The activity-based characterization method
also has an F-score of 1 for all but two of the single and composite change patterns,
namely “lp” and “OIR c”. However, as expected, it still fails to fully characterize
the nested changes, with a minimum F-score of 0.18 for “OIR n”, and an F-score of
around 0.5 for the rest. For the logs with 2.5% and 5% noise, the activity-based charac-
terization method has better F-scores than the fragment-based characterization method
for almost half of the single and composite change patterns, e.g. “sw”, “cf”, “cb”,
“ORI c” and “ROI c”, while for the rest they perform equally well. On the other hand,
the latter outperforms the former for all the nested ones.
Overall, the experimental results in this section show that while both methods
are noise-tolerant, the fragment-based characterization method is able to accurately
characterize single, composite, and nested changes involving fragments of any size.
On the other hand, the activity-based characterization method is well-suited for non-
overlapping activity-level changes, though it fails to accurately characterize changes
involving larger fragments, overlapping changes, as well as nested changes. As such,
the two methods are complementary.
5.7.3 Verbalization Conciseness: Fragment-based vs Activity-based
In this section, we study how our fragment-based characterization method compares
to our activity-based characterization method with regard to the number of statements
required to fully characterize the various change patterns, where each statement re-
ports the occurrence of one change. As observed in the previous experiments, the
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 128
5.7. EVALUATION ON ARTIFICIAL LOGS
0
0.2
0.4
0.6
0.8
1
sre
pre cre rp sw sm cm p
m cf pl
lp cb fr
IOR
_c
IRO
_c
OIR
_c
OR
I_c
RIO
_c
RO
I_c
IOR
_n
IRO
_n
OIR
_n
OR
I_n
RIO
_n
RO
I_n
F-s
core
Change pattern
Fragment-
basedActivity-
based
(a) Noise ratio = 0%
0
0.2
0.4
0.6
0.8
1
sre
pre cre rp sw sm cm p
m cf pl
lp cb fr
IOR
_c
IRO
_c
OIR
_c
OR
I_c
RIO
_c
RO
I_c
IOR
_n
IRO
_n
OIR
_n
OR
I_n
RIO
_n
RO
I_n
F-s
core
Change pattern
Fragment-
basedActivity-
based
(b) Noise ratio = 2.5%
0
0.2
0.4
0.6
0.8
1
sre
pre cre rp sw sm cm p
m cf pl
lp cb fr
IOR
_c
IRO
_c
OIR
_c
OR
I_c
RIO
_c
RO
I_c
IOR
_n
IRO
_n
OIR
_n
OR
I_n
RIO
_n
RO
I_n
F-s
core
Change pattern
Fragment-
basedActivity-
based
(c) Noise ratio = 5%
Figure 5.23: Average F-score for singleton fragments per single, composite andnested change pattern, obtained with the fragment-based characterization method vsthe activity-based characterization method.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 129
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
activity-based characterization method often misses to characterize changes that in-
volve non-singleton fragments and hence does not report any statement, or it may par-
tially identify them, resulting in a small number of statements being reported. Thus,
the actual number of reported statements by this method is not a good indicator of
its verbalization conciseness. To obviate this problem, in Figure 5.24 we count the
number of statements each method would require to report all process model changes
behind each change pattern, if it could fully identify them. Further, as the activity-
based characterization method does not support the nested changes we exclude them
from the comparison.
We can see that the activity-based method requires a substantially larger number of
statements (1 compared to 5.5 on average over all simple change patterns), specially
when drifts involve multiple large process fragments, as in the case of composite pat-
terns (3 compared to 17.5 on average). Reporting many activity-level differences is a
common limitation of those methods like the activity-based characterization method
that rely on low-level representations of the process behavior.
0
5
10
15
20
25
sre
pre cre rp sw sm cm p
m cf pl
lp cb fr
IOR
_c
IRO
_c
OIR
_c
OR
I_c
RIO
_c
RO
I_c
# o
f st
ate
men
ts
Change pattern
Fragment-
based
Activity-
based
Figure 5.24: Average number of statements over all fragment sizes required byour fragment-based characterization method vs. our activity-based characterizationmethod for characterizing each change pattern.
5.7.4 Verbalization Conciseness: Exhaustive vs Greedy
The characterization accuracy of our fragment-based characterization method is de-
pendent on that of IM in discovering pre-drift and post-drift process trees. If a process
tree discovered with IM misrepresents the process behavior recorded in the event log,
e.g. due to the imprecision of IM, then that behavior will produce a false characteri-
zation statement. Consequently, the choice of the search algorithm for computing the
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 130
5.7. EVALUATION ON ARTIFICIAL LOGS
sequence of edit operations that transforms a pre-drift process tree to a post-drift pro-
cess tree only impacts the number of reported statements to the user. For example,
consider two transformations of process trees P and P′ in Figure 5.25. In Figure 5.25a,
P is transformed to P′ by moving activity ‘a’ to a conditional branch with activity ‘c’,
whereas in Figure 5.25b, P is transformed to P′ by first swapping activities ‘a’ and ‘b’,
and then making activities ‘a’ and ‘c’ parallel. Although, both of these are correct, the
first way is preferred as it is more concise.
P P
c
'
a b
a
b
c
^
P
ca b
Move activity 'b'
cb a
Pt
Move activity 'a'
Make activities 'a' and 'c' parallel
P'
a
b
c
^
(a)
P P
c
'
a b
a
b
c
^
P
ca b
swap activities 'a' and 'b'
cb a
Pt
Move activity 'a'
Make activities 'a' and 'c' parallel
P'
a
b
c
^
(b)
Figure 5.25: Two sample transformations of process tree P into process tree P′.
In this section, we evaluate the verbalization conciseness of our fragment-based
characterization method by counting the number of characterization statements re-
ported by our method using the A* algorithm versus the greedy algorithm. As ex-
plained in Section 5.5, drift characterization statements are produced based on simple
as well as compound changes, where each compound change is an aggregation of mul-
tiple simple changes. The threshold parameter of the greedy algorithm, which indicates
the minimum matching score between two mapped operator nodes, can be manually
set by the user. As the greedy algorithm has a low execution time the user may try
different threshold values and select one that results in the lowest number of reported
statements. For the experiments in this section, we set the threshold parameter of the
greedy algorithm to 0.6, i.e. two operator nodes are mapped if their matching score
is at least 0.6. Intuitively, a matching score of 0.6 means that the two nodes are more
similar than dissimilar, and therefore they should be matched. This is also consistent
with previous experiments on model matching in the context of process model merg-
ing, where a value of 0.6 was used [64]. Figure 5.26 reports the average number of
statements produced by our method using the A* algorithm vs the greedy algorithm,
over all fragment sizes, per change pattern, with and without noise. For noise-free
logs, the reported statements by our method for all but four of the single, composite
and nested change patterns (“IOR c”, “IRO c”, “IRO n”, “RIO n”), were all accurate
as the F-score of our method for these changes was 1 (cf. 5.22a). As shown in Fig-
ure 5.26a, using the A* algorithm our method is able to characterize each single change
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 131
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
pattern with one statement, averaged over fragments of size one to five. As the F-score
of our method was 1 for the same change patterns in noise-free logs (cf. 5.22a) these
results show that the number of statements reported by our method for a change pattern
is independent of the size of the fragments to which the change pattern is applied. In
regards to the complex change patterns, our method with the A* algorithm on average,
across all fragment sizes, produces around 3 statements, one per applied change, for all
but three of the composite and nested change patterns. For those three change patterns,
namely “IOR c”, “ORI c”, and “RIO n”, the pre-drift and post-drift sub-logs for larger
fragments did not contain sufficient process behavior for IM to precisely discover the
fragments to which the changes were applied. As such, IM split those fragments into
smaller fragments, leading to our method producing more statements. For the noise-
free logs, our method produces a similar number of statements when using the greedy
algorithm for most of the single, composite and nested change patterns. However, for
some change patterns, e.g. “sm” and “cm”, the greedy algorithm leads to more state-
ments, with the largest difference being for “OIR n” with 6.6 statements against 3.6
statements reported by our method when using the A* algorithm.
The injection of noise in the logs, as shown in Figures 5.26b and 5.26c, slightly
increases the average number of statements reported by our method for all change
patterns over fragments of size one to five. For these logs, our method produced some
false statements, each explaining a change that was not applied to the process as part
of the drift injection. Furthermore, in some cases the injected noise caused IM to
split a large fragment involved in a change into multiple smaller fragments, causing
our method to produce more statements to explain the change. Similar to the noise-
free logs, the A* and the greedy algorithms perform similarly for most of the change
patterns with 2.5% and 5% noise. The largest difference was for the simple change
pattern “cm” with 5% noise, where our method produces 1.8 statements on average
using the A* algorithm versus 5.1 statements on average using the greedy algorithm.
5.7.5 Time Perfromance
We conducted all the experiments on an Intel i7 2.20GHz with 16GB RAM (64 bit),
running Windows 7 and JVM 8 with a heap space of 10GB. The time required to dis-
cover two process trees from the pre-drift and post-drift sub-logs, compute a sequence
of edit operations to transform the pre-drift process tree to the post-drift process tree
using the A* algorithm and construct characterization statements for each drift ranged
from a minimum of 2ms to a maximum of 68sec with an average of 870ms. To perform
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 132
5.7. EVALUATION ON ARTIFICIAL LOGS
0
2
4
6
8
10
12
sre
pre cre rp sw sm cm p
m cf pl
lp cb fr
IOR
_c
IRO
_c
OIR
_c
OR
I_c
RIO
_c
RO
I_c
IOR
_n
IRO
_n
OIR
_n
OR
I_n
RIO
_n
RO
I_n
# o
f st
ate
men
ts
Change pattern
A*
Greedy
(a) Noise ratio = 0%
0
2
4
6
8
10
12
sre
pre cre rp sw sm cm p
m cf pl
lp cb fr
IOR
_c
IRO
_c
OIR
_c
OR
I_c
RIO
_c
RO
I_c
IOR
_n
IRO
_n
OIR
_n
OR
I_n
RIO
_n
RO
I_n
# o
f st
ate
men
ts
Change pattern
A*
Greedy
(b) Noise ratio = 2.5%
0
2
4
6
8
10
12
sre
pre cre rp sw sm cm p
m cf pl
lp cb fr
IOR
_c
IRO
_c
OIR
_c
OR
I_c
RIO
_c
RO
I_c
IOR
_n
IRO
_n
OIR
_n
OR
I_n
RIO
_n
RO
I_n
# o
f st
ate
men
ts
Change pattern
A*
Greedy
(c) Noise ratio = 5%
Figure 5.26: Average number of statements over all fragment sizes per change patternreported by our fragment-based characterization method using the A* algorithm vs thegreedy algorithm.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 133
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
th same operation it took the greedy algorithm from a minimum of 2ms to a maximum
of 100ms with an average of 30ms. This means that the greedy algorithm is almost 30
times faster than its A* counterpart. The bulk of the time spent by our method went
to computing a sequence of edit operations to transform the pre-drift process tree to
the post-drift process tree. Although in most cases the A* algorithm finds the optimal
solution within a reasonable time, for two process trees with several changes it may be
more efficient to use the greedy algorithm. Finally, the activity-based characterization
method took on average 510ms to characterize each drift.
5.8 Evaluation on Real-life Logs
We further evaluated our method on two real-life event logs, one from a ticketing
management process and the other from an insurance claim handling process. For the
experiments in this section, we used IMfpt for discovering noise-free pre-drift and post-
drift process trees, by setting its noise filtering threshold parameter to 10%. We also
used the A* algorithm to compute the shortest sequence of edit operations to trans-
form pre-drift process tree to the post-drift process tree. Furthermore, as explained
in Section 5.5.2 we considered a relative frequency change of at least 50% as a sig-
nificant change, where the relative frequency of the fragment is at least 25% in the
pre-drift or post-drift process tree. The first real-life log,5 obtained from 4TU Data
Centrum,6 contains events from a ticketing management process of the help desk of an
Italian software company. This log contains 21348 events, from 14 activities, and 4580
traces, out of which 226 are distinct. We used the drift detection technique presented in
Chapter 3, by initializing its adaptive windows with 1000 events, and detected 2 drifts
in this log. The first drift occurs at the event index 8757, corresponding to the date
July 25th 2011, and the second one occurs at the event index 17307, corresponding
to the date September 11th 2012. We characterized these two drifts by applying our
fragment-based method to the sub-logs extracted from before and after each drift. The
transformation of the pre-drift process tree to the post-drift process tree over the first
drift is illustrated in Figure 5.27. For the first drift, our method produced a single state-
ment, reporting on the possibility of skipping the sub-tree marked as ”Fragment 1” in
Figure 5.27 after the occurrence of the drift. We did not have access to a ground truth
to validate the obtained results. Therefore, as an alternative we analyzed the directly
follows graph of the sub-logs from before and after the drift, shown in Figures 5.28a
5https://doi.org/10.4121/uuid:0c60edf1-6f83-4e75-9367-4c63b3e9d5bb6https://data.4tu.nl/repository/
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 134
5.8. EVALUATION ON REAL-LIFE LOGS
and 5.28b, respectively, to verify the accuracy of the results. We observed the ap-
pearance of a directly follows relation from activity “Assign seriousness” to activity
“Resolve ticket” after the drift. This finding aligns with the output of our method for
this drift.
Assign seriousness
?
Take in charge ticket
?
X
Wait
Resolve ticket
Closed Assign seriousness
Resolve ticket
Closed
?
Take in charge ticket
?
X
Wait
X
Fragment 1
Fragment 1
Make Fragment 1 skippable
Pre-drift process tree Post-drift process tree
Figure 5.27: Transformation of pre-drift process tree to post-drift process tree over thefirst drift in the ticketing management process.
Take in charge ticket
Assign seriousness
Wait
Resolve ticket Closed
Event
Take in charge ticket
Assign seriousness Wait
Resolve ticket Closed
Fragment 1
Take in charge ticket
Assign seriousness
Wait
Resolve ticket Closed
Event
Take in charge ticket
Assign seriousness Wait
Resolve ticket Closed
Fragment 1
(a) Directly follows graph before the first drift.
Take in charge ticket
Assign seriousness
Wait
Resolve ticket Closed
Event
Take in charge ticket
Assign seriousness Wait
Resolve ticket Closed
(b) Directly follows graph after the first drift.
Figure 5.28: Directly follows graphs of the ticketing management process before andafter the first drift.
For the second drift, our method discovered two changes. The transformation of the
pre-drift process tree to the post-drift process tree over the second drift is illustrated in
Figure 5.29. The first discovered change indicates a significant decrease in the relative
frequency of the τ-node 8 from 80% to 40%, while the second change indicates a
significant increase in the relative frequency of activity “Wait” from 18% to 51%.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 135
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
These two changes are related as activity “Wait” and the τ-node 8 are parented by the
same ×-node 5. To evaluate the accuracy of these changes we have drawn the directly
follows graph of the sub-logs from before and after the drift in Figures 5.30a and 5.30b.
The pre-drift and post-drift graphs show that the frequencies of the outgoing arcs from
activity “Take in charge ticket” to activities “Resolve ticket”, “Wait”, and “Require
upgrade” have changed from 151 (80% of total), 35 (18% of total), and 4 (2% of
total) to 66 (40% of total), 82 (51% of total), and 14 (9% of total), respectively, out of
which the first two changes are considered as significant and reported by our method.
Moreover, in the corresponding process trees, the change in the relative frequency of
activity “Resolve ticket” manifests itself by a change in the relative frequency of the
τ-node 8. These findings conform to the characterization of this drift by our method.
Our method completed the characterization of the first and the second drifts in
330ms and 350ms, respectively. It is worth mentioning that since the drift detection
technique is designed to detect sudden drifts, gradual changes occurred in the process
over the period from the first drift to the second drift, e.g. the insertion of activity
“Require upgrade”, did not trigger the detection of another drift over this period.
Assign seriousness
Take in charge ticket
X
Resolve ticket
Closed
5
Assign seriousness
Resolve ticket
ClosedX
Take in charge ticket
X5
X
80% 51%9% 40%
Require upgrade
Wait Require upgrade
Wait
18%2%
Change branching frequency
Pre-drift process tree Post-drift process tree
8 8
Figure 5.29: Transformation of pre-drift process tree to post-drift process tree over thesecond drift in the ticketing management process.
We also applied the activity-based characterization method to the discovered drifts
in this log, but this method failed to characterize the drifts as it did not report any
changes.
In the second experiment, we employed our fragment-based method to character-
ize drifts in an event log originating from the claims management system of a large
Australian insurance company. The log consists of 61413 events, referring to twelve
distinct activities, and 16365 traces, out of which 172 are distinct. It records cases
of a windscreen claims handling process over a period of 13 months between 2011
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 136
5.8. EVALUATION ON REAL-LIFE LOGS
Take in charge ticket
Assign seriousness
WaitResolve
ticket Closed35
Require upgrade4
151
202 39
3
23
61
Take in charge ticket
Assign seriousness Wait
Resolve ticket Closed
Event
18%
Require upgrade2%
80%
(a) Directly follows graph before the second drift.
Take in charge ticket
Assign seriousness Wait
Resolve ticket Closed
Event
51%
Require upgrade9%
40%
Take in charge ticket
Assign seriousness
WaitResolve
ticket Closed82
Require upgrade14
66
152 68
12
7
142
(b) Directly follows graph after the second drift.
Figure 5.30: Directly follows graphs of ticketing management process before and afterthe second drift.
and 2012. Using our drift detection technique (see Chapter 3) with an adaptive win-
dow size initialized with 7000 events, we detected one drift in this log, at the event
index 13821, corresponding to the date September 19th, 2011. Next, we used our
fragment-based method to characterize this drift. The transformation of the pre-drift
process tree to the post-drift process tree over this drift is illustrated in Figure 5.31. Our
method discovered that Fragment 1 consisting of three sequential activities “Identify
Nil Recovery or Settlement Potential”, “Review Invoice - Motor Glass”, and “Conduct
File Review” in the pre-drift process tree is substituted by Fragment 2 consisting of
two concurrent activities “Confirm Nil Recovery or Settlement Potential” and “Invoice
Paid” in the post-drift process tree. Our method completed the characterization of this
drift in 280ms. We then validated these results with a business analyst from the insur-
ance company, who confirmed our findings and explained the reasons underlying the
identified changes.
Before the drift, by performing activity “Identify Nil Recovery or Settlement Po-
tential” the company tried to claim a fraction of the money paid for every accident
case from other insurance companies involved in the accident. However, as perform-
ing this task for all cases proved to be costly, they decided to perform it only for
cases with certain characteristics, e.g. cases whose cost is below a certain threshold.
Therefore they substituted this activity by a new activity, named “Confirm Nil Recov-
ery or Settlement Potential”. Moreover, during the same time period they automated
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 137
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
Lodge Claim For Glass Only
Close Claim
^
Identify Nil Recovery or Settlement
Potential
X
Review Invoice - Motor Glass
X
Conduct File Review
X
Authorise Services Tax
Invoice
Lodge Claim For Glass Only
Close Claim
^
X
Authorise Services Tax
Invoice
Confirm Nil Recovery or Settlement
Potential
Invoice Paid
Fragment 1
Fragment 2
Substitute Fragment 1 with
Fragment 2
Pre-drift process tree Post-drift process tree
Figure 5.31: Transformation of pre-drift process tree to post-drift process tree over thedrift in the claim handling process.
invoice payments, by removing two activities “Review Invoice - Motor Glass” and
“Conduct File Review” and introducing a new activity, named “Invoice Paid”. These
two changes resulted in the substitution of Fragment 1 consisting of three sequential
activities “Identify Nil Recovery or Settlement Potential”, “Review Invoice - Motor
Glass” and “Conduct File Review” in the pre-drift process tree by Fragment 2 consist-
ing of two concurrent activities “Confirm Nil Recovery or Settlement Potential” and
“Invoice Paid” in the post-drift process tree.
We also applied the activity-based characterization method to the discovered drift
in this log. However, this method could only explain the removal of activity “Identify
Nil Recovery or Settlement Potential” from the process, and failed to discover the other
changes.
5.9 Discussion
The accuracy of our fragment-level drift characterization method is dependent on the
accuracy of the process trees discovered by IM from the sub-logs before and after a
drift. The accuracy of a process tree is measured by means of fitness and precision.
Fitness indicates how much of the process behavior in the log is reproducible by the
discovered process tree, while precision quantifies the fraction of the behavior allowed
by the process tree which is not seen in the log.
IM discovers a process tree by recursively structuring process behavior in a log into
smaller process trees in a top-down manner. At a recursion where IM fails to discover
a process tree that precisely expresses observed process behavior, it generalizes from
the behavior by selecting a fall through, resulting in the discovery of a process tree
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 138
5.9. DISCUSSION
with lower precision. Several fall throughs have been defined (see [72]), decreasing in
precision, in the worst case leading to a flower model that allows any behavior with the
activities in the event log. This latter issue is know as over-generalization of process
behavior. Here, we discuss a few factors that may contribute to this issue.
The pre-drfit and post-drift sub-logs used for drift characterization should be large
enough to contain a behaviorally representative sample of process executions. In our
experiments in this chapter, we used sub-logs extracted from the events within drift
detection windows (cf. Chapter 3) to discover process trees from before and after
a drift. The size of these windows adapts to the behavioral variability of the log,
ensuring that there are sufficient events within each window to fully capture the process
behavior.
A related problem that often leads to over-generalization is when there are devia-
tions from the normal process behavior in the log, a.k.a. noise. To fit these deviations
into the discovered process tree IM may have to over-generalize from the main process
behavior. One way to tackle this problem is to filter out the noise from the log before
discovering a process tree. There are several noise filtering techniques for event logs,
e.g. [22], and for event streams, e.g. [122]. Alternatively, we can use IMfpt, i.e. a
variant of IM that filters out infrequent behavior (noise) from a log before discovering
a process tree. To avoid introducing false differences between pre-drfit and post-drift
process trees as a result of noise filtering, a process behavior should be treated as noise
if it does not meet the filtering requirements in both pre-drfit and post-drift sub-logs.
Finally, if the process behavior in a log is generated by an unstructured process, IM
may overgeneralize from the behavior to discover a block-structured process tree.
However, even in the scenarios where IM discovers a flower model our method
may still be able to characterize some process changes. To investigate the usability
of our method for logs for which IM discovers a flower model, we simulated a log
from a partially unstructured Petri net model shown in Figure 5.32a. We injected a
drift in the log by applying three changes to the model: • we made Fragment 1, i.e.
an internally unstructured SESE fragment, and activity ‘h’ mutually-exclusive; • we
deleted activity ‘d’ from Fragment 1; and • we swapped the two activities ‘f’ and ‘g’.
We used the drift detection method presented in Chapter 3 to detect the drift. The
transformation of the pre-drift process tree to the post-drift process tree over this drift
is shown in Figure 5.32b. Our method was able to identify the first two changes, while
it missed the third change. The first change is identified as it is applied to Fragment
1 as a whole and activity ‘h’, while the deletion of activity ‘d’ is identifiable without
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 139
CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL
any knowledge of the internal structure of Fragment 1. On the other hand, IM over-
generalized the internal behavior of Fragment 1 and discovered a flower model as the
internal model of this fragment. As such, our method missed the third change, i.e. the
swap of the two activities ‘f’ and ‘g’. This experiment showed that our method can still
be used for characterizing external changes as well as certain types of internal changes,
e.g. insertion/deletion of activities, applied to fragments containing flower models.
a
c
b
d
e
g
f
h i a
c
b
e
f
g
h
i
Fragment 1Fragment 1
a ?
X
gfedcb
h i a
?
X
gfecb
iX
h
Fragment 1
Fragment 1
Make Fragment 1 and activity 'h'
mutually-exclusive
Delete activity 'd'
Pre-drift process tree Post-drift process tree(a) Petri net models before and after the drift
a
c
b
d
e
g
f
h i a
c
b
e
f
g
h
i
Fragment 1Fragment 1
a ?
X
gfedcb
h i a
?
X
gfecb
iX
h
Fragment 1
Fragment 1
Make Fragment 1 and activity 'h'
mutually-exclusive
Delete activity 'd'
Pre-drift process tree Post-drift process tree
(b) Transformation of the pre-drift process tree to the post-drift process treeover the drift.
Figure 5.32: Example of drift characterization in a partially unstructured process.
To conclude, the process discovery component of our drift characterization method
is isolated from other components, and as such IM can be replaced by any other process
discovery technique that is able to discover process trees from event streams. There-
fore, future advancements in process discovery from event streams can also enhance
the accuracy of our method.
5.10 Summary
In this chapter, we presented a robust, automated method for characterizing process
drifts at the level of fragments, from event streams. We first adapted a state-of-the-
art process discovery technique, Inductive Miner (IM), to discover process trees, i.e.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 140
5.10. SUMMARY
block-structured process models, from event streams. Next, we used this technique to
discover two process trees, one from the portion of an event stream just before a given
drift, and the other from the portion of stream just after the stream. We then presented
a process tree transformation technique that finds a minimum-cost sequence of edit
operations to transform a pre-drift process tree to a post-drift process tree. The search
for such a sequence is guided by means of process tree mappings, and is supported
by two search algorithms, an exhaustive A*-based algorithm and a fast greedy algo-
rithm, which find the optimal solution or a close approximation of it. The definition of
edit operations and their costs is such that the method is able to characterize changes
applied to fragments of any size, from individual activities to larger fragments. More-
over, the hierarchical structure of process trees allows the characterization of complex
changes such as overlapping changes as well as nested changes. Furthermore, as the
edit operations are defined based on a well-established set of typical business process
change patterns, the identified fragment-level changes can easily be translated into con-
cise natural language statements based on those patterns. Finally, the proposed method
can also characterize process drifts detected from events logs of complete traces, and
can also be used on top of any process drift detection technique so long as it is fed with
a pre-drift and a post-drift sub-log.
We extensively evaluated our method for fragment-level drift characterization us-
ing both highly variable artificial logs, with and without noise, as well as two real-life
logs. The results on the artificial logs show that our method is fast, noise-tolerant,
highly accurate and concise in characterizing drifts induced by the application of typi-
cal process changes to fragments of different size. Furthermore, when using the greedy
algorithm for process tree transformation, the method can scale up to the extent that it
can work in real-time. In the experiments with real-life logs, our method could fully
characterize the identified drifts. Despite the lack of a ground truth to validate the re-
sults in the experiment with the log of the ticketing management process, the results
were supported by various observations from the log. For the experiment with the
log of the insurance claims management process, a business analyst who works with
the process in question confirmed our findings. While our fragment-level drift char-
acterization method outperforms the activity-level characterization method presented
in Chapter 4, as far as non-overlapping and overlapping fragments are concerned, the
latter method provides more accurate results when drifts involve individual activities.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 141
Chapter 6
Conclusion
Today’s business processes are designed to support flexibility and change to remain
effective and efficient in the dynamic environments in which they operate. This allows
process stakeholders to change the way in which they execute processes in response
to various factors such as changes in regulations, supply, demand as well as internal
changes in resource capacity or workload, or simply changes in seasonal factors. Some
process changes are planned ahead, while others occur unexpectedly and are often
undocumented. Examples of the latter are process changes undertaken by individuals
as workarounds in emergency situations or changes due to the replacement of human
resources. Such changes over time may reduce process performance and in general
undermine process improvement initiatives.
In this regard, several techniques have been proposed to detect process drifts, i.e.
statistically significant changes in the process behavior. However, existing techniques
have some limitations. First, they do not work with event streams, and as such are not
able to detect intra-trace drifts, or they detect them with a long delay. Although de-
tecting drifts in an offline setting from historical event logs is helpful for post-mortem
analysis, organizations can fully exploit the benefits of drift detection only when it is
deployed in an online setting over streams of events, as it enables process stakeholders
to take timely corrective measures and avoid or reduce the impact of unintended con-
sequences. Furthermore, existing techniques do not perform well with highly-variable
business processes, e.g. hospital processes, whose logs feature high trace variability.
Finally, they only focus on the detection of drifts in event logs without providing any
solution for characterizing process changes underpinning the drifts.
In this thesis we tackled three research questions:
i) How to detect a drift from an event stream or event log of a business process?
143
CHAPTER 6. CONCLUSION
ii) How to characterize a process drift at activity level from an event stream or event
log of a business process?
iii) How to characterize a process drift at fragment level from an event stream or event
log of a business process?
To tackle the first research question, we proposed a fully-automated method for de-
tecting drifts from event streams of business processes. We performed statistical tests
over distributions of behavioral relations between activities such as causality, conflict
and concurrency, as observed from two juxtaposed windows of adjustable size, sliding
along with the stream. Given that behavioral relations between activities are a type
of sub-trace features, the method does not suffer from low accuracy when the log is
highly variable. Furthermore, the method is capable of detecting inter-trace as well as
intra-trace drifts. By replaying an event log as an event stream the proposed method
can also be used for detecting drifts in event logs.
To tackle the second research question, we proposed a fully-automated method for
characterizing process drifts at the level of individual activities from event streams.
For each detected drift, we first extract behavioral relations between activities such as
causality, concurrency and conflict, from event streams before and after a drift. By per-
forming a statistical test we then assess the significance of associations between each
behavioral relation and the drift. This allows us to identify relations with the highest
explanatory power with respect to the drift. Those relations are then mapped to a pre-
defined set of change templates. Finally, the best-matching templates are reported to
the user as natural language statements. The collection of change templates that we
use to describe a drift is based on a well-established categorization of common busi-
ness process change patterns and can also easily be extended. Moreover, the method
may also be used on top of any process drift detection so long as it is provided with
the point in which a drift occurs. By replaying an event log as an event stream, the
proposed method can as well be used to characterize drifts in event logs. Furthermore,
the method can scale up to the extent that it can work in real-time. To the best of
our knowledge, this is the first method that provides a systematic solution to the drift
characterization problem.
To tackle the third question, we presented a fully-automated method for charac-
terizing process drifts at the level of fragments, from event streams. We first adapted
Inductive Miner to discover process trees from event streams. Next, we used this tech-
nique to discover two process trees, one from the portion of an event stream just before
a given drift, and the other from the portion of stream just after the stream. We then
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 144
presented a process tree transformation technique that finds a minimum-cost sequence
of edit operations to transform a pre-drift process tree to a post-drift process tree. The
search for such a sequence is guided by means of process tree mappings, and is sup-
ported by two search algorithms, an exhaustive A*-based algorithm and a fast greedy,
which respectively, find and closely approximate the optimal solution. The definition
of edit operations and their costs is such that the method is able to characterize changes
applied to fragments of any size, from individual activities to larger fragments. More-
over, the hierarchical structure of process trees allows the characterization of complex
changes such as overlapping changes as well as nested changes. As the edit operations
are defined based on common change patterns, the identified fragment-level changes
can easily be translated into concise natural language statements based on those pat-
terns. Furthermore, the proposed method can characterize process drifts detected from
events logs of complete traces, and can also be used on top of any process drift de-
tection technique so long as it is fed with a pre-drift and a post-drift sub-log. Finally,
when using the greedy algorithm for process tree transformation, the method can scale
up to the extent that it can work in real-time. To the best of our knowledge, this is the
first method that can characterize a process drift at the level of fragments.
We implemented the proposed methods as plug-ins for the open-source process
analytics platform Apromore as well as a standalone command-line tool, namely Pro-
Drift. Using the latter, we extensively evaluated the accuracy and scalability of the
three proposed methods by simulating event streams from highly variable artificial
and real-life logs. The results of the experiments show that the three methods sat-
isfy all the evaluation criteria defined in this thesis. Specifically, the proposed drift
detection method is able to detect process drifts induced by the application of com-
mon change patterns with high accuracy and minimum delay. In doing so, it does
not need any manual intervention and can scale up to the extent that it can work in
real-time. The proposed drift characterization methods are able to work without any
manual intervention and can accurately characterize common change patterns via ex-
planatory natural language statements. By comparing the two methods for character-
izing changes involving fragments of different sizes, in event streams with and without
noise, we observed that while both methods can handle noise well, the method pro-
posed in Chapter 4 is well-suited for non-overlapping activity-level changes, as it uses
features with lower levels of abstraction to capture the process behavior, and benefits
from a statistically-grounded mechanism for identifying change patterns that best ex-
plain a drift. On the other hand, the method proposed in Chapter 5 performs better
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 145
CHAPTER 6. CONCLUSION
at characterizing changes applied to fragments of multiple activities as well as nested
changes. The accuracy of the latter method is dependent on the accuracy of process
trees discovered by IM from before and after a drift. As such, this method is capable
of identifying a complete set of process changes underpinning a drift so long as they
manifest themselves in the discovered process trees.
There are several ways the work presented in this thesis can be extended. As de-
scribed in Chapter 2, there are four classes of drifts: sudden, gradual, recurring and
incremental. The drift detection method in Chapter 3 focuses on detecting sudden
drifts. As such, when applied to event streams containing other classes of drifts it
either fails to detect them or it detects them as sudden drifts. Therefore, an avenue
for future work is to devise methods for detecting other classes of drifts from event
streams. For gradual drifts, our method may easily be extended by the same strategy
used in [78] for detecting gradual drifts in trace streams. That is, to apply a statistical
test to the behavioral relations between two consecutive sudden drifts to determine if
those sudden drifts represent separate changes, or they define the start and end of a
single gradual drift. A recurring drift may also be detected by first detecting two con-
secutive sudden drifts and then using a statistical test to determine if the distributions of
behavioral relations before the first and after the second drift are the same. Similarly,
an incremental drift may be identified by detecting a sequence of minor sudden drifts
using statistical tests on the distributions of behavioral relations in smaller windows
sliding over the event stream between two major sudden drifts.
Another avenue for future work is to characterize other classes of drifts. In this
respect, it is particularly interesting to characterize gradual drifts and understand how
process behavior transitions over time as well as incremental drifts, and identify pro-
cess changes in each increment.
The drift detection and characterization methods presented in this thesis assume
no a-priori knowledge of process models, i.e. they only rely on event data to detect
and characterize a process drift. However, some of the drifts identified may be mod-
eled in a corresponding business process. As such, by looking at a normative model
of the process we can differentiate between actual drifts in the process behavior and
drifts caused by the observation of a process behavior that is modeled but has not been
executed before (false positives). Therefore, another direction for future work is to en-
hance the proposed methods to benefit from valuable information provided by existing
process models when detecting and characterizing process drifts. A starting point here
would be to use a conformance checking technique [117, 123, 86, 41, 101] to verify
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 146
how much of the process behavior observed in the event data after a detected drift can
be replayed by a corresponding process model.
A change may impact more than one process within an organization, e.g. a new
regulation requires a new check in multiple processes of an organization. Therefore, a
further direction for future work is to study the relation between changes in different
processes within an organization and develop methods to identify organizational level
changes. For example, discovering drifts over the same time period in the behavior
of multiple processes of an organization may indicate the existence of an underlying
change on the organizational level during that time period.
Another opportunity for future work is to study the interplay between changes
in the process control flow and changes in other process perspectives. For example,
changes in the control flow may be induced by changes in the data or resource per-
spectives of the process. A starting point is to look at the work in [94], which analyses
the dynamics of human resource behavior as observed from event logs, as well as the
time series analysis approach in [53], which detects cause-effect relations between a
set of business process characteristics and process performance indicators.
In chapter 4 we pre-defined a set of change templates and proposed a method to
identify valid instantiations of those templates to characterize a drift. However, drifts
may also occur due to changes that follow a different pattern than those already defined.
Consequently, such drifts do not engender the instantiation of any of those pre-defined
change templates. Therefore, another avenue for future work is to develop a technique
to automatically learn new change templates from detected drifts.
Another avenue for future work is to provide a visual description of the change
patterns identified by a drift characterization method as a simple and effective way to
communicate the characteristics of the drift, as in [8, 21].
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 147
Appendix A
Notation
The notation used in this thesis is summarized below.
Notation MeaningL Set of activity labelsα+ α+ algorithm [25]a >L b Directly follows relation from label a to label b in log La4Lb Length-two loop relation from label a to label b in log La�L b Length-two loop relation between label a and label b in log La→L b Causality relation from label a to label b in log La ‖L b Parallel relation between label a and label b in log La#Lb Conflict relation between label a and label b in log LΦ(w) Returns the size of oscillation filter, i.e. the number of consecutive sta-
tistical tests whose P–value should remain bellow a certain threshold todeclare a drift
RFC Relative frequency change of an α+ relation over a driftTRFC Total relative frequency change, i.e. the sum of RFCs of all α+ relations
over a driftCRFC Cumulative relative frequency change, computed by x% · TRFC, where
x% indicates the proportion of TRFC over a drift that is used for driftcharacterization
Rank(b,�,B) Returns the index of b in finite set B with total order �ID,T Valid instantiation of template T through drift feature set D, mapping
every variable in T to a label from DC (ID,T ) Confidence of drift feature set D matching template T through ID,T , in-
dicating the quality of ranking relations in D with regards to their prede-fined importance in T
iC (T ) Ideal confidence of template T , indicating the highest possible confidencefor T
nC (ID,T ) Normalized confidence of drift feature set D matching template Tthrough ID,T , obtained by dividing C (ID,T ) by iC (T )
V (T ) Set of nodes in tree TE(T ) Set of edges in tree T|T | Size of tree T, equaling |V (T )|root(T ) Root node of tree TT 〈v〉 Subtree of tree T rooted at node v ∈ T
149
Appendix A
DownT (v) Sequence of nodes on the shortest path from root(T ) to node v ∈ Tleaves(v) Set of leaves under internal node vl(v) Label of node vdep(v) Depth of node v ∈ T , equaling |DownT (v)|−1dep(T ) Depth of tree T , equaling the maximum depth of its nodesCA(v1, . . . ,vn) Set of common ancestors of nodes v1, . . . ,vn in tree T , i.e. nodes in
DownT (v1)∩ . . .∩DownT (vn)LCA(v1, . . . ,vn) Lowest common ancestor of nodes v1, . . . ,vn in tree T , i.e. the deepest
node in CA(v1, . . . ,vn)LCA(T 〈v1〉, . . . ,T 〈vn〉) Lowest common ancestor of subtrees T 〈v1〉, . . . ,T 〈vn〉, i.e.
LCA(v1, . . . ,vn)× Exclusive choice operator∧ Concurrency operator→ Sequence operator Loop operatorP =⊕(P1, . . .Pn) Process tree P rooted at operator node ⊕ with subtrees P1 . . .Pnτ-node Leaf node t in process tree, representing the language with the empty
trace, l(t) ∈ {τ}C(v) Set of activity nodes under operator node v ∈ P, containing the activity
nodes in P〈v〉preP(v) Pre-order index of node v in process tree PP[i] Node with the pre-order index of i in PRankk(v,⊕) Returns the rank of node v in ordered operator node ⊕S Singularity reduction ruleA× Associativity reduction rule for × operatorA∧ Associativity reduction rule for ∧ operatorA→ Associativity reduction rule for→ operatorT→ τ reduction rule for→ operatorT∧ τ reduction rule for ∧ operatorSUB⊕ Operator substitution edit operationSUBac Activity substitution edit operationD f Fragment deletion edit operationD -operator deletion edit operationI f Fragment insertion edit operationI -operator insertion edit operationLMAsM(v, v′) Lowest mapped ancestors (LMAs) of nodes v and v′ in mapping MMST (P, P′) Mapping search tree between process trees P and P′
g∗(v) Returns the mapping cost up to node v in a mapping search treeh∗(v) Returns an estimation of the cost of mapping nodes in P and P′ that have
not yet been mapped up to node v ∈MST (P, P′)stringify(F) Returns a unique and stable textual representation of fragment F
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 150
Bibliography
[1] R. Accorsi and T. Stocker. Discovering workflow changes with time-based trace
clustering. In Data-Driven Process Discovery and Analysis. Springer, 2012.
[2] I. Ada and M. R. Berthold. Eve: a framework for event detection. Evolving
systems, 4(1):61–70, 2013.
[3] M. Adams, A. H. Ter Hofstede, D. Edmond, and W. M. Van Der Aalst. Worklets:
A service-oriented implementation of dynamic flexibility in workflows. In OTM
Confederated International Conferences” On the Move to Meaningful Internet
Systems”, pages 291–308. Springer, 2006.
[4] M. J. Adams. Facilitating dynamic flexibility and exception handling for work-
flows. PhD thesis, Queensland University of Technology, 2007.
[5] T. Akutsu, D. Fukagawa, A. Takasu, and T. Tamura. Exact algorithms for com-
puting the tree edit distance between unordered trees. Theoretical Computer
Science, 412(4-5):352–364, 2011.
[6] H. H. Ang, V. Gopalkrishnan, I. Zliobaite, M. Pechenizkiy, and S. C. Hoi. Pre-
dictive handling of asynchronous concept drifts in distributed environments.
IEEE Transactions on Knowledge and Data Engineering, 25(10):2343–2355,
2013.
[7] P. Arabie and L. J. Hubert. An overview of combinatorial data. Clustering and
classification, page 5, 1996.
[8] A. Armas-Cervantes, P. Baldan, M. Dumas, and L. Garcıa-Banuelos. Behavioral
comparison of process models based on canonically reduced event structures. In
BPM. Springer, 2014.
[9] P. Berkhin. A survey of clustering data mining techniques. In Grouping multi-
dimensional data, pages 25–71. Springer, 2006.
151
BIBLIOGRAPHY
[10] A. Bolt, M. de Leoni, and W. M. van der Aalst. Process variant comparison: Us-
ing event logs to detect differences in behavior and business rules. Information
Systems, 2017.
[11] R. P. J. C. Bose, W. M. P. van der Aalst, I. Zliobaite, and M. Pechenizkiy. Han-
dling concept drift in process mining. In CAiSE. Springer, 2011.
[12] R. P. J. C. Bose, W. M. P. van der Aalst, I. Zliobaite, and M. Pechenizkiy. Deal-
ing with concept drifts in process mining. IEEE Transactions on NNLS, 2014.
[13] A. Bouchachia. Fuzzy classification in dynamic environments. Soft Computing,
15(5):1009–1022, 2011.
[14] J. C. Buijs, M. La Rosa, H. A. Reijers, B. F. van Dongen, and W. M. van der
Aalst. Improving business process models using observed behavior. In Inter-
national Symposium on Data-Driven Process Discovery and Analysis, pages
44–59. Springer, 2012.
[15] J. C. Buijs, B. F. Van Dongen, and W. M. van Der Aalst. On the role of fitness,
precision, generalization and simplicity in process discovery. In OTM Confeder-
ated International Conferences” On the Move to Meaningful Internet Systems”,
pages 305–322. Springer, 2012.
[16] A. Burattin, M. Cimitile, F. M. Maggi, and A. Sperduti. Online Discovery of
Declarative Process Models from Event Streams. IEEE Trans. on Services Com-
puting, 8:833–846, 2015.
[17] A. Burattin, A. Sperduti, and W. M. van der Aalst. Control-flow discovery from
event streams. In Evolutionary Computation (CEC), 2014 IEEE Congress on,
pages 2420–2427. IEEE, 2014.
[18] J. Carmona and R. Gavalda. Online techniques for dealing with concept drift
in process mining. In International Symposium on Intelligent Data Analysis.
Springer, 2012.
[19] P. Castagliola, G. Celano, S. Fichera, and G. Nenes. The variable sample size
t control chart for monitoring short production runs. The International Journal
of Advanced Manufacturing Technology, 66(9-12):1353–1366, 2013.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 152
BIBLIOGRAPHY
[20] G. Celano, A. Costa, and S. Fichera. Statistical design of variable sample size
and sampling interval x control charts with run rules. The International Journal
of Advanced Manufacturing Technology, 28(9-10):966–977, 2006.
[21] A. A. Cervantes, N. R. van Beest, M. La Rosa, M. Dumas, and L. Garcıa-
Banuelos. Interactive and incremental business process model repair. In OTM
Confederated International Conferences” On the Move to Meaningful Internet
Systems”, pages 53–74. Springer, 2017.
[22] R. Conforti, M. La Rosa, and A. H. ter Hofstede. Filtering out infrequent be-
havior from business process event logs. IEEE Transactions on Knowledge and
Data Engineering, 29(2):300–314, 2017.
[23] T. Dasu, S. Krishnan, S. Venkatasubramanian, and K. Yi. An information-
theoretic approach to detecting changes in multi-dimensional data streams. In In
Proc. Symp. on the Interface of Statistics, Computing Science, and Applications.
Citeseer, 2006.
[24] I. Davies, P. Green, M. Rosemann, M. Indulska, and S. Gallo. How do practi-
tioners use conceptual modeling in practice? Data & Knowledge Engineering,
58(3):358–380, 2006.
[25] A. A. de Medeiros, B. F. van Dongen, W. M. P. Van der Aalst, and A. Weijters.
Process mining: Extending the α-algorithm to mine short loops. Technical
report, 2004.
[26] J. de San Pedro and J. Cortadella. Discovering duplicate tasks in transition
systems for the simplification of process models. In International Conference
on Business Process Management, pages 108–124. Springer, 2016.
[27] E. D. Demaine, S. Mozes, B. Rossman, and O. Weimann. An optimal de-
composition algorithm for tree edit distance. ACM Transactions on Algorithms
(TALG), 6(1):2, 2009.
[28] A. Dries and U. Ruckert. Adaptive concept drift detection. Statistical Analysis
and Data Mining: The ASA Data Science Journal, 2(5-6):311–327, 2009.
[29] R. O. Duda, P. E. Hart, et al. Pattern classification and scene analysis, volume 3.
Wiley New York, 1973.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 153
BIBLIOGRAPHY
[30] C. Dugast, P. Beyerlein, and R. Haeb-Umbach. Application of clustering tech-
niques to mixture density modelling for continuous-speech recognition. In
Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 Interna-
tional Conference on, volume 1, pages 524–527. IEEE, 1995.
[31] S. Dulucq and H. Touzet. Decomposition algorithms for the tree edit distance
problem. Journal of Discrete Algorithms, 3(2):448–471, 2005.
[32] M. Dumas, M. La Rosa, J. Mendling, H. A. Reijers, et al. Fundamentals of
business process management, volume 1. Springer, 2013.
[33] R. Elwell and R. Polikar. Incremental learning of concept drift in nonstation-
ary environments. IEEE Transactions on Neural Networks, 22(10):1517–1531,
2011.
[34] D. Fahland and W. M. van der Aalst. Model repair—aligning process models to
reality. Information Systems, 47:220–243, 2015.
[35] G. Forman. Tackling concept drift by temporal inductive transfer. In Proceed-
ings of the 29th annual international ACM SIGIR conference on Research and
development in information retrieval, pages 252–259. ACM, 2006.
[36] E. Frank and I. H. Witten. Using a permutation test for attribute selection in de-
cision trees. In International Conference on Machine Learning. Morgan Kauf-
mann, 1998.
[37] D. Fukagawa, T. Tamura, A. Takasu, E. Tomita, and T. Akutsu. A clique-based
method for the edit distance between unordered trees and its application to anal-
ysis of glycan structures. BMC bioinformatics, 12(1):S13, 2011.
[38] J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detec-
tion. In Brazilian symposium on artificial intelligence, pages 286–295. Springer,
2004.
[39] J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia. A survey
on concept drift adaptation. ACM Computing Surveys (CSUR), 2014.
[40] J. Gao, W. Fan, J. Han, and P. S. Yu. A general framework for mining concept-
drifting data streams with skewed distributions. In Proceedings of the 2007
SIAM International Conference on Data Mining, pages 3–14. SIAM, 2007.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 154
BIBLIOGRAPHY
[41] L. Garcıa-Banuelos, N. R. van Beest, M. Dumas, M. La Rosa, and W. Mertens.
Complete and interpretable conformance checking of business processes. IEEE
Transactions on Software Engineering, 44(3):262–290, 2018.
[42] J. Gebauer and F. Schober. Information system flexibility and the cost effi-
ciency of business processes. Journal of the Association for Information Sys-
tems, 7(3):8, 2006.
[43] J. B. Gomes, E. Menasalvas, and P. A. Sousa. Learning recurring concepts from
data streams with a context-aware ensemble. In Proceedings of the 2011 ACM
symposium on applied computing, pages 994–999. ACM, 2011.
[44] C. W. Gunther, S. Rinderle-Ma, M. Reichert, W. M. Van Der Aalst, and J.
Recker. Using process mining to learn from process changes in evolutionary
systems. International Journal of Business Process Integration and Manage-
ment, 3(1):61–78, 2008.
[45] T. Hagerup and C. Rub. A guided tour of chernoff bounds. Information pro-
cessing letters, 33(6):305–308, 1990.
[46] J. Han, M. Kamber, and J. Pei. Data mining, southeast asia edition: Concepts
and techniques. Morgan kaufmann, 2006.
[47] S. Haridy, A. Maged, S. Kaytbay, and S. Araby. Effect of sample size on the
performance of shewhart control charts. The International Journal of Advanced
Manufacturing Technology, 90(1-4):1177–1185, 2017.
[48] P. Harremoes and G. Tusnady. Information divergence is more χ2-distributed
than the χ2-statistics. IEEE ISIT, pages 533–537, 2012.
[49] P. Heinl, S. Horn, S. Jablonski, J. Neeb, K. Stein, and M. Teschke. A com-
prehensive approach to flexibility in workflow management systems. In ACM
SIGSOFT Software Engineering Notes, volume 24, pages 79–88. ACM, 1999.
[50] D. P. Helmbold and P. M. Long. Tracking drifting concepts by minimizing
disagreements. Machine learning, 14(1):27–45, 1994.
[51] S. Higuchi, T. Kan, Y. Yamamoto, and K. Hirata. An a* algorithm for computing
edit distance between rooted labeled unordered trees. In JSAI-isAI Workshops,
pages 186–196. Springer, 2011.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 155
BIBLIOGRAPHY
[52] S.-S. Ho. A martingale framework for concept change detection in time-varying
data streams. In Proc. of ICML, pages 321–327. ACM, 2005.
[53] B. F. Hompes, A. Maaradji, M. La Rosa, M. Dumas, J. C. Buijs, and W. M.
van der Aalst. Discovering causal factors explaining business process perfor-
mance variation. In International Conference on Advanced Information Systems
Engineering, pages 177–192. Springer, 2017.
[54] Y. Horesh, R. Mehr, and R. Unger. Designing an a* algorithm for calculating
edit distance between rooted-unordered trees. Journal of Computational Biol-
ogy, 13(6):1165–1176, 2006.
[55] K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir tech-
niques. ACM Transactions on Information Systems (TOIS), 2002.
[56] K. Jensen. Coloured petri nets. In Petri nets: central models and their proper-
ties, pages 248–299. Springer, 1987.
[57] P. N. Klein. Computing the edit-distance between unrooted ordered trees. In
ESA, volume 98, pages 91–102. Springer, 1998.
[58] R. Klinkenberg. Learning drifting concepts: Example selection vs. example
weighting. Intelligent data analysis, 8(3):281–300, 2004.
[59] R. Klinkenberg and T. Joachims. Detecting concept drift with support vector
machines. In ICML, pages 487–494, 2000.
[60] J. Z. Kolter and M. A. Maloof. Dynamic weighted majority: A new ensemble
method for tracking concept drift. In Data Mining, 2003. ICDM 2003. Third
IEEE International Conference on, pages 123–130. IEEE, 2003.
[61] S. Kondo, K. Otaki, M. Ikeda, and A. Yamamoto. Fast computation of the
tree edit distance between unordered trees using ip solvers. In International
Conference on Discovery Science, pages 156–167. Springer, 2014.
[62] P. Kosina, J. Gama, and R. Sebastiao. Drift severity metric. In ECAI, pages
1119–1120, 2010.
[63] S. Kullback and R. A. Leibler. On information and sufficiency. The annals of
mathematical statistics, 22(1):79–86, 1951.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 156
BIBLIOGRAPHY
[64] M. La Rosa, M. Dumas, R. Uba, and R. Dijkman. Business process model
merging: An approach to business process consolidation. ACM Transactions on
Software Engineering and Methodology (TOSEM), 22(2):11, 2013.
[65] M. La Rosa, H. A. Reijers, W. M. Van Der Aalst, R. M. Dijkman, J. Mendling,
M. Dumas, and L. Garcıa-Banuelos. Apromore: An advanced process model
repository. Expert Systems with Applications, 38(6):7029–7040, 2011.
[66] C. Lanquillon. Enhancing text classification to improve information filtering.
PhD thesis, Otto-von-Guericke-Universitat Magdeburg, Universitatsbibliothek,
2001.
[67] M. M. Lazarescu, S. Venkatesh, and H. H. Bui. Using multiple windows to track
concept drift. Intelligent data analysis, 8(1):29–59, 2004.
[68] S. J. J. Leemans, D. Fahland, and W. M. P. van der Aalst. Discovering block-
structured process models from event logs - A constructive approach. In J. M.
Colom and J. Desel, editors, Application and Theory of Petri Nets and Concur-
rency - 34th International Conference, PETRI NETS 2013, Milan, Italy, June
24-28, 2013. Proceedings, volume 7927 of Lecture Notes in Computer Science,
pages 311–329. Springer, 2013.
[69] S. J. J. Leemans, D. Fahland, and W. M. P. van der Aalst. Discovering block-
structured process models from event logs containing infrequent behaviour. In
N. Lohmann, M. Song, and P. Wohed, editors, Business Process Management
Workshops - BPM 2013 International Workshops, Beijing, China, August 26,
2013, Revised Papers, volume 171 of Lecture Notes in Business Information
Processing, pages 66–78. Springer, 2013.
[70] S. J. J. Leemans, D. Fahland, and W. M. P. van der Aalst. Discovering block-
structured process models from incomplete event logs. In G. Ciardo and E.
Kindler, editors, Application and Theory of Petri Nets and Concurrency - 35th
International Conference, PETRI NETS 2014, Tunis, Tunisia, June 23-27, 2014.
Proceedings, volume 8489 of Lecture Notes in Computer Science, pages 91–
110. Springer, 2014.
[71] S. J. Leemans, D. Fahland, and W. M. P. van der Aalst. Discovering block-
structured process models from event logs-a constructive approach. In Interna-
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 157
BIBLIOGRAPHY
tional Conference on Applications and Theory of Petri Nets and Concurrency.
Springer, 2013.
[72] S. Leemans. Robust process mining with guarantees. PhD thesis, Ph. D. thesis,
Eindhoven University of Technology, 2017.
[73] V. Leno, A. Armas-Cervantes, M. Dumas, M. La Rosa, and F. M. Maggi. Dis-
covering process maps from event streams. In Proceedings of the 2018 Interna-
tional Conference on Software and System Process, pages 86–95. ACM, 2018.
[74] T. Li, T. He, Z. Wang, Y. Zhang, and D. Chu. Unraveling process evolution by
handling concept drifts in process mining. In Services Computing (SCC), 2017
IEEE International Conference on, pages 442–449. IEEE, 2017.
[75] X. Lu, D. Fahland, F. J. van den Biggelaar, and W. M. van der Aalst. Handling
duplicated tasks in process discovery by refining event labels. In International
Conference on Business Process Management, pages 90–107. Springer, 2016.
[76] H. Luo and Z. Wu. Optimal np control charts with variable sample sizes or
variable sampling intervals. Economic Quality Control, 17(1):39–61, 2002.
[77] A. Maaradji, M. Dumas, M. La Rosa, and A. Ostovar. Fast and Accurate Busi-
ness Process Drift Detection. In Proc. of BPM, 2015.
[78] A. Maaradji, M. Dumas, M. La Rosa, and A. Ostovar. Detecting sudden and
gradual drifts in business processes from execution traces. IEEE Transactions
on Knowledge and Data Engineering, 29(10):2140–2154, 2017.
[79] F. M. Maggi, A. Burattin, M. Cimitile, and A. Sperduti. Online process discov-
ery to detect concept drifts in ltl-based declarative process models. In On the
Move to Meaningful Internet Systems: OTM 2013 Conferences, pages 94–111.
Springer, 2013.
[80] O. Maimon and L. Rokach. Data mining and knowledge discovery handbook,
volume 2. Springer, 2005.
[81] M. Maisenbacher and M. Weidlich. Handling concept drift in predictive process
monitoring. In Services Computing (SCC), 2017 IEEE International Conference
on, pages 1–8. IEEE, 2017.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 158
BIBLIOGRAPHY
[82] J. Martjushev, R. J. C. Bose, and W. M. van der Aalst. Change point detec-
tion and dealing with gradual and multi-order dynamics in process mining. In
International Conference on Business Informatics Research, pages 161–178.
Springer, 2015.
[83] D. Massart and L. Kaufman. The interpretation of analytical chemical data by
the use of cluster analysis. Chemical analysis. Wiley, 1983.
[84] S. Menard. Applied logistic regression analysis. Sage, 2002.
[85] L. L. Minku, A. P. White, and X. Yao. The impact of diversity on online ensem-
ble learning in the presence of concept drift. IEEE Transactions on knowledge
and Data Engineering, 22(5):730–742, 2010.
[86] J. Munoz-Gama, J. Carmona, and W. M. Van Der Aalst. Single-entry single-exit
decomposed conformance checking. Information Systems, 46:102–122, 2014.
[87] S. Muthukrishnan, E. van den Berg, and Y. Wu. Sequential change detection
on data streams. In Data Mining Workshops, 2007. ICDM Workshops 2007.
Seventh IEEE International Conference on, pages 551–550. IEEE, 2007.
[88] K. Nishida and K. Yamauchi. Detecting concept drift using statistical testing. In
International conference on discovery science, pages 264–269. Springer, 2007.
[89] R. Nuzzo. Statistical errors. Nature, 506(13):150–152, 2014.
[90] E. S. Page. Continuous inspection schemes. Biometrika, 41(1/2):100–115,
1954.
[91] M. Pawlik and N. Augsten. Rted: a robust algorithm for the tree edit distance.
Proceedings of the VLDB Endowment, 5(4):334–345, 2011.
[92] M. Pesic and W. M. Van der Aalst. A declarative approach for flexible busi-
ness processes management. In International conference on business process
management, pages 169–180. Springer, 2006.
[93] C. A. Petri. Kommunikation mit automaten. 1962.
[94] A. Pika, M. T. Wynn, C. J. Fidge, A. H. ter Hofstede, M. Leyer, and W. M. P.
van der Aalst. An extensible framework for analysing resource behaviour us-
ing event logs. In International Conference on Advanced Information Systems
Engineering, pages 564–579. Springer, 2014.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 159
BIBLIOGRAPHY
[95] A. Polyvyanyy, S. Smirnov, and M. Weske. On application of structural de-
composition for process model abstraction. In BPSC, pages 110–122. Citeseer,
2009.
[96] A. Polyvyanyy, W. M. Van Der Aalst, A. H. Ter Hofstede, and M. T. Wynn.
Impact-driven process model repair. ACM Transactions on Software Engineer-
ing and Methodology (TOSEM), 25(4):28, 2017.
[97] K. B. Pratt and G. Tschapek. Visualizing concept drift. In Proc. of the ninth
ACM SIGKDD international conference on knowledge discovery and data min-
ing. ACM, 2003.
[98] D. Redlich, T. Molka, W. Gilani, G. S. Blair, and A. Rashid. Scalable dynamic
business process discovery with the constructs competition miner. In SIMPDA,
pages 91–107, 2014.
[99] M. Reichert and B. Weber. Enabling flexibility in process-aware information
systems: challenges, methods, technologies. Springer Science & Business Me-
dia, 2012.
[100] H. A. Reijers. Design and control of workflow processes: business process
management for the service industry. Springer-Verlag, 2003.
[101] D. Reißner, R. Conforti, M. Dumas, M. La Rosa, and A. Armas-Cervantes. Scal-
able conformance checking of business processes. In OTM Confederated Inter-
national Conferences” On the Move to Meaningful Internet Systems”, pages
607–627. Springer, 2017.
[102] M. R. Reynolds Jr and J. C. Arnold. Ewma control charts with variable sample
sizes and variable sampling intervals. IIE transactions, 33(6):511–530, 2001.
[103] S. Roberts. Control chart tests based on geometric moving averages. Techno-
metrics, 1(3):239–250, 1959.
[104] G. J. Ross, N. M. Adams, D. K. Tasoulis, and D. J. Hand. Exponentially
weighted moving average charts for detecting concept drift. Pattern recogni-
tion letters, 33(2):191–198, 2012.
[105] H. Schonenberg, R. Mans, N. Russell, N. Mulyar, and W. van der Aalst. Process
flexibility: A survey of contemporary approaches. In Advances in enterprise
engineering I, pages 16–30. Springer, 2008.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 160
BIBLIOGRAPHY
[106] D. W. Scott. Multivariate density estimation: theory, practice, and visualization,
volume 383. John Wiley & Sons, 2009.
[107] R. Sebastiao and J. Gama. Change detection in learning histograms from data
streams. In Portuguese Conference on Artificial Intelligence, pages 112–123.
Springer, 2007.
[108] D. Shasha, J.-L. Wang, K. Zhang, and F. Y. Shih. Exact and approximate algo-
rithms for unordered tree matching. IEEE Transactions on Systems, Man, and
Cybernetics, 24(4):668–678, 1994.
[109] W. A. Shewhart. Economic control of quality of manufactured product. ASQ
Quality Press, 1931.
[110] A. N. Shiryaev. On optimum methods in quickest detection problems. Theory
of Probability & Its Applications, 8(1):22–46, 1963.
[111] A. Shiryaev. The problem of the most rapid detection of a disturbance in a
stationary process. In Soviet Math. Dokl, volume 2, 1961.
[112] A. Shiryaev. On stochastic models and optimal methods in the quickest detec-
tion problems. Theory of Probability & Its Applications, 53(3):385–401, 2009.
[113] K.-C. Tai. The tree-to-tree correction problem. Journal of the ACM (JACM),
26(3):422–433, 1979.
[114] A. Tsymbal. The problem of concept drift: definitions and related work. Com-
puter Science Department, Trinity College Dublin, 106(2), 2004.
[115] N. R. van Beest, M. Dumas, L. Garcıa-Banuelos, and M. La Rosa. Log delta
analysis: Interpretable differencing of business process event logs. In Proc. of
BPM. Springer, 2015.
[116] W. Van Der Aalst. Process mining: discovery, conformance and enhancement
of business processes. Springer Science & Business Media, 2011.
[117] W. Van der Aalst, A. Adriansyah, and B. van Dongen. Replaying history on pro-
cess models for conformance checking and performance analysis. Wiley Inter-
disciplinary Reviews: Data Mining and Knowledge Discovery, 2(2):182–192,
2012.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 161
BIBLIOGRAPHY
[118] W. Van der Aalst, T. Weijters, and L. Maruster. Workflow mining: Discovering
process models from event logs. IEEE Transactions on Knowledge & Data
Engineering, (9):1128–1142, 2004.
[119] W. M. P. van der Aalst. Process Mining: Discovery, Conformance and Enhance-
ment of Business Processes. Springer, 2011.
[120] W. M. Van der Aalst. The application of petri nets to workflow management.
Journal of circuits, systems, and computers, 8(01):21–66, 1998.
[121] W. M. van Der Aalst, M. Pesic, and H. Schonenberg. Declarative workflows:
Balancing between flexibility and support. Computer Science-Research and
Development, 23(2):99–113, 2009.
[122] S. J. van Zelst, M. F. Sani, A. Ostovar, R. Conforti, and M. La Rosa. Filtering
spurious events from event streams of business processes. 2018.
[123] S. K. vanden Broucke, J. Munoz-Gama, J. Carmona, B. Baesens, and J. Van-
thienen. Event-based real-time decomposed conformance analysis. In OTM
Confederated International Conferences” On the Move to Meaningful Internet
Systems”, pages 345–363. Springer, 2014.
[124] E. Verbeek, J. C. A. M. Buijs, B. F. van Dongen, and W. M. P. van der Aalst.
Prom 6: The process mining toolkit. In M. L. Rosa, editor, Proceedings of the
Business Process Management 2010 Demonstration Track, Hoboken, NJ, USA,
September 14-16, 2010, volume 615 of CEUR Workshop Proceedings. CEUR-
WS.org, 2010.
[125] R. H. Von Alan, S. T. March, J. Park, and S. Ram. Design science in information
systems research. MIS quarterly, 28(1):75–105, 2004.
[126] P. Vorburger and A. Bernstein. Entropy-based concept shift detection. In Data
Mining, 2006. ICDM’06. Sixth International Conference on, pages 1113–1118.
IEEE, 2006.
[127] A. Wald. Sequential analysis. Courier Corporation, 1973.
[128] G. I. Webb, R. Hyde, H. Cao, H. L. Nguyen, and F. Petitjean. Characterizing
concept drift. Data Mining and Knowledge Discovery, 2016.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 162
BIBLIOGRAPHY
[129] B. Weber, M. Reichert, and S. Rinderle-Ma. Change patterns and change sup-
port features–enhancing flexibility in process-aware information systems. DKE,
2008.
[130] A. Weijters, W. M. van Der Aalst, and A. A. De Medeiros. Process mining with
the heuristics miner-algorithm. Technische Universiteit Eindhoven, Tech. Rep.
WP, 166:1–34, 2006.
[131] M. Weske. Business process management architectures. In Business Process
Management, pages 333–371. Springer, 2012.
[132] G. Widmer and M. Kubat. Learning in the presence of concept drift and hidden
contexts. Machine learning, 23(1):69–101, 1996.
[133] Z. Wu, M. Yang, M. B. Khoo, and P. Castagliola. What are the best sample sizes
for the xbar and cusum charts? International Journal of Production Economics,
131(2):650–662, 2011.
[134] F. Zhang and E. D’Hollander. Using hammock graphs to structure programs.
IEEE Transactions on Software Engineering, 30(4):231–245, 2004.
[135] K. Zhang. A constrained edit distance between unordered labeled trees. Algo-
rithmica, 15(3):205–222, 1996.
[136] K. Zhang and T. Jiang. Some max snp-hard results concerning unordered la-
beled trees. Information Processing Letters, 49(5):249–254, 1994.
[137] K. Zhang and D. Shasha. Simple fast algorithms for the editing distance be-
tween trees and related problems. SIAM journal on computing, 18(6):1245–
1262, 1989.
[138] K. Zhang, R. Statman, and D. Shasha. On the editing distance between un-
ordered labeled trees. Information processing letters, 42(3):133–139, 1992.
[139] I. Zliobaite, A. Bifet, M. Gaber, B. Gabrys, J. Gama, L. Minku, and K. Musial.
Next challenges for adaptive learning systems. ACM SIGKDD Explorations
Newsletter, 14(1):48–55, 2012.
PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 163