business process drift: detection and...

Business Process Drift:Detection and Characterization

Alireza Ostovar

BSc. (Software engineering), MSc. (Software engineering)

A dissertation submitted for the degree of

IF49 Doctor of Philosophy

Principal Supervisor:

Prof. Marcello La Rosa (The University of Melbourne)

Associate Supervisors:

Prof. Arthur ter Hofstede (Queensland University of Technology),

Dr. Abderrahmane Maaradji (Algiers University 1)

Business Process Management Discipline

Information Systems School

Science and Engineering Faculty

Queensland University of Technology (QUT)

GPO Box 2434, Brisbane QLD 4001, Australia

2019

Keywords

Business process management, process mining, event log, event stream, process drift,

concept drift, data mining.

i

Abstract

Business processes tend to evolve in response to changes in the business environment

in which they operate. For example, these can be changes in regulations, competi-

tion, supply, demand and technological capabilities as well as internal changes in re-

source capacity or workload, or simply changes due to seasonal factors. Some process

changes are planned ahead and documented, while others may occur unexpectedly and

remain unnoticed. For example, this may be the case of changes induced by the ini-

tiative of individual process workers in order to adjust to variations in workload or in

resource capacity, changes engendered by replacement of human resources, changes in

the frequency of certain types of (problematic) cases, or exceptions that in some cases

lead to new workarounds that over time solidify into norms. Over time, undocumented

process changes like those described above may affect process performance, and more

generally hamper process improvement initiatives.

The objective of this research is to develop a set of methods for the early detection

and characterization of process drifts, i.e. statistically significant changes in the be-

havior of business processes, as recorded in event streams. The main contributions of

this research are: i) an automated method for detecting process drifts at real-time from

event streams; ii) an automated method for characterizing process drifts at the level of

individual activities from event streams; and iii) an automated method for characteriz-

ing process drifts at the level of fragments from event streams.

Early detection and subsequent characterization of process drifts allow organiza-

tions to take prompt remedial actions and avoid potential repercussions resulting from

unplanned changes in the behavior of their processes. The methods devised in this

research have been implemented as a plug-in for the state-of-the-art, open-source pro-

cess analytics platform Apromore. Using this implementation, the proposed methods

have been extensively evaluated by conducting experiments with artificial and real-life

data sets.

iii

Contents

Keywords i

Abstract iii

List of Figures ix

List of Tables xiii

List of Abbreviations xv

Statement of Original Authorship xv

Acknowledgments xvii

1 Introduction 11.1 Problem Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.2 Solution Criteria . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.3 Research Benefits . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Research Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Research Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5 Research Publications . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Background 152.1 Business Process Management . . . . . . . . . . . . . . . . . . . . . 15

2.1.1 Business Process Model and Notation (BPMN) . . . . . . . . 16

2.1.2 Petri nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

v

CONTENTS

2.2 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.1 Concept Drift Detection . . . . . . . . . . . . . . . . . . . . 19

2.2.2 Taxonomy of Concept Drift Detection Mechanisms . . . . . . 20

2.2.3 Concept Drift Characterization . . . . . . . . . . . . . . . . . 22

2.3 Process Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3.1 Event Log and Event Stream . . . . . . . . . . . . . . . . . . 24

2.3.2 Business Process Drift . . . . . . . . . . . . . . . . . . . . . 25

3 Process Drift Detection 293.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Drift Detection Method . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.1 Intra-trace vs Inter-trace . . . . . . . . . . . . . . . . . . . . 33

3.2.2 α+ Relations . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.3 Statistical Testing over Event Streams . . . . . . . . . . . . . 34

3.2.4 Adaptive Window . . . . . . . . . . . . . . . . . . . . . . . 35

3.2.5 Noise handling . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3 Tool Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.4 Evaluation on Artificial Logs . . . . . . . . . . . . . . . . . . . . . . 38

3.4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4.2 Execution Times . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4.3 Impact of Oscillation Filter . . . . . . . . . . . . . . . . . . . 43

3.4.4 Inter-drift Distance . . . . . . . . . . . . . . . . . . . . . . . 44

3.4.5 Comparison with Baseline per Process Change Pattern . . . . 44

3.4.6 Comparison with Baseline over Different Log Variability Rates 45

3.5 Evaluation on Real-life Log . . . . . . . . . . . . . . . . . . . . . . . 46

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4 Process Drift Characterization at Activity Level 494.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2 Drift Characterization Method . . . . . . . . . . . . . . . . . . . . . 51

4.2.1 Preprocessing: Data Points Extraction . . . . . . . . . . . . . 52

4.2.2 Stage 1: Relevant Binary Relations Retrieval and Ordering . . 54

4.2.3 Stage 2: Change Templates Identification . . . . . . . . . . . 55

4.3 Tool Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62


4.4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE VI

CONTENTS

4.4.2 Impact of Characterization Delay on Relations Ordering . . . 65

4.4.3 Impact of Relation Filtering on Characterization Accuracy . . 66

4.4.4 Comparison with Baseline . . . . . . . . . . . . . . . . . . . 67

4.5 Evaluation on Real-life Log . . . . . . . . . . . . . . . . . . . . . . . 68

4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 Process Drift Characterization at Fragment Level 715.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.3 Partial Traces and Process Tree Discovery . . . . . . . . . . . . . . . 79

5.3.1 Detecting Partial Traces . . . . . . . . . . . . . . . . . . . . 80

5.3.2 Discovering Process Models from Partial Traces . . . . . . . 81

5.4 Process Tree Transformation . . . . . . . . . . . . . . . . . . . . . . 83

5.4.1 Process Tree Edit Operations . . . . . . . . . . . . . . . . . . 84

5.4.2 Finding Process Tree Mappings & Lower Bounding Function 101

5.5 Construct Drift Characterization Statements . . . . . . . . . . . . . . 111

5.5.1 Simple Change Patterns . . . . . . . . . . . . . . . . . . . . 112

5.5.2 Compound Change Patterns . . . . . . . . . . . . . . . . . . 113

5.5.3 Nested Changes . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.5.4 Unsupported patterns . . . . . . . . . . . . . . . . . . . . . . 119

5.6 Tool Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119


5.7.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

5.7.2 Accuracy of Drift Characterization: Fragment-based vs Activity-

based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

5.7.3 Verbalization Conciseness: Fragment-based vs Activity-based 128

5.7.4 Verbalization Conciseness: Exhaustive vs Greedy . . . . . . . 130

5.7.5 Time Perfromance . . . . . . . . . . . . . . . . . . . . . . . 132

5.8 Evaluation on Real-life Logs . . . . . . . . . . . . . . . . . . . . . . 134

5.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

5.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

6 Conclusion 143

A. Notation 149

Bibliography 151

PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE VII

List of Figures

2.1 BPM lifecycle [32]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Subset of core BPMN elements. . . . . . . . . . . . . . . . . . . . . 17

2.3 Core Petri net elements. . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Mapping of activities, events and gateways to Petri nets. . . . . . . . . 18

2.5 Quality metrics for process discovery algorithms [15]. . . . . . . . . . 24

2.6 Visual example of a small portion of an event stream. Each square box

represents an event. Case ids are color-coded (i.e. each case id has a

unique background color) and labels in boxes indicate activity labels.

The top row of events represents the entire event stream portion, the

remaining rows show the individual cases constituting the stream. . . 25

2.7 Example of a directly follows graph. . . . . . . . . . . . . . . . . . . 26

2.8 Different classes of drifts. Y-axes indicate process variants and blue

rectangles represent process instances. . . . . . . . . . . . . . . . . . 28

3.1 Drift detection using ProDrift plug-in within Apromore. . . . . . . . . 39

3.2 Artificial process model created in CPN tools, used as a base model to

simulate the artificial event logs. . . . . . . . . . . . . . . . . . . . . 41

3.3 F-score and mean delay using different oscillation filter values. . . . . 43

3.4 F-score and mean delay using different inter-drift distances. . . . . . . 43

3.5 F-score and mean delay per change pattern, obtained with our method

vs. [77]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.6 F-score and mean delay per log variability, obtained with our method

vs. [77]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.7 P-value in our method (left) and in the baseline (right) for the BPIC

2011 log. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.8 Number of events (left) and active cases per month (right) in the BPIC

2011 log. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

ix

LIST OF FIGURES

4.1 Overview of our method for process drift characterization. . . . . . . 52

4.2 From drift detection to drift characterization. . . . . . . . . . . . . . . 53

4.3 Parallelize activities template (T pl) . . . . . . . . . . . . . . . . . . . 57

4.4 Remove activity template (T sre) . . . . . . . . . . . . . . . . . . . . 57

4.5 Drift characterization at activity level using the ProDrift plug-in in

Apromore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.6 Impact of characterization delay on relevant relations retrieval and or-

dering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.7 Impact of relation filtering on characterization accuracy . . . . . . . . 67

4.8 F-score per change template, obtained with our method vs. [115]. . . 68

4.9 Identified template for Drift 1 in BPIC 2011 log. . . . . . . . . . . . . 69

5.1 Overview of our method for process drift characterization. . . . . . . 79

5.2 Example of partial traces. In the window, some traces are observed

partially, as they start and/or end outside of the window. In our exam-

ple, the first and the last trace are only partially observed. . . . . . . . 80

5.3 Two directly follows graphs for L1. . . . . . . . . . . . . . . . . . . . 82

5.4 Examples of process tree edit operations. . . . . . . . . . . . . . . . . 89

5.5 Sample mapping between process trees P and P′. . . . . . . . . . . . 92

5.6 Examples of deleted and inserted fragments in a mapping between

process trees P and P′. Fragment 1 is a maximal deleted fragment,

whereas Fragment 2 is a maximal inserted fragment. . . . . . . . . . 95

5.7 Sample auxiliary operator node, i.e. the ∧-node 2 in P′, and sample

auxiliary τ-node, i.e. the τ-node 5 in P, in a mapping between process

trees P and P′. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.8 Examples of trivial operator nodes in mappings. . . . . . . . . . . . . 97

5.9 Sample mapping that satisfies condition 1. . . . . . . . . . . . . . . . 99

5.10 Sample invalid mapping that does not satisfy condition 2. . . . . . . . 99

5.11 Sample mapping that satisfies condition 3. . . . . . . . . . . . . . . . 100

5.12 Process trees P and P′ (a) and their mapping search tree (b) in Exam-

ple 22. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.13 Process trees P and P′ in Example 23. . . . . . . . . . . . . . . . . . 106

5.14 The running example of the A* algorithm for the process trees P and

P′ in Figure 5.13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.15 Examples of transforming a process tree P into a process tree P′ by the

application of simple changes. . . . . . . . . . . . . . . . . . . . . . 113

PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE X

LIST OF FIGURES

5.16 Examples of transforming a process tree P into a process tree P′ by the

application of compound changes. . . . . . . . . . . . . . . . . . . . 116

5.17 Example of transforming a process tree P into a process tree P′ by the

application of overlapping changes. . . . . . . . . . . . . . . . . . . 117

5.18 Example of transforming a process tree P into a process tree P′ by the

application of nested changes. . . . . . . . . . . . . . . . . . . . . . 117

5.19 Example of synchronizing two fragments change pattern. Activity ‘b’

and activity ‘c’ are synchronized. . . . . . . . . . . . . . . . . . . . 120

5.20 Drift characterization at fragment level using the ProDrift plug-in in

Apromore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

5.21 Average F-score over all logs with different noise ratios per fragment

size, obtained with the fragment-based characterization method vs. the

activity-based characterization method . . . . . . . . . . . . . . . . . 126

5.22 Average F-score over all fragment sizes per single, composite, and

nested change pattern, obtained with the fragment-based characteri-

zation method vs. the activity-based characterization method. . . . . . 127

5.23 Average F-score for singleton fragments per single, composite and

nested change pattern, obtained with the fragment-based characteri-

zation method vs the activity-based characterization method. . . . . . 129

5.24 Average number of statements over all fragment sizes required by our

fragment-based characterization method vs. our activity-based charac-

terization method for characterizing each change pattern. . . . . . . . 130

5.25 Two sample transformations of process tree P into process tree P′. . . 131

5.26 Average number of statements over all fragment sizes per change pat-

tern reported by our fragment-based characterization method using the

A* algorithm vs the greedy algorithm. . . . . . . . . . . . . . . . . . 133

5.27 Transformation of pre-drift process tree to post-drift process tree over

the first drift in the ticketing management process. . . . . . . . . . . . 135

5.28 Directly follows graphs of the ticketing management process before

and after the first drift. . . . . . . . . . . . . . . . . . . . . . . . . . 135


the second drift in the ticketing management process. . . . . . . . . . 136

5.30 Directly follows graphs of ticketing management process before and

after the second drift. . . . . . . . . . . . . . . . . . . . . . . . . . . 137

PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE XI

LIST OF FIGURES


the drift in the claim handling process. . . . . . . . . . . . . . . . . . 138

5.32 Example of drift characterization in a partially unstructured process. . 140

PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE XII

List of Tables

2.1 Common control-flow change patterns in business processes from [129]. 27

3.1 Change patterns from [129] . . . . . . . . . . . . . . . . . . . . . . . 42

4.1 Change templates defined based on change patterns in [129]. . . . . . 55

4.2 Change templates and their drift characterization statement formats. . 61

5.1 Change patterns from [129] and their relation to our process tree edit

operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.2 Costs associated with the process tree edit operations. . . . . . . . . . 90

5.3 Process tree edit operations (cf. Section 16) and their representations

in a mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.4 Change patterns from [129] and their drift characterization statement

formats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

xiii

List of Abbreviations

BPM – Business Process Management

WfM – Workflow Management

IM – Inductive Miner

SESE – Single-entry Single-exit

BPMN – Business Process Model and Notation

EPC – Event-driven Process Chain

UML – Unified Modeling Language

CPN – Colored Petri Nets

SPRT – Sequential Probability Ratio Test

CUSUM – Cumulative Sum

PH – Page-Hinkley est

EWMA – Exponentially Weighted Moving Average

SPC – Statistical Process Control

KSPT – K-sample Permutation Test

RFC – Relative Frequency Change

TRFC – Total Relative Frequency Change

CRFC – Cumulative Relative Frequency Change

DCG – Discounted Cumulative Gain

nDCG – Normalized Discounted Cumulative Gain

nC – Normalized Confidence

CA – Common Ancestor

LCA – Lowest Common Ancestor

LMAs – Lowest Mapped Ancestors

xv

LIST OF TABLES

PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE XVI

Acknowledgments

This research is a part of and funded by the “Improving Business Decision-Making

via Liquid Process Model Collections” project (ARC DP150103356). We used an

anonymised data set from an Australian insurance company in our experiments, and

validated the obtained results by conducting an interview with a process analyst from

the same company, for which ethical clearance was obtained (approval number: 1800000366).

xvii

Chapter 1

Introduction

1.1 Problem Area

Business processes are generally supported by information systems, such as Enterprise

Resource Planning systems, Content Management Systems, Customer Relationship

Management systems, Database Management Systems or e-mail servers, that record

data about each individual execution of a process in the form of event logs [116]. Pro-

cess mining [119] aims at turning such event data into valuable, actionable knowledge,

so that process performance or compliance issues can be identified and rectified. Sev-

eral process mining techniques are available. For example, techniques for discovering

a process model from a historical event log, or techniques for predicting future prop-

erties, e.g. remaining time or outcome, of ongoing process cases from a live stream of

events.

Business processes are prone to evolution in response to various types of changes

in the business environment in which they operate. For example, these can be changes

in regulations, competition, supply, demand and technological capabilities, as well as

internal changes in resource capacity or workload, or simply changes in seasonal fac-

tors. It is clear that the success of an organization highly relies on its ability to promptly

and effectively respond to its changing business environment. Therefore, flexibility

and change have been widely studied in the context of Business Process Management

(BPM) [99, 131, 92, 42, 44, 105] and Workflow Management (WfM) [49, 49, 121,

3, 4, 100]. State-of-the-art BPM and WfM systems provide flexibility. Furthermore,

there is even more flexibility in processes controlled by people than those driven by

BPM/WFM systems.

Some process changes are intentional and planned ahead, while others may occur

1

CHAPTER 1. INTRODUCTION

without being noticed or documented. For example, this may be the case of changes

resulting from ad-hoc workarounds initiated by individuals in emergency situations,

changes that are due to the replacement of human resources, or exceptions that in

some cases give rise to new workarounds that over time turn into norms. Over time,

these hidden changes may negatively impact process performance, and more generally

hinder process improvement initiatives.

In this setting, process analysts and managers require methods and tools that allow

them to detect process changes and pinpoint the time periods at which they occurred

as early as possible. Business process drift detection [18, 1, 12, 82, 77] is a family of

process mining techniques which aim at detecting changes based on observations of

business process executions recorded in event logs. Event logs consist of traces, each

representing one execution of the business process. The term “drift” originates from

the concept drift phenomenon in data mining, where it refers to changes in the relation

between the input and the target variables induced by contextual shifts over time [39].

Accordingly, a business process drift is defined as a (statistically) significant change in

the process behavior (concept) [11, 77].

To successfully implement a process improvement initiative, following the detec-

tion of a drift, it is equally important to understand what has changed in the process

behavior, a.k.a drift characterization. The latter aims to pinpoint the location of the

change in the process as well as provide explanations on the manner in which the

change has occurred. As examples of drift characterization in the context of a loan

application process, “activity ‘verify repayment agreement’ that was always executed

before the drift is sometimes skipped after the drift”, or “process fragment ‘Check

credit history’ followed by ‘assess loan risk’ and activity ‘appraise property’ that were

executed in parallel before the drift are executed sequentially after the drift”.

Drift detection and characterization provide the basis for further qualitative and

quantitative process analysis, e.g. root cause analysis or flow analysis, and more gen-

erally may contribute to the success of a process improvement initiative. For example,

early awareness of unexpected process changes enables an organization to take timely

corrective measures and avoid potential repercussions resulting from such changes.

Drift detection and characterization also work as an enabler for other process mining

techniques, as they enable them to select the last “stable” process behavior since the

last drift. An example of the latter is the case of predictive process monitoring tech-

niques whose accuracy is highly dependent on the currency of their underlying predic-

tive models which may be impaired by the occurrence of a drift. This can be avoided

PHD THESIS - c© 2019 ALIREZA OSTOVAR - PAGE 2

1.2. PROBLEM STATEMENT

by detecting process drifts and subsequently updating predictive models underlying

these techniques with process changes [81] underpinning the drift.

1.2 Problem Statement

State-of-the-art methods in the area of process drift detection suffer from two major

limitations. First, they do not work in online settings with streams of events that in-

crementally record the executions of a business process. As such, they are designed to

detect inter-trace drifts only, i.e. drifts that occur between complete process executions

(traces), as recorded in event logs. For example, a new legislation requires an insurance

company to perform a more stringent verification on new claims, while old claims are

exempted. Even if some approaches work in online settings (e.g. [77]), they still deal

with streams of complete traces or abstractions thereof. However, process drift may

also occur during the execution of a process, and may impact ongoing executions. For

example, an insurance check may need to be removed altogether due to a contingency

plan triggered by severe weather conditions (e.g. a flood). As such, detecting process

changes as they are happening would enable organizations to take quick remedial ac-

tions and prevent or mitigate repercussions. Existing methods either do not detect such

intra-trace drifts, or detect them with a long delay, as they need to wait for the trace to

complete. A related problem is that they do not perform well with highly-variable pro-

cesses, i.e. processes whose logs exhibit a high number of distinct traces over the total

number of traces – a typical characteristic of healthcare logs. This is because these

methods rely on statistical tests over trace distributions, which may not have sufficient

data samples when the proportion of distinct traces over the total number of traces is

very high, in other words, where there is high variability in the log.

Moreover, the existing methods for drift detection only focus on detecting and

pinpointing the location of a drift. However, detection and localization of a process

drift does not provide, per se, enough insights to undertake a process improvement

initiative, unless the drift is characterized, i.e. unless one can understand what has

changed in the process behavior. To the best of our knowledge, there has not been any

attempt to provide a systematic solution for characterizing process drifts. However,

there are a few ways to approach a drift characterization problem. A possible approach

to characterizing drifts at the level of individual activities is to compare sub-logs from

before and after a drift using an existing log-to-log comparison technique, such as the

one in [115]. However, these techniques report all differences between the pre-drift



and post-drift process behaviors regardless of the significance of their association with

the occurrence of the drift. Furthermore, these techniques are designed to compare

logs of complete traces. As such, they do not work with event streams, where a sub-

log from before or after a drift contains partial traces, i.e. traces whose start events

are deleted from the stream and/or whose end events are yet to arrive on the stream.

Given the start and end activities of a process one possible workaround is to only use

complete traces within the pre-drift and post-drift sub-logs. However, this may lead

to an incomplete or even inaccurate comparison as the sub-logs may miss fractions of

process behavior that are only captured by the discarded partial traces. This problem

is exacerbated in the logs of highly-variable processes, as in those logs almost every

trace exhibits a unique execution of the process.

Another challenge for drift characterization is to identify changes that impact larger

fragments of activities, e.g. deletion of a fragment of concurrent activities from the

process, or addition of a loop structure over a fragment of mutually exclusive activi-

ties. A possible approach here would be to abstract from low-level relations between

activities and discover process models from before and after a drift and identify their

differences by comparing them using a process model comparison technique, such as

the one in [8]. However, such techniques are designed to identify differences between

two process models at the level of individual activities. Consequently, when used to

characterize process changes at the level of fragments they tend to report a large num-

ber of activity-level differences, which are often difficult to track or understand. For

example, for a simple fragment-level change, where we parallelize two sequential frag-

ments, each consisting of 4 activities, the method by [8] reports 16 differences, each

describing the parallelization of two activities.

1.2.1 Research Questions

Based on the identified research problems in the previous section, we define three

research questions:

RQ1) How to detect a drift from an event stream or event log of a business process?

RQ2) How to characterize a process drift at activity level from an event stream or

event log of a business process?

RQ3) How to characterize a process drift at fragment level from an event stream or

event log of a business process?



1.2.2 Solution Criteria

To address the research questions defined in the previous section, we develop one drift

detection and two drift characterization methods. The drift detection method resulting

from this research will be evaluated using the following criteria:

• Accuracy

– Recall: The method should be able to find a high percentage of existing

drifts in the event stream [39]. Furthermore, the method should be capable

of detecting drifts caused by the application of all typical business process

change patterns [129], e.g insertion, deletion, move, swap, etc. This mea-

sure can be computed on artificial event streams where drifts are known.

– Precision: The method should not identify any false drifts [39]. In other

words, it should be able to distinguish momentary changes, i.e. deviants,

in the behavior of processes from those of a more permanent nature, i.e.

process drifts. This measure describes the resilience of the method and can

be computed either on artificial event streams where the drifts are known

or on real-life event streams that have no drifts, in which case all detected

drifts are counted as false drifts.

• Detection delay: The method should be able to find drifts as early as possible in

the event stream [39], i.e. using only a small number of events from the post-drift

process behavior. This measure describes how much time would elapse before a

drift is detected and is usually expressed as an average value. It can be computed

on artificial event streams where the actual location of the drifts are known.

• Real-time: The method should be able to find drifts in real time [139]. Our goal

is to create an online method that constantly monitors an event stream recording

the behavior of a process within an organization and detects process drifts.

• Automated: The method should be able to identify drifts with no manual inter-

vention [139]. As the method needs to be online it should not need any configu-

ration at any time.

The drift characterization methods resulting from this research will be evaluated

using the following criteria:



• Accuracy: After a drift is detected it should be characterized, so that it provides

process stakeholders with a full picture of the change that has occurred. How-

ever, an incorrect identification of the change underpinning the drift does not

only confuse the users, but later on, it could also lead to incorrect process im-

provement decisions. An example of an inaccurate drift characterization could

be identifying the addition of a loop structure where in fact an activity is dupli-

cated.

• Completeness: Sometimes a process undergoes multiple different changes. A

complete drift characterization outlines all the occurred changes in a concise and

orthogonal manner. In other words, each change should be described only once

and any two change descriptions should not overlap.

• Understandability: A drift can be characterized by outputting traces before and

after the drift point. However, this does not provide any useful information about

the underlying changes to the stakeholders of the process. A way of character-

izing a drift such that it can be understood by users such as a process analyst

would be to visualize the change on top of the process models before and after

the drift. Alternatively, a change may be described through explanatory state-

ments in natural language, e.g statements like “activity a has been removed from

the process after the drift”.

• Automated: The method should be able to characterize drifts with no manual

intervention. As drifts often occur as a result of unplanned process changes,

drift characterization methods should not rely on the knowledge of the user to

characterize drifts.

1.2.3 Research Benefits

Detection and characterization of a drift helps an organization to identify and act upon

unplanned changes that may negatively impact the performance of their processes. As

an example, a rise in the frequency of certain (problematic) cases may lead to the cre-

ation of bottlenecks in the flow of cases through an insurance claim handling process,

eventuating in performance decline and customer dissatisfaction. Early detection of

process changes can thus be used to alert managers of the process issues, and subse-

quent characterization of the changes can assist them to adopt appropriate measures,

and hence avoid such negative outcomes.



Drift detection and characterization can also help to maintain the currency of pro-

cess models within an organization. Process models help to gain a thorough under-

standing of business processes which is a prerequisite to conduct successful process

analysis, redesign or automation [24]. Therefore, process models are the basis for

many critical decisions within an organization. Hence it is necessary that they accu-

rately reflect the “real-world” processes, i.e. the way processes are currently executed

in the organization.

Organizations involved in BPM programs typically collect hundreds of process

models over time [64]. These models tend to become out-of-synch with current pro-

cesses, as the frequency of real-world process changes is such that continuous process

model updates are not cost-effective. This misalignment severely limits the value that

organizations can obtain from process models. After a drift is detected and its under-

lying process changes are characterized, existing process models can be repaired [14,

34, 21, 96] by incorporating the identified changes, so that they reflect processes as

they are executed in the organization. These up-to-date process models can then be the

basis of subsequent process performance improvement initiatives.

Drift detection and characterization also work as an enabler for other process min-

ing techniques, as they enable them to select the last “stable” process behavior since

the last drift. Most process mining techniques assume processes to be in steady state.

For example, process discovery techniques extract process models based on all traces

in an event log, assuming that all traces are produced from the current actual process.

Consequently, the resulting process models are often so-called “spaghetti models”, i.e.

very complex and mostly useless. However, by detecting drifts inside an event log and

discovering process models based on the most-recent behavior of the process after the

last drift, we can obtain the actual process models as they are currently executed within

the organization [16]. Another example is the case of predictive business process mon-

itoring techniques whose performance suffers from the occurrence of a drift, as the

predictive models underlying these methods are trained based on the old pre-drift pro-

cess behavior. This problem can be avoided by detecting drifts and incorporating the

characterized changes underpinning the drifts in their predictive models [81].

1.2.4 Contributions

This thesis provides the following contributions to the state of the art.

• A method for detecting process drifts. We propose an automated statistically-

grounded method for real-time drift detection from event streams of business



processes. The proposed method is capable of detecting intra-trace as well as

inter-trace drifts. Furthermore, the selected features to capture the process be-

havior are such that it can also detect drifts from event streams of highly-variable

business processes. Finally, by replaying an event log as an event stream, the

method can also be used to detect drifts from an event log of complete traces.

• A method for characterizing process drifts at activity level. By building upon our

drift detection method, we propose an automated statistically-grounded method

for real-time characterization of process drifts at the level of individual activities

from event streams of business processes. In a similar way as for the drift de-

tection method, the proposed drift characterization method can also be used to

characterize drifts detected from an event log of complete traces. The method

reports the identified process changes over a drift as natural language statements

constructed based on typical business process change patterns [129]. To the best

of our knowledge, this is the first method proposed in the context of process drift

characterization.

• A method for characterizing process drifts at fragment level. We propose an

automated method for characterizing process drifts at the level of fragments from

event streams of business processes. In doing so, we adapt a state-of-the-art

process discovery technique, Inductive Miner (IM), to discover process trees,

i.e. block-structured process models, from the event streams before and after a

drift. We also propose the first process tree transformation technique that finds

a minimum cost sequence of edit operations to transform a pre-drift process

tree into a post-drift process tree. The method reports the identified process

changes over a drift as natural language statements constructed based on typical

business process change patterns [129]. Furthermore, the proposed method can

characterize process drifts detected from an event log of complete traces, and can

also be used on top of any process drift detection technique. To the best of our

knowledge, this is the first method proposed for characterizing drifts at fragment

level.

• A process drift detection and characterization tool The proposed drift detec-

tion and characterization methods have been implemented as a standalone open-

source tool called ProDrift,1 as well as a plug-in for the open-source process

1Available at http://apromore.org/platform/tools


1.3. RESEARCH APPROACH

analytics platform Apromore [65].2

1.3 Research Approach

This research project follows a design science research method [125]. Design science

provides seven guidelines describing characteristics of well-conducted research. De-

sign science research must address an important and relevant business problem and

must lead to an innovative artifact that proposes a more effective approach to a prob-

lem or provides a solution to an unsolved problem. The artifact must be rigorously

evaluated in order to ensure its utility for the addressed problem. The research must

present verifiable contributions and rigor must be applied in both the construction and

the evaluation of the artifact. The proposed solutions must be the result of exploring

existing knowledge and utilizing available means, and must be effectively communi-

cated to appropriate audiences.

The purpose of this research project is to devise a set of methods for detecting

and characterizing business process drifts. As explained in the previous sections, early

detection and characterization of a drift may contribute to the success of a process

improvement initiative in several ways. For example, by identifying undocumented

process changes which may over time negatively impact the process performance, or

by helping to maintain existing process models, i.e. expensive assets of any organiza-

tion, up-to-date, or by improving the performance of other process mining solutions,

e.g. predictive process monitoring techniques, drift detection and characterization con-

tribute to both industry and academic community. Accordingly, they have been the sub-

ject of several scientific papers [18, 1, 12, 82, 77]. The rigor of this research is ensured

by conducting an extensive literature review using well defined criteria for comparing

the various works available, by the use of formal methods, by the implementation of

the envisaged methods as open-source tools, by the use of well established languages

such as Petri nets and process trees, and thorough empirical evaluation of each method

using artificial and real-life datasets in various settings.

A number of techniques have been used for detecting business process drifts in the

literature, including statistical hypothesis testing [11, 77], trace clustering [1], confor-

mance checking using log abstraction [18] and process discovery [79, 17]. Performing

statistical tests over trace abstractions, proposed by Maaradji et al. [77], has proved

to be the most reliable solution in the process drift community. However, as outlined

2Available at http://apromore.org/



in Section 1.2, the existing techniques suffer from two major limitations: they do not

work in online setting with streams of events, and they do not perform well with event

logs of unpredictable business processes. To address these limitations, we present a

statistically grounded method for detecting drifts from event streams of business pro-

cesses. The method does not require any manual intervention and can scale up to the

extent that it can work in real-time. It is able to detect all typical change patterns (de-

scribed in [129]) with a minimum delay. We perform a non-parametric statistical test

on distributions of certain features vectors over two adjacent windows, namely refer-

ence and detection windows, moving on a stream of events. After exploring a few

different features including, Directly Follows relations (direct succession), Follows re-

lations (succession), Block Structures (extracted from process trees produced by the

Inductive Miner [71]), we observed that α+ Relations [25] are the suitable level of ab-

straction for capturing the behavior of unpredictable processes represented in an event

stream.

As explained in Section 1.2, to the best of our knowledge the drift characterization

problem has not been addressed in the literature. In this thesis, by building upon our

drift detection method we present a statistically-grounded method for characterizing

drifts from event streams of business processes. The method does not need any manual

intervention and in our extensive experiments proves that it is fast and can accurately

identify typical change patterns (described in [129]) applied to individual activities.

The input to the method are distributions of α+ relations from before and after a drift.

We use a statistical test to identify the relations that have significant association with

the occurrence of the drift. These relations are then mapped to a set of predefined

change templates, and the best-matching templates are translated into natural language

statements before being reported to the user.

We address the drift characterization problem in settings where process changes are

applied to larger fragments of activities. We present a fast, accurate, and noise-tolerant

method for characterizing typical change patterns [129] applied to single-entry single-

exit (SESE) fragments from event streams of business processes. We adapt a state-

of-the-art process discovery technique, namely Inductive Miner, to work with event

streams, and using which we discover two process trees from before and after a drift.

A process tree represents a sound block-structured process model, where each process

(sub)tree is a SESE process fragment. We define a set of fragment-based process

tree edit operations and their application costs based on the typical change patterns.

We then introduce a notion of process tree mapping through which we search for a


1.4. RESEARCH SCOPE

minimum-cost sequence of edit operations that transforms the pre-drift process tree to

the post-drift process tree. We present two search algorithms, an efficient A* and a fast

greedy, which, respectively, find and approximate the optimal solution. The identified

edit operations are then aggregated as much as possible and reported to the user as

natural language statements.

Finally, the methods illustrated in this thesis are implemented as a standalone tool,

and also as a plug-in for the Apromore platform. Apromore is an online open-source

business process analytics platform combining state-of-the-art process mining capa-

bilities with advanced functionality for managing process model collections. Apro-

more features a service-oriented architecture and provides a flexible plug-in frame-

work, which facilitates the seamless addition of new plug-ins. Apromore is the result

of a joint effort between several universities and since its inception has been adopted

in practice by several organizations.

1.4 Research Scope

Process drifts may be divided into four classes based on the form in which they man-

ifest themselves over time including, sudden, gradual, recurring, and incremental

drifts [39]. A sudden drift refers to a scenario where at a certain point in time a pro-

cess is substituted with a new process. A gradual drift, on the other hand, refers to a

scenario where changes are introduced in the process but the old process behavior is

also still allowed for some time and gradually fades out. A recurring drift refers to a

scenario where a set of processes are substituted back and forth with each other. Fi-

nally, an incremental drift refers to a scenario where changes are applied to a process

in smaller increments over a period of time. This thesis focuses on the detection and

characterization of sudden drifts, which can also be used as a starting point to detect

other classes of drifts, e.g. gradual drifts [82, 78].

A drift may occur in different perspectives of a business process, e.g. control-flow,

resource, data, etc. The changes enforced by operations such as insertion, deletion or

reordering of fragments are classified as control-flow changes. Resource perspective

changes include changes in resources, their roles and organizational structure, whereas

data perspective changes refer to the changes in the production and consumption of

data in business processes. In this thesis, we focus on changes related to the control-

flow behavior of a business process.

A process drift may involve changes to different types of process fragments such as,



multiple-entry multiple-exit (MEME) fragments and single-entry single-exit (SESE)

fragments. MEME fragments are more general and include other types of fragments,

e.g. SESE fragments, as a special case. In this thesis, where we characterize a drift

at fragment level, we capture pre-drift and post-drift process behaviors by discovering

process trees from before and after a drift. Each subtree within a process tree represents

a SESE process fragment. As such, the proposed method expresses any fragment-level

process changes in terms of SESE fragments. Note that by introducing gateways, it is

always possible to transform a MEME fragment into multiple SESE fragments [95].

The methods created in this project are extensively evaluated using both artificial

and real-life process models and event logs. However, experimenting the utility of the

final framework by conducting surveys and interviews, involving humans, is out of the

scope of this project.

1.5 Research Publications

In the course of this research project the following publications were produced:

• A. Maaradji, M. Dumas, M. La Rosa, and A. Ostovar. Fast and accurate busi-

ness process drift detection. In Proceedings of the 13th International Conference

on Business Process Management (BPM’15), volume 9253 of Lecture Notes in

Computer Science, pages 406-422. Springer, Cham, 2015.

• R. Conforti, M. Dumas, M. La Rosa, A. Maaradji, H.H. Nguyen, A. Ostovar,

and S. Raboczi. Analysis of business process variants in apromore. In Pro-

ceedings of the Demo Track of the 13th International Conference on Business

Process Management (BPM’15), volume 9253 of Lecture Notes in Computer

Science, pages 406-422. Springer, Cham, 2015.

• A. Ostovar, A. Maaradji, M. La Rosa, A.H.M. ter Hofstede, and B.F.V. van Don-

gen. Detecting Drift from Event Streams of Unpredictable Business Processes.

In Proceedings of the 35th International Conference on Conceptual Modeling

(ER’16), volume 9974 of Lecture Notes in Computer Science, pages 330-346.

Springer, Cham, 2016. (Chapter 3)

• A. Ostovar, A. Maaradji, M. La Rosa, and A.H.M. ter Hofstede. Character-

izing Drift from Event Streams of Business Processes. In Proceedings of the

29th International Conference on Advanced Information Systems Engineering


1.6. OUTLINE

(CAiSE’17), volume 10253 of Lecture Notes in Computer Science, pages 210-

228. Springer, Cham, 2017. (Chapter 4)

• A. Maaradji, M. Dumas, M. La Rosa, and A. Ostovar. Detecting Sudden and

Gradual Drifts in Business Processes from Execution Traces. In Journal of IEEE

Transactions on Knowledge and Data Engineering (TKDE’17), volume 29, issue

10, pages 2140-2154. IEEE, 2017.

• SJ. van Zelst, MF. Sani, A. Ostovar, R. Conforti, and M. La Rosa. Filtering

Spurious Events from Event Streams of Business Processes. In Proceedings of

the 30th International Conference on Advanced Information Systems Engineer-

ing (CAiSE’18), Lecture Notes in Computer Science. Springer, Cham, 2018.

(Received Distinguished Paper Award)

• A. Ostovar, S.J.J. Leemans, and M. La Rosa. Robust Drift Characterization

from Event Streams of Business Processes. Submitted to ACM Transactions on

Knowledge Discovery from Data (TKDD), 2018. Technical report in QUT ePrint

121158. (Chapter 5)

1.6 Outline

This thesis is organized as follows. Chapter 2 provides a background on BPM, data

mining, and process mining as well as a description of business process drift, its differ-

ent classes and perspectives. Chapters 3-5, first review, respectively, existing literature

on process drift detection, process drift characterization at the level of activities, and

process drift characterization at the level of fragments and then introduce a new method

for the same, followed by a description of its tool support and extensive evaluation. Fi-

nally, Chapter 6 concludes this thesis and discusses possible avenues for future work.


Chapter 2

Background

This chapter provides a background on business process management, data mining, and

process mining as well as a description of business process drift, its different classes

and perspectives.

2.1 Business Process Management

Business Process Management (BPM) is an established discipline dedicated to the

ways in which organizations identify, capture, analyze, improve, implement and mon-

itor their business processes [32]. BPM influences the effectiveness and efficiency of

a company and is a significant contributor to its overall performance and competitive-

ness. As business processes determine how an organization operates, what activities

need to be performed and what data and resources are required for their successful

execution, they are crucial to successfully achieve targets for a plethora of key perfor-

mance indicators.

BPM can be regarded as a continuous cycle comprising of multiple phases [32].

The BPM lifecycle is illustrated in Figure 2.1. The first phase in the BPM lifecycle is

the identification of the processes inside the organization. The outcome of this phase is

a collection of interrelated processes currently running in the organization. The second

phase is to use the information about the identified processes to build one or multiple

as-is process models. The as-is process model is then analyzed, in the third phase, for

exploring any issues that might affect the performance objectives of the organization.

These issues are then addressed in the fourth phase by modifying the as-is process

model leading to the to-be process model that meets the organization’s desired way

of functioning. The to-be process model is then implemented through organizational

15

CHAPTER 2. BACKGROUND

change management and/or process automation. The last phase consists of monitoring

the running to-be processes with respect to the main performance objectives according

to a set of performance measures. Any new issues found in this phase must be ad-

dressed by means of applying new corrections and modifications to the process. This

may require another iteration of the BPM lifecycle, and this iteration will take place

whenever process performance deviates from the intended objectives.

Process discovery

Process analysis

Process redesign

Process implementation

Process monitoring and

controlling

Conformance and performance insights

As-is process model

Insights on weaknesses and

their impact

To-be process model

Executable process model

Process indetification

Process architecture

Figure 2.1: BPM lifecycle [32].

Process drift detection and characterization, lie in the process monitoring and con-

trolling phase of the BPM lifecycle. Detection and subsequently characterization of a

drift may give rise to a new iteration through the BPM lifecycle which includes updat-

ing existing process models based on the characterized changes, performing qualitative

and quantitative process analysis, e.g. root cause analysis to identify the root cause of

the changes, redesigning the models, that support or discard the changes, and integrat-

ing the new improvements into the BPM system.

2.1.1 Business Process Model and Notation (BPMN)

There are several languages for modeling a business process, e.g. Business Process

Model and Notation (BPMN), Event-driven Process Chain (EPC) and Unified Model-

ing Language (UML). Nowadays, BPMN is a widely used standard for process mod-


2.1. BUSINESS PROCESS MANAGEMENT

eling [32]. Figure 2.2 shows a subset of the core elements of BPMN. The main events

in a BPMN diagram are the start and end events which represent the initiation and

termination of a process instance, respectively. An activity denotes a work to be per-

formed. A sequence flow represents the order among events, activities and gateways.

Finally, gateways are control-flow elements and control the splitting and merging of

paths. A splitting exclusive gateway only allows the activation of one of its outgoing

paths (based on a defined condition), and a merging exclusive gateway only enables the

execution of its outgoing path once one of its incoming paths is executed. Conversely,

a splitting parallel gateway activates the concurrent execution of all of its outgoing

paths, and a merging parallel gateway waits for all of its incoming paths to be fully

executed before enabling its outgoing path.

BPMN Petri net

Start event

End event Activity

Sequence flowExclusive gateway

Parallel gateway

Place TransitionArc

Start event

End event

Activity

Exclusive gateway

Parallel gateway

Split Join

Fork Merge

Figure 2.2: Subset of core BPMN elements.

2.1.2 Petri nets

Petri nets [93] are a well-known mathematical language for modeling concurrent pro-

cesses [120] and have been widely used in the context of business processes analysis.

Petri nets are backed by precise mathematical definition and offer intuitive graphical

representation. A Petri net is a directed bipartite graph composed of two types of

nodes, namely transitions and places. Intuitively, transitions represents the activities

of a process, and places represent the states of the process. Directed arcs in a Petri net

represent the order among nodes, i.e. which places are pre- and/or postconditions for

which transitions, and cannot connect two nodes of the same type. Figure 2.3 shows

the typical notations used to represent transitions, places and arcs in a Petri net.

BPMN Petri net

Start event

End event Activity


Parallel gateway

Place TransitionArc

Start event

End event

Activity

Exclusive gateway

Parallel gateway

Split Join

Fork Merge

Figure 2.3: Core Petri net elements.

Places in a Petri net may hold a discrete number of tokens. Each distribution of

tokens over the places is called a marking and represents a state of the net. A transition



of a Petri net is enabled if there are sufficient tokens in all of its input places. An

enabled transition may fire at any time. Once a transition fires it consumes the required

input tokens and produces tokens in its output places.

Coloured Petri nets (CPN) [56] are an extension of Petri nets that while preserving

its useful properties, allow the distinction between tokens by adding colors, i.e. data

values, to them. It is possible to define complex data types in CPNs, however, each

place of a CPN typically holds tokens of the same type. This type is called color set of

the place.

Figure 2.4 shows a mapping between BPMN elements and their representations in

Petri nets. In this thesis, we use the term process fragment as a generalized concept

covering activities (or transitions) and sub process graphs with single entry and single

exit nodes (known as hammocks in graph literature [134]).

BPMN Petri net

Start event

End event

Activity


Parallel gateway

Place TransitionArc

Start event

End event

Activity

Exclusive gateway

Parallel gateway

Split Join

Fork Merge

Figure 2.4: Mapping of activities, events and gateways to Petri nets.


2.2. DATA MINING

2.2 Data Mining

Data mining is the process of extracting useful patterns and knowledge from large

datasets. Data mining algorithms can be applied to various types of data, from typical

database data, transactional data, data streams, ordered data, graph data, etc. The typ-

ical data mining process comprises several steps. First data is cleaned from noise and

inconsistent instances (data cleaning). Then the cleaned data from different sources is

integrated into a data warehouse (data integration). Next, the relevant data with respect

to the analysis objective is selected (data selection). In the fourth step, the selected data

is transformed and combined into the form required for mining (data transformation).

Afterwards, suitable mining methods are applied to the data and the patterns are dis-

covered (data mining). The penultimate step is refining interesting patterns from the

discovered knowledge using interestingness measures (pattern evaluation), and the last

step is representing the result by means of tables, charts, graphs and diagrams (knowl-

edge presentation) [80, 46]

Clustering is a branch of data mining which tries to divide data into groups of sim-

ilar objects. Each group is called a cluster and each member of a cluster is similar

to its cluster-mates and dissimilar to the members of other clusters. Clustering is an

unsupervised learning problem, the outcome of which is extremely dependent on the

selected similarity measures. Summarizing data in the form of clusters simplifies data

interpretation, of course at the price of missing certain details about the data. Cluster-

ing can be used in many other fields. Statistics [7] and, more generally, science [83]

have always leveraged clustering techniques, besides fields such as pattern recognition

where character and speech recognition can be named as typical applications of clus-

tering [29, 30]. Clustering can also be applied for solving density estimation problems

such as multivariate statistical estimations [106, 9].

2.2.1 Concept Drift Detection

Concept drift detection has been extensively studied in the field of data mining [59,

67, 114, 35, 88, 28, 33, 104, 39], where a drift mainly refers to change in the relation

between the input and the target variables in an online supervised learning scenario.

As such, a widely studied challenge is that of devising learning algorithms that can

detect a concept drift as quickly as possible and adapt to the new concept (a.k.a. adap-

tive learning) [50, 132, 60, 38, 58, 40, 6]. This includes for instance changes in the

distributions of numerical or categorical variables.



2.2.2 Taxonomy of Concept Drift Detection Mechanisms

In this section, we present a review of the existing concept drift detection techniques

in the field of data mining by dividing them based on their underlying drift detection

mechanism into three categories: techniques based on sequential analysis, techniques

based on control charts, and techniques based on monitoring distributions over two

time windows.

2.2.2.1 Drift Detection Based on Sequential Analysis

The Sequential Probability Ratio Test (SPRT) [127] is a specific type of sequential

hypothesis testing that is used as the basis for several drift detection algorithms. Given

a sequence of observations whose underlying distribution changes from D0 to D1 at

a certain point w, the test evaluates whether the ratio of the probability of observing

certain subsequences under D1 to that under D0 is significant, i.e. above a user-defined

threshold. If the test evaluates to true, the null hypothesis, i.e. the two distributions are

similar, is rejected and a drift is detected at point w.

The Cumulative Sum (CUSUM) is a sequential analysis technique, developed by E.

S. Page [90] based on the principles of SPRT, and is often used for change detection

in stream mining [87]. CUSUM raises an alarm when a parameter, e.g. the mean, of

the probability distribution of the incoming data significantly deviates from zero. The

user is required to set a parameter that specifies the magnitude of the changes that are

considered significant. The value of this parameter controls the trade-off between early

detection of true drifts and detection of false drifts. Initializing this parameter with a

low value allows earlier detection of drifts, but also increases the chance of raising

false drift alarms. The Page-Hinkley test (PH) [90] is a variant of CUSUM that is often

used for change detection in signal processing. PH enables efficient detection of drifts

in the normal behavior of a process as represented by a model.

Motivated by time series, Roberts [103] proposed the Exponentially Weighted Mov-

ing Average (EWMA) method for detecting changes in the moving average of variables

or attributes-type data with normal distributions. Shiryaev [111, 110] introduced a

procedure for detecting changes in the drift of a Brownian motion with the aim of

minimizing expected delay between the time that the change occurs to when it is de-

tected, which is now usually referred to as the Shiryaev-Robert procedure. The same

author several years later presented a Bayesian approach [112] for detecting changes

in online settings which, similar to the CUSUM and PH tests, reports a change once

its computed test statistic goes over a user-defined threshold. The accuracy of such


2.2. DATA MINING

drift detection methods often relies on the trade-off between the false alarm rate and

the missed detection rate.

2.2.2.2 Drift Detection Based on Control Charts

Statistical Process Control (SPC) [109] is a statistically-grounded method for monitor-

ing and controlling the quality of a process. SPC can be applied to any process where

the conformance of the product to its specifications can be measured, e.g. in manufac-

turing lines. Control charts [109] or process-behavior charts are a SPC tool used to

check whether a manufacturing or business process is in a state of control. There are

several drift detection techniques based on control charts, for example [66, 38, 43, 13].

A control chart is constructed by first drawing points which represent statistics,

e.g. means, of measurements of a quality characteristic of a process in samples taken

at different times. Then, a center line is drawn at the value of the mean of these

statistics (e.g. the mean of the means of samples). Next, the standard deviation of the

mean of the statistics over all samples is also calculated, and based on which the upper

and lower control limits are drawn at 3 standard deviations from the center line. For

a process that is in control, over 99% of all process outputs fall within these limits.

Any observation that falls outside these limits indicates a likely unexpected source of

variation, and should be investigated. Optionally, two warning thresholds may also be

added to the chart at 2 standard deviations below and above the center line to provide

early notifications of a likely change to the quality engineers of the process. This

allows them to, for example, increase the rate at which the samples are taken until it

is ensured that the process is truly in control. The impact of the sample size on the

overall performance of a control chart has been studied by several articles [102, 76, 20,

133, 19], and a sample size of 2 has shown to work best for many test cases [47].

2.2.2.3 Drift Detection Based on Monitoring Distributions over Two Time Win-dows

These techniques perform a statistical test over data distributions in a reference and a

detection window, containing past and most recent data samples, respectively. If the

null hypothesis, i.e. the distributions are equal, is rejected a drift is declared at the

start or end of the detection window. The monitored data may be univariate, bivariate,

or multivariate. The size of the two windows may be fixed or adaptive, and different

window positioning strategies may be used [2].

For example, Kifer et al. proposed to compare distributions of successive data



points in two adjacent windows sliding over a data stream using statistical tests and

Chernoff bounds [45] to determine whether the two distributions are statistically dif-

ferent. An entropy-based metric is introduced by Vorburger and Bernstein [126] for

measuring the distribution inequality between two sliding windows containing older

and more recent data instances, respectively. An entropy value of 1 indicates equal

distributions whereas an entropy value of 0 suggests completely different distributions.

By continuously monitoring the entropy metric over time, a concept drift is detected

when the value of the entropy metric drops below a user-defined threshold. Similarly,

Dasu et al. [23] and Sebastiao and Gama [107] use Kullback-Leibler divergence [63]

to measure the distance between the probability distributions of two time windows,

containing old and recent data samples, to identify concept drifts in streams of multi-

dimensional data.

The main advantage of techniques based on monitoring two distributions as com-

pare to sequential analysis techniques is more precise localization of drifts. On the

other hand, they have larger memory footprints as they need to store the data within

two windows, whereas the sequential analysis techniques do not need to store the in-

coming data.

2.2.3 Concept Drift Characterization

In this context, the term drift characterization is used for describing different prop-

erties of a drift as well as explaining concept changes. For example, some studies

focus on analyzing a specific metric of a drift, e.g. severity, predictability, and fre-

quency [85, 62]. In this respect, Webb et al. [128] propose a comprehensive framework

for quantitative analysis of a drift, e.g. measuring drift magnitude or drift duration.

They also qualitatively categorize drifts into different types based on their occurrence

with respect to time, e.g. sudden or gradual. On the other hand, some studies have

explored techniques for the identification of features that explain the drift. For in-

stance, in [97], authors use brushed parallel histograms to visualize concept drifts in

multidimensional problem spaces.

However, the methods developed for detecting and characterizing a drift in data

mining deal with simple data structures (e.g. numerical or categorical variables and

vectors thereof), while in business process drift detection and characterization we seek

to detect and characterize changes in more complex structures, specifically behavioral

relations between process activities or fragments (e.g. concurrency, conflicts, loops).

Thus, methods from the field of concept drift detection and characterization in data


2.3. PROCESS MINING

mining cannot be readily transposed to business process drift detection and character-

ization.

2.3 Process Mining

Process Mining starts by collecting information about ongoing processes. Process min-

ing assumes that details about each activity performed in the organization as well as

their order are stored in event logs. Any transactional information system such as En-

terprise resource planning, Customer relationship management, Business-to-business,

Supply Chain Management and Workflow Management produces such logs.

Process mining techniques fall into three broad categories:

• Automatic discovery: Techniques for automatic discovery of process models

based on event logs.

• Conformance checking: Techniques for verifying the conformance of event logs

to process models.

• Process enhancement: Techniques for modifying or extending process models

based on actual models in the organization, recorded in the form of event logs.

A process discovery algorithm is evaluated based on four metrics: fitness, simplicity,

generalization and precision. Most of the times the quality of a process discovery

algorithm is measured by the percentage of the log that can be reproduced by the dis-

covered model (fitness). Process discovery algorithms often output spaghetti-like pro-

cesses which are difficult to read. Therefore, simplicity is another criterion to consider.

In addition, the ability of the discovered model for generalizing the observed behavior

in the log (generalization), and also the extent that the model allows generation of the

behavior not observed in the log (precision) are other metrics for evaluating a process

discovery algorithm [15]. The described metrics are illustrated in Figure 2.5.

Concept drift detection and characterization methods lie in the family of both pro-

cess discovery and conformance checking techniques.

Below, we introduce basic notions such as traces, event logs, event streams and

directly follows relations used as the basis for defining notions related to each method

in the next chapters. The notation used in this thesis is summarized in Appendix A.



simplicity

precision

replay fitness

generalization

"Occam's razor""able to replay event log"

"not underfitting the log""not overfitting the log"

process discovery

Figure 2.5: Quality metrics for process discovery algorithms [15].

2.3.1 Event Log and Event Stream

Event logs are at the core of all process mining techniques. An event log is a set of

traces, each capturing the sequence of events originated from a given process instance

(case). Each event represents an occurrence of an activity.

Let L be the set of all activity labels (labels, for short), C be the set of all case

identifiers and T be the set of timestamps, then, we define event and event universe as

follows:

Definition 1 (Event, Event universe). An event e is a triple e = (c, l, t) ∈ C ×L ×T

which describes the occurrence of activity l in case c at time t. The set of all possible

events is called event universe and is indicated as E .

To identify each component of an event e=(c, l, t) we define the functions #case(e)=

c, #label(e) = l and #timestamp(e) = t.

Definition 2 (Event log, Trace). Let L be an event log over the set of labels L , i.e. L ∈P(L ∗). A trace σ ∈ L is a sequence of events Eσ ⊆ E , ordered by their timestamps,

with |Eσ |= n such that σ = 〈#label(e0),#label(e1), . . . ,#label(en−1)〉. Any sub-sequence

of a trace represents a sub-trace.

For example, the following represents an event log with a total of six traces, with

two distinct traces: L = {〈a, b, d〉2, 〈a, b, c, d〉4}. For a trace 〈a, b, c, d〉, 〈a, b〉and 〈b, c, d〉 are two sample sub-traces.

The configuration where events are read individually from an online source is

known as event streaming. An event stream is a potentially infinite sequence of events,

where events are ordered by their timestamps. Events of the same trace do not need to

be consecutive in the event stream, i.e. traces can be “overlapping”. Formally:


2.3. PROCESS MINING

Definition 3 (Event stream). An event stream is a partial bijective function ES :N→ E

that maps every element from the index N to E .

Figure 2.6 shows a small portion of an event stream. Note that two subsequent

events may belong to different cases.

A B C D EA B A BC D C E E

Time

A B C D

A B C D E

A B C E

Case c1

Case c2

Case c3

ES

E

Figure 2.6: Visual example of a small portion of an event stream. Each square box rep-resents an event. Case ids are color-coded (i.e. each case id has a unique backgroundcolor) and labels in boxes indicate activity labels. The top row of events represents theentire event stream portion, the remaining rows show the individual cases constitutingthe stream.

Definition 4 (Directly follows relation). Let L be an event log over L and a, b ∈ L .

There is a directly follows relation from a to b, denoted by a >L b, if and only if there

is a trace σ = l1l2l3...ln and i ∈ 1, ...,n−2 such that σ ∈ L and li = a and li+1 = b.

Directly follows relations can be extracted from event logs and process models. A

directly follows graph is a directed graph whose nodes represent activities and whose

edges represent directly follows relations between activities. Each edge in the directly

follows graph that is derived from an event log may be annotated by a weight, denoting

how often its corresponding directly follows relation is observed in the log. Figure 2.7

shows the directly follows graph derived from the event log

L = {〈a,b, f 〉4, 〈 f ,a,b〉3, 〈e,d,a,b〉, 〈d,e,a,b〉2, 〈 f ,a,b,c,a,b〉3}.

2.3.2 Business Process Drift

A business process drift is defined as a (statistically) significant change in the process

behavior [11, 77]. Three primary perspectives in the context of business processes are

the control-flow, data and resource perspectives. A drift may occur in one or more of

these perspectives.



a b

c

d e

f6

2

343

16

2

Figure 2.7: Example of a directly follows graph.

• Control-flow/behavioral perspective Refers to behavioral and structural changes

in a business process model. A list of common control-flow change patterns is

outlined in [129]. In [77], these control-flow changes are classified into three

categories: Insertion (I), e.g. inserting or deleting a fragment, Resequentializa-

tion (R), e.g. parallelizing two sequential fragments, and Optionalization (O),

e.g. embedding an existing fragment in a loop. We also use this classification,

specifically when experimenting with artificial logs, in this thesis. Table 2.1

shows the common control-flow change patterns obtained from [129] and their

categories. For example, an insurance company which used to perform a certain

check on cases after they are processed by case officers now performs the check

in the beginning of the process before cases are processed any further. Here, a

move change patterns has been applied to the case check fragment which falls

into the insertion category.

Sometimes, the change is not in the control-flow structure of a process, but in the

behavioral aspects of it. For example, in a loan application process, applications

above 5000$ were considered as “high” last year, while this year those above

10000$ are labeled as high, due to the banks’s decision to increase the loan

application limit. In this case, the structure of the process remains unchanged

but the routing of cases changes. We refer to such a change as a branching

frequency change, and still consider it as a control-flow change in this thesis.

• Resource Perspective Refers to changes in resource behavior, e.g. their skills,

utilization, preferences, productivity, collaboration, etc., as well as in organiza-

tional structure of a process. Examples of change in resource perspective are,

replacing a resource who performs a particular activity, a change in the perfor-

mance of a resource in performing a certain activity, a change in the workload

of a resource, or a change in the collaboration pattern of a resource with another

resource. Pika et al. [94] present a method for detecting drifts in resource behav-

ior based on a set of predefined resource behavior indicators (RBI). To detect a


2.3. PROCESS MINING

Change pattern Cat.Insert/delete a fragment between two fragments IInsert/delete a fragment in/from parallel branch IInsert/delete a fragment in/from conditional branch IDuplicate a fragment ISubstitute a fragment ISwap two fragments IMove a fragment to between two fragments IMove a fragment into/out of conditional branch IMove a fragment into/out of parallel branch IMake fragments mutually exclusive/sequential RMake fragments parallel/sequential RSynchronize two fragments RMake a fragment loopable/non-loopable OMake a fragment skippable/non-skippable OChange branching frequency O

Table 2.1: Common control-flow change patterns in business processes from [129].

drift they perform statistical tests on a time series that records the evolution of

each RBI over time.

• Data Perspective Refers to changes in the requirement and generation of data in

activities of a process. For example, in a loan application process, reducing the

number of co-signatures required to enable the execution of a particular activity.

Process drifts may be divided into four classes based on the form in which they

manifest themselves over time, as shown in Figure 2.8.

• Sudden drift Refers to a scenario where a current process P1 is substituted with

a new process P2, and from the moment of substitution all process instances are

processed based on the new process, as shown in Figure 2.8a. For example,

requiring a new health check in a citizenship application process due to a new

regulation.

• Gradual drift Refers to a scenario where a current process P1 is substituted

with a new process P2, however both processes coexist for some time with the

old process is gradually discontinued, as shown in Figure 2.8b. For example, a

new policy in an insurance company requires claim handlers to perform a new

check on each insurance claim. The insurance company decide to first start by

performing the check on long-term and high-value claims and over time extend

it to short-term and low-value claims.

• Recurring drift Refers to a scenario where a set of processes, e.g. P1 and P2

in Figure 2.8c, are substituted back and forth with each other. Such drifts can



be divided into periodic and non-periodic, and are often induced by changes in

the external environment in which a business process operates. An example of

a periodic recurrence is in the tourism industry, where a travel agency may de-

ploy different processes during different seasons. An example of a non-periodic

recurrence is a deployment of a different process based on the market condi-

tions. The time of the deployment and its duration are dependent on the market

conditions.

• Incremental drift Refers to a scenario where an existing process P1 is substi-

tuted with a new process Pn by applying smaller incremental changes over a

period of time, resulting in process variants P2, . . .Pn, as shown in Figure 2.8d.

This class of drift is more common in organizations that follow agile business

process management methodology.

P1

P2

time

P1

P2

time

P1

P2

time

P1

P2

time

..P3

Pn

..

(a) Sudden

P1

P2

time

P1

P2

time

P1

P2

time

P1

P2

time

..P3

Pn

..

(b) Gradual

P1

P2

time

P1

P2

time

P1

P2

time

P1

P2

time

..P3

Pn

..

(c) Recurring

P1

P2

time

P1

P2

time

P1

P2

time

P1

P2

time

..P3

Pn..

(d) Incremental

Figure 2.8: Different classes of drifts. Y-axes indicate process variants and blue rect-angles represent process instances.


Chapter 3

Process Drift Detection

In the introduction, we highlighted some ways in which drift detection can contribute

to the success of a process improvement initiative within an organization. Specifically,

early detection of drifts enables organizations to take timely corrective measures and

avoid any negative consequences that would otherwise result from unplanned changes

in the behavior of their business processes. We also specified that state-of-the-art drift

detection techniques cannot detect drifts at real-time from streams of events that incre-

mentally record the executions of a business process. As such, they may also fail to

detect or detect with a long delay intra-trace drifts, i.e. drifts that occur during the exe-

cution of a process and may also impact ongoing process executions. Furthermore, as

they rely on statistical tests over trace distributions to detect drifts, they do not perform

well with unpredictable processes, e.g. a healthcare process, whose logs exhibit high

trace variability, i.e. a high number of distinct traces over the total number of traces.

To address the identified limitations, in this chapter we propose a fully automated,

online method for detecting process drifts from event streams. We perform statisti-

cal tests over distributions of behavioral relations between activities such as conflict,

causality and concurrency, as observed from two adjacent windows of adjustable size,

which we slide over the stream. Given that behavioral relations between activities are

a type of sub-trace features, the method does not suffer from low accuracy when the

log is highly variable (i.e. for unpredictable processes). We extensively evaluate the

accuracy and scalability of our method by simulating event streams from artificial and

real-life logs. The results show that the approach is fast and highly accurate in detect-

ing common change patterns, and significantly better than the state of the art in process

drift detection.

This chapter is structured as follows. Section 3.1 discusses related work on process

29

CHAPTER 3. PROCESS DRIFT DETECTION

drift detection. Section 3.2 introduces the proposed method while Sections 3.4 and

3.5 present its evaluations on artificial and real-life logs, respectively. Section 3.6

concludes the chapter.

3.1 Related Work

Various methods have been proposed to detect process drifts from event logs [18, 1, 12,

82, 77]. These methods are based on the idea of extracting features (e.g. patterns) from

the traces of an event log. For example, Carmona et al. [18] propose to represent a

log as a polyhedron. This representation is computed for prefixes in a random sample

of the initial traces in the log. The method checks the fitness of subsequent trace

prefixes against the constructed polyhedron. If a significant number of these prefixes

does not lie in the polyhedron, a drift is declared. The method guarantees that drifts

of certain types will always be detected. However, to find a second drift after the

first one, the entire detection process must be restarted, thus adversely affecting on the

scalability of the method. In previous experiments we conducted [77], the execution

of this implemented method took hours to complete. Another drawback is its inability

to pinpoint the exact moment of the drift.

Accorsi et al. [1] propose to cluster the traces in a moving window of the log,

based on the average distance between each pair of events in the traces. This method

heavily depends on the choice of the window size: a low window size may lead to false

positives while a high window size may lead to false negatives (undetected drifts),

as drifts happening inside the window go undetected. In addition the method is not

designed to deal with loops, and may fail to detect types of changes that do not cause

significant variations to the distances between activity pairs, e.g. changes involving an

activity being skipped.

Bose et al. [12] propose a method to detect process drifts based on statistical testing

over feature vectors. The method is not fully automated, as the user is asked to identify

the features to be used for drift detection, implying that they have some a-priori knowl-

edge of the possible nature of the drift. Further, this method is unable to identify certain

types of drifts such as inserting a conditional branch or a conditional move, even if the

relevant process activities are selected as features. Finally, similar to Accorsi et al. [1],

the user is required to set a window size for drift detection. Depending on how this

parameter is set, some drifts may be missed. This latter limitation is partially lifted in a

subsequent extension [82], which introduces a notion of adaptive window. The idea is


3.1. RELATED WORK

to increase the window size until it reaches a maximum size or until a drift is detected.

However, this technique requires the user to set a minimum and a maximum window

size. If the minimum window size is too small, minor variations (e.g. noise) may be

misinterpreted as drifts (false positives). Conversely, if the maximum window size is

too large, the execution time is affected and some drifts may go undetected.

Li et al. [74] identify a drift as a difference between the binary activity relations

of causality and concurrency, as well as length-two loop, extracted by the Heuristic

Miner [130], in two overlapping windows of the same size sliding over a stream of

traces. The proposed method suffers from a few problems. First, it does not factor in

the frequency of binary relations when detecting a drift, and as such cannot detect drifts

caused by branching frequency changes. Furthermore, there is no statistical support

for determining whether the identified changes are actually significant. Finally, the

accuracy of drift detection highly depends on the size of the drift detection windows,

which needs to be manually set by the user.

All these methods may miss certain types of changes that are not covered by the

types of features used. Moreover, their scalability is constrained by the need to extract

and analyze a feature space that is potentially very large. Hence, they are not suit-

able for online settings. This motivated us to propose a new method [77] for detecting

process drifts determined by a wide range of typical process change patterns [129].

The method is based on statistical tests over the distribution of runs (an abstraction

of complete traces), as observed in two consecutive time windows. The size of these

windows is adjusted automatically based on changes in log variability. In the exper-

iments with artificial as well as real-life event logs this method outperformed all the

above methods in terms of detection accuracy and scalability. As such, we selected

it as a baseline for the experiments in Section 3.4. As shown in our experiments in

this chapter, this method also does not cater for highly variable event logs. In such

logs each distinct run occurs only a few times, leading to a less reliable statistical test,

and hence too many false negatives. Further, as the method works based on complete

traces, it cannot detect (intra-trace) drifts from event streams.

To the best of our knowledge, the only method that deals with event streams has

been proposed by Burattin et al. [16]. However, this work mainly focuses on the on-

line discovery of process models captured as a set of business constraints (formulated

in Linear Temporal Logic) between events. Any change in the extracted constraints

over time may be considered as a drift. Nonetheless, there is no statistical support

for detecting whether changes are in fact significant, and the exact positions of the



identified drifts are not reported. As such, drift detection accuracy is not evaluated.

In summary, none of the existing process drift detection techniques fully satisfies

the process drift detection criteria outlined in Section 1.2.2.

3.2 Drift Detection Method

From a statistical viewpoint, the problem of business process drift detection can be for-

mulated as follows: identify a time point before and after which there is a statistically

significant difference between the observed process behaviors. Therefore, to detect a

drift we need features that properly capture the behavior of a process. By monitor-

ing and analyzing the feature vectors over time, we can identify the time points where

the feature vectors exhibit statistically significant changes. We explored a few differ-

ent features including Directly Follows relations (direct succession), Follows relations

(succesion), Block Structures (extracted from process trees produced by the Inductive

Miner [71]) and α+ Relations [25]. We found that while the directly follows and

follows relations are over-fitting features, block structures were under-fitting features.

However, α+ relations proved to be the suitable level of abstraction for capturing the

behavior of unpredictable processes represented in an event stream.

To detect a process drift we perform a statistical test, namely the G-test of indepen-

dence,1 over distributions of α+ relations observed in two adjacent time windows of

adaptive size, sliding along with a stream of events. Basically, the most recent events

are equally divided into reference window (less recent events), and detection window

(more recent events). Each time a new event enters the event stream, the two win-

dows shift forward so that the new event is in the detection window. The set of events

within each window is used to build a corresponding sub-log. This sub-log represents

the process behavior observed within the respective window. The sliding window is a

well-stablished technique in the concept drift community [39].

Then the α+ relations and their frequencies are extracted from each sub-log, and

used to populate a 2×n matrix, the so-called contingency matrix, where n is the num-

ber of distinct relations. Each column in the contingency matrix corresponds to a

category of a statistical variable, here an α+ relation. The first row in the contingency

matrix contains the frequencies of the relations in the detection window, i.e. the ob-

served frequencies, while the second row contains the frequencies of the relations in

1The G-test is a non-parametric hypothesis statistical test which assumes no a-priori knowledge ofthe statistical distributions. The G-test is a better approximation to the theoretical chi-squared distribu-tion than the chi-squared test [48].


3.2. DRIFT DETECTION METHOD

the reference window, i.e. the expected frequencies.

The result of applying the G-test of independence on the contingency matrix is the

significance probability (P–value) that the populations of α+ relations over the two

windows come from the same distribution. A P–value above a predefined threshold2

accepts the null hypothesis, i.e. the frequency distributions of the α+ relations in the

two windows are similar. However, a P–value below the threshold rejects the null

hypothesis, meaning that the α+ relations in the two windows come from different

distributions. In other words, they reflect different process behaviors (process drift).

3.2.1 Intra-trace vs Inter-trace

A drift may occur between complete executions of a process. We call this an inter-

trace drift. For example, a new legislation requires an insurance company to perform

a more stringent verification on new claims, while old claims are exempted. These

however are not the only type of drift. In reality, a drift may also occur during the

execution of a process and may impact ongoing process instances [129]. We call these

intra-trace drifts. For example, an insurance check may need to be removed altogether

due to a contingency plan triggered by severe weather conditions (e.g. a flood). Such

a change may impact new process instances as well as the instances that have already

started, but that have not yet gone through the check to be removed.

In addition, in order to detect a drift using a stream of traces, we have to wait until

each trace completes before we can use it. This delays the detection of the drift. On

the other hand, working on a stream of events allows us to instantly use each observed

event, thereby detecting a drift as soon as possible during the execution of the process.

3.2.2 α+ Relations

In this chapter, we use the α+ relations [25], as an extension of the α relations [118],

to capture the behavior of a process. The α-algorithm defines three exclusive rela-

tions: conflict, concurrency and causality. The α+-algorithm adds two more relations:

length-two loop and length-one loop. The α+ relations are formally defined as follows:

Definition 5 (α+ relations from [25]). Let L be an event log over L . Let a, b ∈L :

• a4Lb if and only if there is a trace σ = l1l2l3...ln and i ∈ 1, ...,n−2 such that σ ∈ L and li =

li+2 = a and li+1 = b,

• a�L b if and only if a4Lb and b4La,

2The typical value of the threshold, i.e. significance level, for the G-test is 0.05 [89].



• a >L b if and only if there is a trace σ = l1l2l3...ln and i ∈ 1, ...,n−2 such that σ ∈ L and li = a

and li+1 = b,

• a→L b if and only if a >L b and (b≯L a or a�L b),

• a#Lb if and only if a≯L b and b≯L a, and

• a ‖L b if and only if a >L b and b >L a, and a 6�L b.

A length-two loop relation, including a and b, is denoted with a4Lb. The fre-

quency of this relation in a log is the number of occurrences of the substring aba. A

causality relation from a to b is denoted with a→L b. The frequency of this relation

in a log is the number of occurrences of the substring ab. A parallel relation between

a and b is denoted with a ‖L b. The frequency of this relation in a log is the minimum

of the frequencies of the two substrings, ab and ba. A conflict relation between a and

b is denoted with a#Lb, and indicates that there is no trace with the substring ab or ba.

The frequency of this relation in a log is the number of occurrences of a and b. The

α+-algorithm also discovers length-one loop relations as a pre-processing operation.

For example, there is a length-one loop including the activity a in a log if there is a

trace with the substring aa. The frequency of this relation in a log is the number of

occurrences of the substring aa.

3.2.3 Statistical Testing over Event Streams

This section describes our online drift detection algorithm as presented in Algorithm 1.

The drift detection algorithm has three parameters: 1. eventStream: a stream of events.

2. initWinSize: initial size of the detection and reference windows. 3. maxBufSize:

maximum available memory for the event buffer storing the incoming events, namely

eventBuf . Since the algorithm works online the size of this buffer must not exceed

maxBufSize. Therefore, each time a new event e arrives we first check if the buffer has

reached its maximum size, and if so we shift the events in the buffer and discard the

least recent event (lines 11-13). We then insert the new event into the buffer (line 14).

The first statistical test should be performed when the number of events in the

buffer is 2× initWinSize (line 15). Before each statistical test we adapt the size of

the two windows to improve the accuracy of the approach (line 16). The notion of

adaptive window is explained in Section 3.2.4. The method updateSublogs updates

the sub-logs related to the detection and reference windows, namely detSubLog and

refSubLog, respectively, using the events within their corresponding windows (line



17). The first time this method is called the sub-logs are built from scratch. The α+

relations and their frequencies are extracted from the two sub-logs and populated in

a contingency matrix (line 19). We then perform the G-test of independence on this

contingency matrix and obtain the P–value (line 20). The value of the G-test threshold,

GtestThreshold, is set to the typical value of the G-test, which is 0.05.

Each time the P–value drops below the threshold GtestThreshold, we store the

current event and the current window size in pbtEvent and pbtWinSize, respectively

(lines 24-25). Since any statistical test is subject to sporadic stochastic oscillations, we

introduced an additional filter, namely oscillation filter. The P–value drops have to be

consistent over many consecutive statistical tests in order to avoid reporting incidental

drops in the P–value (oscillations). The size of the oscillation filter is calculated by

function Φ which uses the window size w as input. The number of consecutive tests in

which the P–value is below the threshold GtestThreshold is stored in pbtLen. We detect

a drift only if pbtLen is at least equal to Φ(w) (line 27). Our experiments showed that a

value of Φ(w) = w/2 provides the best results in terms of accuracy (cf. section 3.4.3).

The drift is localized at the event where the P–value dropped consistently below the

threshold, stored at pbtEvent (line 28). Whenever the P–value exceeds the threshold

we reset pbtLen, pbtEvent and pbtWinSize (lines 31-33).

3.2.4 Adaptive Window

Best practices of using the G-test recommend that no more than 20 percent of the

expected frequencies in the contingency matrix have less than 5 occurrences, to have

a reliable statistical test [48]. Thus, each time before performing the statistical test

we ensure the size of the two windows is large enough to fulfill this requirement.

Even though the larger the window size is the higher the chances that the requirement

of the statistical test is met, a very large window size may increase the number of

new events needed to detect a drift, so-called mean delay. Furthermore, it may also

cause the detection and reference windows to span over multiple drifts, thereby letting

some of the drifts go undetected. Therefore, we need to balance between improving

the reliability of the statistical test, by increasing the window size, and reducing the

detection delay of the method, by decreasing the window size.

The idea behind our adaptive window originates from the requirement of the sta-

tistical test mentioned above, meaning that on average we aim to have a frequency

of no less than 5 for each of the α+ relations in the contingency matrix. Given that

the maximum number of possible relations over the set of labels (activity names) L



Algorithm 1 Drift Detection Algorithm1: procedure DETECTDRIFT(eventStream, initWinSize, maxBufSize)2: eventBuf /*Event buffer*/3: w←− initWinSize /*Current window size*/4: detSubLog, refSubLog /*List of sub-traces within detection and reference windows, respec-

tively*/5: GtestThreshold←− 0.05 /*Typical threshold value of G-test*/6: pbtEvent←− NIL /*Current event when P–value drops below GtestThreshold*/7: pbtWinSize←−−1 /*Value of w when P–value drops below GtestThreshold*/8: pbtLen←− 0 /*# of consecutive tests that P–value remains below GtestThreshold*/9: while true do

10: e←− fetch(eventStream)/*Fetch a new event e*/11: if size(eventBuf ) = maxBufSize then12: shift(eventBuf )13: end if14: insert(eventBuf ,e) ebLength←− length of eventBuf15: if ebLength≥ 2 · initWinSize then16: newWinSize←− adWin(eventBuf ,w)17: updateSublogs(eventBuf ,detSubLog,refSubLog,w,newWinSize)18: w←− newWinSize19: conMat←− buildContingencyMatrix(detSubLog,refSubLog)20: pValue←− Gtest(conMat)21: if pValue < GtestThreshold then22: pbtLen←− pbtLen+123: if pbtEvent = NIL then24: pbtEvent←− e25: pbtWinSize←− w26: end if27: if pbtLen = Φ(pbtWinSize) then28: reportDrift(pbtEvent) /*Drift detected and reported*/29: end if30: else31: pbtLen←− 032: pbtEvent←− NIL33: pbtWinSize←−−134: end if35: end if36: end while37: end procedure



is |L |2, we calculate |L | over both detection and reference windows, denoted by

|Ldet |, |Lre f |, respectively. By multiplying max(|Ldet |, |Lre f |)2 by 5 it is likely to

have enough events in both windows to fulfill the requirement of the statistical test.

Hence window size w is defined as w = max(|Ldet |, |Lre f |)2 ·5.

The expansion and the shrinkage of the windows is performed recursively. This

is because each time the windows are, for example, expanded there may be a need to

expand the windows again due to changes in |Ldet | and/or |Lre f |. It is worth men-

tioning that our adaptive window is not dependent on the initial window size, since

starting from any initial value the window sizes converge to the length needed to fulfill

the requirement of the statistical test. The maximum size each window could grow to

is the length of the event buffer divided by two.

It is worth mentioning that in the unlikely extreme scenario where the overlapping

between traces is to the extent that each event within a window comes from a distinct

trace, data streaming techniques with a gradual forgetting strategy [39] should be used.

3.2.5 Noise handling

Real-life event streams often contain noisy events. These events may negatively impact

the accuracy of α+ relations discovered from an event stream, leading to lower drift

detection accuracy. To handle drift detection on noisy event streams, we first filter out

infrequent directly follows relations from the reference and detection windows. We

consider a directly follows relation as infrequent if its frequency lies below a certain

threshold, defined as a percentage of the sum of the frequencies of all directly follows

relations in each of the reference and detection windows. In the experiments with

noisy event streams in this thesis, we set this threshold to 10%. The remaining noise-

free directly follow relations are then used to construct α+ relations from the reference

and detection windows.

Alternatively, more advanced noise filtering techniques such as the one we pro-

posed in [122] can also be used to filter out spurious events from an event stream.

In offline settings, the technique proposed in [22] provides a systematic solution for

removing infrequent activities from an event log.

Time complexity Each time a new event is received from the stream, we first extract

the α+ relations in each sliding window and count their frequencies, and then perform

the G-test of independence. The worst-case complexity of computing the α+ relations

is quadratic in the cardinality of the label set, i.e. O(|L |2). Given a contingency matrix

of maximum size 2× |L |2, the complexity of the G-test is O(|L |2). Since the two



mentioned operations have the same complexity and are executed in a sequence, the

complexity of our method is O(|L |2) for every new event read from the stream.

3.3 Tool Support

We implemented the proposed method as a plug-in for the Apromore platform3 as well

as a standalone open-source tool called ProDrift.4 Figure 3.1 shows a screenshot of the

plug-in in Apromore. The plug-in can be launched by selecting an event log from the

repository within Apromore and pressing “Detect process drifts” from the “Analyze”

menu, as shown in Figure 3.1a. Alternatively, it is possible to click on the menu item

without selecting a log first. In this case, the tool will ask the user to import a log from

their local computer. This second option is particularly useful when the user does not

wish to store their log in the repository that comes with Apromore.

As shown in Figure 3.1b, the plug-in comes with two drift detection configura-

tion options: “event-based” and “trace-based”. The former selects the drift detection

method presented in this chapter, while the latter selects the run-based drift detection

method proposed by Maaradji et al. [77], that is used as the baseline in our experiments

in Section 3.4. When using our method for drift detection we replay the input event

log as an event stream. By default, we use the adaptive window mechanism (cf. Sec-

tion 3.2.4) to automatically set the size of the drift detection windows. Alternatively, it

is possible for the user to select fixed windows of a certain size.

Once the processing of the log is complete, the tool shows a plot of the P–value

of the statistical test, where the position of each detected drift is marked as a circle on

the P–value curve, as shown in Figure 3.1c. Furthermore, the event index at which

each drift is detected and its corresponding date are reported in the list bellow the

P–value plot. Also, by pressing the “Save Sublogs” button, one can download the sub-

logs, each containing sub-traces (event-based setting) or traces (trace-based setting),

between every two consecutive drifts.

3.4 Evaluation on Artificial Logs

We used ProDrift to assess the goodness of our method in terms of accuracy and scal-

ability in a variety of settings. In the rest of this section we discuss the design of the

3Available at http://apromore.org/4Available at http://apromore.org/platform/tools


3.4. EVALUATION ON ARTIFICIAL LOGS

(a) Launch ProDrift.

(b) Set drift detection parameters (optional).

(c) Drift detection results. Drifts are marked as circles on the P-valuecurve of the statistical test and their locations and dates are reportedin the list bellow the plot.

Figure 3.1: Drift detection using ProDrift plug-in within Apromore.



experiments, the datasets used, the impact that oscillation filter and inter-drift distance

have on our method, and conclude by comparing our method with the method in [77].

3.4.1 Setup

To evaluate the effectiveness of our method, we created a variety of artificial logs

with different configurations, and then replayed these logs as event streams. We first

modeled a base business process using CPN tools, as illustrated in Figure 3.2, and

then used this model to generate the logs.5 The model features 42 different activities,

combined with different intertwined structural patterns: five XOR, six AND structures,

and three loop structures. We built this model in a way that the resulting log is highly

variable. To produce logs that include drifts, we then injected different types of control-

flow changes into the base CPN model.

We applied in turn one out of fifteen simple change patterns [129] to the base

model. These patterns, summarized in Table 3.1, describe different change operations

commonly occurring in business process models, such as inserting/deleting a model

fragment, putting a model fragment in a loop, swapping two fragments, or paralleliz-

ing two sequential fragments. We organized the simple changes into three categories:

Insertion (“I”), Resequentialization (“R”) and Optionalization (“O”) (cf. Table 3.1).

These categories make six possible composite change patterns (“IOR”, “IRO”, “OIR”,

“ORI”, “RIO”, and “ROI”) by nesting the simple patterns within each other. For exam-

ple, the composite pattern “ROI” can be obtained by first adding a new activity (“I”),

then making this activity parallel to an existing activity (“O”) and finally by putting the

whole parallel block into a loop structure (“R”).

Each of these change patterns were applied locally on the base model in such a

way that it is possible during log replay to choose between the base model execution

path and the altered one. For instance, if the applied change pattern was to replace

a process fragment (rp), the CPN model would have a branching point, called drift

toggle, right before this fragment, that allows the execution to follow either the initial

model fragment or the new process fragment. A drift is injected by switching the

toggle on or off. In this way, we can generate intra-trace drifts. For instance, if the

toggle is switched on when trace #500 starts, the traces that started before that trace

and have not yet reached the branching point, will follow the new process behavior,

thus exhibiting the change. These traces will therefore have an intra-trace drift. In the

remainder, whenever we say that a drift has been injected at a given trace number (after

5http://cpntools.org


3.4. EVALUATION ON ARTIFICIAL LOGSCID

CID

CID C

ID

CID

CID CID

CID

CID

CID

CID

star

t

CID

1

CID CID

CID

CID

CID

CID

CID

CID

CID

CID

CID

CID

CID

CID

CID

CID

CID

EN

D

CID

CID

CID

CID

CID

CID

CID

CID

CID

CID

CID

CID

CID

CID

CID

CID

CID CID

CID

CID

CID

CID

CID

CID

CID

CID

CID

[i<

tota

lNum

OfT

race

s]

c e f g

a b

h

ij

k

input

();

outp

ut

(Lra

n);

action

dis

cret

e(1,

100)

l

p1

m

s

input

();

outp

ut

(Lra

n);

action

dis

cret

e(1,

100)

r

u

v wx

input

();

outp

ut

(Lra

n);

action

dis

cret

e(1,

100)

d

o7

n1

o1t

yz

n2

p2

o2

o5

n3 n5

n4 n6

o6

p3

p4

p5

p6

o4o3

DP

i

i i i

i i i i

i i i

i@+

TTim

e()

i@+

TTim

e()

i@+

TTim

e()

i@+

TTim

e()

i

i+1@

+In

terA

rriv

alTi

me(

)

i

ii

i

ii

i@+

TTim

e()

i@+

TTim

e()

ii

i@+

TTim

e()

i i i

i@+

TTim

e()

i@+

TTim

e()

ii@

+TT

ime(

)i

i@+

TTim

e()

i

loop

(Lra

n,

i)@

+TT

ime(

)

noL

oop(L

ran,

i)

noL

oop(L

ran,

i)@

+TT

ime(

)

ii@

+TT

ime(

)i

loop

(Lra

n,

i)@

+TT

ime(

)

ii@

+TT

ime(

)

noL

oop(L

ran,

i)@

+TT

ime(

)

ii@

+TT

ime(

)

i i

i@+

TTim

e()

i

i

ii

loop

(Lra

n,

i)@

+TT

ime(

)

ii

ii@

+TT

ime(

) i@+

TTim

e()

ii@

+TT

ime(

)

i

i@+

TTim

e()

ii

noL

oop(L

ran,

i)@

+TT

ime(

)i

i@+

TTim

e()

i

i

i

ii@

+TT

ime(

)

i@+

TTim

e()

ii@

+TT

ime(

)

i

i

i

i@+

TTim

e()

i@+

TTim

e()

i i

i@+

TTim

e()

i@+

TTim

e()

i i

i@+

TTim

e()

i@+

TTim

e()

i i

i

i

i@+

TTim

e()

i@+

TTim

e()

i i

i i

i i

i iii

i i

i@+

TTim

e()

i@+

TTim

e()

i

i@+

TTim

e()

i@+

TTim

e()

i

i@+

TTim

e()

i@+

TTim

e()

ii

i@+

TTim

e()

i i

i@+

TTim

e()

i@+

TTim

e()

i@+

TTim

e()

i

Figu

re3.

2:A

rtifi

cial

proc

ess

mod

elcr

eate

din

CPN

tool

s,us

edas

aba

sem

odel

tosi

mul

ate

the

artifi

cial

even

tlog

s.



a given number of traces) it means that the drift toggle has been switched on at the first

event of that given trace number (resp. after that given number of traces have started).

Code Simple change pattern Categorysre Insert/delete an fragment between two fragment Ipre Insert/delete a fragment in/from parallel branch Icre Insert/delete a fragment in/from conditional branch Icp Duplicate fragment Irp Substitute fragment Isw Swap two fragments Ism Move fragment to between two fragments Icm Move fragment into/out of conditional branch Ipm Move fragment into/out of parallel branch Icf Make two fragments mutually exclusive/sequential Rpl Make two fragments parallel/sequential Rcd Synchronize two fragments Rlp Make fragment loopable/non-loopable Ocb Make fragment skippable/non-skippable Ofr Change branching frequency O

Table 3.1: Change patterns from [129]

Finally, in order to vary the distance between drifts, for each change pattern we

generated three logs of 2,500, 5,000 and 10,000 traces, and injected drifts by switching

the drift toggle on and off every 10% of the log. This led to an inter-drift distance of

250, 500 and 1,000 traces per change pattern, with 9 drifts per log. The position of

an injected drift is given by the index of the first event in the event stream, after the

drift toggle has been switched on. These indexes are used as the true positives of our

evaluation (the gold standard). Further, for each of the 6 composite change patterns,

we created 3 possible combinations, by changing the type of pattern used. This led to

15 (simple patterns) + 18 (complex patterns) = 33 different variants of the CPN model

times three inter-drift distances, resulting in a total of 99 logs.6 All these logs exhibit

a very high trace variability (80%± 2), measured as the ratio between the number of

distinct traces and the number of total traces in the log. According to our analysis of

real-life logs, this value is very indicative of logs of unpredictable processes, such as

the one used in the second part of this evaluation.

To assess the scalability of our method for online drift detection, we measured the

execution time per each new event read from the stream. To evaluate accuracy, we used

F-score and mean delay. The F-score is computed as the harmonic mean of recall and

precision, where recall measures the proportion of actual drifts that have been detected

and precision measures the proportion of detected drifts that are correct. The mean6All the CPN models used for this simulation, the resulting artificial logs, and the detailed evaluation

results are available with the software distribution.



delay [52] assesses the ability of the method to find drifts as early as possible in an

event stream, and is measured as the number of events between the actual position of

the drift and the end of the detection window.

3.4.2 Execution Times

We conducted all tests on an Intel i7 2.20GHz with 16GB RAM (64 bit), running Win-

dows 7 and JVM 7 with standard heap space of 2GB, and a stream buffer (maxBufSize)

of 1GB. The time required to update the α+ relations and perform the G-test, ranges

from a minimum of 10ms to a maximum of 50ms with an average of 14ms. These

results show that the method is suited for online drift detection, including scenarios

where the inter-arrival time between events is in the order of milliseconds.

3.4.3 Impact of Oscillation Filter

In the first experiment, we measured the impact of the oscillation filter Φ(w) on F-

score and mean delay, by varying its value from w/4 to w, where w is the window

size. Figure 3.3 shows the obtained F-score and mean delay averaged over all change

patterns. As expected, we observe that the F-score increases as the filter value grows

and eventually plateaus when it reaches the sliding window size, by filtering out false

positives. However, a larger filter value causes a much higher delay. On the other

hand, while a smaller filter value leads to a smaller delay, it may induce our method

to consider incidental changes as actual drifts, causing the F-score to drop, though this

still remains above 0.9. As a tradeoff, for the remainder of this evaluation, we used

Φ(w) = w/2. With this parameter being set empirically, our method is completely

automated, and no parameter setting is required from the user.

0.88

0.9

0.92

0.94

0.96

0.98

0.25 0.5 0.75 1

F-sc

ore

Oscillation filter (×ѡ)

0

1000

2000

3000

4000

5000

0.25 0.5 0.75 1

Me

an d

ela

y (e

ven

ts)

Oscillation filter (×ѡ)

Figure 3.3: F-score and mean delay usingdifferent oscillation filter values.

0.9

0.91

0.92

0.93

0.94

0.95

0.96

250 500 1000

F-sc

ore

Inter-drift distance

0

500

1000

1500

2000

2500

3000

3500

250 500 1000

Me

an d

ela

y (e

ven

ts)

Inter-drift distance

Figure 3.4: F-score and mean delay usingdifferent inter-drift distances.



3.4.4 Inter-drift Distance

In the second experiment, we compared the F-score and mean delay obtained on logs

of different inter-drift distances (250, 500 and 1,000), in order to assess the minimum

distance that our method can handle. The results, averaged over all change patterns,

indicate that the method performs similarly for the logs with 500 and 1,000 traces of

inter-drift distance, achieving an F-score of about 0.95 and mean delay of about 2,500

(cf. Fig. 3.4). There is a slight decrease in the F-score and a notable increase in the

mean delay when using a distance of 250 traces. In this case, the two sliding windows

may contain two drifts as these are very close. In such cases, the method may miss one

of the two drifts, leading to a lower recall. These cases however are not very common,

as evidenced by the value of the F-score, which does not go below 0.92.

3.4.5 Comparison with Baseline per Process Change Pattern

In the third experiment, we evaluated the accuracy of our method in detecting each of

the 21 change patterns. Figure 3.5 shows the F-score and mean delay for each change

pattern, averaged over the three log sizes, in comparison with those obtained with the

run-based method [77] (the baseline).

Our method could find all the change patterns with a high F-score (above 0.9 in all

but four cases), and a delay in the range of 2,500 events (approximately 100 traces),

peaking at 4,000 events. When compared to the baseline method, our method out-

performs the baseline in terms of F-score in the majority of change patterns (cf. Fig.

3.5 (top)), while the baseline fails to detect almost half of the simple change patterns

(cre, sw, pl, cd, l p, cb). Since in highly variable logs each distinct run is observed

only a few times, the result of the statistical test is less reliable. Thus, in such logs,

the run-based method can only find drift types whose occurrences replace the current

set of runs with a considerably new set of runs, e.g. when deleting a process fragment

from between two other fragments (pattern sre). On the other hand, our current method

considers events (as opposed to traces) and extracts fine-grained, yet abstract features

that capture the process behavior into a few basic relations. Each drift type would be

represented in a handful of α+ relations, and any change in its frequency would be

“echoed” through its correspondent basic relations, making it easier for the statistical

test to detect such a change. Moreover, our method could always detect the drift faster

than the baseline (cf. Fig. 3.5 (bottom)) as it does not need to wait until a trace is

completed to consider it as an input for the statistical test.



0

0.2

0.4

0.6

0.8

1

sre

pre cre cp rp sw sm cm p

m cf pl

cd lp cb fr

IOR

IRO

OIR

OR

I

RIO

RO

I

F-sc

ore

Change patterns

α+

Runs

0

2000

4000

6000

8000

10000

12000

14000

sre


m cf pl

cd lp cb fr

IOR

IRO

OIR

OR

I

RIO

RO

I

Me

an d

ela

y (e

ven

ts)

Change patterns

α+

Runs

Figure 3.5: F-score and mean delay per change pattern, obtained with our method vs.[77].

3.4.6 Comparison with Baseline over Different Log Variability Rates

In this last experiment with artificial logs, we evaluated our method in comparison with

the baseline, when changing the variability rate of the log. As said before, the trace

variability of a log is the ratio between distinct traces and the total number of traces.

It varies from close to 0%, where all traces are the same, to 100%, where every trace

is distinct. Similarly, we define the run variability as the ratio between distinct runs

and the total number of runs. Depending on the concurrency oracle used, a high trace

variability does not necessarily imply a high run variability. On the other hand, a high

run variability always implies an equal or higher trace variability. For instance, a log

with 50% trace variability results in a run variability of 10% (i.e. on average each run

is repeated 10 times). This is due to the aggregation of traces into runs based on the

concurrency oracle. The baseline method performs relatively well with a log with 10%

run variability. Thus, we studied how F-score and mean delay vary as we increase the

run variability of a log.

For this purpose, we generated a new set of artificial logs as described in Section

3.4.1 with different run variability rates, achieved by varying the loopback branching

probability in the CPN model. For each run variability rate and change pattern, we

generated logs of 10,000 traces. The results of this evaluation are reported in Fig. 3.6.

As the variability of the log increases, the baseline method’s accuracy drops signif-



00.20.40.60.8

1

10% (50%) 25% (80%) 40% (90%)

F-sc

ore

Run variability (Trace variability)

α+

Runs0

2000

4000

6000

8000

10% (50%) 25% (80%) 40% (90%)

Me

an d

ela

y (e

ven

ts)

Run variability (Trace variability)

α+

Runs

Figure 3.6: F-score and mean delay per log variability, obtained with our method vs.[77].

icantly. This is because the statistical test adopted by this method is inadequate when

the number of distinct runs is large, as their frequency will be low. In contrast, captur-

ing the process behavior at a lower level of abstraction, as done by the α+ relations, as

opposed to runs, leads to much higher frequencies in the contingency table of the sta-

tistical test, ensuring its reliability. This property is valid regardless of the variability

of the log which explains the steady performance of our method.

3.5 Evaluation on Real-life Log

In addition to the experiments with artificial logs, we evaluated out method on the

BPI Challenge (BPIC) 2011 log, and compared the results with those obtained by the

baseline.7 This log records patient treatments in the Gynaecology department of a

Dutch academic hospital. It contains 150,291 events in over 1,143 traces, of which

981 are distinct, and 623 labels. We first filtered the noise from this event log, using an

offline noise filter [22], which basically removes infrequent activities. This operation

reduced the number of traces to 1,121, of which 798 are distinct, and the number of

labels to 42, resulting in the same trace and run variability of 71%.

We applied our method on the stream of events obtained by replaying the filtered

log. The average execution time for each new event in the stream was 44ms. As shown

in Fig. 3.7 (left), two drifts were detected at the event indexes of 71,321 and 78,541,

corresponding to the dates 6/9/2007 and 29/11/2007 respectively. The baseline could

not detect any drift as the p-value quickly dropped and remained under the threshold,

as shown in Fig. 3.7 (right).

In order to validate the results, we profiled the number of events per month, shown

in Fig. 3.8 (left). The plot exhibits a sharp and consistent increase in the number of

events between July and Sept. 2007 followed by a sharp and consistent decrease be-

tween Sept. and Dec. 2007. We investigated the log and found that the frequencies of

7http://dx.doi.org/10.4121/uuid:d9769f3d-0ab0-4fb8-803b-0d1120ffcf54


3.6. SUMMARY

five activities do increase and then decrease notably over the period in question. More-

over, the number of active cases per month (cf. Fig. 3.8 (right)) decreases gradually

after August 2006. Thus, this variation in the number of events cannot be explained

because of new cases. Rather, this phenomenon could be the result of some rework

in the business process. A rework may manifest itself with looping behavior and/or

duplicate activities, which are change patterns our method is able to detect.

In conclusion, while these observations support the hypothesis of the presence of

two drifts in the log, the results should be validated with domain experts.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1

47

36

94

71

14

20

6

18

94

1

23

67

6

28

41

1

33

14

6

37

88

1

42

61

6

47

35

1

52

08

6

56

82

1

61

55

6

66

29

1

71

02

6

75

76

1

80

49

6

P-v

alu

e

Event index

Drift 1

Drift 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1

64

127

190

253

316

379

442

505

568

631

694

757

820

883

946

1009

1072

P-v

alu

e

Completed trace index

Figure 3.7: P-value in our method (left) and in the baseline (right) for the BPIC 2011log.

0

500

1000

1500

2000

2500

3000

3500

4000

Feb-05

Mar-05

Apr-05

May-05

Jun-05

Jul-05

Aug-05

Sep-05

Oct-05

Nov-05

Dec-05

Jan-06

Feb-06

Mar-06

Apr-06

May-06

Jun-06

Jul-06

Aug-06

Sep-06

Oct-06

Nov-06

Dec-06

Jan-07

Feb-07

Mar-07

Apr-07

May-07

Jun-07

Jul-07

Aug-07

Sep-07

Oct-07

Nov-07

Dec-07

Jan-08

Feb-08

Mar-08

Nu

mb

er

of

eve

nts

Time

Drift 2

Drift 1

0

100

200

300

400

500

600

Feb-05

Mar-05

Apr-05

May-05

Jun-05

Jul-05

Aug-05

Sep-05

Oct-05

Nov-05

Dec-05

Jan-06

Feb-06

Mar-06

Apr-06

May-06

Jun-06

Jul-06

Aug-06

Sep-06

Oct-06

Nov-06

Dec-06

Jan-07

Feb-07

Mar-07

Apr-07

May-07

Jun-07

Jul-07

Aug-07

Sep-07

Oct-07

Nov-07

Dec-07

Jan-08

Feb-08

Mar-08

Nu

mb

er

of

acti

ve c

ase

s

Time

Drift 2

Drift 1

Figure 3.8: Number of events (left) and active cases per month (right) in the BPIC2011 log.

3.6 Summary

In this chapter, we presented a fully automated method for online detection of business

process drifts from event streams. The method relies on a statistical test over distribu-

tions of behavioral relations observed in two adjacent windows sliding along the event



stream. We proposed an adaptive window technique in order to automatically adjust

the sliding windows size, striking a good tradeoff between accuracy and detection de-

lay. By replaying an event log as an event stream the proposed method can also be

deployed for drift detection in event logs.

We evaluated our method against different degrees of log variability and varying

inter-drift distance, by injecting various change patterns into artificial logs. The results

showed that the method is able to scale up to online settings and detect drifts very

accurately, while outperforming a state-of-the-art baseline for all the change patterns.

A second evaluation on a healthcare log with very high variability showed that our

method could detect two drifts that were supported by observations from the log.


Chapter 4

Process Drift Characterization atActivity Level

In the previous chapter, we presented an online automated method for detecting drift

from event streams of business processes. However, as highlighted in the introduction,

detecting a drift without explaining its characteristics does not provide analysts with

a full picture of the changes occurred in a process. The latter is known as drift char-

acterization and aims to shed light on what has changed in the behavior of a process.

While early detection of drifts alerts organizations of process changes as they occur,

drift characterization enables to identify how the process has changed.

To the best of our knowledge, there has not been any attempt to provide a sys-

tematic solution for characterizing process drifts. To fill this gap, in this chapter we

propose a fully automated online method for characterizing process drifts at the level

of individual activities from event streams. For each detected drift, we perform a statis-

tical test to measure the statistical association between the drift and the distributions of

the behavioral relations between activities such as causality, conflict and concurrency,

extracted from the portions of an event stream before and after the drift. We then rank

the relations based on their relative frequency change, and try to match them with a set

of predefined change templates. The best-matching templates are then reported to the

user as the changes underpinning the drift. We extensively evaluated the accuracy of

our method by simulating event streams from artificial and real-life logs. The results

show that the approach is fast and highly accurate in characterizing common change

patterns, and performs significantly better than a state-of-the-art technique for log delta

analysis.

This chapter is structured as follows. Section 4.1 discusses related work on drift

49

CHAPTER 4. PROCESS DRIFT CHARACTERIZATION AT ACTIVITY LEVEL

characterization at activity level. Section 4.2 introduces the proposed method while

Sections 4.4 and 4.5 present its evaluation on artificial and real-life logs, respectively.

Section 4.6 concludes the chapter.

4.1 Related Work

As already remarked, existing process drift detection methods only report the existence

of a drift, and while some can also localize it with high accuracy in the log, none

can actually characterize the detected drift. As described in Section 3.1, the authors

in [18] propose to detect a drift by evaluating the fitness of causal constraints extracted

from the post-drift process behavior in a polyhedron built from the pre-drift event

log. To characterize a drift, they then suggest to use the same causal constraints to

discover a process model from the post-drift process behavior. However, they do not

provide an actual method, nor do they evaluate the practicality of such an approach.

Furthermore, discovering a process model from the post-drift process behavior without

pinpointing its changes over the drift may not be enough to facilitate the understanding

of changes underlying the drift. Nonetheless, it can be used as the basis for developing

drift characterization solutions.

A possible approach to characterize process drifts is to compare two sub-logs ex-

tracted from event streams before and after a drift and identify their differences. In

this context, Bolt et al. [10] propose a technique for comparing the behavior of dif-

ferent variants of the same process based on observed executions of such variants in

event logs. Given two event logs each corresponding to a process variant, they fol-

low a three-step approach. In the first step, they build a transition system from the

event logs and annotate each of its states and transitions with the measurements of the

variants with respect to a certain process metric. In the second step, they perform a

statistical test between every two sets of measurements of each metric on each state or

transition to identify the differences that are statistically significant. Finally, the identi-

fied differences are highlighted by changing the appearance of the states or transitions.

For example, if the difference between the frequency of executing an activity in two

compared variants is statistically significant, the arc corresponding to that activity in

the transition system is thickened. By using the sub-logs extracted from before and

after a drift as input to this technique, we can identify some of the significant differ-

ences between the pre-drift and post-drift process variants. However, this technique

has several limitations. With respect to the control-flow differences, it is only able to


4.2. DRIFT CHARACTERIZATION METHOD

identify that a certain activity (transition) occurs after a sequence of activities (state)

in one process but not in the other, while missing the structural differences of the pro-

cesses, e.g. the occurrence of two activities in an XOR construct in one process but

not in the other. Furthermore, this technique is not meant to work with event streams

where each sub-log before or after a drift contains partial traces, i.e. traces whose start

events are removed from the stream and/or whose end events are yet to arrive on the

stream. Assuming that we know the start and end activities of the process, one pos-

sible workaround is to build a transition system by only using complete traces within

the pre-drift and post-drift sub-logs. However, this may lead to an incomplete or even

inaccurate transition system as fractions of process behavior that are only captured by

the discarded partial traces are missed by the transition system. This problem is wors-

ened in the event streams of highly variable processes, as almost every trace of such

processes exhibits a unique execution of the process. A sub-log extracted from such

an event stream is likely to only contain partial traces. Van Beest et al. [115] propose

a technique for diagnosing behavioral differences between two event logs. The idea is

to use two prime event structures, i.e. a formalism composed of events and behavioral

relations, such as causality and conflict, for modeling concurrent processes, to loss-

lessly encode the event logs, and by comparing them report their differences as natural

language statements. A problem of this technique when used for drift characterization

is that it reports all differences between the pre-drift and post-drift sub-logs regardless

of the significance of their association with the occurrence of the drift. Furthermore,

similar to the technique proposed by Bolt et al. [10], this technique also does not work

with partial traces. Consequently, it may miss the fractions of process behavior that

are only captured by those traces.

As the technique proposed by Van Beest et al. [115] is able to report a complete

set of control-flow differences between two event logs we use it as a baseline for the

experiments in Sections 4.4 and 4.5.

4.2 Drift Characterization Method

The purpose of process drift characterization is to identify the differences in the pro-

cess behavior before and after the drift point that best explain the drift. In the previous

chapter, the α+ binary relations (cf. Definition 5) were shown to be suitable for captur-

ing process behavior, in particular in the context of highly variable business processes.

These behavioral relations and their frequencies are extracted from the time window



containing the most recent events of the stream. As a preprocessing operation, each

time this window slides, a snapshot of the process behavior is captured and stored as

a data point. Each binary relation actually represents a dimension of the stored data

point, while the frequency of this relation is the scalar in this dimension. Sliding the

window along the event stream provides us with a set of data points representing snap-

shots of the pre-drift and post-drift process behaviors. These data points are used as

input to our two-stage characterization method.

In Stage 1 we measure the statistical association of each of the α+ relations with the

drift using an information gain metric. Those relations that are significantly associated

with the drift are then ordered based on their explanatory power with respect to the

drift. In Stage 2, the resulting ordered list of relations is fed to a template matching

algorithm, where we find the best-matching templates that characterize the drift. The

identified templates are then reported to the user in natural language. An overview of

our method is shown in Fig. 4.1. The rest of this section describes the method in detail.

Driftdetection

Datapointsextraction

Relationsretrievalandordering

Changetemplates

identification

Preprocessing Stage1 Stage2

Figure 4.1: Overview of our method for process drift characterization.

4.2.1 Preprocessing: Data Points Extraction

For drift detection, we use our drift detection method (cf. Chapter 3), which works

in online settings with event streams of highly-variable business processes. However,

the drift characterization method proposed here can in principle be used on top of any

process drift detection method.

Our detection technique captures process behavior by extracting α+ binary rela-

tions in two juxtaposed windows of the same size, namely reference and detection

windows, sliding along the event stream. The most recent events are equally divided

into these two windows, where the reference window contains the less recent events,

and the detection window contains the more recent ones. The size of these windows

is adjusted using a formula based on the maximum number of distinct activity labels

within the two windows. This adaptive window sizing ensures that there are enough

events in each window for accurately capturing the process behavior.

We use the detection window as a snapshot of the most recent process behavior.

Each time this window slides with the stream on arrival of a new event, we extract



α+ relations and their frequencies and store them as a multidimensional data point in

a buffer, namely characterization buffer. Each α+ relation represents a dimension of

this data point. By sliding the detection window the new data points are added to the

head of the buffer. As a drift is detected, the P–value of the statistical test, i.e. G–test

(cf. Section 3.2), drops below the detection threshold (drift point). At this point we

stop inserting any new data point into the characterization buffer. We then remove the

last w (window size at drift point) data points from the head of the characterization

buffer, as these data points may include the post-drift process behavior. This results

in a set of recent data points that only encode the process behavior from the pre-drift

area. We retain these data points for characterizing the detected drift.

The P–value remains below threshold until the process behaviors within the two

reference and detection windows become statistically similar. In other words until the

process behavior, reflected in the event stream, starts to stabilize. Therefore, we call

the point where P–value returns to above the detection threshold a stabilization point.

This is where we start inserting new data points into the characterization buffer, as the

detection window only includes the behavior from the post-drift process. We continue

extracting data points from the event stream with the next n incoming events. We de-

fine n as the characterization delay, as it indicates the delay that is needed after the

stabilization point to characterize the drift. Similarly, we consider only the n most

recent pre-drift data points for drift characterization. In Section 4.4.2, we perform an

experiment to determine the suitable characterization delay that leads to a hight accu-

racy of retrieving and ordering the relevant binary relations. The behavioral relations

extraction, explained above, is illustrated in Fig. 4.2.

Stabilization point

P-v

alu

e

Detection threshold

Event stream Characterization delay (n)

Characterization delay (n)

Drift point

Pre-drift area Post-drift area

Characterization point

w (removed data points)

Figure 4.2: From drift detection to drift characterization.



4.2.2 Stage 1: Relevant Binary Relations Retrieval and Ordering

The purpose of the first stage of our approach is to identify and order the α+ binary

relations that are statistically associated with the detected drift. In other words, we

would like to measure the explanatory power of each relation with respect to the de-

tected drift. We approach this issue as a classification problem with the α+ binary

relations, extracted from the event stream, as the explanatory variables, and the bi-

nary target variable defined with the labels pre-drift and post-drift. One might first opt

for a logistic regression model because of its additive and interpretability properties.

However, the logistic regression requires the least correlation between the indepen-

dent variables (multicollinearity problem [84]). Such a requirement cannot be guaran-

teed, particularity in our case where the binary relations come from the same process

(model). We opted for a less restrictive classification approach, namely decision tree,

where we use K-sample permutation test (KSPT) in order to measure the statistical as-

sociation between each individual explanatory variable (here a binary relation) and the

target variable (the drift classification variable). Similarly to the information gain, the

permutation test allows us to measure the mutual information between two variables.

We opted for the permutation test since it is more suitable for small sample sizes [36].

We perform a pairwise permutation test to measure the significance of the statistical

association of each binary relation with the target variable (drift). This latter is en-

coded with the value 0 (resp. 1) for the pre-drift (resp. post-drift) behavior. If the null

hypothesis is rejected, we discard the relation as it is not significantly associated with

the drift.

As suggested in [36], the KSPT can be applied to identify the relevant features, then

an appropriate distance measure is used to order the selected features. Indeed, despite

identifying the relations that are found to be statistically associated with our binary

drift target variable, some relations may contribute more than others to the change that

occurred. We use a measure that is similar to the chi-squared statistic to measure the

contribution of each relation to the overall change. This metric measures the relative

frequency change (RFC) of each relation, and is defined as RFC = (O−E)2/max(O,E),

where O and E are the average frequencies of a relation before and after the drift point,

respectively. In addition, total relative frequency change (TRFC) is defined as the sum

of the RFCs of all relations. With relations ordered based on their RFCs in descending

order, we can filter out the relations with insignificant RFCs by retaining only the top

relations, summing up to x% of the TRFC, where x% · TRFC is defined as cumula-

tive relative frequency change (CRFC). In section 4.4.3, we perform an experiment to



investigate the impact of varying CRFC on the characterization accuracy.

4.2.3 Stage 2: Change Templates Identification

The output of the Stage 1 is a list of relations ordered based on their explanatory power

(RFC) with respect to the drift, where the first ordered relation and the last ordered

relation have the highest and the lowest explanatory power, respectively. In the stage

2, we aim to match the relations with the typical change patterns that may characterize

the drift the best. For that we define a set of templates based on the change patterns

defined in [129] (listed in Table 2.1) at the level of individual activities. Table 4.1 shows

the defined templates. Each template is represented based on α+ binary relations. We

try to match the process relations, obtained from Stage 1, with the binary relations

of the predefined templates. Using a matching confidence metric we find the best

matching between templates and the process relations. In the rest, we explain our

template matching algorithm in detail.

Code Simple change template Cat.sre Insert/delete an activity between two activities Ipre Insert/delete an activity in/from parallel branch Icre Insert/delete an activity in/from conditional branch Icp Duplicate an activity Irp Substitute an activity Isw Swap two activities Ism Move an activity to between two activities Icm Move an activity into/out of conditional branch Ipm Move an activity into/out of parallel branch Icf Make activity mutually exclusive/sequential Rpl Make activities parallel/sequential Rcd Synchronize two activities Rlp Make activities loopable/non-loopable Ocb Make an activity skippable/non-skippable Ofr Change branching frequency O

Table 4.1: Change templates defined based on change patterns in [129].

Example 1. As a running example, let us assume the output of the stage 1 of our

method is the ordered relation list of 〈 e→ f : −, e ‖ f : +, e→ g: +, d→ f : +, a→b: −, f → g:↘, d → e : ↘, b→ c: −, a→ c: +〉, where + (resp. −) indicates that

the relation appeared (resp. disappeared) after the drift, and ↗ (resp. ↘) indicates

that the frequency of the relation increased (resp. decreased) after the drift.

In the remainder of this chapter, unless otherwise indicated, we use both “feature”

and “relation” to refer to an α+ binary relation between two activity labels. A feature



set is used to represent the α+ relations before or after the drift, and is defined as

follows.

Definition 6 (Feature Set). Let L be a set of activity labels, and T := {→,‖,#, �,4}a set of binary α+ relations symbols, denoting causality, concurrency, conflict, length

one and two loops, respectively. A feature set F :L ×L � T is a partial function

that yields the type of α+ relation between two labels.

Two feature sets, will be used to represent the sets of the discovered features before

and after a given drift point, along with a classification of a feature frequency change

before and after the drift point. The classification only considers the relations that ex-

isted both before and after the occurrence of the drift, in our example { f → g, d → e}.A relation is classified as increasing (↗), decreasing (↘) or not applicable (⊥), de-

pending on whether its frequency increased, decreased, or remained unchanged. A

relation that disappeared (resp. appeared) after the drift does not need to be classified

as it only belongs to the pre-drift (resp. post-drift) feature set. All the features existing

before and after the drift are ordered in terms of their explanatory power. The two

feature sets from before and after the drift, the classification and the ordering functions

form a drift feature set which constitutes the output of the first stage of our method.

Formally, a drift feature set is defined as follows:

Definition 7 (Drift Feature Set). Let O := {↗,↘,⊥} be a set of feature frequency

change types. A drift feature set is a tuple D := 〈Fpre,Fpost ,DiffD,v,L 〉, where Fpre

(resp. Fpost) is the feature set before (resp. after) a drift, DiffD is a classification

function defined as DiffD:Fpre∩Fpost →O , and v is a total order on Fpre∪Fpost .

The following function returns the index of a feature in a given drift feature set.

Definition 8 (Rank). Let � be a total order on a finite set B. For all b ∈B,

Rank(b,�,B) = |{b′ ∈ B | b′ � b}|.

Example 2. With Definition 7, Example 1 is represented as a drift feature set D1 =

〈FD1

pre,FD1

post ,DiffD,↘,L 〉, where L = {a,b,c,d,e, f ,g}, FD1pre = {e → f , a→ b, f → g, d→ e,

b→ c}, FD1post = {e ‖ f , e→ g, d→ f , f → g, d→ e, a→ c},v = 〈 e→ f , e ‖ f , e→ g, d→ f , a→ b,

f → g, d→ e, b→ c, a→ c〉, and DiffD = { ( f → g,↘), (d→ e,↘)}.

Our drift characterization method aims at explaining a detected drift using prede-

fined change templates. In this regard, we define a set of change templates representing



the typical change patterns [129]. These templates are presented in Table 4.1. A change

template is represented by a process model fragment before the change compared to

another process model fragment after the change.

Consequently, a template is a generic way to describe a typical change pattern.

It enumerates the expected sets of relations before and after the change based on a

change pattern representation. The relations that are present in both process model

fragments, before and after the change, need to be classified based on their expected

frequency evolution in the change pattern. Besides, the importance of every relation in

the change pattern is appended to the template. A template handles variables that can

be instantiated with actual activity labels in a matching operation.

Definition 9 (Template). Let V be a set of variables, T a set of α+ binary rela-

tions symbols, and O a set of relation frequency change types. A template is a tuple

T := 〈 Tpre, Tpost , DiffT , S , V 〉 where Tpre : V ×V � T represents the relations

before the change, Tpost : V × V � T represents the relations after the change,

DiffT is a classification function defined as DiffT : Tpre ∩ Tpost → O , and S is

a function specifying the importance of each relation to the template T defined as

S : Tpre∪Tpost → (0,1].

Example 3. Let us assume the two change templates, parallelize activities (T pl) and

remove activity (T sre), for our example, illustrated in the Fig. 4.3 and Fig. 4.4, re-

spectively. With the Definition 9 T pl = 〈{X → Y , W → X, Y → Z}, {X ‖ Y , W → Y , X → Z,

W → X, Y → Z}, {(W → X ,↘), (Y → Z,↘)}, {(X → Y,1), (W → X , 1), (Y → Z,1), (X ‖ Y,1),

(W → Y,1), (X → Z,1)}, {W,X ,Y,Z}〉, and T sre = 〈{X → Y , Y → Z}, {X → Z}, ∅, {(X → Y,1),

(Y → Z,1), (X → Z,1)}, {X ,Y,Z}〉.

X

X YYW Z W Z

X Y Z X Z

Figure 4.3: Parallelize activities template(T pl)

X

X YYW Z W Z

X Y Z X Z

Figure 4.4: Remove activity template(T sre)

In order to explain a drift, the discovered features represented with a drift feature

set are matched to a predefined template. Every variable in the template needs to be

mapped to a label from the drift feature set. This operation is called a valid instantia-

tion, and is defined as follows:

Definition 10 (Valid Instantiation). Given a drift feature set D := 〈Fpre,Fpost ,DiffD,v,L 〉, and a template T := 〈Tpre,Tpost ,DiffT ,S ,V 〉, a valid instantiation of T through

D is a function ID,T : V →L such that



• Tpre(v1,v2) = t1 iff Fpre(ID,T (v1),ID,T (v2)) = t1,

• Tpost(v3,v4) = t2 iff Fpost(ID,T (v3),ID,T (v4)) = t2, and

• Diff T (v5,v6) = ϑ iff Diff D(ID,T (v5),ID,T (v6)) = ϑ

Example 4. In our example, we can have two valid instantiations, one per template.

The first instantiation ID1,T pl = { W : d, X : e, Y : f , Z : g} , whereas the second

instantiation ID1,T sre = { X : a, Y : b, Z : c}.

A confidence is calculated for each matching (valid instantiation) in order to assess

the likelihood of such a matching. The confidence of an instantiation is based on

the Discounted Cumulative Gain (DCG) measure [55], which indicates the quality of

ranking relations in a drift feature set with regards to their predefined importance in a

template. In our method, we consider the same importance of 1 for all the relations of

a template. The confidence of an instantiation is defined as follows.

Definition 11 (Confidence in an Instantiation). Given a drift feature set D := 〈Fpre,Fpost

,DiffD,v,L 〉, a template T := 〈Tpre,Tpost ,DiffT ,S ,V 〉, and a valid instantiation ID,T :

V →L , the confidence C (ID,T ) of D matching T through ID,T is:

C (ID,T ) = ∑(x,y,t)∈Tpre∪Tpost

S (x,y, t)log2(Rank((ID,T (x),ID,T (y), t), v, Fpre∪Fpost)+1)

Example 5. In our example, the confidence of ID1,T pl is calculated as follows:

C(ID1,T pl ) = 1log2(1+1) +

1log2(2+1) +

1log2(3+1) +

1log2(4+1) +

1log2(6+1) +

1log2(7+1) ≈ 2.25.

The confidence of ID1,T sre is calculated in the same way and approximates to 0.62.

As we want to find the best-matching template among all matching templates we

need to rank them based on their confidences. However, as the number of relations

in different templates may not be the same, we need to normalize the confidence of

an instantiation with respect to the maximal confidence of its template. Similarly to

the normalized DCG (nDCG) [55], we first define the notion of ideal confidence of a

template T as the DCG obtained after ordering relations of T based on their importance

defined by S . The normalized confidence (nC) of an instantiation is calculated by

dividing the confidence of the instantiation by the ideal confidence of its template.

Definition 11 (continued). The Ideal confidence iC (T ) of T is computed as

iC (T ) = ∑(x,y,t)∈Tpre∪Tpost

S (x,y, t)log2(Rank((x,y, t), ≥, range(S ))+1)

, and the normalized

confidence nC (ID,T ) of D matching T through ID,T is computed as nC (ID,T ) =C (ID,T )iC (ID,T )



Example 6. In our example, iC (T pl)≈ 2.30 and nC (ID1,T pl)≈ 0.98, whereas iC (T sre)≈1.13 and nC (ID1,T sre) ≈ 0.54. As nC (ID1,T pl) ≥ nC (ID1,T sre), T pl is identified as the

best-matching template with the drift feature set.

Simultaneous changes. Identifying one template is not enough as a process drift may

involve more than one change. In order to characterize all the simultaneous changes,

each time that a best-matching template with the drift feature set is identified, we re-

move the features that were used for this template instantiation from the drift feature

set. The new resulting drift feature set is then reused for the identification of a new

best-matching template. We repeat this cycle until we cannot find any more templates

that match the remaining features within the drift feature set. It is worth mentioning

that if there are two overlapping changes in the process, i.e. changes that share a non-

empty set of features, only the one with higher nC can be matched with a template.

This is because each time we find a best-matching template we remove the matched

features from the drift feature set. This limits the ability of the proposed method to the

identification of non-overlapping simultaneous changes.

Example 7. In our example, as there is no feature shared between ID1,T pl and ID1,T sre ,

both change templates can be identified. The identified templates, T pl and T sre, are

then reported to the user using the two following statements, respectively:

• Before the drift, activities “e” and “f” were sequential, while after the drift, they

are parallel.

• After the drift, activity “b” is deleted from between activities “a” and “c”.

Finally, we also report the remaining features that are not used in any template

instantiation to the user via statements such as “Before the drift, activity X was fol-

lowed by activity Y, while after the drift it is not” or “Before the drift, activity X was

more frequently followed by activity Y”. This provides the user with useful insight for

further investigation of process changes.

Table 4.2 shows the format of drift characterization statements produced by our

method for each change template.

Time complexity. Given the number of data points 2n, where n is the characteriza-

tion delay, and the maximum possible number of α+ relations |L |2, where L is the

label set, the complexity of our drift characterization method is the maximum of the

worst-case complexities of the following sequential operations: (i) performing KSPT

between the α+ relations and a binary target variable (O(2n · |L |2)), (ii) computing



sre Insert/delete an activity between two activitiesTemplate

X

X

X

X YY

X X

X X

X

Y

X

Y

X

X

X X

X

X YY

X70%

Y30%

X40%

Y60%

X Z X Y Z

W X Z W X

Y

Z

W X

Y

ZW X Z

W X Z W Y Z

U X V W Y Z U Y V W X Z

W Y Z U V W Z U Y V

W Y Z W ZU X V U X

Y

V

W Y Z W ZU X V U X

Y

V

W

Y

X

Z W X ZY

W

Y

X

Z W X ZY

YY

YYX X

Y1 Yn Y1 Yn

W Y ZW Y Z

40%

60%

W

X

Y

70%

30%

W

X

Y

Characterizationstatement

After the drift, activity Y is inserted (resp., deleted from) between activities Xand Z.

pre Insert/delete an activity in/from parallel branchTemplate

X

X

X

X YY

X X

X X

X

Y

X

Y

X

X

X X

X

X YY

X70%

Y30%

X40%

Y60%

X Z X Y Z

W X Z W X

Y

Z

W X

Y

ZW X Z

W X Z W Y Z


W Y Z U V W Z U Y V

W Y Z W ZU X V U X

Y

V

W Y Z W ZU X V U X

Y

V

W

Y

X

Z W X ZY

W

Y

X

Z W X ZY

YY

YYX X

Y1 Yn Y1 Yn

W Y ZW Y Z

40%

60%

W

X

Y

70%

30%

W

X

Y


Insert: After the drift, activity Y is inserted between activities W and Z in aparallel branch (with activity X). Delete: After the drift, activity Y which wasin a parallel branch (with activity X) between activities W and Z is deleted.

cre Insert/delete an activity in/from conditional branchTemplate

X

X

X

X YY

X X

X X

X

Y

X

Y

X

X

X X

X

X YY

X70%

Y30%

X40%

Y60%

X Z X Y Z

W X Z W X

Y

Z

W X

Y

ZW X Z

W X Z W Y Z


W Y Z U V W Z U Y V

W Y Z W ZU X V U X

Y

V

W Y Z W ZU X V U X

Y

V

W

Y

X

Z W X ZY

W

Y

X

Z W X ZY

YY

YYX X

Y1 Yn Y1 Yn

W Y ZW Y Z

40%

60%

W

X

Y

70%

30%

W

X

Y


Insert: After the drift, activity Y is inserted between activities W and Z in aconditional branch (with activity X). Delete: After the drift, activity Y whichwas in a conditional branch (with activity X) between activities W and Z isdeleted.

cp Duplicate an activityTemplate Duplication is the insertion of an existing activity and is discovered in a post-

processing step. As such, it has a similar template as sre/pre/cre.Characterizationstatement

After the drift, activity Y , i.e. a duplicate of activity X , is inserted ... (continueswith sre, pre, or cre).

rp Substitute an activityTemplate

X

X

X

X YY

X X

X X

X

Y

X

Y

X

X

X X

X

X YY

X70%

Y30%

X40%

Y60%

X Z X Y Z

W X Z W X

Y

Z

W X

Y

ZW X Z

W X Z W Y Z


W Y Z U V W Z U Y V

W Y Z W ZU X V U X

Y

V

W Y Z W ZU X V U X

Y

V

W

Y

X

Z W X ZY

W

Y

X

Z W X ZY

YY

YYX X

Y1 Yn Y1 Yn

W Y ZW Y Z

40%

60%

W

X

Y

70%

30%

W

X

Y


After the drift, activity X , which was between activities W and Z, is substitutedby activity Y .

sw Swap two activitiesTemplate

X

X

X

X YY

X X

X X

X

Y

X

Y

X

X

X X

X

X YY

X70%

Y30%

X40%

Y60%

X Z X Y Z

W X Z W X

Y

Z

W X

Y

ZW X Z

W X Z W Y Z


W Y Z U V W Z U Y V

W Y Z W ZU X V U X

Y

V

W Y Z W ZU X V U X

Y

V

W

Y

X

Z W X ZY

W

Y

X

Z W X ZY

YY

YYX X

Y1 Yn Y1 Yn

W Y ZW Y Z

40%

60%

W

X

Y

70%

30%

W

X

Y


After the drift, activity X , which was between activities U and V , is swappedwith activity Y , which was between activities W and Z.

sm Move an activity to between two activitiesTemplate

X

X

X

X YY

X X

X X

X

Y

X

Y

X

X

X X

X

X YY

X70%

Y30%

X40%

Y60%

X Z X Y Z

W X Z W X

Y

Z

W X

Y

ZW X Z

W X Z W Y Z


W Y Z U V W Z U Y V

W Y Z W ZU X V U X

Y

V

W Y Z W ZU X V U X

Y

V

W

Y

X

Z W X ZY

W

Y

X

Z W X ZY

YY

YYX X

Y1 Yn Y1 Yn

W Y ZW Y Z

40%

60%

W

X

Y

70%

30%

W

X

Y


After the drift, activity Y , which was between activities W and Z, has movedto between activities U and V .

cm Move an activity into/out of conditional branchTemplate

X

X

X

X YY

X X

X X

X

Y

X

Y

X

X

X X

X

X YY

X70%

Y30%

X40%

Y60%

X Z X Y Z

W X Z W X

Y

Z

W X

Y

ZW X Z

W X Z W Y Z


W Y Z U V W Z U Y V

W Y Z W ZU X V U X

Y

V

W Y Z W ZU X V U X

Y

V

W

Y

X

Z W X ZY

W

Y

X

Z W X ZY

YY

YYX X

Y1 Yn Y1 Yn

W Y ZW Y Z

40%

60%

W

X

Y

70%

30%

W

X

Y


After the drift, activity Y , which was between activities W and Z, has movedto between activities U and V and in a conditional branch (with activity X).

pm Move an activity into/out of parallel branchTemplate

X

X

X

X YY

X X

X X

X

Y

X

Y

X

X

X X

X

X YY

X70%

Y30%

X40%

Y60%

X Z X Y Z

W X Z W X

Y

Z

W X

Y

ZW X Z

W X Z W Y Z


W Y Z U V W Z U Y V

W Y Z W ZU X V U X

Y

V

W Y Z W ZU X V U X

Y

V

W

Y

X

Z W X ZY

W

Y

X

Z W X ZY

YY

YYX X

Y1 Yn Y1 Yn

W Y ZW Y Z

40%

60%

W

X

Y

70%

30%

W

X

Y


After the drift, activity Y , which was between activities W and Z, has movedto between activities U and V and in a parallel branch with activity X .



cf Make activity mutually exclusive/sequentialTemplate

X

X

X

X YY

X X

X X

X

Y

X

Y

X

X

X X

X

X YY

X70%

Y30%

X40%

Y60%

X Z X Y Z

W X Z W X

Y

Z

W X

Y

ZW X Z

W X Z W Y Z


W Y Z U V W Z U Y V

W Y Z W ZU X V U X

Y

V

W Y Z W ZU X V U X

Y

V

W

Yn

Y1

Z W Y1 ZYn

YY

YYX X

Y1 Yn Y1 Yn

W Y ZW Y Z

40%

60%

W

X

Y

70%

30%

W

X

Y

W

Yn

Y1

Z W Y1 ZYn


Before the drift, activities Y1, . . .Yn were mutually exclusive (resp., sequential),while after the drift, they are sequential (resp., mutually exclusive).

pl Make activities parallel/sequentialTemplate

X

X

X

X YY

X X

X X

X

Y

X

Y

X

X

X X

X

X YY

X70%

Y30%

X40%

Y60%

X Z X Y Z

W X Z W X

Y

Z

W X

Y

ZW X Z

W X Z W Y Z


W Y Z U V W Z U Y V

W Y Z W ZU X V U X

Y

V

W Y Z W ZU X V U X

Y

V

W

Yn

Y1

Z W Y1 ZYn

YY

YYX X

Y1 Yn Y1 Yn

W Y ZW Y Z

40%

60%

W

X

Y

70%

30%

W

X

Y

W

Yn

Y1

Z W Y1 ZYn


Before the drift, activities Y1, . . .Yn were parallel (resp., sequential), while afterthe drift, they are sequential (resp., parallel).

cd Synchronize two activitiesTemplate

X

X

X

X YY

X X

X X

X

Y

X

Y

X

X

X X

X

X YY

X70%

Y30%

X40%

Y60%

X Y Y X

X Z X Y Z

W X Z W X

Y

Z

W X

Y

ZW X Z

W X Z W Y Z


W Y Z U V W Z U Y V

W Y Z W ZU X V U X

Y

V

W Y Z W ZU X V U X

Y

V

W

Yn

Y1

Z W Y1 ZYn

YY

YYX X

Y1 Yn Y1 Yn

W Y ZW Y Z

40%

60%

W

X

Y

70%

30%

W

X

Y

W

Yn

Y1

Z W Y1 ZYn

X

Y

X

Y

ZY

W

X

ZY

W

X


Before the drift, activities x and y were parallel (resp., synchronized), whileafter the drift they are synchronized (resp., parallel).

lp Make activities loopable/non-loopableTemplate

X

X

X

X YY

X X

X X

X

Y

X

Y

X

X

X X

X

X YY

X70%

Y30%

X40%

Y60%

X Z X Y Z

W X Z W X

Y

Z

W X

Y

ZW X Z

W X Z W Y Z


W Y Z U V W Z U Y V

W Y Z W ZU X V U X

Y

V

W Y Z W ZU X V U X

Y

V

W

Y

X

Z W X ZY

W

Y

X

Z W X ZY

YY

YYX X

Y1 Yn Y1 Yn

W Y ZW Y Z

40%

60%

W

X

Y

70%

30%

W

X

Y


After the drift, activities Y1, . . .Yn have become loopable/non-loopable.

cb Make an activity skippable/non-skippableTemplate

X

X

X

X YY

X X

X X

X

Y

X

Y

X

X

X X

X

X YY

X70%

Y30%

X40%

Y60%

X Z X Y Z

W X Z W X

Y

Z

W X

Y

ZW X Z

W X Z W Y Z


W Y Z U V W Z U Y V

W Y Z W ZU X V U X

Y

V

W Y Z W ZU X V U X

Y

V

W

Y

X

Z W X ZY

W

Y

X

Z W X ZY

YY

YYX X

Y1 Yn Y1 Yn

W Y ZW Y Z

40%

60%

W

X

Y

70%

30%

W

X

Y


After the drift, activity Y has become skippable/non-skippable.

fr Change branching frequencyTemplate

X

X

X

X YY

X X

X X

X

Y

X

Y

X

X

X X

X

X YY

X70%

Y30%

X40%

Y60%

X Z X Y Z

W X Z W X

Y

Z

W X

Y

ZW X Z

W X Z W Y Z


W Y Z U V W Z U Y V

W Y Z W ZU X V U X

Y

V

W Y Z W ZU X V U X

Y

V

W

Y

X

Z W X ZY

W

Y

X

Z W X ZY

YY

YYX X

Y1 Yn Y1 Yn

W Y ZW Y Z

40%

60%

W

X

Y

70%

30%

W

X

Y


After the drift, following activity W , branch of activity X is more frequentlyexecuted, while branch of activity Y is less frequently executed.

Table 4.2: Change templates and their drift characterization statement formats.



the average frequencies and RFCs of the relations (O(2n . |L |2)), (iii) ordering the

relations (O(|L |2· log(|L |2))), and (iv) template identification O(|L |2· m · |L |2!)1.

Hence, the time complexity of our method is O(|L |2· m · |L |2!). This time complex-

ity is a theoretical upper-bound, however in practice the number of relations rarely

approaches |L |2, and not all permutations are verified for the template identification

operations (relations are first filtered based on their types, e.g. causality).

4.3 Tool Support

We implemented the proposed method as an extension of ProDrift, which is available

as a plug-in of Apromore2 as well as a standalone open-source tool3. To enable drift

characterization for a detected drift, the user needs to tick the “Drift characterization”

checkbox in the configuration panel of the plug-in, as shown in Figure 4.5a. It is

then required to choose between the two drift characterization configuration options:

“activity level” and “fragment level”. To characterize drifts using the method presented

in this chapter, the “activity level” option needs to be selected.

By default, the value of the CRFC threshold (cf. Section 4.2.2) is set to 95% as

in the experiments presented later in this chapter, this value empirically resulted in

the highest characterization accuracy. Alternatively, it is possible for the user to set a

different CRFC threshold in the “Cumulative change” field.

After a drift is detected, it is characterized by the drift characterization method.

Once the parsing of the log is complete, by clicking on each detected drift on the

list bellow the P–value plot, the user can inspect its natural language characterization

statements, as shown in Figure 4.5b.


We used ProDrift to evaluate the effectiveness of our method with different parame-

ters settings. The tool is fed with an event stream replayed from an event log, and

1Matching a template of k relations to a drift feature set of |L |2 relations requires iterating over allpossible permutations (nPk = |L |

2!/(|L |2−k)!). The upper-bound complexity of this operation is O(|L |2!).Next, to identify the best-matching template, we iterate over the number of predefined templates m. Fi-nally, we need to match simultaneous changes which in the worse case are |L |2 (where each templatehas only one relation). The upper-bound time complexity of identifying multiple non-overlapping tem-plates is O(|L |2· m · |L |2!).

2Available at http://apromore.org/3Available at http://apromore.org/platform/tools



(a) Enable drift characterization and choose the “activity level” configuration for using the driftcharacterization method presented in this chapter.

(b) Inspect natural language characterization statements for each detected drift.

Figure 4.5: Drift characterization at activity level using the ProDrift plug-in in Apro-more.



reports, for each detected drift, its characterization as a verbalization in natural lan-

guage, based on the applicable templates. In the rest of this section we discuss the

setup of the experiments and a two-pronged evaluation to assess the effectiveness of

the relevant relations retrieval and ranking with respect to each individual template,

and the accuracy of template identification. Finally, we compare our method with

log-to-log comparison.

4.4.1 Setup

We generated a artificial dataset using the same approach and CPN base model as in

the previous chapter (cf. Section 3.4) that represents a highly variable process. For

each simple change template in Tab. 4.1, we generated a log featuring 9 drifts, each in-

jected by alternatively activating and deactivating the template within the base model.

For instance, for the template “sre” we alternatively inserted or deleted an activity into

or from the process model. For the particular change template “lp”, three logs were

generated with length-one, length-two and length-three loops, and the reported results

for this template were averaged over these three logs. This resulted in 17 logs, each

containing 10,000 traces with nine equidistant drifts of the same change template. To

evaluate the characterization of drifts in the context of simultaneous changes, we or-

ganized our change templates in three categories: Insertion (“I”), Resequentialization

(“R”) and Optionalization (“O”) (cf. Table 4.1). Limited to two and three simultaneous

cross-category changes, these categories make four possible scenarios of simultaneous

changes (“IR”, “IO”, “RO”, “RIO”). For each such scenarios two logs were generated

by randomly selecting single templates from different categories. For instance, a drift

from the simultaneous changes scenario of “IR” could simultaneously add a new activ-

ity (“I”) and a loop back (“R”) in two different locations of the process. This resulted

in eight logs for the simultaneous changes setting. All in all, the dataset contained 25

logs for both single and simultaneous changes.4

In these experiments, we used our drift detection method (cf. Chapter 3, to de-

tect drifts, because this method works in online settings with event streams of highly-

variable business processes.

4All the CPN models used for this simulation, the resulting artificial logs, and the detailed evaluationresults are available with the software distribution.



4.4.2 Impact of Characterization Delay on Relations Ordering

In Stage 1 of our method, the KSPT is used to retrieve the relations that are significantly

associated with the drift, and discard the irrelevant ones. Then, the retrieved binary

relations are ordered based on their RFCs with respect to the TRFC that occurred in

the drift. For each detected drift, the ground truth (ideal case) is that the relations

related to the injected drift template are correctly identified and placed in the top of

the returned ordered list. However, some spurious relations may affect the relations

ordering. We use the normalized discounted cumulative gain (nDCG) to evaluate the

accuracy of the relations ordering. The nDCG is a relative measure where a value of

1.0 indicates that the ordered list corresponds to the ground truth, while 0.0 indicates

that none of the relations related to the injected drift template have been retrieved.

This measure is also used for computing the confidence of a template matching, as

explained in Section 4.2.2.

In the first experiment, we study how the accuracy of the ordered binary relations

list is impacted by changing the characterization delay. We vary the characterization

delay from 200 to 1,000 events, and report the mean and the standard deviation of

the nDCG over all the simple change templates, where each template was evaluated

separately over nine injected drifts (cf. Fig. 4.6). In this experiment, we do not apply

any filtering on the ordered binary relations list (CRFC = 100% · TRFC).

Not surprisingly, for a characterization delay of 200 events, the KSPT does not

have enough data to identify the relevant binary relations causing the drift, which leads

to a relatively low average nDCG of around 0.84 and a standard deviation of 0.19

over all templates. Consequently, spurious relations, most often resulting from a slight

change in a branching probability, appear in the ordered relations list. However, we

observe that the accuracy of the relations ordering increases when the characterization

delay grows and eventually plateaus at an average of 0.98 with a standard deviation

of 0.02. As expected, the more data points are fed to the KSPT, the more accurate is

the statistical association between the explanatory variable (here an individual binary

relation) and the target variable (the drift classification variable), and the better the

estimation of the RFC for ordering the relations is. However, the characterization

delay cannot grow indefinitely, hence, we select 500 events as a trade-off between

a short characterization delay and a high characterization accuracy (fewer spurious

relations). This value is used as the default delay in the remaining experiments.

We note that the characterization delay does not only indicate how many events our

method needs to fetch from the event stream to obtain an accurate characterization, but



it also allows us to infer the minimum inter-drift distance that our method can handle.

In other terms, the next potential drift must occur at least after a number of events equal

to this characterization delay (+ one detection window) after the stabilization point (cf.

Fig. 4.2) in order to be accurately characterized.

4.4.3 Impact of Relation Filtering on Characterization Accuracy

As introduced in Section 4.2.2, the ordered relations list resulting from Stage 1 can

be filtered based on the CRFC to discard the relations with insignificant RFCs. Thus,

only the top relations that sum up their CRFC to a certain proportion of the TRFC

are retained. The filtered list is then fed to the template identification stage to find

the best-matching templates with the relations. In this experiment, we study how the

filter affects the accuracy of template identification. We vary the CRFC threshold (x%)

from 70% to 100% (no filtering), and report the F-score of the template identification

averaged over the 25 artificial logs. The F-score is measured as the harmonic mean

of recall and precision, where recall measures the ratio of correctly identified change

templates of a specific type over the total number of injected templates of the same

type, and precision measures the ratio of correctly identified change templates of a

specific type over the total number of identified templates of that same type. Figure

4.7 shows the average accuracy over all templates and per single change, double and

triple simultaneous changes.

As expected, we observe that the F-score increases as the CRFC threshold in-

creases. When the threshold is low, many relations are filtered out, and if only one

relation corresponding to an injected template is discarded then its corresponding tem-

plate will not be matched. On the other hand, when the threshold increases, more

relations remain in the filtered list, thereby increasing the likelihood of matching the

relevant template, leading to a higher recall. However, when no relations are filtered

out (threshold = 100%), spurious relations will be matched with the frequency tem-

plate “fr”. This will impact the precision, explaining the drop in the average F-score

at the threshold value of 100%. As an example, for the change template parallel move

“pm” (with 8 relations), the output of the first stage of our method was an ordered list

of 50 relations. A filter threshold of 70% retains only the top five relations out of 50,

leading to a recall of 0 for this template. On the other hand, a threshold of 90% retains

the top nine relations, leading to a recall of 1. In the remaining experiments we use a

CRFC threshold of 95% that is suitable for both single and simultaneous changes.



0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

200 300 400 500 600 700 800 900 1000

Ord

eri

ng

accu

racy

(nC

)

Characterization delay (events)

Figure 4.6: Impact of characterization de-lay on relevant relations retrieval and or-dering

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

70 75 80 85 90 95 100

F-sc

ore

Cumulative change [x%]

Single template

Double templates

Triple templates

Average (all)

Figure 4.7: Impact of relation filtering oncharacterization accuracy

4.4.4 Comparison with Baseline

As discussed in Section 4.1, a possible approach to process drift characterization is

to apply the log-to-log comparison technique in [115] on two sub-logs extracted from

before and after a drift. This technique is designed to compare logs with complete

traces, while in our setting the pre-drift and post-drift sub-logs are extracted from an

event stream, and hence contain many incomplete traces. As a first attempt, we fed

the log comparison technique with the two sub-logs before and after the drift as is, but

as expected, the comparison led to a large number of misleading differences. We then

decided to only use complete traces within the two sub-logs. This was possible as we

knew the start and end activities of the process. For each change template, we evaluated

the accuracy of the differences returned by the technique manually. We calculated

recall by considering the missing differences for a given template as false negatives,

so that a recall of 1 is obtained if a template is fully described by the differences.

Similarly, precision was calculated by considering the statements that were not related

to the template as false positives.

Figure 4.8 reports the F-score obtained for each change template for our method

and for the baseline. Our method had almost a perfect F-score for every template as

it could retain the (great majority of the) relations that were involved in the injected

change template, without returning relations that did not fit the templates. On the other

hand, the baseline produced a low F-score for all the change templates. Admittedly,

this technique had a high average recall of around 0.85 over all logs. However, its

precision was very low due to a high number of false positives (wrong differences

returned). Indeed, the two sub-logs capture partial process behavior, which, even if

similar at the event level, is quite variable at the trace level. This was exacerbated by

the high variability of the process. These results are in line with the findings in the



previous chapter on drift detection (cf. Section 3.4.5). That is, the techniques based

on (abstraction of) complete traces such as [77] do not perform well when detecting

drifts in highly variable logs and that finer-grained features such as the α+ relations

are more suitable to capture process behavior in high variability settings.

0

0.2

0.4

0.6

0.8

1

sre


m cf pl

cd lp cb fr RI

RO IO

RIO

F-sc

ore

Change templates

Our method

Log delta

Figure 4.8: F-score per change template, obtained with our method vs. [115].

We conducted all the experiments on an Intel i7 2.20GHz with 16GB RAM (64 bit),

running Windows 7 and JVM 7 with standard heap space of 4GB. The time required

to extract, order, and then match the α+ relations to the predefined templates for each

drift ranged from a minimum of 410ms to a maximum of 660ms with an average of

530ms. The baseline method took on average 15 seconds to report the differences

between the pre-drift and post-drift sub-logs.

4.5 Evaluation on Real-life Log

We further evaluated our method on the BPI Challenge (BPIC) 2011.5 We chose this

log, which records patient treatments in a Dutch hospital, because of its high trace

variability (∼ 70%). We prepared the log by filtering out infrequent behavior using

the noise filter in [22] with its default settings. This operation resulted in a log with

1,121 traces, of which 798 are distinct, and 42 activity labels. In the previous chapter

(cf. Section 3.5), we detected two drifts from this filtered log, using our drift detection

method. The two drifts were supported by the observation of a sudden increase, and

a subsequent decrease in the number of events while the number of active cases was

decreasing.

We applied our method for drift characterization in order to identify the change

templates that explain these two drifts. Two frequency change templates were identi-

fied to characterize the first drift, while the second drift was explained by one frequency

change template. This template was symmetric to the first frequency change template,

5http://dx.doi.org/10.4121/uuid:d9769f3d-0ab0-4fb8-803b-0d1120ffcf54


4.6. SUMMARY

identified for the first drift. After investigation, we found that the probability of the

branch which was identified by the change template as increasing (resp. decreasing) af-

ter the first (resp. second) drift point included five activities in a loopback. The increase

from 34% to 46% (resp. decrease from 46% to 34%) in the upper branch probability

of the identified frequency change template is, in fact, the cause of the increased (resp.

decreased) number of events after the first (resp. second) drift. Figure 4.9 depicts the

identified template, with the activity labels in their original language.

As discussed in Section 4.4.4, the baseline technique for log-to-log comparison

[115] is designed to compare logs with complete traces. However, since there was no

complete trace within the pre-drift and post-drift sub-logs, we ran the baseline tech-

nique using the sub-logs containing only partial traces. Nevertheless, we had to abort

the experiment as it did not complete within six hours.

aanname laboratoriumonderzoek

ordertarief 190021 klinische opname a002

190205 klasse 3b a205190101 bovenreg.toesl. a101ligdagen - alle spec.beh.kinderg.-reval.

Figure 4.9: Identified template for Drift 1 in BPIC 2011 log.

4.6 Summary

In this chapter, we proposed a systematic online method for characterizing process

drifts at the level of individual activities from event streams. The method can charac-

terize multiple simultaneous changes so long as they do not overlap in terms of process

behavior. The strength of our method resides in the features used to encode the process

behavior and its well-grounded statistical approach, that allow us to deal with highly

variable processes. The collection of change templates that we use to describe a drift is

based on a well-established categorization of typical change patterns. We do not claim

this collection to be complete, but it can easily be extended. Furthermore, the change

templates that best characterize the drift are reported to the user as natural language

statements. The method may also be used on top of any process drift detection tech-

nique so long as it is provided with the point (or period) in which a drift occurs. Finally,

by replaying an event log as an event stream the method can as well be deployed for

characterizing drifts in event logs.



We extensively evaluated our method using both highly variable artificial logs as

well as a real-life log. The results on the artificial logs show high accuracy of our

method in characterizing drifts induced by the application of typical process changes to

individual activities as well as its low characterization delay and low time performance.

And despite the lack of a ground truth to validate our findings on the real-life log, the

results were supported by various observations from the log. In addition, the method

outperforms a state-of-the-art technique for log-to-log comparison.


Chapter 5

Process Drift Characterization atFragment Level

Drift detection and characterization play equally important roles in identifying and

explaining undocumented process changes that may over time negatively impact the

performance of a business process. As such, in Chapters 3 and 4 we proposed two au-

tomated methods for detecting and characterizing drifts from event streams of business

processes. The characterization method starts with extracting α+ binary behavioral re-

lations, such as causality, concurrency and conflict, from an event stream before and

after a drift and then by performing a statistical test filters out unrelated relations to

the drift. The remaining relations are mapped to a predefined set of change patterns to

produce statements that explain the drift. However, this method is limited to charac-

terizing changes applied to individual activities, e.g. removing an activity or swapping

two activities. This limitation is in fact due to the low-level abstraction of process be-

havior as captured by the α+ relations. Consequently, changes to process fragments

are either completely missed by this method, e.g. skipping a fragment of two concur-

rent activities, or only partially explained. An example of the latter case is when we

remove a fragment of two mutually exclusive activities from the process, for which this

method only identifies the removal of one of the activities. Another limitation is the

inability to characterize complex changes such as overlapping changes, i.e. changes

that share some behavioral relations, as well as nested changes, i.e. a set of overlapping

changes, each applied to the resulting subprocess from the application of the previous

one.

In the light of the above, this chapter proposes a fully automated method for char-

actering process drifts at the level of fragments from event streams. The core idea is to

71

CHAPTER 5. PROCESS DRIFT CHARACTERIZATION AT FRAGMENT LEVEL

discover two process trees, i.e. block-structured process models, from the portions of

the event stream before and after the drift, and use a process tree transformation tech-

nique to find a minimum-cost sequence of edit operations that transforms the pre-drift

process tree to the post-drift process tree. The underpinning assumption is that edit

operations within such a sequence manifest control-flow changes of the process un-

derlying the drift. Each process (sub)tree represents a single-entry single-exit (SESE)

process fragment. As such, we define a set of fragment-based edit operations, each

representing a change to one or more process fragments. The definition of the edit

operations and the cost of applying them is such that a minimum-cost sequence of edit

operations provides a detailed yet concise explanation of the process changes. That is,

if a change involves an individual activity within the process then it is explained by

one change in the sequence referring to that activity. On the other hand, if a change

involves a fragment of multiple activities, then it is explained by one change in the

sequence referring to that fragment as a whole. Moreover, the hierarchical structure

of a process tree allows the characterization of more complex changes such as over-

lapping changes as well as nested changes. The identified fragment-level changes are

translated into natural language statements based on typical change patterns of busi-

ness processes that we have already used for detection and characterization of drifts in

the previous two chapters.

We extensively evaluated the accuracy and the conciseness of the statements re-

ported by our method by characterizing drifts on event streams simulated from artifi-

cial and real-life event logs in various settings. The results indicate that the proposed

method is fast and highly accurate in characterizing typical change patterns via concise

statements, and performs better than the method proposed in the previous chapter at

characterizing changes applied to fragments of multiple activities, overlapping changes

as well as nested changes.

This chapter is structured as follows. Sections 5.1 and 5.2 discuss related work and

preliminaries, respectively. Sections 5.3, 5.4 and 5.5 illustrate the various ingredients

of the proposed method, divided into process tree discovery, process tree transforma-

tion, and computing characterization statements, respectively. Sections 5.7 and 5.8

present the evaluation on artificial and real-life logs, respectively. Finally, Section 5.9

discusses some factors that influence the accuracy of the proposed method, while Sec-

tion 5.10 concludes the chapter.


5.1. RELATED WORK

5.1 Related Work

In the previous chapter, we discussed that a possible approach to characterize a drift

is to extract two sub-logs from before and after a drift and compare them using a log-

to-log comparison technique, called log-delta analysis [115, 10]. We also pointed out

some limitations of these techniques, e.g. inability to work with partial traces and

hence event streams. Another approach to drift characterization is to first discover two

process models, one from the pre-drift sub-log and the other from the post-drift sub-

log and compare them using a model-to-model comparison technique. In this context,

Armas-Cervantes et al. [8] propose a method for diagnosing behavioral differences

between two process models based on canonically reduced event structures. The idea

is similar to that of Van Beest et al. [115], that is to build two event structures, here

from two process models, and by comparing them report their differences as natural

language statements. However, the identified differences by this method are at the level

of individual activities. Consequently, a problem is that this method reports a large

number of differences, specially when the changes are applied to process fragments,

or when they occur in a nested way. For example, for a simple fragment-level change,

where we parallelize two sequential fragments, each consisting of four activities, this

method would report 16 differences, each capturing the parallelization of two activities.

Obviously, it is not easy to understand and analyze such a large number of differences.

Another limitation of this method is its high execution time, specially when it needs to

compare two large event structures with several differences.

All the approaches described above, as well as the one presented in Chapter 4

work at the level of individual activities and thus are not suitable for characterizing

process fragment changes. A possible approach to discover process fragments is to

abstract from the low-level behavioral relations in an event stream by discovering a

block-structured process model, where each block represents a single-entry single-exit

(SESE) process fragment. The latter requires a process discovery technique that works

on event streams. A few techniques have been proposed for process discovery on

event streams [17, 98, 16], among which, only the technique developed by Redlich

et al. [98] guarantees the discovery of block-structured process models. However, the

latter technique can only work on completed traces within an event stream, and hence

it misses the behavior represented by partial traces. In this chapter, we adapt Inductive

Miner (IM) [68] to work on event streams. Inductive Miner is an automated process

discovery technique that guarantees the discovery of block-structured process models



in the form of process trees. We chose IM because this algorithm recursively constructs

a process tree from an event log, so it naturally lends itself to be adapted to work on

partial traces. Alternatively, it is possible to incrementally compute a process map

from an event stream using a technique such as the one in [73] and use it as input

to any other process discovery technique that builds a block-structured process model

from a process map, provided this technique can work on partial traces.

A block-structured process model can be represented as a process tree. Therefore,

given two process trees from before and after a drift we can characterize the drift by

finding a sequence of edit operations that transform the pre-drift process tree to the

post-drift process tree. This problem in the algorithms and data structures community

is known as tree edit distance, where a widely studied challenge is to find the mini-

mum number of edit operations to turn an ordered (resp., unordered) tree into another

ordered (resp., unordered) tree. There are several techniques for finding the minimum

tree edit distance between two ordered trees [113, 137, 57, 27, 31, 91] and between

two unordered trees [138, 136, 108, 135, 54, 5, 51, 37, 61]. A process tree, as opposed

to conventional trees, may contain both ordered and unordered nodes at the same time.

The existing tree comparison techniques, however, are designed to transform an or-

dered (resp. unordered) tree to another ordered (resp. unordered) tree. Moreover, due

to the specific syntactic rules of process trees the basic node deletion/insertion/substi-

tution edit operations defined by these techniques are not suitable to capture the differ-

ences between the process behaviors expressed by the two process trees. For example,

non-leaf nodes in a process tree should at least have two children, or some leaves may

only be parented by certain nodes. Such rules often give rise to the situations where the

deletion of a node from a process tree triggers a sequence of trivial node deletions that

do not change the process behavior expressed by the tree. Therefore, these techniques,

in their current form, cannot be used for the process tree comparison problem. In this

chapter we introduce a process tree comparison technique that finds a minimum-cost

sequence of process tree edit operations needed to transform a process tree to another

process tree. We implement the proposed technique using two alternative search strate-

gies, exhaustive and greedy, and assess the relative merits.

5.2 Preliminaries

This section introduces basic notions such as process trees and fragments. The notation

used in this chapter is summarized in Appendix A.


5.2. PRELIMINARIES

A tree is an acyclic, connected graph. For a tree T , the sets containing nodes and

edges are denoted by V (T ) and E(T ), respectively. The size of T is |V (T )| and is

denoted by |T |. We sometimes denote v ∈V (T ) as v ∈ T . The root node of a tree T is

denoted by root(T ). We denote the subtree of T rooted at v ∈ T by T 〈v〉.

For each non-root node v in T , let DownT (v)⊂V (T ) be the sequence of nodes on

the shortest path from root(T ) to v. The parent of v is its adjacent node in DownT (v).

The parent of root r is undefined. We say v is a child of u if u is the parent of v. A nodes

in DownT (v) preceding v is called an ancestor of v in T . We say v is a descendant of

u if u is an ancestor of v. The nodes with the same parent are called siblings. A node

with no children is called a leaf. A non-leaf node is called an internal node. The set of

leaves under an internal node v ∈ T is denoted by leaves(v). We denote the label of a

node v by l(v).

The depth of v in T , is denoted by dep(v) and equals to |DownT (v)|−1. The depth

of T , denoted by dep(T ) equals to the maximum depth of its nodes, i.e. dep(T ) =

maxv∈V (T ) dep(v). For the nodes v1, . . . ,vn in T we define a common ancestor (CA)

as a node in DownT (v1)∩ . . .∩DownT (vn), and denote it by CA(v1, . . . ,vn). Also,

we define the lowest common ancestor (LCA) as the deepest CA, and denote it by

LCA(v1, . . . ,vn). Accordingly, for the subtrees T 〈v1〉, . . . ,T 〈vn〉 the lowest common an-

cestor (LCA) is denoted by LCA(T 〈v1〉, . . . ,T 〈vn〉), and is the same as LCA(v1, . . . ,vn).

A process tree is a rooted labeled tree that provides an abstract hierarchical repre-

sentation of a block-structured workflow net [68]. We define its syntax as follows:

Definition 12 (Process tree). Let L be a set of activity labels, and O = {×,→,∧,}be a set of operator labels. Then, an activity node t with l(t) ∈L is a process tree, a

τ-node with l(τ) ∈ {τ} is a process tree, and ⊕(P1, . . .Pn) is a process tree, in which

⊕ is a process tree operator node with l(⊕) ∈ O, and P1 . . .Pn are process trees.

A process tree expresses a language: an activity node t represents the singleton lan-

guage l(t), a τ-node represents the language with the empty trace, and an operator node

represents a certain combination of the languages of its subtrees P1 . . .Pn, depending on

its label. In this chapter, we have the following four operator labels: 1)× expresses the

exclusive choice between its subtrees, 2) → expresses the sequential composition of

its subtrees, 3) ∧ expresses the concurrent composition of its subtrees, 4) expresses

the structured loop of its first subtree (loopbody), followed by the alternative loopback

path of its second subtree.



For instance, the process tree ×

dc

∧

ba

expresses the language {〈a,b〉, 〈b,a〉, 〈c〉,

〈c,d,c〉, 〈c,d,c,d,c〉 . . .}.

In a process tree P, a leaf node is either an activity node or a τ node, whereas

an internal node is always an operator node and must at least have two children. We

define Γ =L ∪{τ}∪O as a fixed finite alphabet which assigns a label to each node in

a process tree. In a process tree, if an activity node has a unique label, we sometimes

refer to that activity node by its label. The set of activity nodes under an operator node

v∈P is denoted by C(v), and contains the activity nodes in P〈v〉. By replaying an event

log on top of a process tree we can annotate each node of the tree with its execution

frequency. We call the ratio of the frequency of a node v to the frequency of its parent

the relative frequency of v.

The relation between the nodes v1, . . . ,vn in P is defined by the operator of their

LCA, i.e. mutually-exclusive (×), concurrent (∧), sequential (→), or loop (). Ac-

cordingly, the relation between the process trees P〈v1〉, . . . ,P〈vn〉 in P is the same as

the relation between the nodes v1, . . . ,vn.

A process tree can contain both ordered and unordered operator nodes. An operator

node ⊕ is unordered if it is commutative, i.e. ⊕(P1, . . . ,Pn) = ⊕(Pn, . . . ,P1), it is

ordered otherwise. The operator nodes × and ∧ are unordered, whereas→ and are

ordered. For example, ×

ba

= ×

ab

, whereas →

ba

6= →

ab

.

The pre-order index of v in a process tree P is denoted by preP(v), and is the

same as one in an ordered tree when arbitrarily fixing the order of siblings parented by

unordered operator nodes in P. We refer to the node with the pre-order index of i in P

by P[i]. Also, in the process tree examples in this chapter, the number on the left side

of a node indicates its pre-order index.

For a node v and an ordered operator node ⊕ in a process tree P such that v is a

descendant of ⊕, we define a function Rankk returning the rank of v in ⊕.

Definition 13 (Rankk). Let� be the order on the children of an ordered operator node

⊕ in a process tree P. Also, let v ∈ P be a descendant of ⊕ and c ∈ P be a child of ⊕such that c ∈ DownP(v), then Rankk(v,⊕) = |{c′ ∈ children of ⊕ | c′ � c}|.


5.2. PRELIMINARIES

Example 8. As an example, let us assume the process tree P →

∧

dc

×

ba

0

1

2 3

4

5 6

. The rank of

each non-root node P[i] in the→-node P[0] is as follows.

Rankk(P[1], P[0]) = 1, Rankk(P[2], P[0]) = 1, Rankk(P[3], P[0]) = 1,

Rankk(P[4], P[0]) = 2, Rankk(P[5], P[0]) = 2, Rankk(P[6], P[0]) = 2.

There might be multiple process trees with the same language. For example, the

tree ×(a,×(b, c)) expresses the same language as ×(×(a, b), c). As in this chapter

we compare the structures of two trees to characterize a drift, we need to have one

structurally unique tree for each language. A set of structural reduction rules is intro-

duced in [72], which guarantees to preserve the language of a process tree. Repeated

application of these rules to a process tree leads to a syntactic unique normal form, i.e.

for each language, there is at most one process tree in normal form. In our example, the

normal form would be ×(a, b, c). In this chapter, we use a subset of these reduction

rules, defined below.

Definition 14 (Reduction rules). Let M, Q, and R be process trees, and let . . . be any

number of process trees (possibly 0). Then, the reduction rules are as follows:

singularity rule

(S) ⊕(M)⇒M with ⊕ ∈ {×,→,∧}

associativity reduction rules

(A×) ×(. . .1 ,×(. . .2))⇒ ×(. . .1 , . . .2)

(A→) →(. . .1 ,→(. . .2), . . .3)⇒ →(. . .1 , . . .2 , . . .3)

(A∧) ∧(. . .1 ,∧(. . .2))⇒ ∧(. . .1 , . . .2)

τ-reduction rules

(T→) →(. . . ,M,τ)⇒ →(. . . ,M)

(T∧) ∧(. . . ,M,τ)⇒ ∧(. . . ,M)

A process tree to which no rule can be applied is in normal form and is called a

canonical process tree.

A singularity rule applies to all operators except , as a -node always has two



children. This rule is based on the definition of the process tree operators (provided

above) that a→-node, a ∧-node, or an ×-node with one child has the same behavior

as the child itself. The associativity rule applies to ×,→, and ∧ operators and reduces

a tree such as ×(a,×(b, c)) to ×(a, b, c). The τ reduction rules target τ constructs

and are defined for→ and ∧ operators. A τ-node as a child of a→-node, or a ∧-node

does not change the language (T→, T∧).

We define a fragment as a process tree representing a single-entry single-exit pro-

cess fragment. Formally:

Definition 15 (Fragment). Let P be a process tree rooted at w.

• P is a fragment.

• Let S = {P〈v1〉, . . .P〈vn〉} be the set of process trees under w, where v1, . . .vn

are children of w, . . . is any number of process trees (possibly 0), and l(w) ∈{×,∧}. A process tree ⊕(P1, . . . Pm) parented by w, where l(⊕) = l(w) and

s = {P1, . . .Pm} is any non-empty proper subset of S, is a fragment. We call such

a fragment a sub-fragment of w.

• Let S= {P〈v1〉, . . .P〈vn〉} be the sequence of process trees under w, where v1, . . .vn

are children of w, . . . is any number of process trees (possibly 0), and l(w) =→.

A process tree⊕(P1, . . .Pm) parented by w, where l(⊕)= l(w) and s= {P1 . . .Pm}is any nonempty proper subsequence of S, is a fragment provided that any two

consecutive elements P〈vi〉,P〈vi+1〉 in s are consecutive in S. We call such a

fragment a sub-fragment of w.

Any fragment formed by the nodes within a process tree P is a sub-fragment of P.

We sometimes refer to P〈v〉 by P〈v〉-fragment. Also, a fragment P〈v〉, where v is the

child of a node w, is called a child fragment of w. Furthermore, a fragment f1 = τ is

called a τ-fragment, and as such a fragment f2 6= τ is called a non-τ-fragment.

Example 9. As an example, in the process tree ×

τba

the set of all sub-fragments of ×

is { ×

ba

, ×

τb

, ×

τa

, a, b, τ}.


5.3. PARTIAL TRACES AND PROCESS TREE DISCOVERY

Example 10. As an example, for the process tree

→

fed

×

τba

the set of all sub-fragments

is {

→

fed

×

τba

, ×

τba

, ×

ba

, ×

τb

, ×

τa

, →

fed

, →

ed

, →

fe

, a, b, τ , d, e, f}.

5.3 Partial Traces and Process Tree Discovery

Given two sub-logs of partial traces, one extracted from before and the other extracted

from after a drift, our method characterizes the drift in three steps. In the first step, two

process trees P and P′ are discovered from the pre-drift and post-drift sub-logs, respec-

tively. In the second step, a minimum-cost sequence of edit operations that transforms

P into P′ is computed. In the third step, our method constructs characterization state-

ments based on the identified edit operations. An overview of our method is shown in

Figure 5.1. In the rest of this section we illustrate Step 1, while in the next to sections

we cover the other two steps.

Process tree discovery Process tree transformation

Construct Drift characterization

statementsPost-drift sub-log

Pre-drift sub-log P

P'A sequence of edit operations

Characterization statements

Figure 5.1: Overview of our method for process drift characterization.

Due to the traces being derived from streams of events, and our application of

window-based extraction, we might observe some traces only partially. That is, the

start and/or the end of the traces might fall outside of the considered window, as illus-

trated in Figure 5.2. Partial traces can be found outside the area of streams as well:

if an event log is extracted from a running process, one in fact applies a window to

the running process, and every case that is still in progress falls partially outside of the

window. Furthermore, cases that were already started before the event log was being

captured also fall outside of the window.

Partially observed traces might influence discovery, which we illustrate using Fig-

ure 5.2. If all traces would have been observered completely, in the log

L = [〈,a,b,c,d,e, f ,g〉3], IM would discover the model →

gfedcba

. However, the



〈a,b,c,d,e, f ,g〉

〈a,b,c,d,e, f ,g〉

(a) Traces in a window.

Lp = [ |b,c,d,e, f ,g〉,〈a,b,c,d,e| ]

(b) A corresponding log with partial traces(their partiality is denoted by |).

Figure 5.2: Example of partial traces. In the window, some traces are observed par-tially, as they start and/or end outside of the window. In our example, the first and thelast trace are only partially observed.

event log observed is Lp = [|b,c,d,e, f ,g〉, 〈a,b,c,d,e|]. Without knowledge of partial

traces, IM discovers the model →

×

→

gfe

τ

dcb×

aτ

. This process tree does not capture

the meaning of the partial traces well, as it allows a, e, f and g to be skipped, even

though there has not been evidence of this skipping in the event log. One could sim-

ply remove the partial traces. However, as seen in our example, these traces add vital

information as without them the event log would be empty.

In this section, we first describe how partial traces can be detected. Second, we

describe an existing process tree discovery algorithm (Inductive Miner (IM)). Third,

we introduce a new process tree discovery algorithm that extends IM by adapting its

steps to handle partial traces better.

5.3.1 Detecting Partial Traces

Two pieces of information constitute knowledge of partial traces: one needs to decide

whether one has seen the first event, and whether the last event has been seen. We refer

to a trace of which we have seen the first event as having a reliable start, and to a trace

of which we have seen the last event as having a reliable end. Traces might have both

an unreliable start and an unreliable end, or both might be reliable. In Figure 5.2, the

first trace has a reliable end and the second trace has a reliable start.

To detect whether a trace has a reliable start or end, one could incorporate domain

knowledge. For instance, it could be known that a trace always starts with a “regis-

tration” step and always ends with an “archive” step. Then, each trace that starts with

“registration” has a reliable start and each trace that ends with “archive” has a reliable


5.3. PARTIAL TRACES AND PROCESS TREE DISCOVERY

end. Other ways to determine reliability include the use of attribute data. For instance,

attribute data attached to the trace could indicate whether a trace has been completed.

In absence of any domain knowledge, one could mark the most occurring start and

end activities, given some threshold, and mark traces as reliable accordingly.

Many more ways of detecting partial traces might be proposed. The extension of

IM that is described in this section does not depend on the way reliability is decided.

5.3.2 Discovering Process Models from Partial Traces

We introduce an extension of Inductive Miner (IM) to handle partial traces, namely

Inductive Miner - partial traces (IMpt). We chose Inductive Miner (IM) [68] because

this algorithm recursively constructs a process tree from an event log, so it naturally

lends itself to be adapted to work on partial traces, given that we need to produce

process trees as output. Specifically, in the recursion, IM tries to find a cut of the

event log, consisting of a partition of the activities in the event log and a process tree

operator. This cut describes the most important behavior in the event log. For instance,

the cut (→,{a,b},{c}) denotes that the most important behavior in the event log is

‘some behavior with a and b’ sequentially followed by ‘some behavior with c’. If

such a cut can be found, the event log is split accordingly into sub-logs, and on these

sub-logs IM recurses, thereby constructing a process tree in a top-down manner. The

recursion ends in a base case, for instance if only a single activity remains in the event

log. Alternatively, if no base case applies and a cut cannot be found, a fall through is

selected. Several fall throughs have been defined (see [72]), decreasing in precision, in

the worst case leading to a flower model that allows any behavior with the activities in

the event log (e.g.

τ×

an. . .a1

).

As seen in the example of Figure 5.2, partial traces might introduce skips in the

resulting process model. For each of the steps of IM, we describe the effects of partial

traces, and briefly how IMpt addresses them.

IM detects cuts by considering the directly follows graph of the event log. The

nodes of this graph are the activities in the log, and the edges denote the activities

that were directly followed by other activities in the event log. Furthermore, directly

follows graphs contain information about which activities where observed in the log as

the start or end of a trace.



As an example, we consider an event log Ll:

Ll = [〈a,b,c,a,b,c,a,b〉,〈a,b,c,a,b,c|, |b,c,a,b〉, |c,a|]

The directly follows graph of Ll , without considering partial traces, is shown in Fig-

ure 5.3a.

In the directly follows graph, IM identifies characteristic footprints of the process

tree operators ×, →, ∧ and . For instance, in Figure 5.3a, the cut (∧,{a,b},{c})can be identified, as a and b are fully connected to c. For more details on cut de-

tection, please refer to [71]. As a final result, IM would discover the process tree

∧(c,(a,τ),(τ,b)).

However, with the available knowledge of partial traces, this tree does not do Ll

justice. To take partial traces into account, IMpt considers only reliable start and end

activities. For Ll , the directly follows graph then becomes as shown in Figure 5.3b.

In this graph, the cut (,{a,b},{c}) can be identified. As a final result, IMpt would

discover the process tree (→(a,b),c), which matches the intuitive idea of the log

better than the model discovered by IM.

a b

c

(a) Without considering partial traces.

a b

c

(b) Considering partial traces.

Figure 5.3: Two directly follows graphs for L1.

After a cut has been detected, the event log is split into several sub-logs, based on

the cut that was found. During log splitting, information about the reliability of traces

has to be copied to the sub-logs and adjusted.

For × and ∧, no adjustments are necessary. For instance, for ∧, if the trace had

an unreliable end, then, both sub-traces have unreliable ends. That is: 〈a,b,c| split on

(∧,{a,b},{c}) becomes 〈a,b| and 〈c|.For→ and , if the to-be split trace has an unreliable end, the last sub-trace will

have an unreliable end, but all other sub-traces will have reliable ends. For instance,

〈a,b| split on (→,{a},{b},{c}) becomes 〈a〉 for {a}; and 〈b| for {b}; and no trace for

{c}.1 For unreliable starts of traces and for -cuts, a similar strategy is applied.

Most base cases of IM are unaffected by partial traces. However, if the log contains

1If the trace would have a reliable end, then an empty trace would be introduced in the sub-log for{c}.


5.4. PROCESS TREE TRANSFORMATION

empty traces, then the base case EMPTYTRACES [72, p195] might remove the empty

traces from the log, recurse and return a ×(τ, .) construct. Like other traces, empty

traces might have unreliable starts or ends. If a trace had an unreliable start, then the

actual trace might have events that fell before the window of observation (similar for

unreliable ends). Therefore, IMpt considers empty traces only if these have a reliable

start and a reliable end.

The concepts of IMpt can be straightforwardly extended to handle infrequent be-

havior (analogous to IM - infrequent [69], which filters noise from the directly follows

graph before cut detection and from the log during log splitting), yielding Inductive

Miner - infrequent - partial traces (IMfpt), and to handle incomplete behavior (anal-

ogous to Inductive Miner - incompleteness [70], which optimises to find the best cut

rather than a perfect cut), yielding Inductive Miner - incompleteness - partial traces

(IMcpt).

Introducing a technique to handle event logs with partial traces yields the need for

conformance checking concepts and techniques that are aware of partial traces as well,

such as fitness and precision in the presence of partial traces. However, defining these

concepts is outside the scope of this thesis.

5.4 Process Tree Transformation

We use IM to discover two process trees from the sub-logs before and after a drift. In

this section, we present a method for finding a sequence of edit operations with the

minimum cost, that transforms the pre-drift process tree P to the post-drift process tree

P′. We first define a set of process tree edit operations and the cost of applying them

in Section 5.4.1. A direct approach to solve the process tree transformation problem

is then to try all possible sequences of edit operations that transform P into P′ and

find the cheapest one. However, there are infinite number of such sequences and it

may be impossible to enumerate all of them. To prune the search space, we define

a notion of mapping between two process trees, where a valid mapping is one that

represents a sequence of edit operations that transforms P into P′. By defining the cost

of a valid mapping based on the cost of edit operations, we reformulate our goal as

to find a minimum-cost valid mapping between P and P′. By means of a mapping we

substantially prune the search space as we only need to try all possible valid mappings

between P and P′ to find a minimum-cost sequence of edit operations that transforms

P into P′. In Section 5.4.2.1 we present an A* algorithm to compute a minimum-cost



valid mapping between two process trees. As a faster alternative, a greedy algorithm

is presented in Section 5.4.2.2 to approximate such a mapping.

5.4.1 Process Tree Edit Operations

A process tree edit operation is an edit operation applied to a process tree at any step

during its transformation to another process tree. In a process tree transformation prob-

lem the goal is to find a minimum-cost sequence of edit operations to transform one

process tree into another process tree (optimal solution). Hence, the granularity of pro-

cess tree changes expressed in the optimal solution depends on the size of process tree

constructs based on which the edit operations are defined as well as the cost of each

edit operation. For example, consider the transformation of process tree P : →

∧

dc

×

ba

0

4

into process tree P′ : →

×

fba

e

0 , and assume two edit operations, delete/insert a frag-

ment (of any size), where each edit operation has a unit cost. These two edit operations

yield the optimal solution consisting of two changes: delete P〈P[0]〉-fragment and in-

sert P′〈P′[0]〉-fragment, i.e. delete the original process tree and insert the new process

tree. However, such an abstract explanation does not provide any detail on the actual

changes occurred in the process. On the other hand, assume two edit operations with

unit costs which only allow the insertion/deletion of individual nodes in a process tree.

For P and P′ in the above example, the optimal solution would become: delete activ-

ities c and d, and insert activities e and f . This sequence of changes provide detailed

characterization of changes in the process trees. However, explaining the changes at

the level of activities can become verbose and confusing, specially when changes in-

volve large fragments of a process. As such, we need to define edit operations and

their costs such that the optimal solution characterizes process tree changes in enough

detail while avoiding verbosity. For example, instead of reporting on the deletion of

activities c and d individually, we could report on the deletion of the P〈P[4]〉-fragment

containing those activities without loss of information.

A process tree edit operation represents a change in its underlying process. There-

fore, we define process tree edit operations based on the typical change patterns in

business processes [129], introduced in Chapter 2. We classify each change patterns,



except “synchronize two fragments”, as simple (S) or compound (C), where a com-

pound change pattern is one that can be expressed using multiple simple change pat-

terns. Table 5.1 shows the class of each change pattern. Note that the synchronization

of two fragments introduces unstructuredness into a process model and hence cannot

be used as a basis for defining process tree edit operations. This change pattern is

illustrated with an example in Section 5.5. We set our goal as to find a sequence of

simple changes that fully explains the transformation of the pre-drift process tree P to

the post-drift process tree P′, while satisfying three requirements. 1. To improve the

understandability of the changes, a change in the relation between fragments, e.g. from

sequential to parallel, should only involve fragments that exist in both P and P′. 2. The

changes within the sequence should not overlap, i.e. any two changes should cover

distinct differences between the trees. 3. The sequence of changes needs to be detailed

yet concise. That is, if a change involves an individual activity within the process tree

then it should be explained by one change in the sequence referring to that activity.

On the other hand, if a change involves a fragment of multiple activities then it should

be explained by one change in the sequence referring to that fragment as a whole. To

satisfy these requirements, we first define a set of process tree edit operations based on

the simple change patterns in Definition 16, 17, 18 and 19. The defined edit operations

can be applied to fragments of any size, from individual activities to larger fragments.

We then search for a minimum-cost sequence of edit operations which transforms the

pre-drift process-tree P into the post-drift process tree P′. In this search, we only con-

sider sequences of edit operations in which edit operations that delete (resp., insert)

fragments occur before (resp., after) edit operations that change the relation between

fragments. Furthermore, by limiting each node within P or P′ to be subject to one edit

operation we ensure that the edit operations within a sequence do not overlap. We also

define the cost of edit operations such that a minimum-cost sequence of edit operations

which transforms P into P′ provides a detailed description of changes within P. In a

post-processing step, we then aggregate the edit operations within a sequence to make

it as concise as possible.

Therefore, based on a defined set of edit operations our goal is to find a minimum

cost sequence of edit operations to transform P into P′ and to subsequently make it as

concise as possible. In this chapter, we use six process tree edit operations: substitution

of operators SUB⊕, substitution of activities SUBac, deletion of fragments (D f ), dele-

tion of -operator nodes (D), insertion of fragments (I f ), and insertion of -operator

nodes (I). The relation with the change patterns is shown in Table 5.1.



Code Change pattern Cat. Class Process tree edit op-erations

sre Insert/delete a fragment between two fragments I S I f , D fpre Insert/delete a fragment in/from parallel branch I S I f , D fcre Insert/delete a fragment in/from conditional branch I S I f , D fcp Duplicate a fragment I Crp Substitute a fragment I C SUBac (covers activ-

ity substitution)sw Swap two fragments I Csm Move a fragment to between two fragments I Ccm Move a fragment into/out of conditional branch I Cpm Move a fragment into/out of parallel branch I Ccf Make fragments mutually exclusive/sequential R S SUB⊕pl Make fragments parallel/sequential R S SUB⊕cd Synchronize two fragments R - -lp Make a fragment loopable/non-loopable O S I, D

cb Make a fragment skippable/non-skippable O S I f , D ffr Change branching frequency O -

Table 5.1: Change patterns from [129] and their relation to our process tree edit oper-ations.

Definition 16 (Process tree edit operations). A process tree edit operation γ transforms

a canonical process tree P into another canonical process tree P′, denoted by Pγ−→

P′.

Definition 17 (Substitution operations). We use the following process tree edit opera-

tions for substitution:

SUB⊕ Operator substitution Let ⊕(M1, . . .M2) be a fragment, where . . . is any num-

ber of process trees (possibly 0), l(⊕) ∈ {→,×,∧}, and M1 and M2 are process

trees. Operator substitution replaces the operator of ⊕ with a different operator

in {→,×,∧}.

This edit operation cannot be applied to an ×(. . . ,τ)-node, where . . . is any

number of process trees, as a→- or ∧-node may not have a τ-child.

SUBac Activity substitution Applies to activity nodes, where it replaces the activity

with a different activity.

Example 11. Figure 5.4a shows two examples of substitution operations, where the

operator of the→(b,×(c,d))-node is substituted with ∧, and activity ‘a’ is substituted

with activity ‘e’.



After the application of each edit operation to a process tree, we reduce the result-

ing tree to normal form by repeatedly applying the reduction rules (cf. Definition 14).

We do not report on the changes in a process tree as a result of the application of

reduction rules, as they do not change the language of the tree.

Example 12. For instance, in →

c∧

ba

1

SUB⊕−−−−−→∧ ⇒ →

→

c→

ba

A→−−→ →

cba

after the substitu-

tion of the operator of the ∧-node 1 with→, we can reduce the resulting process tree

by applying the associativity reduction rule A→.

Definition 18 (Deletion operations). We use the following process tree edit operations

for deletion:

D f Fragment deletion Deletes a fragment f .

If f is a sub-fragment of an operator node ⊕ and as a result of deleting it ⊕ is

left with one child, ⊕ will be removed by the singularity reduction rule (S). For

example, in Figure 5.4b (left to right) the ∧-node P[1] is deleted subsequently by

the singularity reduction rule.

If a -node with less than two children remains after applying a fragment dele-

tion, the deleted construct is replaced with a τ-child to keep the number of chil-

dren of the -node at 2. Such τ-nodes are called auxiliary τ-nodes.

D -operator deletion Let P = ⊕(. . . ,w(M1,τ), . . .), where . . . is any number of

process trees (possibly 0), w is a -node, and M1 is a process tree. Deletion of

the -node w makes ⊕ the parent of M1 and deletes the τ-node.

Example 13. Figure 5.4b (left to right) shows an example of D f , where Fragment 1 is

deleted. In

c→

ba

D f−→→

ba

cτ1

, the deleted→-fragment is replaced by the τ-node 1 to

keep the number of children of the -node at 2. Figure 5.4c (left to right) shows an

example of D, where the -node P[2] is deleted.

Definition 19 (Insertion operations). We use the following process tree edit operations

for insertion:



I f Fragment insertion Inserts a fragment (as a child of an operator node or an

auxiliary operator node).

As discussed above, the deletion of a fragment may cause its parent to be deleted

as well by the singularity reduction rule. Thus, the fragment insertion opera-

tion needs to insert auxiliary operator nodes again, to ensure that a fragment

insertion can offset a fragment deletion. An auxiliary operator node is an extra

non--operator node, inserted (as a child of an operator node ⊕) in a process

tree P (and) as the parent of an inserted fragment and a sub-fragment (of ⊕) in

P. An auxiliary operator node defines the relation between the inserted fragment

and the sub-fragment.

As explained before, the deletion operations insert τ-leaves if a -node would,

as a result of the deletion, not have 2 children. Similarly, when inserting a frag-

ment as the first child of a (τ,M1)-node, the τ-node is replaced (and similar

for the symmetric second-child case). Such τ-nodes that are inserted (resp.,

deleted) as a result of deleting (resp., inserting) child fragments of -nodes are

called auxiliary τ-nodes.

I -operator insertion Inserts a -node n in a process tree P. As a result of this

edit operation, one of the non-τ-sub-fragments of P is inserted as the first child

(loop body) of n, while the second child (loopback) of n is a τ-node.

Example 14. Figure 5.4b (right to left) shows an example of I f , where Fragment 1 is

inserted as a child of the auxiliary ∧-node P[1], in a concurrent relation with activity

‘a’. In

cτ1

I f−→→

ba

c→

ba

, the τ-node 1 is replaced by the inserted→-fragment to keep

the number of children of the -node at 2. Figure 5.4c (right to left) shows an example

of I, where the -node P[2] is inserted as a child of the→-node P[0] (P′[0]), and as

the parent of activity ‘b’.

We defined a set of 6 edit operations based on the simple change patterns in Ta-

ble 5.1, which allow to provide detailed characterization of changes in a process tree.

Included in this set are the two edit operations, insert/delete a fragment, which alone

suffice for explaining any types of changes in the structure of a process tree. Therefore,

the set of 6 edit operations defined above is complete, i.e. it is possible to characterize

any changes in the structure of a process tree using the edit operations in this set.



P P'0

1eb

0

1 2aSubstitite'a'

X

with'e'Substitite with X

P

d daInsert Fragment 1

P P'0

1aInsert

c

2

3 4b

0

1a 3 cb2^

P'

aInsert

2

b

P

ba?

?

Delete Fragment 1

Delete ^

Delete?

cb

X

Fragment 1

^

a

1

P'

^

3 c 2

3b c4

P P'

baSubstitite 'a'with'e'

dc

X Substitite

b

dc

X

in with

b

e

dc

^X

^

P P'


dc

X Substitite with ^X be

dc

^

P


Delete Fragment 1

cb

X

Fragment 1

a

P'

P'

aInsert

b

P

ba? ?

Delete?

(a) SUBacandSUB⊕

P P'0

1eb

0

1 2aSubstitite'a'

X


P


P P'0

1aInsert

c

2

3 4b

0

1a 3 cb2^

P'

aInsert

2

b

P

ba?

?

Delete Fragment 1

Delete ^

Delete?

cb

X

Fragment 1

^

a

1

P'

^

3 c 2

3b c4

P P'


dc

X Substitite

b

dc

X

in with

b

e

dc

^X

^

P P'


dc


dc

^

P


Delete Fragment 1

cb

X

Fragment 1

a

P'

P'

aInsert

b

P

ba? ?

Delete?

(b) D f (left to right) and I f (right to left)

P P'0

1eb

0

1 2aSubstitite'a'

X


P


P P'0

1aInsert

c

2

3 4b

0

1a 3 cb2^

P'

aInsert

2

b

P

ba?

?

Delete Fragment 1

Delete ^

Delete?

cb

X

Fragment 1

^

a

1

P'

^

3 c 2

3b c4

P P'


dc

X Substitite

b

dc

X

in with

b

e

dc

^X

^

P P'


dc


dc

^

P


Delete Fragment 1

cb

X

Fragment 1

a

P'

P'

aInsert

b

P

ba?

?

Delete?

00

(c) D (left to right) and I (right to left)

Figure 5.4: Examples of process tree edit operations.



Each operation has an associated cost θ ; these costs are shown in Table 5.2. For

our cost function θ , it can be shown that the triangle inequality holds, that is, for all

process trees w, u and v and all edit operations x, y and z it holds that θ(w x−→ u) ≤θ(w

y−→ v) +θ(v z−→ u).

Edit operation Cost θ

SUB⊕ 1.SUBac 1.D f If the deleted fragment is a τ-node, then 1. Otherwise, the number of

non-τ leaves in the fragment. Auxiliary nodes have no cost.D 1.I f If the inserted fragment is a τ-node, then 1. Otherwise, the number of

non-τ leaves in the fragment. Auxiliary nodes have no cost.I 1.

Table 5.2: Costs associated with the process tree edit operations.

5.4.1.1 Edit Operation Sequences

Let S = e1, . . . ,en be a sequence of edit operations that transforms a process tree P

into a process tree P′. That is, there is a sequence of process trees P0, . . . ,Pn such that

P = P0, P′ = Pn, and Pi−1ei−→ Pi for 1≤ i≤ n. By extending θ the cost of the sequence

S is given by θ(S) = ∑ni=1 θ(ei).

The edit distance d(P,P′) from process tree P to process tree P′ is defined to be the

minimum cost of all sequences of edit operations which transform P into P′, i.e.

d(P,P′) = min{θ(S) | S is a sequence of edit operations which transforms P into P′}

As stated in Section 5.4.1, to improve the understandability of changes within a

sequence of edit operations, it should only be allowed to change the relation between

fragments that exist in both P and P′. This is illustrated in the following example.

Example 15. As an example, consider this sequence of edit operations that transforms

process tree P into process tree P′,

×

d∧

cba

1

SUB⊕(∧,×)−−−−−−→ ×

dcba

D f−→c

×

dba

. First the operator of the ∧-node

1 is substituted with × by a SUB⊕ edit operation, followed by the application of the

reduction rule A× to this node. Then, activity ‘c’ is deleted by a D f edit operation.

The first edit operation describes that the relation between activities ‘a’, ‘b’ and ‘c’

has changed from concurrent in P to mutually exclusive in P′. However, as activity ‘c’



is deleted by the subsequent edit operation, and hence does not exist in P′, describing

a change in the relation between this activity and other activities may be misleading.

This problem can be avoided by applying fragment deletion operations before and

symmetrically fragment insertion operations after other operations in a sequence of

edit operations. In the above example, this could be achieved by reversing the order of

the two edit operations.

As such, we define the following condition for a sequence of edit operations.

Definition 20 (Fragment deletion/insertion order). Let S be a sequence of edit oper-

ations that transforms a process tree P into a process tree P′. It should hold that

fragment deletion operations precede and fragment insertion operations follow other

operations in S.

Furthermore, we consider less higher-level edit operations to be more understand-

able than more lower-level edit operations. Therefore, given a sequence of edit op-

erations that transforms P into P′, we aggregate the edit operations of the sequence

as much as possible to obtain a concise sequence of edit operations. For example, in

P: →

d×

cb

a 2

−→ →

da

the minimum-cost sequence of edit operations {D f (b),D f (c)}

can be reduced to the concise sequence of {D f (P〈P[2]〉)} by aggregating the two ac-

tivity deletion operations into the deletion of the fragment containing those deleted

activities.

In the remainder of this chapter, unless otherwise indicated, a sequence of edit

operations that transforms P into P′ always refers to a concise sequence of edit opera-

tions.

5.4.1.2 Process Tree Mappings

There are infinite numbers of different sequences of edit operations that transform P

into P′. Therefore, it may be impossible to enumerate all sequences and find the short-

est one. In the next section, we define structures called process tree mappings to prune

the search space further and solve this problem more efficiently. We adapt the map-

ping between ordered trees by Tai [113], a.k.a. Tai mapping, to work on process trees

featuring both ordered and unordered nodes. Figure 5.5 illustrates a sample mapping

between two process trees P and P′.



P P

c

a b e

X

1 5

d b a f

0

1 X

e

^

d

'

2 3

4

6 7 2 3

4

5 6 7

0

^

Figure 5.5: Sample mapping between process trees P and P′.

A dotted line connecting a node n ∈ P to a node m ∈ P′ indicates that n is to

be substituted with m if l(n) 6= l(m), or remain unchanged. Each node in P that is

not connected by a dotted line is to be deleted from P, whereas each node in P′ not

connected by a dotted line is to be inserted in P. To maintain the hierarchical structure

of the trees we add two virtual nodes with the same label as the roots of the trees and

always map them to each other. Formally, a process tree mapping is defined as follows.

Definition 21 (Process tree mapping). A process tree mapping between two process

trees P and P′ is defined by a triple (M, P, P′), where M is any set of pairs of integers

(i, j) satisfying the following conditions:

1) A pair (i, j) ∈ M, where i 6= −1 and j 6= −1, indicates that P[i] needs to be sub-

stituted with P′[ j] if l(P[i]) 6= l(P′[ j]); otherwise it remains unchanged. A pair

(i,−1) ∈ M indicates that the node P[i] is to be deleted from P, whereas a pair

(−1, j) ∈M indicates that the node P′[ j] is to be inserted in P:

−1≤ i≤ |P|−1∧−1≤ j ≤ |P′|−1∧ (i 6=−1∨ j 6=−1)

2) Every node of P or P′ is in the mapping:

∀0≤i1≤|P|−1∃−1≤ j1≤|P|−1(i1, j1) ∈M∧∀0≤ j1≤|P′|−1∃−1≤i1≤|P|−1(i1, j1) ∈M

3) Each node of P or P′ is mapped at most once:

∀(i1, j1),(i2, j2)∈M∧(i1 6=−1∨i2 6=−1)∧( j1 6=−1∨ j2 6=−1)i1 = i2⇔ j1 = j2

4) For every pair (i 6=−1, j 6=−1) ∈M the following conditions should hold:

a) P[i] is a non- operator node iff P′[ j] is a non- operator node.



b) P[i] is a -node iff P′[ j] is a -node.

c) P[i] is an activity node iff P′[ j] is an activity node.

d) P[i] is a τ-node iff P′[ j] is a τ-node.

e) Any two mapped -nodes w in P and u in P′, the nodes on the loopbody (resp.

loopback) path of w can only be mapped to the nodes on the loopbody (resp.

loopback) path of u:

Let w = P[r] and u = P′[s] be ancestors of P[i] and P′[ j] in P and P′, such that

l(w) = l(u) = and (r,s) ∈M, then P[i] is on the loopbody path of w iff P′[ j]

is on the loopbody path of u.

5) For every two pairs (i1 6=−1, j1 6=−1),(i2 6=−1, j2 6=−1) ∈M the following con-

ditions should hold:

a) P[i1] is an ancestor (resp., descendant) of P[i2] iff P′[ j1] is an ancestor (resp.,

descendant) of P′[ j2].

b) Let w be a common ordered ancestor of P[i1] and P[i2] in P, and u be a common

ordered ancestor of P′[ j1] and P′[ j2] in P′,

if Rankk(P[i1], w)< Rankk(P[i2], w) then Rankk(P′[ j1], u)≤ Rankk(P′[ j2], u)

if Rankk(P[i1], w)> Rankk(P[i2], w) then Rankk(P′[ j1], u)≥ Rankk(P′[ j2], u)

if Rankk(P′[ j1], u)< Rankk(P′[ j2], u) then Rankk(P[i1], w)≤ Rankk(P[i2], w)

if Rankk(P′[ j1], u)> Rankk(P′[ j2], u) then Rankk(P[i1], w)≥ Rankk(P[i2], w)

Condition 1 ensures that a node in P or P′ is either mapped to a node in the other

tree or to -1. Condition 2 and 3 ensure that every node in P or P′ is exactly mapped

once. Conditions 4a-4d ensure that M complies with the constraints of the substitution

edit operations (cf. 17). Condition 4e ensures that for any two mapped -nodes w

in P and u in P′, respectively, the nodes on the loopbody (resp. loopback) path of w

can only be mapped to the nodes on the loopbody (resp. loopback) path of u. For the

sample mapping in Figure 5.5, M = {(0, 0), (1, 1), (2, 3), (3,−1), (4,−1), (5, 4),

(6, 6), (7, 5), (−1, 2), (−1, 7)}.Condition 5a in conjunction with the previous conditions are sufficient to ensure

that after each touched node P[i] is changed to its paired node P′[ j] (if l(P[i]) 6= l(P′[ j])),

untouched nodes of P are deleted and untouched nodes of P′ are inserted in P, P and

P′ are equivalent provided that the two process trees only contain unordered opera-

tor nodes. However, as mentioned before, a process tree may contain both ordered



and unordered operator nodes. Hence, we add condition 5b to preserve the order

among siblings in both P and P′. For instance, for the two process trees P and P′

in the sample mapping in Figure 5.5, two nodes P[1] and P′[1] are the only ordered

nodes. Among descendants of P[1], i.e. {P[2], P[3]}, and descendants of P′[1], i.e.

{P′[2], P′[3]}, P[2] is mapped to P′[3] in the mapping M, i.e. (2, 3) ∈ M. Conse-

quently, P[3] cannot be mapped to P′[2] in M, i.e. (3, 2) /∈ M, as otherwise con-

dition 5b will be violated: Rankk(P[2], P[1])(= 1) < Rankk(P[3], P[1])(= 2), but

Rankkank(P′[3], P′[1])(= 2)� Rankk(P′[2], P′[1])(= 1)

To fully comply with the process tree edit operations and sequences thereof defined

in Section 5.4.1, a mapping needs to satisfy further conditions. We call a mapping

that satisfy those conditions a valid process tree mapping (valid mapping). Before

presenting the formal definition of a valid mapping we define the following notions. In

the remainder of this chapter, unless otherwise indicated, a mapping always refers to a

valid mapping.

Definition 22 (Deleted fragments in a mapping). Let M be a mapping between two

process trees P and P′, and let f be a fragment in P. The fragment f is deleted through

M if ∀P[k]∈ f (k,−1) ∈M.

Let S = { f1, . . . , fn} be the set of all deleted fragments in M. A fragment fi ∈ S is

a maximal deleted fragment if there is no f j(6= fi) ∈ S such that fi is a sub-fragment of

f j.

Definition 23 (Inserted fragments in a mapping). Let M be a mapping between two

process trees P and P′, and let f be a fragment in P′. The fragment f is inserted

through M if ∀P′[k]∈ f (−1,k) ∈M.

Let S = { f1, . . . , fn} be the set of all inserted fragments in M. A fragment fi ∈ S is

a maximal inserted fragment if there is no f j(6= fi) ∈ S such that fi is a sub-fragment

of f j.

Example 16. Figure 5.6 shows examples of deleted and inserted fragments in a map-

ping between process trees P and P′. The set of all deleted fragments in this mapping

is S = {b, c, ∧(b,c)}, among which Fragment 1 = ∧(b,c) is a maximal deleted frag-

ment. The set of all inserted fragments in this mapping is S′ = {e, f , →(e, f )}, among

which Fragment 2 =→(e, f ) is a maximal inserted fragment.

Definition 24 (Auxiliary operator nodes in a mapping). Let M be a mapping between

two process trees P and P′. A non--operator node ⊕ = P[i] (resp., ⊕ = P′[ j]) is an

auxiliary operator in M if:



P P

d1

'0

5a

c

2

b3 4

^

d

1

0

3

a

f

4

e5 6

2 X

Fragment 2

Fragment 1

P P

4

'0

c

b

1

a2 3

^

c

0

5

b

2

a3 4

1 X

e

5

d6 7

X

^ e

6

d7 8

^

Fragment 1

Fragment 1

P P

1

'0

a

c

2

b3 4

^ 1

0

a 2

Fragment 1

?

cb4 5Fragment 1

3^ 6

P P

1

'0

a

c

2

b3 4

^ 1

0

a 2

Fragment 1

cb4 5Fragment 1

3^ 6

X

P P'

Figure 5.6: Examples of deleted and inserted fragments in a mapping between processtrees P and P′. Fragment 1 is a maximal deleted fragment, whereas Fragment 2 is amaximal inserted fragment.

i) (i,−1) ∈M (resp., (−1, j) ∈M)

ii) Exactly one child fragment of ⊕ is not a deleted (resp., inserted) fragment in M.

An auxiliary operator node v = P[i] in a mapping corresponds to a node deleted

by the singularity reduction rule after a fragment deletion edit operation (cf. Defini-

tion 18), whereas an auxiliary operator node v = P′[i] in a mapping corresponds to an

auxiliary operator node inserted along with a fragment insertion (cf. Definition 19).

Definition 25 (Auxiliary τ-nodes in a mapping). Let M be a mapping between two

process trees P and P′. Also, let v∈ P (resp., v∈ P′) be a τ-node parented by a -node

u ∈ P (resp., u ∈ P′). v is an auxiliary τ-node if v is deleted (resp., inserted) in M while

u is not deleted (resp., inserted).

An auxiliary τ-node in a mapping corresponds to an auxiliary τ-node inserted

(resp., deleted) as a result of deleting (resp., inserting) a child fragment of a -node by

an edit operation D f (resp., I f ) to keep the number of children of the -node at 2 (cf.

Definitions 18 and 19).

Example 17. As an example, in the mapping between process trees P and P′ in Fig-

ure 5.7, the ∧-node 2 in P′ is an auxiliary operator node, inserted along with the

insertion of activity ‘e’, and the τ-node 5 in P is an auxiliary τ-node, deleted as a

result of inserting activity ‘d’.

Definition 26 (Trivial operator nodes). Let M be a mapping between two process trees

P and P′. A non--operator node v = P[i] (resp., v = P′[ j]) is a trivial operator node in



P P

1

'0

a

0

b2

d

3

c4 7

?

1 a 5

c6

?

e

2

b3 4

^

5

P P'0 0

c4 5 d1

a2

X

b3

5 d1

a2

X

b3 c4

P P'0 0

6 d

2

a3 b4

1

c5

X b2 3 d1 a

Figure 5.7: Sample auxiliary operator node, i.e. the ∧-node 2 in P′, and sample auxil-iary τ-node, i.e. the τ-node 5 in P, in a mapping between process trees P and P′.

M if v is deleted (resp., inserted), at least two child fragments of v contain some nodes

that are not deleted (resp., inserted), and at least one of the following conditions holds

for v:

i) There exists an inserted (resp., deleted) operator node v′ in P′ (resp., P) such that

l(v′) = l(v), and that all undeleted (resp., uninserted) leaves under v are mapped

to leaves under v′ and at least one uninserted (resp., undeleted) leaf under v′ is

not mapped to a node under v. Then, we refer to v as an indirectly-trivial operator

node. Let v′ be the deepest node that satisfies this condition, then we refer to v′ as

indirect parent of v.

ii) Let u be the deepest ancestor of v that satisfies one of the following conditions:

• u is mapped to a node u′ in P′ (resp., P). • u is an indirectly-trivial operator

node and a node u′ in P′ (resp., P) is its indirect parent. • u is an indirect parent

for an indirectly-trivial operator node u′ in P′ (resp., P). such that all undeleted

(resp., uninserted) leaves under v are mapped to leaves under u′. Then, one of the

following should hold for v and u:

a) l(v) = l(u).

b) l(v) = l(u′).

A trivial deleted operator node corresponds to an operator node deleted by the

associativity reduction rules after the application of an edit operation. Inversely, a

trivial inserted operator node corresponds to an operator node inserted as the root of a

sub-fragment of a non--operator node as a result of applying an edit operation.

Example 18. Figure 5.8 shows examples of trivial operator nodes in mappings. In

Figure 5.8a, the ×-node 1 in P is an indirectly-trivial operator node and the ×-node 1



in P′ is its indirect parent (condition i in Definition 26). After substituting the operator

of the→-node in the fragment→(×(a,b),c) with ×, resulting in the insertion of the

×-node 1 ∈ P′, the ×-node 1 ∈ P, i.e. ×(a,b)-node, is deleted by the associativity

reduction rule A×. In Figure 5.8b, the →-node 2 is a trivial operator node, as after

deleting activity ‘c’, and subsequently the×-node 1 by a singularity reduction rule, the

→-node 2 is deleted by the associativity reduction rule A→ (condition iia in Defini-

tion 26). In Figure 5.8c, the ∧-node 1 is a trivial operator node, as after changing the

operation of the→-node 0 to ∧, the ∧-node 1 is deleted by the associativity reduction

rule A∧ (condition iib in Definition 26).

P P

1

'0

a

0

b2

d

3

c4 7

?

1 a 5

c6

?

e

2

b3 4

^

5

P P'0 0

c4 5 d1

a2

X

b3

5 d1

a2

X

b3 c4

P P'0 0

6 d

2

a3 b4

1

c5

X b2 3 d1 a

(a) ×-node 1 in P.

P P

1

'0

a

0

b2

d

3

c4 7

?

1 a 5

c6

?

e

2

b3 4

^

5

P P'0 0

c4 5 d1

a2

X

b3

5 d1

a2

X

b3 c4

P P'0 0

6 d

2

a3 b4

1

c5

X b2 3 d1 a

(b)→-node 2.

P P'0 0

4 d1

a2 b3

b2 3 d1 a^

^

(c) ∧-node 1.

Figure 5.8: Examples of trivial operator nodes in mappings.

Definition 27 (Lowest mapped ancestors). Let M be a mapping between two process

trees P and P′. The lowest mapped ancestors (LMAs) of two nodes v ∈ P and v′ ∈ P′

in M, denoted by LMAsM(v, v′), is a pair (u, u′) of nodes, where u = P[r] and u′ =

P′[s] are ancestors of v and v′, respectively, such that (r, s) ∈M and there is no pair

(m, n) in M, where P[m] is an ancestor of v and P′[n] is an ancestor of v′, such that

dep(P[m])> dep(P[r]) ∧ dep(P′[n])> dep(P′[s]).

Definition 28 (Valid process tree mapping). Given two process trees P and P′, a valid

process tree mapping from P to P′ is a mapping M satisfying the following conditions:

1) For every subtree R = P[i](Q1,Q2) in P (resp., R = P′[ j](Q1,Q2) in P′), where P[i]

(resp., P′[ j]) is a -node, and Q1 and Q2 are process trees, if (i,−1) ∈ M (resp.,

(−1, j) ∈M) then Q2 is a deleted fragment (resp., inserted fragment) in M.

2) Let ⊕ be an operator node in P (resp., P′) that is mapped to an operator node in

the other tree. If l(⊕) ∈ {→,∧} (resp., l(⊕) = ) then at least two (resp., one)

child fragments of⊕ should contain some activity nodes that are not deleted (resp.,

inserted) in M. If l(⊕) = × then at least two child fragments of ⊕ should not be



deleted (resp., inserted) fragments, and one of which should contain some activity

nodes that are not deleted (resp., inserted).

3) For every pair (i, j) in M such that t = P[i] and t ′ = P′[ j] are two τ-nodes, one of

the following conditions should hold:

Let q = P[r] and q′ = P′[s] be the parents of t and t ′, respectively.

a) There exists a pair (r, s) in M.

b) Let v∈P and v′ ∈P′ be the deepest ancestors of t and t ′, respectively, that satisfy

one of the following conditions: • (v, v′) = LMAsM(t, t ′) (cf. Definition 27). • v

is an indirectly-trivial operator node and v′ is its indirect parent (cf. Definition

26). • v′ is an indirectly-trivial operator node and v is its indirect parent. Let u

be an ancestor of t (resp., t ′) such that u is on the shortest path from v (resp., v′)

to q (resp., q′) and l(u) ∈ {→,∧}. One of the following conditions should hold

for u:

i) u is an auxiliary operator node in M (cf. Definition 24).

ii) The child fragment of u containing t (resp., t ′) should at least contain an

activity node that is not deleted (resp., inserted) in M.

The above conditions are defined to ensure that a mapping satisfies all the condi-

tions of process tree edit operations and sequences thereof, defined in Section 5.4.1.

As defined for the edit operations I and D, the second child fragment of a -

node which is to be deleted (resp., inserted) is a τ-node (cf. Definitions 18 and 19).

That is, to delete a -node we need to first delete its second child fragment (if 6= τ) by

a D f operation. And to insert a -node with a non-τ second child fragment f we first

need to insert the -node by a D operation and subsequently insert f as its second

child. This is ensured in M by condition 1, which requires the deletion (resp., insertion)

of the second child fragment of a deleted (resp., an inserted) -node in M.

Example 19. As an example, in the mapping between two process tree P and P in

Figure 5.9 since the -node P[1] is deleted, its second child fragment, i.e. activity b, is

also deleted.

As defined in Definition 20, fragment deletions precede and fragment insertions

follow all other operations in a sequence of edit operation. Also, as we explained in

Section 16, after the application of each edit operation to a process tree we reduce the



P P

d

c

1

a

'

2

0

^

?

b3 4

5 d

0?

2a1

P P

c

b

1

a

'

2

0

3

5 c

0

3a1?

d2

P

c

b

1

a2

0

3

4^ d5 e6

P'

cb

1

a2

0

3 4

^

d5

e6

P P'

c

b

1

0

5

6 c

0

31

X X

X2

4a3

b2

P P

c

b

1

'0

5

6 c

0

31

X X

X2

4a3

b2

Pt

c

0

5

X

b4X1

3a2

Delete Delete a

P P'

b

0

4X1

3a2

b

0

4X1

3c2

Figure 5.9: Sample mapping that satisfies condition 1.

tree to normal form by applying reduction rules. As a result of the latter some operator

nodes may be deleted. Thus, to ensure that an operator node⊕ in P (resp., in P′) that is

mapped to an operator node in P′ (resp., P) cannot be deleted by a reduction rule after

(resp., before) the application of all fragment deletions (resp., insertions) we require⊕to satisfy condition 2.

Example 20. As an example, the invalid mapping between two process tree P and P

in Figure 5.10 does not satisfy condition 2. This is because the ∧-operator node P[1]

is mapped to a node in P′, while it does not at least have two child fragments that have

some undeleted activity nodes. As a result, after deleting activity ‘a’, the ∧-node P[1]

will also be deleted by the singularity reduction rule S and hence cannot be mapped

to a node in P′.

P P

d

c

1

a

'

2

0

^

?

b3 4

5 d

0?

2a1

P P

c

b

1

a

'

2

0

3

5 c

0

3a1?

d2

P

c

b

1

a2

0

3

4^ d5 e6

P'

cb

1

a2

0

3 4

^

d5

e6

P P'

d

b

2

0

6

8 d

0

42

X X

X3

5a4

c3

P P

c

b

1

'0

5

6 c

0

31

X X

X2

4a3

b2

Pt

c

0

5

X

b4X1

3a2

Delete Delete a

P P'

b

0

4X1

3a2

b

0

4X1

3c2

c7

1

^

a1

d

0

71

X3

5b4

c6

a2

^ d

0

41

c3b2

^

P P'

Figure 5.10: Sample invalid mapping that does not satisfy condition 2.

As defined in Definition 14, a τ-node may be deleted by one of the τ-reduction

rules, T→ or T∧. As such, we defined condition 3 to ensure that a τ-node to which

one of the τ-reduction rules can be applied is always deleted in a mapping. That is,

we only allow a τ-node t ∈ P to be mapped to a τ-node t ′ ∈ P′ in M if for which one

of the two conditions, 3a or 3b, holds. Condition 3a requires the parents q and q′ of

t and t ′, respectively, to be mapped in M. For condition 3b we first define two nodes

v and v′ as the deepest ancestors of t and t ′, respectively, that are either mapped in M



Edit operation Representation in a mapping (M, P, P′)SUB⊕ A non--operator node n ∈ P mapped to a non--operator node m ∈ P′

such that l(n) 6= l(m) or a non-auxiliary nontrivial deleted or insertednon--operator node.

SUBac An activity node n∈P mapped to an activity node m∈P′ such that l(n) 6=l(m).

D f A maximal deleted fragment (6= trivial τ-node).D A deleted (M1,M2)-node n, where M1 is not a deleted fragment (i.e.

n /∈ a maximal deleted fragment).I f A maximal inserted fragment (6= trivial τ-node).I An inserted (M1,M2)-node n, where M1 is not an inserted fragment (i.e.

n /∈ a maximal inserted fragment).

Table 5.3: Process tree edit operations (cf. Section 16) and their representations in amapping.

or one of them is an indirectly-trivial operator node and the other one is its indirect

parent. To avoid the deletion of t or t ′ by one of T→ or T∧, we then require each→-

or ∧-node u on the shortest path from q (resp., q′) to v (resp., v′) to satisfy one of the

two conditions, 3bi or 3bii.

Example 21. As an example, consider the mapping between the two process trees

P and P′ in Figure 5.11, where the τ-node P[5] is mapped to the τ-node P′[1], and

LMAsM(P[5],P′[1]) = (P[0], P′[0]). In this mapping, the ∧-node P[2] satisfies condi-

tion 3bi, and the→-node P[1] satisfies condition 3bii.

P P

d

c

1

a

'

2

0

^

?

b3 4

5 d

0?

2a1

P P

c

b

1

a

'

2

0

3

5 c

0

3a1?

d2

P

c

b

1

a2

0

3

4^ d5 e6

P'

cb

1

a2

0

3 4

^

d5

e6

P P'

d

b

2

0

6

8 d

0

42

X X

X3

5a4

c3

P P

c

b

1

'0

5

6 c

0

31

X X

X2

4a3

b2

Pt

c

0

5

X

b4X1

3a2

Delete Delete a

P P'

b

0

4X1

3a2

b

0

4X1

3c2

c7

1

^

a1

d

0

71

X3

5b4

c6

a2

^ d

0

41

c3b2

^

Figure 5.11: Sample mapping that satisfies condition 3.

Table 5.3 illustrates how each edit operation is represented in a mapping.

Definition 29 (Process tree mapping cost). Let M be a mapping between two process

trees P to P′, We define the cost of M as follows:

cost(M) = total cost of all node substitutions +



total cost of all maximal deleted and inserted fragments +

total cost of all deleted and inserted -nodes /∈maximal deleted or inserted fragments +

total cost of all deleted and inserted non- operator nodes /∈maximal deleted or inserted fragments.

We compute each of the first three costs in the same way as we did for the edit op-

erations, while the cost of deleting or inserting a non--operator node is 1. Auxiliary

or trivial nodes have no cost.

Thus, the cost of M is just the cost of the sequence of edit operations consisting

of: a SUB⊕ for each operator node substitution or non-auxiliary nontrivial deleted

or inserted non--operator node in M that is not in a maximal deleted or inserted

fragment, a SUBac for each activity node substitution in M, a D f (resp., I f ) for each

maximal deleted (resp., inserted) fragment (excluding trivial τ-nodes) in M, and a D

(resp., I) for each deleted (resp., inserted) -operator in M that is not in a maximal

deleted (resp., inserted) fragment. It can be shown that d(P, P′) can be determined by a

minimum-cost mapping from P to P′. This proof is similar to the proof of Theorem 3.1

in [113] and is omitted. Since d(P,P′) = min{θ(S) | S is a sequence of edit operations

which transforms P into P′}, we obtain: d(P,P′) = min{cost(M) | M is a mapping

from P to P′}Hence, the search for a minimum-cost sequence of edit operations has been reduced

to a search for a minimum-cost mapping.

5.4.2 Finding Process Tree Mappings & Lower Bounding Func-tion

In the next two sub-sections we present two algorithms for finding a minimum-cost

mapping between two process trees. Here we define a mapping search tree which is a

data structure to capture the search space of the mapping, based on two different search

strategies: exhaustive (A*) and greedy.

Definition 30 (Mapping search tree). A mapping search tree between two process trees

P and P′, denoted by MST (P, P′), is a tree such that the label of the root is 0, the depth

is |P|−1, and every internal node has a maximum of |P′| children, each labeled by one

of −1,1,2, . . . , |P|−1.



We say that a node v in MST (P, P′) is valid if the following set Mv of pairs of

integers forms a mapping between P and P′:

Mv ={(dep(w), l(w)) | w ∈ DownMST (P, P′)(v)} ∪

{(r,−1) | r ∈ {dep(v)+1, . . . , |P|−1}} ∪

{(−1, s) | s ∈ {1, . . . , |P′|−1}−{l(w) | w ∈ DownMST (P, P′)(v)}}

In this chapter, we refer to a mapping search tree as one consisting of just valid

nodes. Hence, Mv is a mapping in which each node on the DownMST (P, P′)(v) denotes

the pair (i, j), such that i = dep(v) and j = l(v). Every node m in P, with r = preP(m),

for which r > dep(v) is deleted in Mv ((r,−1) ∈ Mv), and every node n in P′, with

s= preP′(n), for which s /∈ {l(w) | w∈DownMST (P, P′)(v)} is inserted in Mv ((−1, s)∈Mv).

Example 22. Consider process trees P and P′ in Figure 5.12a. Figure 5.12b illustrates

the mapping search tree MST (P, P′). For example, the path 〈0, 2,−1〉 in MST (P, P′)

represents the mapping {(0, 0), (1, 2), (2,−1), (−1, 1)} between P and P′. In this

path, the node labeled with “2” does not have a child with the label “1”, because the

set of pairs {(0, 0), (1, 2)}, in compliance with the condition 5b of mapping, cannot

form a mapping with the pair (2, 1).

0

-1 1 2 -1 2

-1 1

P P'

a

0

1 2b

0

1

2

b

0

1 2a

-1

1

(a)

0

-1 1 2 -1 2

-1 1

P P'

a

0

1 2b

0

1

2

b

0

1 2a

-1

2

(b)

Figure 5.12: Process trees P and P′ (a) and their mapping search tree (b) in Example 22.

In a mapping M between P and P′, each activity node in P can only be mapped to

−1 or to an activity node in P′, and vise versa (conditions 3c and 4 in Definition 21).

Moreover, the cost of substituting, inserting or deleting an activity node in M is always

1, and the cost of mapping two activity nodes with the same label is 0. Therefore, it

holds that the cost of M at least equals to the minimum cost of mapping two sets of

activity nodes under P and P′. For example, assume C1 = {a, b, c, d} and C2 = {a, c,

e, f , g} are the two sets of activity nodes under P and P′, respectively. The cost of M



at least equals to the minimum cost of mapping C1 and C2, i.e. 3, obtained from the

activity mapping set S = {(a, a), (b, e), (c, c), (d, f ), (−1, g)} between C1 and C2.

Given two sets C1 and C2 of activity nodes, Algorithm 2 computes the minimum cost

of mapping C1 and C2.

Algorithm 2 Compute the minimum cost of mapping two sets of activity nodes1: procedure MINMAPPINGCOST(C1, C2)2: cost← 03: for each c ∈C1 do4: for each d ∈C2 do5: if l(c) = l(d) then6: C1←C1−{c}7: C2←C2−{d}8: break9: end if

10: end for11: end for12: cost←min(|C1|, |C2|) + (||C1| − |C2||)13: return cost14: end procedure

For every node in C1, Algorithm 2 iterates over all nodes in C2 to find a node with

the same label. If such a node is found it removes the two nodes from their respective

sets (lines 3-8). After processing every node in C1, there is no pair of nodes from

C1 and C2 with the same label. The remaining nodes in C1 and C2 are then mapped

injectively to each other, constituting min(|B1|, |B2|) mappings. Finally, the remaining

|B1|−|B2| nodes in C1 or C2 are mapped to −1.

5.4.2.1 Exhaustive search

In this section we introduce an A* algorithm that finds the minimal cost mapping be-

tween process trees P and P′, by finding the cheapest path from the root to a leaf

in the mapping search tree MST(P, P′). However, instead of constructing the whole

MST(P, P′) our A* algorithm traverses P in a pre-order manner and only constructs

nodes in MST(P, P′) that are potentially a part of the cheapest path to the leaves.

It is necessary to define two functions g∗(v) and h∗(v) for any instantiation of the

A* algorithm. For a node v ∈MST(P, P′), g∗(v) determines the mapping cost up to v,

whereas h∗(v) estimates the cost of mapping the nodes that have not yet been mapped

up to v. Let v be a node in MST(P, P′) such that dep(v) = preP(w) and l(v) = preP′(u)

or l(v) =−1. Let Y1 and Y2 be the sets of activity nodes in P and P′, respectively. Also,

let C1 =C(w) and C2 =C(u) be the sets of activity nodes under w and u (if l(v) 6=−1),

respectively. Furthermore, let Pm and P′m be the sets of nodes in P and P′, respectively,



that are already mapped (either to a node in the other process tree or to −1). Then,

h∗(v) is defined as follows.

h∗(v) =

minMappingCost(C1, C2) + if l(v) 6=−1

minMappingCost(Y1 \C1 \Pm, Y2 \C2 \P′m)

minMappingCost(Y1 \Pm, Y2 \P′m) if l(v) =−1

To compute g∗(v) we need to compute and sum the cost of every node substitution,

node deletion, and node insertion induced by the mappings on the path from the root

to v in MST(P, P′). However, as specified in Definition 29 for computing the cost of

a mapping M between two process trees P and P′, the cost of deletion or insertion of

an auxiliary or trivial operator node in M is 0. Whether a deleted or inserted operator

node is considered auxiliary or trivial in a mapping depends on how its descendants

are mapped (cf. Section 5.4.1.2). For example, a deleted operator node o is auxiliary

in M if at most one of its child fragments is not deleted in M. Therefore, to determine

if o is an auxiliary operator node we need to know how its descendants are mapped.

However, as we construct a mapping search tree by traversing P in a pre-order man-

ner, the descendants of o are still unmapped when computing the mapping cost at v.

Thus, to enable computing the cost of a deleted or inserted operator node we assume a

mapping of −1 for each of its descendant nodes that is not already mapped. However,

this assumption is only to assist computing the cost of mapped operator nodes up to

each node on a mapping search tree, and does not imply the deletion or the insertion of

those unmapped descendant nodes. Furthermore, this assumption does not result in the

overestimation of the mapping cost as it actually leads to the temporary consideration

of a deleted or inserted operator node as an auxiliary or trivial operator node, with a

cost of 0.

The A* algorithm computes the value of f ∗(v) = g∗(v) + h∗(v) for each node v in

MST(P, P′), and at each step searches for the node with the lowest f ∗. The A* algo-

rithm for finding the lowest cost mapping between P and P′ is given as Algorithm 3.

The A* Algorithm starts with constructing the root node v of MST(P, P′) from a

mapping between two fake root nodes added to P and P′ (line 4). These fake nodes have

the same label, randomly selected from {→,×,∧}\ ({l(root(P)}∪{l(root(P′))}). A

list L holds child-free nodes in MST(P, P′). The algorithm proceeds with adding nodes

to MST(P, P′) and selecting the node in L with the lowest cost at each step (lines 5-14).

Each time a node is selected its subsequent node in the pre-order traverse of P is first



Algorithm 3 A*

1: procedure ASTAR(P, P′)2: /* MST(P, P′) is a mapping search tree between P and P′*/3: /*L is a list of triples*/4: add the node v labeled by 0 as the root to MST(P, P′)5: while dep(v) 6= |P|−1 do6: i← dep(v) + 17: add the node u such that l(u) =−1 to MST(P, P′) as the child of v8: L← L∪{(u, g∗(u), h∗(u))}9: for each w ∈ P′ do

10: if (Mv−{(i, −1)}) ∪{(i, preP′(w))} forms a mapping between P and P′ then11: add the node u such that l(u) = preP′(w) to MST(P, P′) as the child of v12: L← L∪{(u, g∗(u), h∗(u))}13: end if14: end for15: select (v, g∗(v), h∗(v)) ∈ L such that f ∗(v) is minimum16: L← L−{(v, g∗(v), h∗(v))}17: end while18: return Mv19: end procedure

deleted (mapped to −1) (lines 7-8), and then mapped to any node of P′ that does not

lead to the violation of mapping conditions (lines 9-12). At each iteration of the while

loop the node v in L with the lowest f ∗ is selected (line 13). The while loop halts if

v is a leaf of MST(P, P′). At the end, the algorithm outputs the mapping Mv, i.e. the

minimum-cost mapping between P and P′ (line 15).

Example 23. Consider process trees P and P′ in Figure 5.13. Two fake ×-nodes are

added as roots to P and P′, to be mapped at the first step of the A* algorithm. We

illustrate the run of the A* algorithm to construct the MST(P, P′) in Figure 5.14. Here,

the index of a node in MST(P, P′) represents the value g∗(v) + h∗(v) for that node. At

each step, the node with the minimum cost in L, highlighted with gray, is selected and

deleted from L. The children of this node are then added in the following step both to

the MST(P, P′) and to L. Also, the mapping corresponding to the path from the root

to this node is illustrated by dotted lines on the two trees on the left. Each number on

the left side of the MST(P, P′) indicates the pre-order index of the node in P that is

mapped to nodes in P′ via the nodes in that depth of MST(P, P′).

At the step (a) of the running example, the two added fake roots are mapped to each

other, constructing the root of MST(P, P′). At the step (b), the two children of the root

of MST(P, P′) are added by mapping the→-node in P, with the pre-order index of 1,

to −1 and to the→-node in P′, with the pre-order index of 1. As a result of deleting

the→-node in P (mapping to −1) the→-node in P′ is also deleted, since there is no

other operator node in P that can potentially be mapped to it. Here, the value of g∗



P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

0

X X

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X

-10+1 10+1

0+1P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X

00+1

'

0

1

0

Figure 5.13: Process trees P and P′ in Example 23.

for mapping the→-node in P to −1 is 0. Because, as explained before, the algorithm

initially considers a deleted/inserted operator node as an auxiliary or trivial operator

node if it does not know how its descendants are mapped. As these two added children

nodes have the same cost one of them is randomly selected, here the mapping to −1.

At the step (c), the node ‘a’ in P is mapped to −1, to ‘b’ and to ‘a’ in P′, respectively.

Among the existing nodes in L, the mapping of the→-node in P to the→-node in P′

is selected as it has the lowest cost of 1. At the step (d), the children of this node are

added, i.e. again mapping the node ‘a’ to −1, to ‘b’ and to ‘a’ in P′, respectively, and

the node with the lowest cost is selected. At the step (e), the algorithm maps the node

‘b’ in P to −1, to ‘b’ and to a in P′, respectively, and selects the mapping to ‘b’ as the

minimum-cost node. At the step (f), the children of this node are added, i.e. mapping

the node ‘c’ to −1, and to ‘a’ in P′, respectively. Note that mapping the node ‘c’ to

the node ‘a’ makes the deleted (resp., inserted) →-node in P (resp., P′) on the path

〈0,1,−1,3〉 non-auxiliary and non-trivial as it does not satisfy the conditions of an

auxiliary or a trivial operator node anymore (cf. Definition 28). The minimum-cost

node at this step is the mapping of ‘a’ to −1 on the path 〈0, 1,−1〉. At the step (g),

the node ‘b’ in P is again mapped to −1, and to ‘b’ and to ‘a’ in P′, respectively, with

the mapping to ‘b’ being the one with the minimum cost among all nodes. Finally, at

the step (h), the node ‘c’ in P is mapped to −1 and the node ‘a’ in P′, with the latter

mapping forming the minimum-cost node. At this step the A* algorithm terminates as

the node v with the lowest mapping cost is a leaf in MST (P, P′). The minimum-cost

path from the root to v is 〈0, 1,−1, 2, 3〉, and the minimum-cost mapping between P

and P′ is Mv = {(0, 0), (1, 1), (2,−1), (3,2), (4,3)}, with the cost of 2.

Time Complexity

It is known that the problem of computing the tree edit distance between two unordered

trees is NP-hard [138]. A process tree may contain unordered operator nodes, such as

×-nodes and ∧-nodes as well as ordered operator nodes, such as →-nodes and -

nodes. Therefore, the problem of computing the minimum-cost mapping between two



P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

0

X X

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X

-10+1 10+1

0+1P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X

00+1

'

0

1

0

(a)

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

0

X X

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X

-10+1 10+1

0+1P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X

00+1

'

0

1

0

(b)0

-11+1 21+2 31+2

-10+1 10+1

0+1

0

-11+1 21+2 31+2 -11+1 21+2 31+2

-10+1 10+1

0+1

0

-11+1 21+2 31+2 -11+1 21+2 31+2

-10+1 10+1

0+1

-12+2 21+1 33+1

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X0

1

2

3

0

1

2

0

1

2

(c)

0

-11+1 21+2 31+2

-10+1 10+1

0+1

0

-11+1 21+2 31+2 -11+1 21+2 31+2

-10+1 10+1

0+1

0

-11+1 21+2 31+2 -11+1 21+2 31+2

-10+1 10+1

0+1

-12+2 21+1 33+1

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X0

1

2

3

0

1

2

0

1

2

(d)

0

-11+1 21+2 31+2

-10+1 10+1

0+1

0

-11+1 21+2 31+2 -11+1 21+2 31+2

-10+1 10+1

0+1

0

-11+1 21+2 31+2 -11+1 21+2 31+2

-10+1 10+1

0+1

-12+2 21+1 33+1

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X0

1

2

3

0

1

2

0

1

2

(e)



0

-11+1 21+2 31+2 -11+1 21+2 31+2

-10+1 10+1

0+1

-12+2 21+1 33+1

-13+0

0

-11+1 21+2 31+2 -11+1 21+2 31+2

-10+1 10+1

0+1

-12+2 21+1 33+1 -12+2 21+1 33+1

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X0

1

2

3

4

0

1

2

3

4 34+0

-13+0 34+0

(f)

0

-11+1 21+2 31+2 -11+1 21+2 31+2

-10+1 10+1

0+1

-12+2 21+1 33+1

-13+0

0

-11+1 21+2 31+2 -11+1 21+2 31+2

-10+1 10+1

0+1

-12+2 21+1 33+1 -12+2 21+1 33+1

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X0

1

2

3

4

0

1

2

3

4 34+0

-13+0 34+0

(g)

0

-11+1 21+2 31+2 -11+1 21+2 31+2

-10+1 10+1

0+1

-12+2 21+1 33+1 -12+2 21+1 33+1

-13+0 32+0

P P

a b

1

'

2 3 c4

0

a

1

2 3

0

b

X X0

1

2

3

4 -13+0 34+0

(h)

Figure 5.14: The running example of the A* algorithm for the process trees P and P′

in Figure 5.13.



process trees is also NP-hard.

5.4.2.2 Greedy Search

As mentioned in the previous section, the problem of computing the minimum-cost

mapping between two process trees is NP-hard. Consequently, depending on how

different the two process trees are, the A* algorithm may not be able to compute the

optimal solution within a reasonable time. Therefore, in this section we introduce a fast

greedy algorithm, illustrated in Algorithm 4, that approximates the optimal solution.

Algorithm 4 Greedy1: procedure GREEDY(P, P′, threshold)2: /* MST(P, P′) is a mapping search tree between P and P′*/3: /*L is a list of triples*/4: add the node v labeled by 0 as the root to MST(P, P′)5: while dep(v) 6= |P|−1 do6: i← dep(v) + 17: add the node z such that l(z) =−1 to MST(P, P′) as the child of v8: if P[i] is not an operator node then9: L← L∪{(z, g∗(z), h∗(z))}

10: end if11: for each w ∈ P′ do12: if (Mv−{(i, −1)}) ∪{(i, preP′(w))} forms a mapping between P and P′ then13: add the node u such that l(u) = preP′(w) to MST(P, P′) as the child of v.14: L← L∪{(u, g∗(u), h∗(u))}15: end if16: end for17: select (v, g∗(v), h∗(v)) ∈ L such that f ∗(v) is minimum18: y← P[i]19: if y is an operator node then20: if v 6= null then21: y′← P′[l(v)]22: ubc← max(|C(y)|, |C(y′)|) /*Upper bound for the cost of mapping two activity

sets C(y) and C(y′)*/23: mmc← minMappingCost(C(y), C(y′))24: matchingScore← (ubc−mmc)/ubc25: if matchingScore < threshold then26: v← z27: end if28: else29: v← z30: end if31: end if32: L← /033: end while34: return Mv35: end procedure

The Greedy algorithm is similar to the A* algorithm. The latter finds a mapping

between P and P′ that has the lowest global cost. As such, it may process each node in



P multiple times. In contrast, the greedy algorithm selects and fixes a locally optimal

mapping for each node in P at each step of constructing MST (P,P′). This is performed

by clearing the list L containing the child-free nodes in MST (P,P′) at the end of each

step (line 26). In addition, the greedy algorithm has a different strategy for mapping

operator nodes. For every operator node y in P, this algorithm first finds an operator

node y′ in P′ with the lowest mapping cost (lines 10-14). Next, it computes a matching

score between y and y′. The matching score measures the similarity of activity nodes

under y and y′, and lies in the range [0,1], where 0 indicates that there is no pair (a,b) of

activities in C(y)×C(y′) such that l(a) = l(b), whereas 1 indicates {l(a) | a ∈C(y)}={l(b) | b ∈C(y′)}. The nodes y and y′ are mapped if the matching score between them

is above the threshold. However, in case there is no node in P′ that can be mapped to

y or if the matching score is bellow the threshold, y is mapped to −1, by selecting z as

the optimal node in MST (P, P′) (lines 16-25).

Time Complexity

Given two process trees P and P′, the time complexity of the greedy algorithm is

dominated by the complexity of the while loop (line 5). The complexity of this loop

is the maximum of the worst-case complexity of three sequential steps. This loop

iterates |P| times. At each iteration, we first map a node y ∈ P to every node in P′

that satisfies the mapping conditions (lines 10-13), thus O(|P| · |P′|). We then select

the mapping between y and a node in P′ with the lowest cost (line 14). The latter step

requires, in the worst case, iterating over |P′| mappings, thus O(|P| · |P′|). If y is an

operator node for which we are able to find a mapping node y′ ∈ P′, then we compute

the minimum cost of mapping two sets of activity nodes under y and y′ (line 20), thus

O(|P|2 · |P′|). Therefore, the worst-case complexity of the while loop and so that of the

greedy algorithm is O(|P|2 · |P′|).

From Process Tree Mapping to Sequence of Edit Operations

Finally, from a given mapping between two process trees P and P′, we extract a concise

sequence of edit operations that transforms P into P′. In that, we need to satisfy the

fragment deletion/insertion order condition of sequence of edit operations (cf. Defini-

tion 20), that requires fragment deletions (resp., insertions) to precede (resp., to follow)

other operations. Accordingly, we extract fragment deletions first and fragment inser-

tions last from the mapping. Furthermore, edit operations of the same type in each step

are ordered based on the lexicographical order of the fragments to which they are ap-


5.5. CONSTRUCT DRIFT CHARACTERIZATION STATEMENTS

plied. We extract a concise sequence of edit operations from a mapping by performing

the following steps in this order.

1) Fragment deletion

for i = 1 to dep(P) doAdd a D f operation for every maximal deleted fragment (6= trivial τ-node)

rooted at depth i of P.

end for

2) Activity Substitution

Add a SUBac operation for every activity node substitution.

3) Operator Substitution, -Operator deletion, and -Operator insertion

for i = 1 to dep(P) doAdd a SUB⊕ operation for every non-auxiliary nontrivial deleted non- op-

erator node at depth i of P.

Add a D operation for every deleted -node at depth i of P.

end for

for i = dep(P′) to 1 doAdd a SUB⊕ operation for every non-auxiliary nontrivial inserted non- op-

erator node at depth i of P′.

Add a SUB⊕ operation for every operator node substitution at depth i of P′.

Add an I operation for every inserted -node at depth i of P′.

end for

4) Fragment insertion

for i = dep(P′) to 1 doAdd an I f operation for every maximal inserted fragment (6= trivial τ-node)

rooted at depth i of P′.

end for

5.5 Construct Drift Characterization Statements

The output of the previous section is a sequence of edit operations that transforms a

pre-drift process tree P into a post-drift process tree P′. In this section, we construct

a sequence of characterization statements based on a given sequence of edit opera-

tions. As explained in Section 5.4.1, each edit operation describes a simple change in



Table 5.1. By aggregating the simple changes obtained from a sequence of edit op-

erations in a post-processing step we create compound changes (cf. Table 5.1). This

further reduces the number of changes reported to the user and creates higher-level

changes that are easier to interpret. Each remaining simple change and each created

compound change is then reported to the user as a natural language statement.

5.5.1 Simple Change Patterns

Here we describe how each simple change pattern is captured by an edit operation.

• Insert/delete a fragment (sre, pre, cre) The application of a D f edit operation

on a non-τ-fragment represents a fragment deletion, whereas the application of

a I f edit operation on a non-τ-fragment represents a fragment insertion. The

fragment insertion/deletion is serial (sre), parallel (pre), or conditional (cre) if

the parent node of the inserted/deleted fragment is a →-node, a ∧-node, or an

×-node, respectively. Also, the insertion/deletion of a fragment in/from the loop-

body (resp., loopback) of a -node is considered as a serial (resp., conditional)

fragment insertion/deletion. For example, in the transformation of P into P′ in

Figure 5.15a, Fragment 1 is deleted from between activity ‘a’ and activity ‘c’

(serial deletion), whereas Fragments 2 is inserted in a conditional branch with

activity ‘d’.

• Make fragments mutually exclusive/parallel/sequential (cf, pl) The applica-

tion of a SUB⊕ operation on an operator node v changes the relation between

child fragments of v. For example, in the transformation of P into P′ in Fig-

ure 5.15b, Fragment 1 precedes activity ‘a’ in P, but, after the substitution of

the operator of the→(∧(a,b),c)-node with×, they are mutually exclusive in P′.

In addition to the change patterns cf and pl, the relation between two fragments

can also be changed from mutually exclusive to parallel, and vice versa. Though,

this change pattern is not defined as one of the common change patterns in [129].

For example, in Figure 5.15b, activities ‘d’ and ‘e’ were mutually exclusive in

P, but after the substitution of the operator of the ×-node P[5] with the ∧, they

are parallel in P′.

• Make a fragment loopable/non-loopable (lp) The insertion (resp., deletion) of

a -node by a I (resp., D) edit operation as the parent of a fragment makes

that fragment loopable (resp., non-loopable). For example, in the transformation



of P into P′ in Figure 5.15c, Fragment 1 in P has become loopable in P′ with the

insertion of the -node P′[2].

• Make a fragment skippable/non-skippable (cb) The insertion (resp., deletion)

of a τ-node by a I f (resp., D f ) edit operation under an×-node makes other child

fragments of the ×-node skippable (resp., non-skippable). For example, in the

transformation of P into P′ in Figure 5.15d, with the insertion of the τ-node

P′[6], Fragment 1 has become skippable in P′.

P P

d

'0

a

cb

^

d

0

a

fe

2 X

Fragment 2

Fragment 1

P P'

c

ba

^

c

ba

X

e

5

d

X

^ ed

^

Fragment 1Fragment 1

Make Fragment 1 and 'c'

mutually-exclusive

Make 'd' and 'e' parallel

P P'

a

cb

^ a 2

Fragment 1

?

cbFragment 1

^

Make Fragment 1

loopable

P P'

a

cb

^ a

Fragment 1

cbFragment 1

^ 6

X

Make Fragment 1 skippable

P P

c

'

c

ba

X

Fragment 2

Fragment 1ba

^

^ba

^

Fragment 1

Duplicate Fragment 1

P P'

a

db

a

fe

X X

c

Fragment 1 Fragment 2

Substitute Fragment 1 with

Fragment 2

P

c

ba

ed

^

X

P'

c

de

^

ba

X



^

^

Swap Fragment 1 and Fragment 2

e

e

X

ba

^

ba

^dc

^

dc

^

Fragment 1

Fragment 1

P P'

Move Fragment 1

ba

^ c

fe

^

ba

^ c

fe

^40%

30%30% 70%10%

20%

Fragment 1

P P'Change

branching frequency

X X

P P

ca b ed X

c d

'

eaDelete 'b'

Make 'c' and 'd' mutually-exclusive

PX

b c

?

aa cb

XP

b c

t

a

XMake 'b' and 'c' sequential

Make fragment (b, c)

loopable

Ptt

2

P

d

b c

'

^?

a

XInsert 'd'

Fragment 1

Insert Fragment 2

Delete Fragment 1

(a) Insert/delete a fragment.

P P

d

'0

a

cb

^

d

0

a

fe

2 X

Fragment 2

Fragment 1

P P'

c

ba

^

c

ba

X

e

5

d

X

^ ed

^



mutually-exclusive


P P'

a

cb

^ a 2

Fragment 1

?

cbFragment 1

^

Make Fragment 1

loopable

P P'

a

cb

^ a

Fragment 1

cbFragment 1

^ 6

X


P P

c

'

c

ba

X

Fragment 2

Fragment 1ba

^

^ba

^

Fragment 1


P P'

a

db

a

fe

X X

c



Fragment 2

P

c

ba

ed

^

X

P'

c

de

^

ba

X



^

^


e

e

X

ba

^

ba

^dc

^

dc

^

Fragment 1

Fragment 1

P P'

Move Fragment 1

ba

^ c

fe

^

ba

^ c

fe

^40%

30%30% 70%10%

20%

Fragment 1

P P'Change

branching frequency

X X

P P

ca b ed X

c d

'

eaDelete 'b'


PX

b c

?

aa cb

XP

b c

t

a



loopable

Ptt

2

P

d

b c

'

^?

a

XInsert 'd'

Fragment 1

Insert Fragment 2

Delete Fragment 1

(b) Make fragments mutually exclusive/paral-lel/sequential.

P P

d

'0

a

cb

^

d

0

a

fe

2 X

Fragment 2

Fragment 1

P P'

c

ba

^

c

ba

X

e

5

d

X

^ ed

^



mutually-exclusive


P P'

a

cb

^ a 2

Fragment 1

?

cbFragment 1

^

Make Fragment 1

loopable

P P'

a

cb

^ a

Fragment 1

cbFragment 1

^ 6

X


P P

c

'

c

ba

X

Fragment 2

Fragment 1ba

^

^ba

^

Fragment 1


P P'

a

db

a

fe

X X

c



Fragment 2

P

c

ba

ed

^

X

P'

c

de

^

ba

X



^

^


e

e

X

ba

^

ba

^dc

^

dc

^

Fragment 1

Fragment 1

P P'

Move Fragment 1

ba

^ c

fe

^

ba

^ c

fe

^40%

30%30% 70%10%

20%

Fragment 1

P P'Change

branching frequency

X X

P P

ca b ed X

c d

'

eaDelete 'b'


PX

b c

?

aa cb

XP

b c

t

a



loopable

Ptt

2

P

d

b c

'

^?

a

XInsert 'd'

Fragment 1

Insert Fragment 2

Delete Fragment 1

(c) Make a fragment loopable/non-loopable.

P P

d

'0

a

cb

^

d

0

a

fe

2 X

Fragment 2

Fragment 1

P P'

c

ba

^

c

ba

X

e

5

d

X

^ ed

^



mutually-exclusive


P P'

a

cb

^ a 2

Fragment 1

?

cbFragment 1

^

Make Fragment 1

loopable

P P'

a

cb

^ a

Fragment 1

cbFragment 1

^ 6

X


P P

c

'

c

ba

X

Fragment 2

Fragment 1ba

^

^ba

^

Fragment 1


P P'

a

db

a

fe

X X

c



Fragment 2

P

c

ba

ed

^

X

P'

c

de

^

ba

X



^

^


e

e

X

ba

^

ba

^dc

^

dc

^

Fragment 1

Fragment 1

P P'

Move Fragment 1

ba

^ c

fe

^

ba

^ c

fe

^40%

30%30% 70%10%

20%

Fragment 1

P P'Change

branching frequency

X X

P P

ca b ed X

c d

'

eaDelete 'b'


PX

b c

?

aa cb

XP

b c

t

a



loopable

Ptt

2

P

d

b c

'

^?

a

XInsert 'd'

Fragment 1

Insert Fragment 2

Delete Fragment 1

(d) Make a fragment skippable/non-skippable.

Figure 5.15: Examples of transforming a process tree P into a process tree P′ by theapplication of simple changes.

5.5.2 Compound Change Patterns

By aggregating simple change patterns we can construct compound change patterns.

We describe these compound patterns below.

• Duplicate a fragment An inserted fragment f2 in a process tree is a duplicate

fragment if there is another fragment f1 in the tree, such that stringify( f2) =

stringify( f1), where stringify is a recursive function that converts a fragment to

a unique and stable textual representation and is defined as follows.



– For an activity node or a τ-node v, stringify(v) = l(v).

– For a fragment F =⊕(F1, . . . Fn) such that l(⊕) ∈ {→, },stringify(F) = l(⊕)(stringify(F1)+ . . .+ stringify(Fn)).

– For a fragment F =⊕(F1, . . . Fn) such that l(⊕) ∈ {×, ∧},stringify(F)= l(⊕)(stringify(Fi) + . . .+ stringify(Fm)), where 1≤ i, . . . m ≤n and {stringify(Fi), . . . stringify(Fm)} is an ordered sequence obtained by

arranging the sequence {stringify(F1), . . . , stringify(Fn)} in ascending al-

phabetical order.

For example, in the transformation of P into P′ in Figure 5.16a, Fragment 2 in P′

is a duplicate of Fragment 1, since Fragment 2 is inserted, and stringify(Fragment 2) =

stringify(Fragment 1)

• Substitute a fragment (rp) The application of a SUBac edit operation represents

an activity substitution, and the application of a SUB⊕ edit operation represents

an operator substitution. To discover a fragment substitution we need to abstract

from the operator and the activity substitutions within the fragment. A fragment

f in P is substituted by a fragment f ′ in P′ if at least one node within f is

substituted by a node within f ′, and every other node within f (resp., f ′) is either

substituted by (resp., either substitutes) a node within f ′ (resp., f ) or is deleted

(resp., inserted). For example, in the transformation of P into P′ in Figure 5.16b,

fragment 1 in P is substituted with fragment 2 in P′.

• Swap two fragments (sw) In the transformation of P into P′, two fragments

f1 and f2 in P are swapped if they are substituted by two fragments f ′1 and f ′2in P′, respectively, such that stringify( f1) = stringify( f ′2) and stringify( f2) =

stringify( f ′1). For example, in the transformation of P into P′ in Figure 5.16c,

Fragment 1 and Fragment 2 in P are swapped as they are substituted by Frag-

ment 2 and Fragment 1 in P′, respectively, and stringify(Fragment 1P) =

stringify(Fragment 2P′) = “ab” and stringify(Fragment 2P) =

stringify(Fragment 1P′) = “bc”.

• Move a fragment (sm, pm, cm) The combination of deleting a fragment f from

P and inserting a fragment f ′ in P such that stringify( f ) = stringify( f ′) repre-

sents a move of the fragment f within P. The fragment move is serial (sre),

parallel (pre), or conditional (cre) if the parent node of the inserted fragment f ′

is a →-node, a ∧-node, or an ×-node, respectively. For example, in the trans-



formation of P into P′ in Figure 5.16d, Fragment 1 has moved to a conditional

branch with activity ‘e’.

• Change branching frequency (fr) In the transformation of P into P′, let v ∈ P

be an ×-node with no deleted or inserted children. Also let c be a child frag-

ment of v that is not substituted by another fragment. We define the relative

frequency of c as the ratio between the frequency of c and the frequency of v,

and express it as a percentage by multiplying it by 100. A significant change

in the relative frequency of c over the transformation of P into P′ represents a

change of branching frequency. Let freqB and freqA be the relative frequencies

of c in P and P′, respectively. We compute the relative frequency change of c

by | f reqB− f reqA|∗100/avg( f reqB, f reqA), where the function avg( f reqB, f reqA) com-

putes the average of f reqA and f reqB. The significance of the relative frequency

change can be defined by the user. To focus on more significant branching fre-

quency changes, in the evaluation sections of this chapter we consider a relative

frequency change of at least 50% as a significant change, where the relative fre-

quency of the fragment is at least 25% in P or P′. To compute the frequency of

nodes in a process tree we replay its underlying event log on top of the process

tree. For example, in the transformation of P into P′ in Figure 5.16e, the relative

frequency of Fragment 1 has changed from 40% in P to 70% in P′, while the

relative frequency of activity ‘c’ has changed from 30% in P to 10% in P′.

5.5.3 Nested Changes

As defined in the previous chapter (cf. Section 4.2), multiple changes that share some

behavioral relations between activities, e.g. causality or concurrency, are called over-

lapping changes. For example, in the transformation of the process tree P to the process

tree P′ in Figure 5.17, there are two overlapping changes: 1. activity ‘b’ is deleted, 2.

change in the relation between activity ‘c’ and activity ‘d’ from sequential in the left

process tree to mutually exclusive in the right process tree. When applied in isolation,

these two changes share the causal relation b→ c. The application of the first change

deletes this behavioral relation, while the application of the second change decreases

the frequency of its execution. By abstracting from the low-level behavioral relations

between activities, IM discovers relations between fragments within a process tree.

Consequently, overlapping changes are isolated from each other, and can be character-

ized in the same way as the non-overlapping ones. For example in Figure 5.17, in P,



P P

d

'0

a

cb

^

d

0

a

fe

2 X

Fragment 2

Fragment 1

P P'

c

ba

^

c

ba

X

e

5

d

X

^ ed

^



mutually-exclusive


P P'

a

cb

^ a 2

Fragment 1

?

cbFragment 1

^

Make Fragment 1

loopable

P P'

a

cb

^ a

Fragment 1

cbFragment 1

^ 6

X


P P

c

'

c

ba

X

Fragment 2

Fragment 1ba

^

^ba

^

Fragment 1


P P'

a

db

a

fe

X X

c



Fragment 2

P

c

ba

ed

^

X

P'

c

de

^

ba

X



^

^


e

e

X

ba

^

ba

^dc

^

dc

^

Fragment 1

Fragment 1

P P'

Move Fragment 1

ba

^ c

fe

^

ba

^ c

fe

^40%

30%30% 70%10%

20%

Fragment 1

P P'Change

branching frequency

X X

P P

ca b ed X

c d

'

eaDelete 'b'


PX

b c

?

aa cb

XP

b c

t

a



loopable

Ptt

2

P

d

b c

'

^?

a

XInsert 'd'

Fragment 1

Insert Fragment 2

Delete Fragment 1

(a) Duplicate a fragment.

P P

d

'0

a

cb

^

d

0

a

fe

2 X

Fragment 2

Fragment 1

P P'

c

ba

^

c

ba

X

e

5

d

X

^ ed

^



mutually-exclusive


P P'

a

cb

^ a 2

Fragment 1

?

cbFragment 1

^

Make Fragment 1

loopable

P P'

a

cb

^ a

Fragment 1

cbFragment 1

^ 6

X


P P

c

'

c

ba

X

Fragment 2

Fragment 1ba

^

^ba

^

Fragment 1


P P'

a

db

a

fe

X X

c



Fragment 2

P

c

ba

ed

^

X

P'

c

de

^

ba

X



^

^


e

e

X

ba

^

ba

^dc

^

dc

^

Fragment 1

Fragment 1

P P'

Move Fragment 1

ba

^ c

fe

^

ba

^ c

fe

^40%

30%30% 70%10%

20%

Fragment 1

P P'Change

branching frequency

X X

P P

ca b ed X

c d

'

eaDelete 'b'


PX

b c

?

aa cb

XP

b c

t

a



loopable

Ptt

2

P

d

b c

'

^?

a

XInsert 'd'

Fragment 1

Insert Fragment 2

Delete Fragment 1

(b) Substitute a fragment.

P P

d

'0

a

cb

^

d

0

a

fe

2 X

Fragment 2

Fragment 1

P P'

c

ba

^

c

ba

X

e

5

d

X

^ ed

^



mutually-exclusive


P P'

a

cb

^ a 2

Fragment 1

?

cbFragment 1

^

Make Fragment 1

loopable

P P'

a

cb

^ a

Fragment 1

cbFragment 1

^ 6

X


P P

c

'

c

ba

X

Fragment 2

Fragment 1ba

^

^ba

^

Fragment 1


P P'

a

db

a

fe

X X

c



Fragment 2

P

c

ba

ed

P'

c

de

ba



^

^


e

e

X

ba

^

ba

^dc

^

dc

^

Fragment 1

Fragment 1

P P'

Move Fragment 1

ba

^ c

fe

^

ba

^ c

fe

^40%

30%30% 70%10%

20%

Fragment 1

P P'Change

branching frequency

X X

P P

ca b ed X

c d

'

eaDelete 'b'


PX

b c

?

aa cb

XP

b c

t

a



loopable

Ptt

2

P

d

b c

'

^?

a

XInsert 'd'

Fragment 1

Insert Fragment 2

Delete Fragment 1

X

X ?

?

(c) Swap two fragments.

P P

d

'0

a

cb

^

d

0

a

fe

2 X

Fragment 2

Fragment 1

P P'

c

ba

^

c

ba

X

e

5

d

X

^ ed

^



mutually-exclusive


P P'

a

cb

^ a 2

Fragment 1

?

cbFragment 1

^

Make Fragment 1

loopable

P P'

a

cb

^ a

Fragment 1

cbFragment 1

^ 6

X


P P

c

'

c

ba

X

Fragment 2

Fragment 1ba

^

^ba

^

Fragment 1


P P'

a

db

a

fe

X X

c



Fragment 2

P

c

ba

ed

^

X

P'

c

de

^

ba

X



^

^


e

e

X

ba

^

ba

^dc

^

dc

^

Fragment 1

Fragment 1

P P'

Move Fragment 1

ba

^ c

fe

^

ba

^ c

fe

^40%

30%30% 70%10%

20%

Fragment 1

P P'Change

branching frequency

X X

P P

ca b ed X

c d

'

eaDelete 'b'


PX

b c

?

aa cb

XP

b c

t

a



loopable

Ptt

2

P

d

b c

'

^?

a

XInsert 'd'

Fragment 1

Insert Fragment 2

Delete Fragment 1

(d) Move a fragment.

P P

d

'0

a

cb

^

d

0

a

fe

2 X

Fragment 2

Fragment 1

P P'

c

ba

^

c

ba

X

e

5

d

X

^ ed

^



mutually-exclusive


P P'

a

cb

^ a 2

Fragment 1

?

cbFragment 1

^

Make Fragment 1

loopable

P P'

a

cb

^ a

Fragment 1

cbFragment 1

^ 6

X


P P

c

'

c

ba

X

Fragment 2

Fragment 1ba

^

^ba

^

Fragment 1


P P'

a

db

a

fe

X X

c



Fragment 2

P

c

ba

ed

P'

c

de

ba



^

^


e

e

X

ba

^

ba

^dc

^

dc

^

Fragment 1

Fragment 1

P P'

Move Fragment 1

ba

^ c

fe

^

ba

^ c

fe

^40%

30%30% 70%10%

20%

Fragment 1

P P'Change

branching frequency

X X

P P

ca b ed X

c d

'

eaDelete 'b'


PX

b c

?

aa cb

XP

b c

t

a



loopable

Ptt

2

P

d

b c

'

^?

a

XInsert 'd'

Fragment 1

Insert Fragment 2

Delete Fragment 1

X

X ?

?

(e) Change branching frequency.

Figure 5.16: Examples of transforming a process tree P into a process tree P′ by theapplication of compound changes.



activity ‘b’ precedes the fragment →(c,d), whereas in P′, activity ‘b’ is deleted and

the operator of the root of the fragment→(c,d) is substituted with ×, resulting in the

fragment ×(c,d).

P P

d

'0

a

cb

^

d

0

a

fe

2 X

Fragment 2

Fragment 1

P P'

c

ba

^

c

ba

X

e

5

d

X

^ ed

^



mutually-exclusive


P P'

a

cb

^ a 2

Fragment 1

?

cbFragment 1

^

Make Fragment 1

loopable

P P'

a

cb

^ a

Fragment 1

cbFragment 1

^ 6

X


P P

c

'

c

ba

X

Fragment 2

Fragment 1ba

^

^ba

^

Fragment 1


P P'

a

db

a

fe

X X

c



Fragment 2

P

c

ba

ed

^

X

P'

c

de

^

ba

X



^

^


e

e

X

ba

^

ba

^dc

^

dc

^

Fragment 1

Fragment 1

P P'

Move Fragment 1

ba

^ c

fe

^

ba

^ c

fe

^40%

30%30% 70%10%

20%

Fragment 1

P P'Change

branching frequency

X X

P P

ca b ed X

c d

'

eaDelete 'b'


PX

b c

?

aa cb

XP

b c

t

a



loopable

Ptt

2

P

d

b c

'

^?

a

XInsert 'd'

Fragment 1

Insert Fragment 2

Delete Fragment 1

Figure 5.17: Example of transforming a process tree P into a process tree P′ by theapplication of overlapping changes.

Nested changes are a set of overlapping changes, each applied to the resulting

process subtree from the application of its previous change. The hierarchical structure

of a process tree allows us to characterize the changes applied to the inner structure of

a fragment and those applied to the fragment as a whole independently of each other.

In 5.4.2.2, we explained in what order we traverse process trees to extract a sequence

of edit operations from a mapping. We apply the changes in the same order that we

extracted them to transform a process tree P into a process tree P′. For example, to

transform the process trees P into the process tree P′ in Figure 5.18, we first make

activity ‘b’ and activity ‘c’ sequential, resulting in the fragment→(b,c). Next, a loop

structure is placed over this fragment, by inserting the -node Ptt [2]. Finally, activity

‘d’ is inserted in a parallel branch with the fragment ∧((→(b,c),τ),d).

P P

d

'0

a

cb

^

d

0

a

fe

2 X

Fragment 2

Fragment 1

Insert Fragment 2

Delete Fragment 1

P P'

c

ba

^

c

ba

X

e

5

d

X

^ ed

^

Fragment 1

Fragment 1


mutually-exclusive


P P'

a

cb

^ a 2

Fragment 1

?

cbFragment 1

^

Make Fragment 1 loopable

P P'

a

cb

^ a

Fragment 1

cbFragment 1

^ 6

X


P P

c

'

c

ba

X

Fragment 2

Fragment 1ba

^

^ba

^

Fragment 1


P P'

a

db

a

fe

X X

cFragment 1 Fragment 2


Fragment 2

P

c

ba

ed

^

X

P'

c

de

^

ba

X



^

^


e

e

X

ba

^

ba

^dc

^

dc

^

Fragment 1

Fragment 1

P P'

Move Fragment 1

ba

^ c

fe

^

ba

^ c

fe

^

40% 30% 30% 70% 10% 20%

Fragment 1

P P'

Change branching frequency

X X

P P

ca b ed X

c d

'

eaDelete 'b'


PX

b c

?

aa cb

XP

b c

t

a



loopable

Ptt

2

P

d

b c

'

^?

a

XInsert 'd'

Fragment 1

Figure 5.18: Example of transforming a process tree P into a process tree P′ by theapplication of nested changes.

Table 5.4 shows the format of drift characterization statements produced by our

method for each change pattern.



Code Change pattern Drift characterization statement formatsre Insert/delete a fragment be-

tween two fragmentsAfter the drift, fragment f1 = . . . is inserted (resp., deletedfrom) between fragments f2 = . . . and f3 = . . ..

pre Insert/delete a fragment in-/from parallel branch

After the drift, fragment f1 = . . . is inserted in (resp., deletedfrom) a parallel branch with fragment f2 = . . ..

cre Insert/delete a fragment in-/from conditional branch

After the drift, fragment f1 = . . . is inserted in (resp., deletedfrom) a conditional branch with fragment f2 = . . ..

cp Duplicate a fragment After the drift, fragment f1 = . . ., i.e. a duplicate of frag-ment f2 = . . ., is inserted ... (continues with sre, pre, or cre).

rp Substitute a fragment After the drift, fragment f1 = . . . is substituted by fragmentf2 = . . ..

sw Swap two fragments After the drift, fragments f1 = . . . and f2 = . . . are swapped.sm Move a fragment to between

two fragmentsAfter the drift, fragment f1 = . . . has moved to betweenfragments f2 = . . . and f3 = . . ..

cm Move a fragment into/out ofconditional branch

After the drift, fragment f1 = . . . has moved to a conditionalbranch with fragment f2 = . . ..

pm Move a fragment into/out ofparallel branch

After the drift, fragment f1 = . . . has moved to a parallelbranch with fragment f2 = . . ..

cf Make fragments mutually ex-clusive/sequential

Before the drift, fragments f1 = . . . , . . . and fn = . . . weremutually exclusive (resp., sequential), while after the driftthey are sequential (resp., mutually exclusive).

pl Make fragments parallel/se-quential

Before the drift, fragments f1 = . . . , . . . and fn = . . . wereparallel (resp., sequential), while after the drift they are se-quential (resp., parallel).

lp Make a fragmentloopable/non-loopable

After the drift, fragment f1 = . . . has become loopable/non-loopable.

cb Make a fragmentskippable/non-skippable

After the drift, fragment f1 = . . . has become skippable/non-skippable.

fr Change branching frequency Before the drift, after the ×-node ⊕ the branch of fragmentf1 = . . . was executed x% of the time, while after the drift itis executed y% of the time.

Table 5.4: Change patterns from [129] and their drift characterization statement for-mats.


5.6. TOOL SUPPORT

5.5.4 Unsupported patterns

The only change pattern from Table 5.1 that our method is unable to support is the

Synchronize two fragments change pattern. This pattern refers to changes where two

parallel fragments are synchronized, or vice versa. As discussed in Section 5.4.1, this

pattern introduces unstructuredness into a process model and hence cannot be used as

a basis for defining process tree edit operations. Figure 5.19a shows an example of

this change pattern. In this example, before the change, activity ‘b’ was performed

in parallel with activity ‘c’, while after the change, activity ‘b’ precedes activity ‘c’.

Observe that the synchronization change pattern is different from the change pattern

where we sequentialize two parallel fragments (“pl” in Table 5.1) by transforming the

parallel block containing the fragments to a sequential block without impairing the

structuredness of the process. However, in the synchronization change pattern two

parallel fragments are synchronized by directly connecting one fragment to the other.

This results in the loss of structuredness between the two branches in the parallel block

that contains the two synchronized fragments. Consequently, to discover a structured

process model IM needs to generalize from the behavior of this unstructured block. As

such, the resulting process tree does not precisely represent the synchronization change

applied to the process, i.e. it also represents false process changes. Figure 5.19b shows

the process trees corresponding to the process models in Figure 5.19a discovered by

IM. Activity ‘c’ which was in parallel with activity ‘b’ and mutually exclusive with

activity ‘d’ in process tree P is performed after the parallel block in process tree P′.

Furthermore, activities ‘c’ and ‘d’ can be skipped in P′. Although the occurrence of ‘c’

after ‘b’ is accurately represented in P′, there are also several false changes in P′ such

as the occurrence of ‘c’ after ‘d’ or the occurrence of ‘e’ after ‘b’ by skipping ‘c’.

5.6 Tool Support

As for the methods presented in Chapters 3 and 4, we also implemented the method

for characterizing drifts at the level of process model fragments as an extension of

ProDrift, available both as a standalone tool as well as a plugin of Apromore. To enable

the characterization of a detected drift the user needs to tick the “Drift characterization”

checkbox in the configuration panel of the plug-in, as shown in Figure 5.20a. It is

then required to choose between the two drift characterization configuration options:

“activity level” and “fragment level”. To characterize drifts using the method presented

in this chapter, the “fragment level” option needs to be selected.



a

d

b

ec

a

d

b

e

c

a e

P P'

^

b X

c d

a e^

b X

d

X

c

(a) Petri net models before and after the synchronization.

a

d

b

ec

a

d

b

e

c

a e

P P'

^

b X

c d

a e^

b X

d

X

c

(b) Process trees P and P′ discovered by IM be-fore and after the synchronization.

Figure 5.19: Example of synchronizing two fragments change pattern. Activity ‘b’ andactivity ‘c’ are synchronized.

By default, the tool uses the A* algorithm (cf. Section 5.4.2.1) to search for a min-

imum cost sequence of edit operations that transforms the pre-drift process tree to the

post-drift process tree, resulting in a complete and concise set of drift characterization

statements. As a more efficient alternative, the greedy algorithm (cf. Section 5.4.2.2)

may be selected to speed up the search at the price of a less concise characterization.

Furthermore, to discover process trees from noisy logs we use IMfpt, i.e. a variant of

IM with noise filtering capabilities that works with partial traces. By default, we set

the value of the noise filtering threshold of IMfpt to 10% as this showed to effectively

handle noise in our experiments with artificial and real-life logs, as reported later in

this chapter. Alternatively, it is also possible for the user to set a different threshold by

changing the value of the “Drift characterization noise filter” field.

After a drift is detected it is characterized by the drift characterization method.

Once the parsing of the log is complete, by clicking on each drift on the list of detected

drifts, the user can inspect its natural language characterization statements, as shown

in Figure 5.20b.

The new variants of Inductive Miner including IMpt, IMfpt and IMcpt, have been

implemented as plug-ins of the ProM framework2 [124] and their source code is pub-

licly available.

2http://promtools.org


5.6. TOOL SUPPORT

(a) Enable drift characterization and choose the ”fragment level” configuration for using thedrift characterization method presented in this chapter.

(b) Inspect natural language characterization statements for each detected drift.

Figure 5.20: Drift characterization at fragment level using the ProDrift plug-in in Apro-more.




To evaluate the effectiveness of our method we used ProDrift to conduct experiments

on artificial and real-life event logs with different parameters settings. The tool is fed

with an event stream replayed from an event log, and outputs, for each detected drift, its

characterization statements in natural language. In the rest of this section, we present

the results of our evaluation on artificial logs. Specifically, we measured the accuracy

of drift characterization, the conciseness of the statements produced to characterize

such drifts and the time performance, and compared the results against our activity-

based characterization method, presented in the previous chapter. In the next section,

we present the results of our evaluation on real-life logs.

5.7.1 Setup

We generated an artificial dataset using the same CPN3 base model as in Chapter 3 (cf.

Figure 3.2). This model represents a block-structured process, consisting of 42 activ-

ities, five XOR, six AND, and three loop structures, modeled in an intertwined way,

producing highly variable event logs with trace variability of around 80%. For each

change pattern, except “Duplicate a Fragment”, in Table 5.1, we generated five logs,

each featuring two drifts applied to fragments of a different size between one to five.

Note that as IM does not discover process trees with duplicate activities, we do not

experiment with logs containing drifts caused by a fragment duplication. Nonetheless,

the process tree transformation algorithms presented in this chapter can be applied to

process trees with duplicate fragments and are able to identify insertion (resp., dele-

tion) of a duplicate fragment in (resp., from) a process tree. Also, label duplication

techniques such as the ones introduced in [26, 75] can be used to pre-process a log

before applying IM. For each generated log we simulated 3,000 traces, with drifts

injected during the simulation at 1,000-trace intervals. The first drift is injected by ap-

plying a change pattern to the base model, and the second drift is injected by reversing

the applied change and reverting to the base model.

We also evaluated our methods in more complex settings by simulating logs fea-

turing drifts caused by multiple non-overlapping simultaneous changes (i.e. compos-

ite changes) as well as nested changes. To create such logs, we divided our change

patterns into three categories, as described in Chapter 2: Insertion (“I”), Resequen-

tialization (“R”) and Optionalization (“O”) (cf. Table 5.1). Limited to three cross-

3http://cpntools.org



category changes, these categories make six possible scenarios for each of the com-

posite changes and nested changes (“IOR”, “IRO”, “OIR”, “ORI”, “RIO”, “ROI”). For

each such scenario, five logs were generated by randomly selecting one template from

each category and applying them to fragments of a certain size, from one to five. For

example, a drift from the composite change scenario of “IOR” could simultaneously

delete a fragment of size two (“I”), add a loop over a fragment of size two (“O”), and

parallelize two sequential fragments of size two (“R”) in three different locations of

the process. As another example, a drift in the process from the nested changes sce-

nario of “IOR” could first parallelize two sequential fragments of size three (“R”), then

add a loop over the two parallelized fragments (“O”), and finally insert a fragment of

size three in a conditional branch with the resulting loop fragment (“I”). In turn, this

resulted in 30 logs for each of the non-overlapping and nested changes settings. This

resulted in a collection of 65 logs with single changes, 30 logs with composite changes,

and 30 logs with nested changes, each containing 3,000 traces with two equidistant

drifts involving one or multiple fragments of a certain size.

For each such log, we also generated two variants with 2.5% and 5% noise by

inserting random events into the traces of the log. Altogether, the artificial dataset

contained 375 logs.4

5.7.2 Accuracy of Drift Characterization: Fragment-based vs Activity-based

In the first experiment, we evaluate and compare the accuracy of our fragment-based

characterization method in characterizing drifts detected in the artificial logs versus

that of our activity-based characterization method, presented in the previous chapter.

To ensure that we use the same sub-logs as the activity-based characterization

method, we used the same method for drift detection in our experiments with the arti-

ficial and real-life event streams in this chapter. Furthermore, we also used the same

strategy as the activity-based characterization method to extract pre-drift and post-drift

sub-logs after the detection of a drift. Specifically, we use the two sub-logs of partial

traces built, respectively, from the events in the reference window as the P–value drops

below the threshold, and from the events in the detection window as the P–value rises

above the threshold, to discover the pre-drift and post-drift process trees. By doing

so, we try to obtain pre-drift and post-drift process trees that only represent the actual

4All the CPN models used for this simulation, the resulting artificial logs, and the detailed evaluationresults are available with the software distribution.



process behaviors before and after a drift. It is worth noting that our fragment-based

drift characterization method can be applied on top of any drift detection technique

that works on event streams or trace streams. The only required input to our method is

a pair of sub-logs containing partial or complete traces from before and after a drift.

The output of both the fragment-based characterization method and the activity-

based characterization method is a list of statements explaining the changes underpin-

ning a drift. To compare the accuracy of the reported statements by the two methods

we use F-score, i.e. the harmonic mean of recall and precision, where recall measures

the ratio of reported statements relevant to the drift over the total number of statements

required to explain the drift, and precision measures the ratio of reported statements

relevant to the drift over the total number of reported statements. The relevance of a

statement to a drift is assessed manually such that a statement is considered to be rel-

evant to the drift if it describes at least a fraction of the changes applied to the process

in order to inject that drift. We count the number of statements required by a method

to explain a drift based on the changes applied to the process to inject that drift and the

abstraction level of the characterization statements produced by that method. For ex-

ample, to explain deleting a fragment of three activities, the activity-based characteri-

zation method needs three statements, one per activity, since it is designed to character-

ize changes at the level of individual activities. On the other hand, the fragment-based

method needs only one statement to explain the same fragment deletion.

For the experiments in this section we use the A* algorithm (cf. 5.4.2.1) to compute

edit operations to transform pre-drift process tree to post-drift process trees. We also

use the activity-based characterization method with its default parameter settings.

5.7.2.1 Fragments of Different Size

Figure 5.21 shows the average F-score over all logs of a certain fragment size, with

and without noise, for the fragment-based as well as the activity-based characteriza-

tion methods. Figure 5.21a shows that the accuracy of our fragment-based character-

ization method is not influenced by the size of fragments involved in a drift, as the

average F-score remains around 0.99 for all fragment sizes, over all noise-free logs.

On the other hand, the average F-score of the activity-based characterization method

drops as the fragment size increases, being on average around 0.85, 0.56, 0.38, 0.28

and 0.21 for fragments of size one, two, three, four and five over all noise-free logs,

respectively. This is explained by the fact that this method is limited to characteriz-

ing changes to fragments of size one, i.e. individual activities. For a change involving



larger fragments this method either fails to characterize the change or can only partially

characterize it, resulting in a significant drop in the recall, from 0.82 for fragments of

size one to 0.13 for those of size five. However, the precision of the activity-based char-

acterization method is not influenced as much by the increase in the size of fragments,

dropping from 0.98 for fragments of size one to 0.82 for those of size five.

For the experiments with logs that contain noise we used IMfpt, i.e. a variant

of Inductive Miner that filters out infrequent behavior in the logs of partial traces,

discovering noise-free pre-drift and post-drift process trees. To avoid introducing false

differences between the pre-drift and post-drift process trees as a result of filtering, a

process behavior is treated as noise if it does not meet the filtering requirements on

both sides of the drift. This significantly improved the accuracy of our fragment-based

characterization method in experiments with noisy logs. We set the noise filtering

threshold parameter of IMfpt to 10% for the experiments with these logs. The results

with the logs with 2.5% and 5% noise, in Figures 5.21b, and 5.21c, suggest that both

characterization methods can to a great extent handle different levels of noise injected

in the logs. The accuracy of the fragment-based method incurs a slight decrease of

around 15% for both 2.5% and 5% noise, with F-score being above 0.82 averaged over

all logs of the same fragment size. This is mostly caused by a decrease in the precision

of this method from 0.99, averaged over all fragment sizes, for noise-free logs, to

0.76 and 0.73 for logs with 2.5% and 5% noise, respectively. The average F-score

of the activity-based characterization method also drops by around 10% per fragment

size for logs with 2.5% and 5% noise. The precision of this method also drops from

0.91, averaged over all fragment sizes, for noise-free log to 0.7 and 0.67 for logs with

2.5% and 5% noise, respectively. The activity-based characterization method uses

a statistical technique to filter out spurious relations from the extracted α+ relations

before matching them with change templates. With regards to the impact of fragment

size on the characterization accuracy of the two methods, we observe similar trends

as the noise-free logs. The accuracy of the fragment-based characterization method is

not affected by the fragment size, whereas that of the activity-based characterization

method drops significantly as fragments became larger.

5.7.2.2 Process Change Patterns

Figure 5.22 reports the average F-score for each single, composite and nested change

pattern over all fragment sizes, with and without noise in the logs, for the fragment-

based as well as the activity-based characterization methods. In this figure, we dis-



0

0.2

0.4

0.6

0.8

1

1 2 3 4 5

F-s

core

Fragment size

Fragment-

based

Activity-

based

(a) Noise ratio = 0%

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5

F-s

core

Fragment size

Fragment-

based

Activity-

based

(b) Noise ratio = 2.5%

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5

F-s

core

Fragment size

Fragment-

based

Activity-

based

(c) Noise ratio = 5%

Figure 5.21: Average F-score over all logs with different noise ratios per fragmentsize, obtained with the fragment-based characterization method vs. the activity-basedcharacterization method

tinguish the composite change patterns from the nested ones by appending “ c” and

“ n” to their names, respectively. The results of the experiment in Figure 5.22a shows

that in the absence of noise in the logs, the fragment-based characterization method

has a perfect F-score of 1 for all the single change patterns and for all but four of the

composite and nested change patterns, namely IOR c, IRO c, IRO n, and RIO n. For

these four logs, the discovered process trees by IM had minor imprecisions, leading to

some false statements. On the other hand, the activity-based characterization method

has an F-score in the range of 0.5− 0.6 for most of the single and composite change

patterns, with “cb” having the lowest F-score of 0.31, and “rp” and “sw” having the

highest F-score of 0.7.

For the nested change patterns the activity-based characterization method, as ex-

pected, performs poorly, with a maximum F-score of 0.34 for “ROI n”. This is due to

the inherent inability of this method to characterize nested changes. For the logs with

noise, as shown in Figure 5.22b and Figure 5.22c, despite a small drop in the accuracy

of the fragment-based characterization method, this could filter out most of the injected

noise in the logs, and achieve a higher F-score than the activity-based method for all

single, composite and nested change patterns. The F-score falls to around 0.8 for most

single change patterns for both 2.5% and 5% noise, and to around 0.9 and 0.85 for



most composite and nested change patterns for 2.5% and 5% noise, respectively. The

activity-based characterization method also handles the injected noise well and only

incurs slight drops in its F-score. As explained before, this method can inherently filter

out infrequent relations formed by spurious events.

0

0.2

0.4

0.6

0.8

1

sre

pre cre rp sw sm cm p

m cf pl

lp cb fr

IOR

_c

IRO

_c

OIR

_c

OR

I_c

RIO

_c

RO

I_c

IOR

_n

IRO

_n

OIR

_n

OR

I_n

RIO

_n

RO

I_n

F-s

core

Change pattern

Fragment-

basedActivity-

based


0

0.2

0.4

0.6

0.8

1

sre


m cf pl

lp cb fr

IOR

_c

IRO

_c

OIR

_c

OR

I_c

RIO

_c

RO

I_c

IOR

_n

IRO

_n

OIR

_n

OR

I_n

RIO

_n

RO

I_n

F-s

core

Change pattern

Fragment-

based

Activity-

based


0

0.2

0.4

0.6

0.8

1

sre


m cf pl

lp cb fr

IOR

_c

IRO

_c

OIR

_c

OR

I_c

RIO

_c

RO

I_c

IOR

_n

IRO

_n

OIR

_n

OR

I_n

RIO

_n

RO

I_n

F-s

core

Change pattern

Fragment-

basedActivity-

based


Figure 5.22: Average F-score over all fragment sizes per single, composite, and nestedchange pattern, obtained with the fragment-based characterization method vs. theactivity-based characterization method.



5.7.2.3 Singleton Fragments

The results of the previous experiments show that the fragment-based characteriza-

tion method on average outperforms the activity-based characterization method in all

change patterns over different fragment sizes. However, the latter is engineered to

characterize non-overlapping activity-level changes. Therefore, in the last experiment

in this section we study how the two methods compare in characterizing changes to

singleton fragments, i.e. activities. Figure 5.23 shows the F-score for singleton frag-

ments per single, composite and nested change patterns for the fragment-based and the

activity-based characterization methods. For noise-free logs, as shown in Figure 5.23a,

the fragment-based characterization method achieves a perfect F-score of 1 for all the

change patterns except “IRO n”, for which the discovered process trees by IM were not

precise, leading to some false statements. The activity-based characterization method

also has an F-score of 1 for all but two of the single and composite change patterns,

namely “lp” and “OIR c”. However, as expected, it still fails to fully characterize

the nested changes, with a minimum F-score of 0.18 for “OIR n”, and an F-score of

around 0.5 for the rest. For the logs with 2.5% and 5% noise, the activity-based charac-

terization method has better F-scores than the fragment-based characterization method

for almost half of the single and composite change patterns, e.g. “sw”, “cf”, “cb”,

“ORI c” and “ROI c”, while for the rest they perform equally well. On the other hand,

the latter outperforms the former for all the nested ones.

Overall, the experimental results in this section show that while both methods

are noise-tolerant, the fragment-based characterization method is able to accurately

characterize single, composite, and nested changes involving fragments of any size.

On the other hand, the activity-based characterization method is well-suited for non-

overlapping activity-level changes, though it fails to accurately characterize changes

involving larger fragments, overlapping changes, as well as nested changes. As such,

the two methods are complementary.

5.7.3 Verbalization Conciseness: Fragment-based vs Activity-based

In this section, we study how our fragment-based characterization method compares

to our activity-based characterization method with regard to the number of statements

required to fully characterize the various change patterns, where each statement re-

ports the occurrence of one change. As observed in the previous experiments, the



0

0.2

0.4

0.6

0.8

1

sre


m cf pl

lp cb fr

IOR

_c

IRO

_c

OIR

_c

OR

I_c

RIO

_c

RO

I_c

IOR

_n

IRO

_n

OIR

_n

OR

I_n

RIO

_n

RO

I_n

F-s

core

Change pattern

Fragment-

basedActivity-

based


0

0.2

0.4

0.6

0.8

1

sre


m cf pl

lp cb fr

IOR

_c

IRO

_c

OIR

_c

OR

I_c

RIO

_c

RO

I_c

IOR

_n

IRO

_n

OIR

_n

OR

I_n

RIO

_n

RO

I_n

F-s

core

Change pattern

Fragment-

basedActivity-

based


0

0.2

0.4

0.6

0.8

1

sre


m cf pl

lp cb fr

IOR

_c

IRO

_c

OIR

_c

OR

I_c

RIO

_c

RO

I_c

IOR

_n

IRO

_n

OIR

_n

OR

I_n

RIO

_n

RO

I_n

F-s

core

Change pattern

Fragment-

basedActivity-

based


Figure 5.23: Average F-score for singleton fragments per single, composite andnested change pattern, obtained with the fragment-based characterization method vsthe activity-based characterization method.



activity-based characterization method often misses to characterize changes that in-

volve non-singleton fragments and hence does not report any statement, or it may par-

tially identify them, resulting in a small number of statements being reported. Thus,

the actual number of reported statements by this method is not a good indicator of

its verbalization conciseness. To obviate this problem, in Figure 5.24 we count the

number of statements each method would require to report all process model changes

behind each change pattern, if it could fully identify them. Further, as the activity-

based characterization method does not support the nested changes we exclude them

from the comparison.

We can see that the activity-based method requires a substantially larger number of

statements (1 compared to 5.5 on average over all simple change patterns), specially

when drifts involve multiple large process fragments, as in the case of composite pat-

terns (3 compared to 17.5 on average). Reporting many activity-level differences is a

common limitation of those methods like the activity-based characterization method

that rely on low-level representations of the process behavior.

0

5

10

15

20

25

sre


m cf pl

lp cb fr

IOR

_c

IRO

_c

OIR

_c

OR

I_c

RIO

_c

RO

I_c

# o

f st

ate

men

ts

Change pattern

Fragment-

based

Activity-

based

Figure 5.24: Average number of statements over all fragment sizes required byour fragment-based characterization method vs. our activity-based characterizationmethod for characterizing each change pattern.

5.7.4 Verbalization Conciseness: Exhaustive vs Greedy

The characterization accuracy of our fragment-based characterization method is de-

pendent on that of IM in discovering pre-drift and post-drift process trees. If a process

tree discovered with IM misrepresents the process behavior recorded in the event log,

e.g. due to the imprecision of IM, then that behavior will produce a false characteri-

zation statement. Consequently, the choice of the search algorithm for computing the



sequence of edit operations that transforms a pre-drift process tree to a post-drift pro-

cess tree only impacts the number of reported statements to the user. For example,

consider two transformations of process trees P and P′ in Figure 5.25. In Figure 5.25a,

P is transformed to P′ by moving activity ‘a’ to a conditional branch with activity ‘c’,

whereas in Figure 5.25b, P is transformed to P′ by first swapping activities ‘a’ and ‘b’,

and then making activities ‘a’ and ‘c’ parallel. Although, both of these are correct, the

first way is preferred as it is more concise.

P P

c

'

a b

a

b

c

^

P

ca b

Move activity 'b'

cb a

Pt

Move activity 'a'

Make activities 'a' and 'c' parallel

P'

a

b

c

^

(a)

P P

c

'

a b

a

b

c

^

P

ca b

swap activities 'a' and 'b'

cb a

Pt

Move activity 'a'

Make activities 'a' and 'c' parallel

P'

a

b

c

^

(b)

Figure 5.25: Two sample transformations of process tree P into process tree P′.

In this section, we evaluate the verbalization conciseness of our fragment-based

characterization method by counting the number of characterization statements re-

ported by our method using the A* algorithm versus the greedy algorithm. As ex-

plained in Section 5.5, drift characterization statements are produced based on simple

as well as compound changes, where each compound change is an aggregation of mul-

tiple simple changes. The threshold parameter of the greedy algorithm, which indicates

the minimum matching score between two mapped operator nodes, can be manually

set by the user. As the greedy algorithm has a low execution time the user may try

different threshold values and select one that results in the lowest number of reported

statements. For the experiments in this section, we set the threshold parameter of the

greedy algorithm to 0.6, i.e. two operator nodes are mapped if their matching score

is at least 0.6. Intuitively, a matching score of 0.6 means that the two nodes are more

similar than dissimilar, and therefore they should be matched. This is also consistent

with previous experiments on model matching in the context of process model merg-

ing, where a value of 0.6 was used [64]. Figure 5.26 reports the average number of

statements produced by our method using the A* algorithm vs the greedy algorithm,

over all fragment sizes, per change pattern, with and without noise. For noise-free

logs, the reported statements by our method for all but four of the single, composite

and nested change patterns (“IOR c”, “IRO c”, “IRO n”, “RIO n”), were all accurate

as the F-score of our method for these changes was 1 (cf. 5.22a). As shown in Fig-

ure 5.26a, using the A* algorithm our method is able to characterize each single change



pattern with one statement, averaged over fragments of size one to five. As the F-score

of our method was 1 for the same change patterns in noise-free logs (cf. 5.22a) these

results show that the number of statements reported by our method for a change pattern

is independent of the size of the fragments to which the change pattern is applied. In

regards to the complex change patterns, our method with the A* algorithm on average,

across all fragment sizes, produces around 3 statements, one per applied change, for all

but three of the composite and nested change patterns. For those three change patterns,

namely “IOR c”, “ORI c”, and “RIO n”, the pre-drift and post-drift sub-logs for larger

fragments did not contain sufficient process behavior for IM to precisely discover the

fragments to which the changes were applied. As such, IM split those fragments into

smaller fragments, leading to our method producing more statements. For the noise-

free logs, our method produces a similar number of statements when using the greedy

algorithm for most of the single, composite and nested change patterns. However, for

some change patterns, e.g. “sm” and “cm”, the greedy algorithm leads to more state-

ments, with the largest difference being for “OIR n” with 6.6 statements against 3.6

statements reported by our method when using the A* algorithm.

The injection of noise in the logs, as shown in Figures 5.26b and 5.26c, slightly

increases the average number of statements reported by our method for all change

patterns over fragments of size one to five. For these logs, our method produced some

false statements, each explaining a change that was not applied to the process as part

of the drift injection. Furthermore, in some cases the injected noise caused IM to

split a large fragment involved in a change into multiple smaller fragments, causing

our method to produce more statements to explain the change. Similar to the noise-

free logs, the A* and the greedy algorithms perform similarly for most of the change

patterns with 2.5% and 5% noise. The largest difference was for the simple change

pattern “cm” with 5% noise, where our method produces 1.8 statements on average

using the A* algorithm versus 5.1 statements on average using the greedy algorithm.

5.7.5 Time Perfromance

We conducted all the experiments on an Intel i7 2.20GHz with 16GB RAM (64 bit),

running Windows 7 and JVM 8 with a heap space of 10GB. The time required to dis-

cover two process trees from the pre-drift and post-drift sub-logs, compute a sequence

of edit operations to transform the pre-drift process tree to the post-drift process tree

using the A* algorithm and construct characterization statements for each drift ranged

from a minimum of 2ms to a maximum of 68sec with an average of 870ms. To perform



0

2

4

6

8

10

12

sre


m cf pl

lp cb fr

IOR

_c

IRO

_c

OIR

_c

OR

I_c

RIO

_c

RO

I_c

IOR

_n

IRO

_n

OIR

_n

OR

I_n

RIO

_n

RO

I_n

# o

f st

ate

men

ts

Change pattern

A*

Greedy


0

2

4

6

8

10

12

sre


m cf pl

lp cb fr

IOR

_c

IRO

_c

OIR

_c

OR

I_c

RIO

_c

RO

I_c

IOR

_n

IRO

_n

OIR

_n

OR

I_n

RIO

_n

RO

I_n

# o

f st

ate

men

ts

Change pattern

A*

Greedy


0

2

4

6

8

10

12

sre


m cf pl

lp cb fr

IOR

_c

IRO

_c

OIR

_c

OR

I_c

RIO

_c

RO

I_c

IOR

_n

IRO

_n

OIR

_n

OR

I_n

RIO

_n

RO

I_n

# o

f st

ate

men

ts

Change pattern

A*

Greedy


Figure 5.26: Average number of statements over all fragment sizes per change patternreported by our fragment-based characterization method using the A* algorithm vs thegreedy algorithm.



th same operation it took the greedy algorithm from a minimum of 2ms to a maximum

of 100ms with an average of 30ms. This means that the greedy algorithm is almost 30

times faster than its A* counterpart. The bulk of the time spent by our method went

to computing a sequence of edit operations to transform the pre-drift process tree to

the post-drift process tree. Although in most cases the A* algorithm finds the optimal

solution within a reasonable time, for two process trees with several changes it may be

more efficient to use the greedy algorithm. Finally, the activity-based characterization

method took on average 510ms to characterize each drift.

5.8 Evaluation on Real-life Logs

We further evaluated our method on two real-life event logs, one from a ticketing

management process and the other from an insurance claim handling process. For the

experiments in this section, we used IMfpt for discovering noise-free pre-drift and post-

drift process trees, by setting its noise filtering threshold parameter to 10%. We also

used the A* algorithm to compute the shortest sequence of edit operations to trans-

form pre-drift process tree to the post-drift process tree. Furthermore, as explained

in Section 5.5.2 we considered a relative frequency change of at least 50% as a sig-

nificant change, where the relative frequency of the fragment is at least 25% in the

pre-drift or post-drift process tree. The first real-life log,5 obtained from 4TU Data

Centrum,6 contains events from a ticketing management process of the help desk of an

Italian software company. This log contains 21348 events, from 14 activities, and 4580

traces, out of which 226 are distinct. We used the drift detection technique presented in

Chapter 3, by initializing its adaptive windows with 1000 events, and detected 2 drifts

in this log. The first drift occurs at the event index 8757, corresponding to the date

July 25th 2011, and the second one occurs at the event index 17307, corresponding

to the date September 11th 2012. We characterized these two drifts by applying our

fragment-based method to the sub-logs extracted from before and after each drift. The

transformation of the pre-drift process tree to the post-drift process tree over the first

drift is illustrated in Figure 5.27. For the first drift, our method produced a single state-

ment, reporting on the possibility of skipping the sub-tree marked as ”Fragment 1” in

Figure 5.27 after the occurrence of the drift. We did not have access to a ground truth

to validate the obtained results. Therefore, as an alternative we analyzed the directly

follows graph of the sub-logs from before and after the drift, shown in Figures 5.28a

5https://doi.org/10.4121/uuid:0c60edf1-6f83-4e75-9367-4c63b3e9d5bb6https://data.4tu.nl/repository/


5.8. EVALUATION ON REAL-LIFE LOGS

and 5.28b, respectively, to verify the accuracy of the results. We observed the ap-

pearance of a directly follows relation from activity “Assign seriousness” to activity

“Resolve ticket” after the drift. This finding aligns with the output of our method for

this drift.

Assign seriousness

?

Take in charge ticket

?

X

Wait

Resolve ticket

Closed Assign seriousness

Resolve ticket

Closed

?


?

X

Wait

X

Fragment 1

Fragment 1


Pre-drift process tree Post-drift process tree

Figure 5.27: Transformation of pre-drift process tree to post-drift process tree over thefirst drift in the ticketing management process.


Assign seriousness

Wait

Resolve ticket Closed

Event


Assign seriousness Wait


Fragment 1


Assign seriousness

Wait


Event




Fragment 1

(a) Directly follows graph before the first drift.


Assign seriousness

Wait


Event




(b) Directly follows graph after the first drift.

Figure 5.28: Directly follows graphs of the ticketing management process before andafter the first drift.

For the second drift, our method discovered two changes. The transformation of the

pre-drift process tree to the post-drift process tree over the second drift is illustrated in

Figure 5.29. The first discovered change indicates a significant decrease in the relative

frequency of the τ-node 8 from 80% to 40%, while the second change indicates a

significant increase in the relative frequency of activity “Wait” from 18% to 51%.



These two changes are related as activity “Wait” and the τ-node 8 are parented by the

same ×-node 5. To evaluate the accuracy of these changes we have drawn the directly

follows graph of the sub-logs from before and after the drift in Figures 5.30a and 5.30b.

The pre-drift and post-drift graphs show that the frequencies of the outgoing arcs from

activity “Take in charge ticket” to activities “Resolve ticket”, “Wait”, and “Require

upgrade” have changed from 151 (80% of total), 35 (18% of total), and 4 (2% of

total) to 66 (40% of total), 82 (51% of total), and 14 (9% of total), respectively, out of

which the first two changes are considered as significant and reported by our method.

Moreover, in the corresponding process trees, the change in the relative frequency of

activity “Resolve ticket” manifests itself by a change in the relative frequency of the

τ-node 8. These findings conform to the characterization of this drift by our method.

Our method completed the characterization of the first and the second drifts in

330ms and 350ms, respectively. It is worth mentioning that since the drift detection

technique is designed to detect sudden drifts, gradual changes occurred in the process

over the period from the first drift to the second drift, e.g. the insertion of activity

“Require upgrade”, did not trigger the detection of another drift over this period.

Assign seriousness


X

Resolve ticket

Closed

5

Assign seriousness

Resolve ticket

ClosedX


X5

X

80% 51%9% 40%

Require upgrade

Wait Require upgrade

Wait

18%2%

Change branching frequency


8 8

Figure 5.29: Transformation of pre-drift process tree to post-drift process tree over thesecond drift in the ticketing management process.

We also applied the activity-based characterization method to the discovered drifts

in this log, but this method failed to characterize the drifts as it did not report any

changes.

In the second experiment, we employed our fragment-based method to character-

ize drifts in an event log originating from the claims management system of a large

Australian insurance company. The log consists of 61413 events, referring to twelve

distinct activities, and 16365 traces, out of which 172 are distinct. It records cases

of a windscreen claims handling process over a period of 13 months between 2011


5.8. EVALUATION ON REAL-LIFE LOGS


Assign seriousness

WaitResolve

ticket Closed35

Require upgrade4

151

202 39

3

23

61




Event

18%

Require upgrade2%

80%

(a) Directly follows graph before the second drift.




Event

51%

Require upgrade9%

40%


Assign seriousness

WaitResolve

ticket Closed82

Require upgrade14

66

152 68

12

7

142

(b) Directly follows graph after the second drift.

Figure 5.30: Directly follows graphs of ticketing management process before and afterthe second drift.

and 2012. Using our drift detection technique (see Chapter 3) with an adaptive win-

dow size initialized with 7000 events, we detected one drift in this log, at the event

index 13821, corresponding to the date September 19th, 2011. Next, we used our

fragment-based method to characterize this drift. The transformation of the pre-drift

process tree to the post-drift process tree over this drift is illustrated in Figure 5.31. Our

method discovered that Fragment 1 consisting of three sequential activities “Identify

Nil Recovery or Settlement Potential”, “Review Invoice - Motor Glass”, and “Conduct

File Review” in the pre-drift process tree is substituted by Fragment 2 consisting of

two concurrent activities “Confirm Nil Recovery or Settlement Potential” and “Invoice

Paid” in the post-drift process tree. Our method completed the characterization of this

drift in 280ms. We then validated these results with a business analyst from the insur-

ance company, who confirmed our findings and explained the reasons underlying the

identified changes.

Before the drift, by performing activity “Identify Nil Recovery or Settlement Po-

tential” the company tried to claim a fraction of the money paid for every accident

case from other insurance companies involved in the accident. However, as perform-

ing this task for all cases proved to be costly, they decided to perform it only for

cases with certain characteristics, e.g. cases whose cost is below a certain threshold.

Therefore they substituted this activity by a new activity, named “Confirm Nil Recov-

ery or Settlement Potential”. Moreover, during the same time period they automated



Lodge Claim For Glass Only

Close Claim

^

Identify Nil Recovery or Settlement

Potential

X

Review Invoice - Motor Glass

X

Conduct File Review

X

Authorise Services Tax

Invoice

Lodge Claim For Glass Only

Close Claim

^

X

Authorise Services Tax

Invoice

Confirm Nil Recovery or Settlement

Potential

Invoice Paid

Fragment 1

Fragment 2


Fragment 2


Figure 5.31: Transformation of pre-drift process tree to post-drift process tree over thedrift in the claim handling process.

invoice payments, by removing two activities “Review Invoice - Motor Glass” and

“Conduct File Review” and introducing a new activity, named “Invoice Paid”. These

two changes resulted in the substitution of Fragment 1 consisting of three sequential

activities “Identify Nil Recovery or Settlement Potential”, “Review Invoice - Motor

Glass” and “Conduct File Review” in the pre-drift process tree by Fragment 2 consist-

ing of two concurrent activities “Confirm Nil Recovery or Settlement Potential” and

“Invoice Paid” in the post-drift process tree.

We also applied the activity-based characterization method to the discovered drift

in this log. However, this method could only explain the removal of activity “Identify

Nil Recovery or Settlement Potential” from the process, and failed to discover the other

changes.

5.9 Discussion

The accuracy of our fragment-level drift characterization method is dependent on the

accuracy of the process trees discovered by IM from the sub-logs before and after a

drift. The accuracy of a process tree is measured by means of fitness and precision.

Fitness indicates how much of the process behavior in the log is reproducible by the

discovered process tree, while precision quantifies the fraction of the behavior allowed

by the process tree which is not seen in the log.

IM discovers a process tree by recursively structuring process behavior in a log into

smaller process trees in a top-down manner. At a recursion where IM fails to discover

a process tree that precisely expresses observed process behavior, it generalizes from

the behavior by selecting a fall through, resulting in the discovery of a process tree


5.9. DISCUSSION

with lower precision. Several fall throughs have been defined (see [72]), decreasing in

precision, in the worst case leading to a flower model that allows any behavior with the

activities in the event log. This latter issue is know as over-generalization of process

behavior. Here, we discuss a few factors that may contribute to this issue.

The pre-drfit and post-drift sub-logs used for drift characterization should be large

enough to contain a behaviorally representative sample of process executions. In our

experiments in this chapter, we used sub-logs extracted from the events within drift

detection windows (cf. Chapter 3) to discover process trees from before and after

a drift. The size of these windows adapts to the behavioral variability of the log,

ensuring that there are sufficient events within each window to fully capture the process

behavior.

A related problem that often leads to over-generalization is when there are devia-

tions from the normal process behavior in the log, a.k.a. noise. To fit these deviations

into the discovered process tree IM may have to over-generalize from the main process

behavior. One way to tackle this problem is to filter out the noise from the log before

discovering a process tree. There are several noise filtering techniques for event logs,

e.g. [22], and for event streams, e.g. [122]. Alternatively, we can use IMfpt, i.e. a

variant of IM that filters out infrequent behavior (noise) from a log before discovering

a process tree. To avoid introducing false differences between pre-drfit and post-drift

process trees as a result of noise filtering, a process behavior should be treated as noise

if it does not meet the filtering requirements in both pre-drfit and post-drift sub-logs.

Finally, if the process behavior in a log is generated by an unstructured process, IM

may overgeneralize from the behavior to discover a block-structured process tree.

However, even in the scenarios where IM discovers a flower model our method

may still be able to characterize some process changes. To investigate the usability

of our method for logs for which IM discovers a flower model, we simulated a log

from a partially unstructured Petri net model shown in Figure 5.32a. We injected a

drift in the log by applying three changes to the model: • we made Fragment 1, i.e.

an internally unstructured SESE fragment, and activity ‘h’ mutually-exclusive; • we

deleted activity ‘d’ from Fragment 1; and • we swapped the two activities ‘f’ and ‘g’.

We used the drift detection method presented in Chapter 3 to detect the drift. The

transformation of the pre-drift process tree to the post-drift process tree over this drift

is shown in Figure 5.32b. Our method was able to identify the first two changes, while

it missed the third change. The first change is identified as it is applied to Fragment

1 as a whole and activity ‘h’, while the deletion of activity ‘d’ is identifiable without



any knowledge of the internal structure of Fragment 1. On the other hand, IM over-

generalized the internal behavior of Fragment 1 and discovered a flower model as the

internal model of this fragment. As such, our method missed the third change, i.e. the

swap of the two activities ‘f’ and ‘g’. This experiment showed that our method can still

be used for characterizing external changes as well as certain types of internal changes,

e.g. insertion/deletion of activities, applied to fragments containing flower models.

a

c

b

d

e

g

f

h i a

c

b

e

f

g

h

i


a ?

X

gfedcb

h i a

?

X

gfecb

iX

h

Fragment 1

Fragment 1

Make Fragment 1 and activity 'h'

mutually-exclusive

Delete activity 'd'

Pre-drift process tree Post-drift process tree(a) Petri net models before and after the drift

a

c

b

d

e

g

f

h i a

c

b

e

f

g

h

i


a ?

X

gfedcb

h i a

?

X

gfecb

iX

h

Fragment 1

Fragment 1

Make Fragment 1 and activity 'h'

mutually-exclusive

Delete activity 'd'


(b) Transformation of the pre-drift process tree to the post-drift process treeover the drift.

Figure 5.32: Example of drift characterization in a partially unstructured process.

To conclude, the process discovery component of our drift characterization method

is isolated from other components, and as such IM can be replaced by any other process

discovery technique that is able to discover process trees from event streams. There-

fore, future advancements in process discovery from event streams can also enhance

the accuracy of our method.

5.10 Summary

In this chapter, we presented a robust, automated method for characterizing process

drifts at the level of fragments, from event streams. We first adapted a state-of-the-

art process discovery technique, Inductive Miner (IM), to discover process trees, i.e.


5.10. SUMMARY

block-structured process models, from event streams. Next, we used this technique to

discover two process trees, one from the portion of an event stream just before a given

drift, and the other from the portion of stream just after the stream. We then presented

a process tree transformation technique that finds a minimum-cost sequence of edit

operations to transform a pre-drift process tree to a post-drift process tree. The search

for such a sequence is guided by means of process tree mappings, and is supported

by two search algorithms, an exhaustive A*-based algorithm and a fast greedy algo-

rithm, which find the optimal solution or a close approximation of it. The definition of

edit operations and their costs is such that the method is able to characterize changes

applied to fragments of any size, from individual activities to larger fragments. More-

over, the hierarchical structure of process trees allows the characterization of complex

changes such as overlapping changes as well as nested changes. Furthermore, as the

edit operations are defined based on a well-established set of typical business process

change patterns, the identified fragment-level changes can easily be translated into con-

cise natural language statements based on those patterns. Finally, the proposed method

can also characterize process drifts detected from events logs of complete traces, and

can also be used on top of any process drift detection technique so long as it is fed with

a pre-drift and a post-drift sub-log.

We extensively evaluated our method for fragment-level drift characterization us-

ing both highly variable artificial logs, with and without noise, as well as two real-life

logs. The results on the artificial logs show that our method is fast, noise-tolerant,

highly accurate and concise in characterizing drifts induced by the application of typi-

cal process changes to fragments of different size. Furthermore, when using the greedy

algorithm for process tree transformation, the method can scale up to the extent that it

can work in real-time. In the experiments with real-life logs, our method could fully

characterize the identified drifts. Despite the lack of a ground truth to validate the re-

sults in the experiment with the log of the ticketing management process, the results

were supported by various observations from the log. For the experiment with the

log of the insurance claims management process, a business analyst who works with

the process in question confirmed our findings. While our fragment-level drift char-

acterization method outperforms the activity-level characterization method presented

in Chapter 4, as far as non-overlapping and overlapping fragments are concerned, the

latter method provides more accurate results when drifts involve individual activities.


Chapter 6

Conclusion

Today’s business processes are designed to support flexibility and change to remain

effective and efficient in the dynamic environments in which they operate. This allows

process stakeholders to change the way in which they execute processes in response

to various factors such as changes in regulations, supply, demand as well as internal

changes in resource capacity or workload, or simply changes in seasonal factors. Some

process changes are planned ahead, while others occur unexpectedly and are often

undocumented. Examples of the latter are process changes undertaken by individuals

as workarounds in emergency situations or changes due to the replacement of human

resources. Such changes over time may reduce process performance and in general

undermine process improvement initiatives.

In this regard, several techniques have been proposed to detect process drifts, i.e.

statistically significant changes in the process behavior. However, existing techniques

have some limitations. First, they do not work with event streams, and as such are not

able to detect intra-trace drifts, or they detect them with a long delay. Although de-

tecting drifts in an offline setting from historical event logs is helpful for post-mortem

analysis, organizations can fully exploit the benefits of drift detection only when it is

deployed in an online setting over streams of events, as it enables process stakeholders

to take timely corrective measures and avoid or reduce the impact of unintended con-

sequences. Furthermore, existing techniques do not perform well with highly-variable

business processes, e.g. hospital processes, whose logs feature high trace variability.

Finally, they only focus on the detection of drifts in event logs without providing any

solution for characterizing process changes underpinning the drifts.

In this thesis we tackled three research questions:

i) How to detect a drift from an event stream or event log of a business process?

143

CHAPTER 6. CONCLUSION

ii) How to characterize a process drift at activity level from an event stream or event

log of a business process?

iii) How to characterize a process drift at fragment level from an event stream or event

log of a business process?

To tackle the first research question, we proposed a fully-automated method for de-

tecting drifts from event streams of business processes. We performed statistical tests

over distributions of behavioral relations between activities such as causality, conflict

and concurrency, as observed from two juxtaposed windows of adjustable size, sliding

along with the stream. Given that behavioral relations between activities are a type

of sub-trace features, the method does not suffer from low accuracy when the log is

highly variable. Furthermore, the method is capable of detecting inter-trace as well as

intra-trace drifts. By replaying an event log as an event stream the proposed method

can also be used for detecting drifts in event logs.

To tackle the second research question, we proposed a fully-automated method for

characterizing process drifts at the level of individual activities from event streams.

For each detected drift, we first extract behavioral relations between activities such as

causality, concurrency and conflict, from event streams before and after a drift. By per-

forming a statistical test we then assess the significance of associations between each

behavioral relation and the drift. This allows us to identify relations with the highest

explanatory power with respect to the drift. Those relations are then mapped to a pre-

defined set of change templates. Finally, the best-matching templates are reported to

the user as natural language statements. The collection of change templates that we

use to describe a drift is based on a well-established categorization of common busi-

ness process change patterns and can also easily be extended. Moreover, the method

may also be used on top of any process drift detection so long as it is provided with

the point in which a drift occurs. By replaying an event log as an event stream, the

proposed method can as well be used to characterize drifts in event logs. Furthermore,

the method can scale up to the extent that it can work in real-time. To the best of

our knowledge, this is the first method that provides a systematic solution to the drift

characterization problem.

To tackle the third question, we presented a fully-automated method for charac-

terizing process drifts at the level of fragments, from event streams. We first adapted

Inductive Miner to discover process trees from event streams. Next, we used this tech-

nique to discover two process trees, one from the portion of an event stream just before

a given drift, and the other from the portion of stream just after the stream. We then


presented a process tree transformation technique that finds a minimum-cost sequence

of edit operations to transform a pre-drift process tree to a post-drift process tree. The

search for such a sequence is guided by means of process tree mappings, and is sup-

ported by two search algorithms, an exhaustive A*-based algorithm and a fast greedy,

which respectively, find and closely approximate the optimal solution. The definition

of edit operations and their costs is such that the method is able to characterize changes

applied to fragments of any size, from individual activities to larger fragments. More-

over, the hierarchical structure of process trees allows the characterization of complex

changes such as overlapping changes as well as nested changes. As the edit operations

are defined based on common change patterns, the identified fragment-level changes

can easily be translated into concise natural language statements based on those pat-

terns. Furthermore, the proposed method can characterize process drifts detected from

events logs of complete traces, and can also be used on top of any process drift de-

tection technique so long as it is fed with a pre-drift and a post-drift sub-log. Finally,

when using the greedy algorithm for process tree transformation, the method can scale

up to the extent that it can work in real-time. To the best of our knowledge, this is the

first method that can characterize a process drift at the level of fragments.

We implemented the proposed methods as plug-ins for the open-source process

analytics platform Apromore as well as a standalone command-line tool, namely Pro-

Drift. Using the latter, we extensively evaluated the accuracy and scalability of the

three proposed methods by simulating event streams from highly variable artificial

and real-life logs. The results of the experiments show that the three methods sat-

isfy all the evaluation criteria defined in this thesis. Specifically, the proposed drift

detection method is able to detect process drifts induced by the application of com-

mon change patterns with high accuracy and minimum delay. In doing so, it does

not need any manual intervention and can scale up to the extent that it can work in

real-time. The proposed drift characterization methods are able to work without any

manual intervention and can accurately characterize common change patterns via ex-

planatory natural language statements. By comparing the two methods for character-

izing changes involving fragments of different sizes, in event streams with and without

noise, we observed that while both methods can handle noise well, the method pro-

posed in Chapter 4 is well-suited for non-overlapping activity-level changes, as it uses

features with lower levels of abstraction to capture the process behavior, and benefits

from a statistically-grounded mechanism for identifying change patterns that best ex-

plain a drift. On the other hand, the method proposed in Chapter 5 performs better


CHAPTER 6. CONCLUSION

at characterizing changes applied to fragments of multiple activities as well as nested

changes. The accuracy of the latter method is dependent on the accuracy of process

trees discovered by IM from before and after a drift. As such, this method is capable

of identifying a complete set of process changes underpinning a drift so long as they

manifest themselves in the discovered process trees.

There are several ways the work presented in this thesis can be extended. As de-

scribed in Chapter 2, there are four classes of drifts: sudden, gradual, recurring and

incremental. The drift detection method in Chapter 3 focuses on detecting sudden

drifts. As such, when applied to event streams containing other classes of drifts it

either fails to detect them or it detects them as sudden drifts. Therefore, an avenue

for future work is to devise methods for detecting other classes of drifts from event

streams. For gradual drifts, our method may easily be extended by the same strategy

used in [78] for detecting gradual drifts in trace streams. That is, to apply a statistical

test to the behavioral relations between two consecutive sudden drifts to determine if

those sudden drifts represent separate changes, or they define the start and end of a

single gradual drift. A recurring drift may also be detected by first detecting two con-

secutive sudden drifts and then using a statistical test to determine if the distributions of

behavioral relations before the first and after the second drift are the same. Similarly,

an incremental drift may be identified by detecting a sequence of minor sudden drifts

using statistical tests on the distributions of behavioral relations in smaller windows

sliding over the event stream between two major sudden drifts.

Another avenue for future work is to characterize other classes of drifts. In this

respect, it is particularly interesting to characterize gradual drifts and understand how

process behavior transitions over time as well as incremental drifts, and identify pro-

cess changes in each increment.

The drift detection and characterization methods presented in this thesis assume

no a-priori knowledge of process models, i.e. they only rely on event data to detect

and characterize a process drift. However, some of the drifts identified may be mod-

eled in a corresponding business process. As such, by looking at a normative model

of the process we can differentiate between actual drifts in the process behavior and

drifts caused by the observation of a process behavior that is modeled but has not been

executed before (false positives). Therefore, another direction for future work is to en-

hance the proposed methods to benefit from valuable information provided by existing

process models when detecting and characterizing process drifts. A starting point here

would be to use a conformance checking technique [117, 123, 86, 41, 101] to verify


how much of the process behavior observed in the event data after a detected drift can

be replayed by a corresponding process model.

A change may impact more than one process within an organization, e.g. a new

regulation requires a new check in multiple processes of an organization. Therefore, a

further direction for future work is to study the relation between changes in different

processes within an organization and develop methods to identify organizational level

changes. For example, discovering drifts over the same time period in the behavior

of multiple processes of an organization may indicate the existence of an underlying

change on the organizational level during that time period.

Another opportunity for future work is to study the interplay between changes

in the process control flow and changes in other process perspectives. For example,

changes in the control flow may be induced by changes in the data or resource per-

spectives of the process. A starting point is to look at the work in [94], which analyses

the dynamics of human resource behavior as observed from event logs, as well as the

time series analysis approach in [53], which detects cause-effect relations between a

set of business process characteristics and process performance indicators.

In chapter 4 we pre-defined a set of change templates and proposed a method to

identify valid instantiations of those templates to characterize a drift. However, drifts

may also occur due to changes that follow a different pattern than those already defined.

Consequently, such drifts do not engender the instantiation of any of those pre-defined

change templates. Therefore, another avenue for future work is to develop a technique

to automatically learn new change templates from detected drifts.

Another avenue for future work is to provide a visual description of the change

patterns identified by a drift characterization method as a simple and effective way to

communicate the characteristics of the drift, as in [8, 21].


Appendix A

Notation

The notation used in this thesis is summarized below.

Notation MeaningL Set of activity labelsα+ α+ algorithm [25]a >L b Directly follows relation from label a to label b in log La4Lb Length-two loop relation from label a to label b in log La�L b Length-two loop relation between label a and label b in log La→L b Causality relation from label a to label b in log La ‖L b Parallel relation between label a and label b in log La#Lb Conflict relation between label a and label b in log LΦ(w) Returns the size of oscillation filter, i.e. the number of consecutive sta-

tistical tests whose P–value should remain bellow a certain threshold todeclare a drift

RFC Relative frequency change of an α+ relation over a driftTRFC Total relative frequency change, i.e. the sum of RFCs of all α+ relations

over a driftCRFC Cumulative relative frequency change, computed by x% · TRFC, where

x% indicates the proportion of TRFC over a drift that is used for driftcharacterization

Rank(b,�,B) Returns the index of b in finite set B with total order �ID,T Valid instantiation of template T through drift feature set D, mapping

every variable in T to a label from DC (ID,T ) Confidence of drift feature set D matching template T through ID,T , in-

dicating the quality of ranking relations in D with regards to their prede-fined importance in T

iC (T ) Ideal confidence of template T , indicating the highest possible confidencefor T

nC (ID,T ) Normalized confidence of drift feature set D matching template Tthrough ID,T , obtained by dividing C (ID,T ) by iC (T )

V (T ) Set of nodes in tree TE(T ) Set of edges in tree T|T | Size of tree T, equaling |V (T )|root(T ) Root node of tree TT 〈v〉 Subtree of tree T rooted at node v ∈ T

149

Appendix A

DownT (v) Sequence of nodes on the shortest path from root(T ) to node v ∈ Tleaves(v) Set of leaves under internal node vl(v) Label of node vdep(v) Depth of node v ∈ T , equaling |DownT (v)|−1dep(T ) Depth of tree T , equaling the maximum depth of its nodesCA(v1, . . . ,vn) Set of common ancestors of nodes v1, . . . ,vn in tree T , i.e. nodes in

DownT (v1)∩ . . .∩DownT (vn)LCA(v1, . . . ,vn) Lowest common ancestor of nodes v1, . . . ,vn in tree T , i.e. the deepest

node in CA(v1, . . . ,vn)LCA(T 〈v1〉, . . . ,T 〈vn〉) Lowest common ancestor of subtrees T 〈v1〉, . . . ,T 〈vn〉, i.e.

LCA(v1, . . . ,vn)× Exclusive choice operator∧ Concurrency operator→ Sequence operator Loop operatorP =⊕(P1, . . .Pn) Process tree P rooted at operator node ⊕ with subtrees P1 . . .Pnτ-node Leaf node t in process tree, representing the language with the empty

trace, l(t) ∈ {τ}C(v) Set of activity nodes under operator node v ∈ P, containing the activity

nodes in P〈v〉preP(v) Pre-order index of node v in process tree PP[i] Node with the pre-order index of i in PRankk(v,⊕) Returns the rank of node v in ordered operator node ⊕S Singularity reduction ruleA× Associativity reduction rule for × operatorA∧ Associativity reduction rule for ∧ operatorA→ Associativity reduction rule for→ operatorT→ τ reduction rule for→ operatorT∧ τ reduction rule for ∧ operatorSUB⊕ Operator substitution edit operationSUBac Activity substitution edit operationD f Fragment deletion edit operationD -operator deletion edit operationI f Fragment insertion edit operationI -operator insertion edit operationLMAsM(v, v′) Lowest mapped ancestors (LMAs) of nodes v and v′ in mapping MMST (P, P′) Mapping search tree between process trees P and P′

g∗(v) Returns the mapping cost up to node v in a mapping search treeh∗(v) Returns an estimation of the cost of mapping nodes in P and P′ that have

not yet been mapped up to node v ∈MST (P, P′)stringify(F) Returns a unique and stable textual representation of fragment F


Bibliography

[1] R. Accorsi and T. Stocker. Discovering workflow changes with time-based trace

clustering. In Data-Driven Process Discovery and Analysis. Springer, 2012.

[2] I. Ada and M. R. Berthold. Eve: a framework for event detection. Evolving

systems, 4(1):61–70, 2013.

[3] M. Adams, A. H. Ter Hofstede, D. Edmond, and W. M. Van Der Aalst. Worklets:

A service-oriented implementation of dynamic flexibility in workflows. In OTM

Confederated International Conferences” On the Move to Meaningful Internet

Systems”, pages 291–308. Springer, 2006.

[4] M. J. Adams. Facilitating dynamic flexibility and exception handling for work-

flows. PhD thesis, Queensland University of Technology, 2007.

[5] T. Akutsu, D. Fukagawa, A. Takasu, and T. Tamura. Exact algorithms for com-

puting the tree edit distance between unordered trees. Theoretical Computer

Science, 412(4-5):352–364, 2011.

[6] H. H. Ang, V. Gopalkrishnan, I. Zliobaite, M. Pechenizkiy, and S. C. Hoi. Pre-

dictive handling of asynchronous concept drifts in distributed environments.

IEEE Transactions on Knowledge and Data Engineering, 25(10):2343–2355,

2013.

[7] P. Arabie and L. J. Hubert. An overview of combinatorial data. Clustering and

classification, page 5, 1996.

[8] A. Armas-Cervantes, P. Baldan, M. Dumas, and L. Garcıa-Banuelos. Behavioral

comparison of process models based on canonically reduced event structures. In

BPM. Springer, 2014.

[9] P. Berkhin. A survey of clustering data mining techniques. In Grouping multi-

dimensional data, pages 25–71. Springer, 2006.

151

BIBLIOGRAPHY

[10] A. Bolt, M. de Leoni, and W. M. van der Aalst. Process variant comparison: Us-

ing event logs to detect differences in behavior and business rules. Information

Systems, 2017.

[11] R. P. J. C. Bose, W. M. P. van der Aalst, I. Zliobaite, and M. Pechenizkiy. Han-

dling concept drift in process mining. In CAiSE. Springer, 2011.

[12] R. P. J. C. Bose, W. M. P. van der Aalst, I. Zliobaite, and M. Pechenizkiy. Deal-

ing with concept drifts in process mining. IEEE Transactions on NNLS, 2014.

[13] A. Bouchachia. Fuzzy classification in dynamic environments. Soft Computing,

15(5):1009–1022, 2011.

[14] J. C. Buijs, M. La Rosa, H. A. Reijers, B. F. van Dongen, and W. M. van der

Aalst. Improving business process models using observed behavior. In Inter-

national Symposium on Data-Driven Process Discovery and Analysis, pages

44–59. Springer, 2012.

[15] J. C. Buijs, B. F. Van Dongen, and W. M. van Der Aalst. On the role of fitness,

precision, generalization and simplicity in process discovery. In OTM Confeder-

ated International Conferences” On the Move to Meaningful Internet Systems”,

pages 305–322. Springer, 2012.

[16] A. Burattin, M. Cimitile, F. M. Maggi, and A. Sperduti. Online Discovery of

Declarative Process Models from Event Streams. IEEE Trans. on Services Com-

puting, 8:833–846, 2015.

[17] A. Burattin, A. Sperduti, and W. M. van der Aalst. Control-flow discovery from

event streams. In Evolutionary Computation (CEC), 2014 IEEE Congress on,

pages 2420–2427. IEEE, 2014.

[18] J. Carmona and R. Gavalda. Online techniques for dealing with concept drift

in process mining. In International Symposium on Intelligent Data Analysis.

Springer, 2012.

[19] P. Castagliola, G. Celano, S. Fichera, and G. Nenes. The variable sample size

t control chart for monitoring short production runs. The International Journal

of Advanced Manufacturing Technology, 66(9-12):1353–1366, 2013.


BIBLIOGRAPHY

[20] G. Celano, A. Costa, and S. Fichera. Statistical design of variable sample size

and sampling interval x control charts with run rules. The International Journal

of Advanced Manufacturing Technology, 28(9-10):966–977, 2006.

[21] A. A. Cervantes, N. R. van Beest, M. La Rosa, M. Dumas, and L. Garcıa-

Banuelos. Interactive and incremental business process model repair. In OTM



[22] R. Conforti, M. La Rosa, and A. H. ter Hofstede. Filtering out infrequent be-

havior from business process event logs. IEEE Transactions on Knowledge and

Data Engineering, 29(2):300–314, 2017.

[23] T. Dasu, S. Krishnan, S. Venkatasubramanian, and K. Yi. An information-

theoretic approach to detecting changes in multi-dimensional data streams. In In

Proc. Symp. on the Interface of Statistics, Computing Science, and Applications.

Citeseer, 2006.

[24] I. Davies, P. Green, M. Rosemann, M. Indulska, and S. Gallo. How do practi-

tioners use conceptual modeling in practice? Data & Knowledge Engineering,

58(3):358–380, 2006.

[25] A. A. de Medeiros, B. F. van Dongen, W. M. P. Van der Aalst, and A. Weijters.

Process mining: Extending the α-algorithm to mine short loops. Technical

report, 2004.

[26] J. de San Pedro and J. Cortadella. Discovering duplicate tasks in transition

systems for the simplification of process models. In International Conference

on Business Process Management, pages 108–124. Springer, 2016.

[27] E. D. Demaine, S. Mozes, B. Rossman, and O. Weimann. An optimal de-

composition algorithm for tree edit distance. ACM Transactions on Algorithms

(TALG), 6(1):2, 2009.

[28] A. Dries and U. Ruckert. Adaptive concept drift detection. Statistical Analysis

and Data Mining: The ASA Data Science Journal, 2(5-6):311–327, 2009.

[29] R. O. Duda, P. E. Hart, et al. Pattern classification and scene analysis, volume 3.

Wiley New York, 1973.


BIBLIOGRAPHY

[30] C. Dugast, P. Beyerlein, and R. Haeb-Umbach. Application of clustering tech-

niques to mixture density modelling for continuous-speech recognition. In

Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 Interna-

tional Conference on, volume 1, pages 524–527. IEEE, 1995.

[31] S. Dulucq and H. Touzet. Decomposition algorithms for the tree edit distance

problem. Journal of Discrete Algorithms, 3(2):448–471, 2005.

[32] M. Dumas, M. La Rosa, J. Mendling, H. A. Reijers, et al. Fundamentals of

business process management, volume 1. Springer, 2013.

[33] R. Elwell and R. Polikar. Incremental learning of concept drift in nonstation-

ary environments. IEEE Transactions on Neural Networks, 22(10):1517–1531,

2011.

[34] D. Fahland and W. M. van der Aalst. Model repair—aligning process models to

reality. Information Systems, 47:220–243, 2015.

[35] G. Forman. Tackling concept drift by temporal inductive transfer. In Proceed-

ings of the 29th annual international ACM SIGIR conference on Research and

development in information retrieval, pages 252–259. ACM, 2006.

[36] E. Frank and I. H. Witten. Using a permutation test for attribute selection in de-

cision trees. In International Conference on Machine Learning. Morgan Kauf-

mann, 1998.

[37] D. Fukagawa, T. Tamura, A. Takasu, E. Tomita, and T. Akutsu. A clique-based

method for the edit distance between unordered trees and its application to anal-

ysis of glycan structures. BMC bioinformatics, 12(1):S13, 2011.

[38] J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detec-

tion. In Brazilian symposium on artificial intelligence, pages 286–295. Springer,

2004.

[39] J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia. A survey

on concept drift adaptation. ACM Computing Surveys (CSUR), 2014.

[40] J. Gao, W. Fan, J. Han, and P. S. Yu. A general framework for mining concept-

drifting data streams with skewed distributions. In Proceedings of the 2007

SIAM International Conference on Data Mining, pages 3–14. SIAM, 2007.


BIBLIOGRAPHY

[41] L. Garcıa-Banuelos, N. R. van Beest, M. Dumas, M. La Rosa, and W. Mertens.

Complete and interpretable conformance checking of business processes. IEEE

Transactions on Software Engineering, 44(3):262–290, 2018.

[42] J. Gebauer and F. Schober. Information system flexibility and the cost effi-

ciency of business processes. Journal of the Association for Information Sys-

tems, 7(3):8, 2006.

[43] J. B. Gomes, E. Menasalvas, and P. A. Sousa. Learning recurring concepts from

data streams with a context-aware ensemble. In Proceedings of the 2011 ACM

symposium on applied computing, pages 994–999. ACM, 2011.

[44] C. W. Gunther, S. Rinderle-Ma, M. Reichert, W. M. Van Der Aalst, and J.

Recker. Using process mining to learn from process changes in evolutionary

systems. International Journal of Business Process Integration and Manage-

ment, 3(1):61–78, 2008.

[45] T. Hagerup and C. Rub. A guided tour of chernoff bounds. Information pro-

cessing letters, 33(6):305–308, 1990.

[46] J. Han, M. Kamber, and J. Pei. Data mining, southeast asia edition: Concepts

and techniques. Morgan kaufmann, 2006.

[47] S. Haridy, A. Maged, S. Kaytbay, and S. Araby. Effect of sample size on the

performance of shewhart control charts. The International Journal of Advanced

Manufacturing Technology, 90(1-4):1177–1185, 2017.

[48] P. Harremoes and G. Tusnady. Information divergence is more χ2-distributed

than the χ2-statistics. IEEE ISIT, pages 533–537, 2012.

[49] P. Heinl, S. Horn, S. Jablonski, J. Neeb, K. Stein, and M. Teschke. A com-

prehensive approach to flexibility in workflow management systems. In ACM

SIGSOFT Software Engineering Notes, volume 24, pages 79–88. ACM, 1999.

[50] D. P. Helmbold and P. M. Long. Tracking drifting concepts by minimizing

disagreements. Machine learning, 14(1):27–45, 1994.

[51] S. Higuchi, T. Kan, Y. Yamamoto, and K. Hirata. An a* algorithm for computing

edit distance between rooted labeled unordered trees. In JSAI-isAI Workshops,



BIBLIOGRAPHY

[52] S.-S. Ho. A martingale framework for concept change detection in time-varying

data streams. In Proc. of ICML, pages 321–327. ACM, 2005.

[53] B. F. Hompes, A. Maaradji, M. La Rosa, M. Dumas, J. C. Buijs, and W. M.

van der Aalst. Discovering causal factors explaining business process perfor-

mance variation. In International Conference on Advanced Information Systems

Engineering, pages 177–192. Springer, 2017.

[54] Y. Horesh, R. Mehr, and R. Unger. Designing an a* algorithm for calculating

edit distance between rooted-unordered trees. Journal of Computational Biol-

ogy, 13(6):1165–1176, 2006.

[55] K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir tech-

niques. ACM Transactions on Information Systems (TOIS), 2002.

[56] K. Jensen. Coloured petri nets. In Petri nets: central models and their proper-

ties, pages 248–299. Springer, 1987.

[57] P. N. Klein. Computing the edit-distance between unrooted ordered trees. In

ESA, volume 98, pages 91–102. Springer, 1998.

[58] R. Klinkenberg. Learning drifting concepts: Example selection vs. example

weighting. Intelligent data analysis, 8(3):281–300, 2004.

[59] R. Klinkenberg and T. Joachims. Detecting concept drift with support vector

machines. In ICML, pages 487–494, 2000.

[60] J. Z. Kolter and M. A. Maloof. Dynamic weighted majority: A new ensemble

method for tracking concept drift. In Data Mining, 2003. ICDM 2003. Third

IEEE International Conference on, pages 123–130. IEEE, 2003.

[61] S. Kondo, K. Otaki, M. Ikeda, and A. Yamamoto. Fast computation of the

tree edit distance between unordered trees using ip solvers. In International

Conference on Discovery Science, pages 156–167. Springer, 2014.

[62] P. Kosina, J. Gama, and R. Sebastiao. Drift severity metric. In ECAI, pages

1119–1120, 2010.

[63] S. Kullback and R. A. Leibler. On information and sufficiency. The annals of

mathematical statistics, 22(1):79–86, 1951.


BIBLIOGRAPHY

[64] M. La Rosa, M. Dumas, R. Uba, and R. Dijkman. Business process model

merging: An approach to business process consolidation. ACM Transactions on

Software Engineering and Methodology (TOSEM), 22(2):11, 2013.

[65] M. La Rosa, H. A. Reijers, W. M. Van Der Aalst, R. M. Dijkman, J. Mendling,

M. Dumas, and L. Garcıa-Banuelos. Apromore: An advanced process model

repository. Expert Systems with Applications, 38(6):7029–7040, 2011.

[66] C. Lanquillon. Enhancing text classification to improve information filtering.

PhD thesis, Otto-von-Guericke-Universitat Magdeburg, Universitatsbibliothek,

2001.

[67] M. M. Lazarescu, S. Venkatesh, and H. H. Bui. Using multiple windows to track

concept drift. Intelligent data analysis, 8(1):29–59, 2004.

[68] S. J. J. Leemans, D. Fahland, and W. M. P. van der Aalst. Discovering block-

structured process models from event logs - A constructive approach. In J. M.

Colom and J. Desel, editors, Application and Theory of Petri Nets and Concur-

rency - 34th International Conference, PETRI NETS 2013, Milan, Italy, June

24-28, 2013. Proceedings, volume 7927 of Lecture Notes in Computer Science,



structured process models from event logs containing infrequent behaviour. In

N. Lohmann, M. Song, and P. Wohed, editors, Business Process Management

Workshops - BPM 2013 International Workshops, Beijing, China, August 26,

2013, Revised Papers, volume 171 of Lecture Notes in Business Information

Processing, pages 66–78. Springer, 2013.


structured process models from incomplete event logs. In G. Ciardo and E.

Kindler, editors, Application and Theory of Petri Nets and Concurrency - 35th

International Conference, PETRI NETS 2014, Tunis, Tunisia, June 23-27, 2014.

Proceedings, volume 8489 of Lecture Notes in Computer Science, pages 91–

110. Springer, 2014.

[71] S. J. Leemans, D. Fahland, and W. M. P. van der Aalst. Discovering block-

structured process models from event logs-a constructive approach. In Interna-


BIBLIOGRAPHY

tional Conference on Applications and Theory of Petri Nets and Concurrency.

Springer, 2013.

[72] S. Leemans. Robust process mining with guarantees. PhD thesis, Ph. D. thesis,

Eindhoven University of Technology, 2017.

[73] V. Leno, A. Armas-Cervantes, M. Dumas, M. La Rosa, and F. M. Maggi. Dis-

covering process maps from event streams. In Proceedings of the 2018 Interna-

tional Conference on Software and System Process, pages 86–95. ACM, 2018.

[74] T. Li, T. He, Z. Wang, Y. Zhang, and D. Chu. Unraveling process evolution by

handling concept drifts in process mining. In Services Computing (SCC), 2017

IEEE International Conference on, pages 442–449. IEEE, 2017.

[75] X. Lu, D. Fahland, F. J. van den Biggelaar, and W. M. van der Aalst. Handling

duplicated tasks in process discovery by refining event labels. In International

Conference on Business Process Management, pages 90–107. Springer, 2016.

[76] H. Luo and Z. Wu. Optimal np control charts with variable sample sizes or

variable sampling intervals. Economic Quality Control, 17(1):39–61, 2002.

[77] A. Maaradji, M. Dumas, M. La Rosa, and A. Ostovar. Fast and Accurate Busi-

ness Process Drift Detection. In Proc. of BPM, 2015.

[78] A. Maaradji, M. Dumas, M. La Rosa, and A. Ostovar. Detecting sudden and

gradual drifts in business processes from execution traces. IEEE Transactions

on Knowledge and Data Engineering, 29(10):2140–2154, 2017.

[79] F. M. Maggi, A. Burattin, M. Cimitile, and A. Sperduti. Online process discov-

ery to detect concept drifts in ltl-based declarative process models. In On the

Move to Meaningful Internet Systems: OTM 2013 Conferences, pages 94–111.

Springer, 2013.

[80] O. Maimon and L. Rokach. Data mining and knowledge discovery handbook,

volume 2. Springer, 2005.

[81] M. Maisenbacher and M. Weidlich. Handling concept drift in predictive process

monitoring. In Services Computing (SCC), 2017 IEEE International Conference

on, pages 1–8. IEEE, 2017.


BIBLIOGRAPHY

[82] J. Martjushev, R. J. C. Bose, and W. M. van der Aalst. Change point detec-

tion and dealing with gradual and multi-order dynamics in process mining. In

International Conference on Business Informatics Research, pages 161–178.

Springer, 2015.

[83] D. Massart and L. Kaufman. The interpretation of analytical chemical data by

the use of cluster analysis. Chemical analysis. Wiley, 1983.

[84] S. Menard. Applied logistic regression analysis. Sage, 2002.

[85] L. L. Minku, A. P. White, and X. Yao. The impact of diversity on online ensem-

ble learning in the presence of concept drift. IEEE Transactions on knowledge

and Data Engineering, 22(5):730–742, 2010.

[86] J. Munoz-Gama, J. Carmona, and W. M. Van Der Aalst. Single-entry single-exit

decomposed conformance checking. Information Systems, 46:102–122, 2014.

[87] S. Muthukrishnan, E. van den Berg, and Y. Wu. Sequential change detection

on data streams. In Data Mining Workshops, 2007. ICDM Workshops 2007.

Seventh IEEE International Conference on, pages 551–550. IEEE, 2007.

[88] K. Nishida and K. Yamauchi. Detecting concept drift using statistical testing. In

International conference on discovery science, pages 264–269. Springer, 2007.

[89] R. Nuzzo. Statistical errors. Nature, 506(13):150–152, 2014.

[90] E. S. Page. Continuous inspection schemes. Biometrika, 41(1/2):100–115,

1954.

[91] M. Pawlik and N. Augsten. Rted: a robust algorithm for the tree edit distance.

Proceedings of the VLDB Endowment, 5(4):334–345, 2011.

[92] M. Pesic and W. M. Van der Aalst. A declarative approach for flexible busi-

ness processes management. In International conference on business process

management, pages 169–180. Springer, 2006.

[93] C. A. Petri. Kommunikation mit automaten. 1962.

[94] A. Pika, M. T. Wynn, C. J. Fidge, A. H. ter Hofstede, M. Leyer, and W. M. P.

van der Aalst. An extensible framework for analysing resource behaviour us-

ing event logs. In International Conference on Advanced Information Systems

Engineering, pages 564–579. Springer, 2014.


BIBLIOGRAPHY

[95] A. Polyvyanyy, S. Smirnov, and M. Weske. On application of structural de-

composition for process model abstraction. In BPSC, pages 110–122. Citeseer,

2009.

[96] A. Polyvyanyy, W. M. Van Der Aalst, A. H. Ter Hofstede, and M. T. Wynn.

Impact-driven process model repair. ACM Transactions on Software Engineer-

ing and Methodology (TOSEM), 25(4):28, 2017.

[97] K. B. Pratt and G. Tschapek. Visualizing concept drift. In Proc. of the ninth

ACM SIGKDD international conference on knowledge discovery and data min-

ing. ACM, 2003.

[98] D. Redlich, T. Molka, W. Gilani, G. S. Blair, and A. Rashid. Scalable dynamic

business process discovery with the constructs competition miner. In SIMPDA,

pages 91–107, 2014.

[99] M. Reichert and B. Weber. Enabling flexibility in process-aware information

systems: challenges, methods, technologies. Springer Science & Business Me-

dia, 2012.

[100] H. A. Reijers. Design and control of workflow processes: business process

management for the service industry. Springer-Verlag, 2003.

[101] D. Reißner, R. Conforti, M. Dumas, M. La Rosa, and A. Armas-Cervantes. Scal-

able conformance checking of business processes. In OTM Confederated Inter-

national Conferences” On the Move to Meaningful Internet Systems”, pages

607–627. Springer, 2017.

[102] M. R. Reynolds Jr and J. C. Arnold. Ewma control charts with variable sample

sizes and variable sampling intervals. IIE transactions, 33(6):511–530, 2001.

[103] S. Roberts. Control chart tests based on geometric moving averages. Techno-

metrics, 1(3):239–250, 1959.

[104] G. J. Ross, N. M. Adams, D. K. Tasoulis, and D. J. Hand. Exponentially

weighted moving average charts for detecting concept drift. Pattern recogni-

tion letters, 33(2):191–198, 2012.

[105] H. Schonenberg, R. Mans, N. Russell, N. Mulyar, and W. van der Aalst. Process

flexibility: A survey of contemporary approaches. In Advances in enterprise

engineering I, pages 16–30. Springer, 2008.


BIBLIOGRAPHY

[106] D. W. Scott. Multivariate density estimation: theory, practice, and visualization,

volume 383. John Wiley & Sons, 2009.

[107] R. Sebastiao and J. Gama. Change detection in learning histograms from data

streams. In Portuguese Conference on Artificial Intelligence, pages 112–123.

Springer, 2007.

[108] D. Shasha, J.-L. Wang, K. Zhang, and F. Y. Shih. Exact and approximate algo-

rithms for unordered tree matching. IEEE Transactions on Systems, Man, and

Cybernetics, 24(4):668–678, 1994.

[109] W. A. Shewhart. Economic control of quality of manufactured product. ASQ

Quality Press, 1931.

[110] A. N. Shiryaev. On optimum methods in quickest detection problems. Theory

of Probability & Its Applications, 8(1):22–46, 1963.

[111] A. Shiryaev. The problem of the most rapid detection of a disturbance in a

stationary process. In Soviet Math. Dokl, volume 2, 1961.

[112] A. Shiryaev. On stochastic models and optimal methods in the quickest detec-

tion problems. Theory of Probability & Its Applications, 53(3):385–401, 2009.

[113] K.-C. Tai. The tree-to-tree correction problem. Journal of the ACM (JACM),

26(3):422–433, 1979.

[114] A. Tsymbal. The problem of concept drift: definitions and related work. Com-

puter Science Department, Trinity College Dublin, 106(2), 2004.

[115] N. R. van Beest, M. Dumas, L. Garcıa-Banuelos, and M. La Rosa. Log delta

analysis: Interpretable differencing of business process event logs. In Proc. of

BPM. Springer, 2015.

[116] W. Van Der Aalst. Process mining: discovery, conformance and enhancement

of business processes. Springer Science & Business Media, 2011.

[117] W. Van der Aalst, A. Adriansyah, and B. van Dongen. Replaying history on pro-

cess models for conformance checking and performance analysis. Wiley Inter-

disciplinary Reviews: Data Mining and Knowledge Discovery, 2(2):182–192,

2012.


BIBLIOGRAPHY

[118] W. Van der Aalst, T. Weijters, and L. Maruster. Workflow mining: Discovering

process models from event logs. IEEE Transactions on Knowledge & Data

Engineering, (9):1128–1142, 2004.

[119] W. M. P. van der Aalst. Process Mining: Discovery, Conformance and Enhance-

ment of Business Processes. Springer, 2011.

[120] W. M. Van der Aalst. The application of petri nets to workflow management.

Journal of circuits, systems, and computers, 8(01):21–66, 1998.

[121] W. M. van Der Aalst, M. Pesic, and H. Schonenberg. Declarative workflows:

Balancing between flexibility and support. Computer Science-Research and

Development, 23(2):99–113, 2009.

[122] S. J. van Zelst, M. F. Sani, A. Ostovar, R. Conforti, and M. La Rosa. Filtering

spurious events from event streams of business processes. 2018.

[123] S. K. vanden Broucke, J. Munoz-Gama, J. Carmona, B. Baesens, and J. Van-

thienen. Event-based real-time decomposed conformance analysis. In OTM



[124] E. Verbeek, J. C. A. M. Buijs, B. F. van Dongen, and W. M. P. van der Aalst.

Prom 6: The process mining toolkit. In M. L. Rosa, editor, Proceedings of the

Business Process Management 2010 Demonstration Track, Hoboken, NJ, USA,

September 14-16, 2010, volume 615 of CEUR Workshop Proceedings. CEUR-

WS.org, 2010.

[125] R. H. Von Alan, S. T. March, J. Park, and S. Ram. Design science in information

systems research. MIS quarterly, 28(1):75–105, 2004.

[126] P. Vorburger and A. Bernstein. Entropy-based concept shift detection. In Data

Mining, 2006. ICDM’06. Sixth International Conference on, pages 1113–1118.

IEEE, 2006.

[127] A. Wald. Sequential analysis. Courier Corporation, 1973.

[128] G. I. Webb, R. Hyde, H. Cao, H. L. Nguyen, and F. Petitjean. Characterizing

concept drift. Data Mining and Knowledge Discovery, 2016.


BIBLIOGRAPHY

[129] B. Weber, M. Reichert, and S. Rinderle-Ma. Change patterns and change sup-

port features–enhancing flexibility in process-aware information systems. DKE,

2008.

[130] A. Weijters, W. M. van Der Aalst, and A. A. De Medeiros. Process mining with

the heuristics miner-algorithm. Technische Universiteit Eindhoven, Tech. Rep.

WP, 166:1–34, 2006.

[131] M. Weske. Business process management architectures. In Business Process

Management, pages 333–371. Springer, 2012.

[132] G. Widmer and M. Kubat. Learning in the presence of concept drift and hidden

contexts. Machine learning, 23(1):69–101, 1996.

[133] Z. Wu, M. Yang, M. B. Khoo, and P. Castagliola. What are the best sample sizes

for the xbar and cusum charts? International Journal of Production Economics,

131(2):650–662, 2011.

[134] F. Zhang and E. D’Hollander. Using hammock graphs to structure programs.

IEEE Transactions on Software Engineering, 30(4):231–245, 2004.

[135] K. Zhang. A constrained edit distance between unordered labeled trees. Algo-

rithmica, 15(3):205–222, 1996.

[136] K. Zhang and T. Jiang. Some max snp-hard results concerning unordered la-

beled trees. Information Processing Letters, 49(5):249–254, 1994.

[137] K. Zhang and D. Shasha. Simple fast algorithms for the editing distance be-

tween trees and related problems. SIAM journal on computing, 18(6):1245–

1262, 1989.

[138] K. Zhang, R. Statman, and D. Shasha. On the editing distance between un-

ordered labeled trees. Information processing letters, 42(3):133–139, 1992.

[139] I. Zliobaite, A. Bifet, M. Gaber, B. Gabrys, J. Gama, L. Minku, and K. Musial.

Next challenges for adaptive learning systems. ACM SIGKDD Explorations

Newsletter, 14(1):48–55, 2012.


business process drift: detection and...

Documents