33 process cubes - eindhoven university of technology results generated using procube page 37 . page...

57
33 PAGE 0 Process Cubes Wil van der Aalst AIS Meeting 13-2-2014

Upload: others

Post on 01-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • 33

    PAGE 0

    Process Cubes

    Wil van der Aalst

    AIS Meeting 13-2-2014

  • DSC/e's Motto

    Turning Data into Value

  • What

    happened?

  • Why did

    it happen?

  • What will

    happen?

  • What is the

    best that

    can happen?

  • Data Science

    aims to

    answer such

    questions

    Process Mining

  • Motivation

    PAGE 7

    4.56 compute average

    create model

    (freq., sum, variance, mode, …)

    (alpha, heuristic, fuzzy, social,

    dotted, compliance, etc., etc.)

  • 2 dimensions: model = number

    PAGE 8

    hours: 50.5

    number: 32 passed

    failed

    TU/e HSE QUT

    hours: 48.5

    number: 55

    hours: 23.2

    number: 49

    hours: 23.5

    number: 23

    hours: 10.5

    number: 4

    hours: 24.5

    number: 8

  • 2 dimensions: higher-level models

    PAGE 9

    passed

    failed

    TU/e HSE QUT

  • Process Cubes

    PAGE 10

    W.M.P. van der Aalst.

    Process Cubes: Slicing, Dicing, Rolling Up

    and Drilling Down Event Data for Process

    Mining. In M. Song, M. Wynn, and J. Liu, editors, Asia Pacific

    Conference on Business Process Management (AP-

    BPM 2013), volume 159 of Lecture Notes in Business

    Information Processing, pages 1-22. Springer-Verlag,

    Berlin, 2013.

    10.1007/978-3-319-02922-1_1

  • Process Cube: Dimensions and Cells

    PAGE 11

    time

    win

    dow

    event class

    case

    typ

    e

    process cell

    event

    New behaviorprocess cube

    cell model

    cell sublog

    case

    s

    time

    ct

    ec

    tw

    dimension

    caseactivitytimestampresourcetype….

    event

  • Example Dimensions

    PAGE 12

    time location

    group

    concept

    drift

    analysis cross-

    organizational

    process

    mining

    clustering and

    classification

    acbe abce ade

    acbe abce ade

    acbe abce ade

    cases may

    be

    distributed

    over

    multiple

    cells, the

    same event

    may belong

    to multiple

    cells

  • Example

    PAGE 13

    gold

    silver

    normal

  • Another Example

    PAGE 14

    >100k

    50k

    >50k & 100k

    http://www.win.tue.nl/coselog/wiki/_detail/participants.png?id=start

  • Another Example

    PAGE 15

  • Another Example

    PAGE 16

    {1,2,3,4}

    {5,6,7}

    {8,9,10}

  • Roll up and drill down

    PAGE 17

    time

    win

    dow

    event class

    case

    typ

    e A

    G

    B

    C

    D

    E

    F

    1: ACDEG2: BCFG3: BCFG

    4: ACEDG5: ACFG

    6: ACEDG7: BCEDG8: BCDEG

    1: AC4: AC5: AC6: AC

    1: CDEG4: CEDG5: CFG

    6: CEDG

    2: BC3: BC7: BC8: BC

    2: CFG3: CFG

    7: CEDG8: CDEG

    A

    C

    B

    C

    GC

    D

    E

    F

    GC

    D

    E

    F

    time

    win

    dow

    event class

    case

    typ

    e

    drill down

    gold

    cu

    sto

    mer

    silv

    er c

    ust

    om

    er

    2012

    2013

    2012

    2013

    sales delivery

    roll up

  • Two foundational ways of spitting event

    data: horizontal or vertical

    PAGE 18

    A

    start complete

    G

    B

    C

    D

    E

    F

    1: ACDEG2: BCFG3: BCFG

    4: ACEDG5: ACFG

    6: ACEDG7: BCEDG8: BCDEG

    1: AC2: BC3: BC4: AC5: AC6: AC7: BC8: BC

    A

    G

    B

    C

    D

    E

    FC

    1: CDEG2: CFG3: CFG

    4: CEDG5: CFG

    6: CEDG7: CEDG8: CDEG

    split

    horizontally

    A C F G

    B C F G

    B C

    D

    G

    E

    A C

    D

    G

    E

    1: ACDEG4: ACEDG6: ACEDG

    7: BCEDG8: BCDEG

    5: ACFG

    2: BCFG3: BCFG

    merge

    horizontally

    split

    vertically

    merge

    vertically

  • Ingredients

    PAGE 19

    Event Base (EB) with

    (raw) event data

    Process Cube Structure (PCS)

    defining dimensions

    independent of content

    Process Cube View (PCV)

    based on PCS that can be

    linked to real events

  • Event Base (E,P, π)

    • Set of events E

    • Set of properties P

    • Partial mapping π

    (πp(e)=v means event e has property p with value v)

    PAGE 20

  • Slightly more formal

    PAGE 21

  • Process Cube Structure (D,type,hier)

    • D is the set of dimensions

    • type defines the type of each dimension (string, int,

    etc.)

    • hier defines the hierarchy per dimension (does not

    need to be a tree, elements at the same level may be

    partially overlapping)

    PAGE 22

    caseactivityresourcetypetotaltime

    Process Cube Dimensions

  • Compatible: D P and correct types

    PAGE 23

    reso

    urce

    activity

    case

    typ

    e

    Event Base (EB) with

    (raw) event data

    Process Cube Structure

    (PCS) defining

    dimensions D

    independent of content

  • Slightly more formal

    PAGE 24

  • Process Cube View (Dsel , sel)

    • View on a process cube structure (D,type,hier).

    • Dsel D are the selected dimensions.

    • sel is a function determining the structure of each

    dimension in D .

    PAGE 25

    reso

    urce

    activity

    case

    typ

    e

    reso

    urce

    activity

  • Slightly more formal

    PAGE 26

  • Process Cube View Example (1/3):

    Dimension resource is not selected

    PAGE 27

    reso

    urce

    activity

    case

    typ

    e

    activity

    case

    typ

    e

    PCV = (Dsel , sel) PCS = (D,type,hier)

  • Process Cube View Example (2/3):

    Dimension activity is coarsened

    PAGE 28

    reso

    urce

    activityca

    se t

    ype

    e.g., per subprocess rather than per activity

    reso

    urce

    activity

    case

    typ

    e

    PCV = (Dsel , sel) PCS = (D,type,hier)

  • Process Cube View Example (3/3):

    Dimension case type is filtered

    PAGE 29

    reso

    urce

    activity

    case

    typ

    e

    part of the case type dimension is not selected

    reso

    urce

    activity

    case

    typ

    e

    PCV = (Dsel , sel) PCS = (D,type,hier)

  • Materialized Process Cube View

    PAGE 30

    activity

    case

    typ

    e

    reso

    urce

    activity

    case

    typ

    e

    reso

    urce

    activity

    case

    typ

    e

    3 x 2 x 1 = 6 event logs

    2 x 2 x 3 = 12 event logs

    3 x 1 x 3 = 9 event logs

  • An event may be part of multiple cells!!

    • departments may

    overlap

    • subprocesses share

    interfaces

    • contracts involving

    multiple parties

    • ….

    PAGE 31

    activity

    case

    typ

    e

    • events are unique,

    • but may be part of multiple cells

  • Slightly more formal

    PAGE 32

  • OLAP (Online Analytical Processing)

    PAGE 33

  • OLAP operations can be formalized (see W.M.P. van der Aalst. Process Cubes: Slicing, Dicing, Rolling Up and Drilling Down Event Data

    for Process Mining. In AP-BPM 2013, volume 159 of Lecture Notes in Business Information

    Processing, pages 1-22. Springer-Verlag, Berlin, 2013.)

    PAGE 34

  • Initial Implementation

    (work of Tatiana Mamaliga)

    PAGE 35 http://www.processmining.org/_media/blogs/pub2013/masterthesistatianamamaliga.pdf

  • Architecture

    PAGE 36

  • Results Generated Using ProCube

    PAGE 37

  • PAGE 38

    problem 1

    performance

  • Sparsity is a problem when unloading

    cells to ProM

    PAGE 39

    Process cubes tend to

    be sparse. Palo does

    not handle this well.

    Ongoing work of Shengnan Guo.

  • PAGE 40

    problem 2

    interpretation

  • PAGE 41

  • Two variants of the same "model" …

    PAGE 42

    comparing models is closely related to configurable process models

  • PAGE 43

    link to decomposed

    process mining

  • Remember

    PAGE 44

    A

    start complete

    G

    B

    C

    D

    E

    F

    1: ACDEG2: BCFG3: BCFG

    4: ACEDG5: ACFG

    6: ACEDG7: BCEDG8: BCDEG

    1: AC2: BC3: BC4: AC5: AC6: AC7: BC8: BC

    A

    G

    B

    C

    D

    E

    FC

    1: CDEG2: CFG3: CFG

    4: CEDG5: CFG

    6: CEDG7: CEDG8: CDEG

    split

    horizontally

    A C F G

    B C F G

    B C

    D

    G

    E

    A C

    D

    G

    E

    1: ACDEG4: ACEDG6: ACEDG

    7: BCEDG8: BCDEG

    5: ACFG

    2: BCFG3: BCFG

    merge

    horizontally

    split

    vertically

    merge

    vertically

    both process

    cubes and

    decomposed

    process mining

    split event logs

  • Decomposing Conformance Checking

    PAGE 45

    SNprocessmodel

    Levent log

    decomposition technique

    conformancechecking

    technique

    M1submodel

    L1sublog

    conformance check

    M2submodel

    L2sublog

    conformance check

    Mnsubmodel

    Lnsublog

    conformance check

    conformance diagnostics

    decompose model

    decompose event log

    e.g., maximal decomposition, passage-based decomposition, or SESE/RPST-based decomposition

    e.g., A* based alignments, token-based replay, or simple replay until first deviation

    yields a (valid) activity partitioning

    See "divide and conquer"

    framework by Eric Verbeek.

  • Example of a valid decomposition

    PAGE 46

    a

    start t1

    SN1

    a

    c

    e

    c2

    t1

    t4

    t6SN3

    ab

    d

    e

    c1 c3

    t1

    t2

    t3

    t5

    t6

    SN2

    d

    g

    h

    e

    c5

    f

    t5

    t6

    t7 t8

    t9

    t10

    c6

    c7

    SN5

    c

    d

    c4

    t4

    t5

    SN4

    g

    h

    end

    f

    t8

    t9

    t10

    t11c8

    c9

    SN6

    a

    start

    b

    c

    d

    g

    h

    e

    end

    c1

    c2

    c3

    c4

    c5t1

    f

    t2

    t3

    t4

    t5

    t6

    t7 t8

    t9

    t10

    t11

    c6

    c7 c8

    c9

    Log can be split in the same way!

  • Example of an alignment for observed

    trace a,b,c,d,e,c,d,g,f

    PAGE 47

    a

    start

    b

    c

    d

    g

    h

    e

    end

    c1

    c2

    c3

    c4

    c5t1

    f

    t2

    t3

    t4

    t5

    t6

    t7 t8

    t9

    t10

    t11

    c6

    c7 c8

    c9

    a

    start t1

    SN1

    a

    c

    e

    c2

    t1

    t4

    t6SN3

    ab

    d

    e

    c1 c3

    t1

    t2

    t3

    t5

    t6

    SN2

    d

    g

    h

    e

    c5

    f

    t5

    t6

    t7 t8

    t9

    t10

    c6

    c7

    SN5

    c

    d

    c4

    t4

    t5

    SN4

    g

    h

    end

    f

    t8

    t9

    t10

    t11c8

    c9

    SN6

    Etc.

    a,b,c,d,e,c,d,g,f

  • Conformance checking can be

    decomposed !!!

    • General result for any valid decomposition: Any

    event log or trace is perfectly fitting the overall

    model if and only if it is also fitting all the individual

    fragments

    PAGE 48

    a

    start

    b

    c

    d

    g

    h

    e

    end

    c1

    c2

    c3

    c4

    c5t1

    f

    t2

    t3

    t4

    t5

    t6

    t7 t8

    t9

    t10

    t11

    c6

    c7 c8

    c9

    a

    start t1

    SN1

    a

    c

    e

    c2

    t1

    t4

    t6SN3

    ab

    d

    e

    c1 c3

    t1

    t2

    t3

    t5

    t6

    SN2

    d

    g

    h

    e

    c5

    f

    t5

    t6

    t7 t8

    t9

    t10

    c6

    c7

    SN5

    c

    d

    c4

    t4

    t5

    SN4

    g

    h

    end

    f

    t8

    t9

    t10

    t11c8

    c9

    SN6

    Wil van der Aalst, Decomposing Petri nets for process mining:

    A generic approach. Distributed and Parallel Databases,

    Volume 31, Issue 4, pp 471-507, 2013

  • Decomposing Process Discovery

    PAGE 49

    Mprocessmodel

    Levent log

    decomposition technique

    process discoverytechnique

    M1submodel

    L1sublog

    discovery

    M2submodel

    L2sublog

    Mnsubmodel

    Lnsublog

    compose model

    decompose event log

    discovery discovery

    e.g., causal graph based on frequencies is decomposed using passages or SESE/RPST

    e.g., language/state-based region discovery, variants of alpha algorithm, genetic process mining

    yields a (valid) activity partitioning

    See "divide and conquer"

    framework by Eric Verbeek.

  • Learn more about decomposing process

    mining problems?

    • W.M.P. van der Aalst. Decomposing Petri Nets for Process Mining: A Generic Approach. Distributed and Parallel

    Databases, 31(4):471-507, 2013.

    • W.M.P. van der Aalst. A General Divide and Conquer Approach for Process Mining. In M. Ganzha, L. Maciaszek,

    and M. Paprzycki, editors, Federated Conference on Computer Science and Information Systems (FedCSIS

    2013), pages 1-10. IEEE Computer Society, 2013.

    • W.M.P. van der Aalst and E. Verbeek. Process Discovery and Conformance Checking Using Passages.

    Fundamenta Informaticae, 2014.

    • W.M.P. van der Aalst. Decomposing Process Mining Problems Using Passages. In S. Haddad and L. Pomello,

    editors, Applications and Theory of Petri Nets 2012, volume 7347 of Lecture Notes in Computer Science, pages

    72-91. Springer-Verlag, Berlin, 2012.

    • J. Munoz-Gama, J. Carmona, and W.M.P. van der Aalst. Hierarchical Conformance Checking of Process Models

    Based on Event Logs. In J.M. Colom and J. Desel, editors, Applications and Theory of Petri Nets 2013, volume

    7927 of Lecture Notes in Computer Science, pages 291-310. Springer-Verlag, Berlin, 2013.

    • J. Munoz-Gama, J. Carmona, and W.M.P. van der Aalst. Conformance Checking in the Large: Partitioning and

    Topology. In F. Daniel, J. Wang, and B. Weber, editors, International Conference on Business Process

    Management (BPM 2013), volume 8094 of Lecture Notes in Computer Science, pages 130-145. Springer-Verlag,

    Berlin, 2013.

    • J. Munoz-Gama, J. Carmona, and W.M.P. van der Aalst. Single-Entry Single-Exit Decomposed Conformance

    Checking. Information Systems, 2014.

    • E. Verbeek and W.M.P. van der Aalst. Decomposing Replay Problems: A Case Study. In D. Moldt and H. Roelke,

    editors, Proceedings of the International Workshop on Petri Nets in Software Engineering (PNSE 2013), volume

    989 of CEUR Workshop Proceedings, pages 213-232. CEUR-WS.org, 2013.

    PAGE 50

  • PAGE 51

    Challenges

  • PAGE 52

  • PAGE 53

    formal

    (not just a

    picture)

    fast

    (should not

    take years) sound

    (result should

    at least be free

    of deadlocks,

    etc.)

    ability to balance

    all conformance

    dimensions

    (fitness, precision,

    generalization, and

    simplicity) incl.

    noise

    provide

    guarantees

    (not just a best

    effort)

    1

    2

    3 4

    5

    process

    discovery

    challenge

  • Selected challenges in process mining

    • Process mining in the Large (DeLiBiDa project)

    − decomposition & distribution (SESE, passages, etc.)

    − streaming data

    − not just discovery (conformance, repair, ext., pred.!)

    • Mining with/for stochastic models

    − likelihood of models, alignments, delays

    • Operational support

    − cases!

    • Data-aware and resource-aware process mining

    − decision points, work distribution, etc.

    − exploiting more informative event logs

    − low-level logs versus higher-level processes

    PAGE 54

  • Selected challenges in process mining

    • Process of process mining

    − workflow support, reporting

    − parameter tuning, experimentation

    − links to other types of mining (e.g., text mining)

    − repositories (models, data, and analysis techniques)

    − guidelines

    • Log extraction

    − correlation, n:m relations, etc.

    − continuous to discrete ("mine your own body")

    • Comparative process mining

    − process cubes, concept drift, etc.

    • Process improvement

    − optimization (improve your past), simulation, cigar box

    PAGE 55

  • Conclusion

    • Challenges in process mining:

    concept drift, comparing processes,

    cases, and organizations, need for

    contextual analysis, limited

    scalability, performance problems.

    • Unifying view: Process Cubes.

    • Similar to OLAP but also many

    differences.

    • Related to decomposing process

    mining problems.

    • Initial implementations, but just the

    starting point.

    PAGE 56