vit-pla: visual interactive tool for process log analysis · vit-pla: visual interactive tool for...

1
VIT-PLA: Visual Interactive Tool for Process Log Analysis Sen Yang 1 , Xin Dong 1 , Moliang Zhou 1 , Xinyu Li 1 , Shuhong Chen 1 , Rachel Webman 2 , Aleksandra Sarcevic 3 , Ivan Marsic 1 , Randall Burd 2 1 Rutgers University, Piscataway, NJ 2 Children’s Nat’l Medical Center, Washington, DC 3 Drexel University, Philadelphia, PA Abstract Techniques for analyzing and visualizing process or workflow data have been developed and applied in a wide range of domains (Van Der Aalst, W. et al. 2011; Monroe, M. et al. 2013). Visual analysis of large process logs and integration of statistical analysis, however, have been limited. We introduce the Visual Interactive Tool for Process Log Analysis (VIT-PLA) that provides a simplified process log visualization and performs statistical correlation analysis on process attributes. We demonstrate its use by applying it to an artificial dataset and running a preliminary analysis of trauma team task data collected from a medical emergency department. Data Preprocessing: Sequencing of Traces Process sequencing is necessary before more advanced processing. Activities coded in a process log usually have start and end timestamps (some logs may not include end time) for each activity. Idle time may exist between activities, and some activities may be executed concurrently. In process mining, process traces are usually sequenced by ascending order of the start time of activities. Future Direction In upcoming studies, we seek for: In addition to hierarchical clustering, we will evaluate other clustering algorithms (e.g. KNN, k-means, HMM-based clustering). Instead of deciding the cluster number manually, we plan on building a function that suggests cluster number based on some cluster metric. Apply our approach on more real world datasets. Acknowledgement This research is supported by National Institutes of Health under grant number 1R01LM011834-01A1. Preliminary Case Study I (a) (b) (c) Software: VIT-PLA is still under development, we have not published it yet. Please contact Sen Yang (sy [email protected]) if you have workflow data that you would like to analyze with VIT-PLA 4 6 5 1 7 8 9 10 2 3 Objective 1. Simplify visualization of large process logs (workflow data) 2. Uncover the correlation between process trace clusters and process trace attributes T1 a b c d d T2 a b c c a d d T1 a b c d c T2 a b c a d time concurrent activities idle times Sequencing: Step 1: Reorder concurrent activates Step 2: Remove the idle time between activities (a) (b) Method The core methods in VIT-PLA can be summarized as: 1) Clustering of process traces (workflow data) 2) Aggregation of process traces and selection of cluster prototype (representative of the other traces in the cluster) 3) Regression analysis to explore underlying knowledge 4) Interactive visualization of process traces and statistical analysis results. Process Attributes Process Traces Trace Clustering Data Aggregation Visualization Regression Analysis Statistics Clusters Cluster Membership Cluster Prototypes (1) (2) (3) (4) Method 1 A B C D E 2 A B B D E 3 A B D E 4 A B D E 5 A B C D D E 6 B A C D 7 B A C D E 8 B A D E 9 E C D 10 E B C D 1 A B C D E 2 A B B D E 3 A B D E 4 A B D E 5 A B C D D E 6 B A C D 7 B A C D E 8 B A D E 9 E C D 10 E B C D Cluster 1 Cluster 2 Cluster 3 Cluster 1 (5) A B D E Cluster 2 (3) B A C D E Cluster 3 (2) E C D Cluster Prototype (Medoid) Clustering Aggregate Input Process Log Case Study II Figure 5. (a) Workflow model (drawn based on BPMN) given by domain expert describing the initial evaluation of trauma, (b) Simplified visualization of 171 traces using four cluster prototypes, (c) Alignment view of four cluster prototypes (d) p- value for binomial logistic regression coefficients Findings: We find that one cluster (cluster 1) whose cluster prototype follows the model occurs more often during the day and another cluster (cluster 2) whose cluster prototype deviates from the model occurs more often at night. This association finding supports previous work showing decreased compliance with trauma protocols (ATLS) at night. (Carter, Elizabeth A., et al. 2013) Figure 4. Visualization of artificially generated dataset. (a) Alignment view of all 500 process traces; (b) Simplified visualization of 500 process traces using 10 cluster prototypes; (c) Alignment view of 10 cluster prototypes. Findings: Sequential order of consensus tasks (tasks that occur more than or equal to 50% in the column) is “ACEGFDHIB”; Pattern “HIJ” is repeated in two of the ten clusters (cluster 1 and cluster 2); Activity C is performed late in one cluster (cluster 5); Activity D is performed late in one cluster (cluster 3) and omitted in another (cluster 7) Multinomial Logistic Regression (one-vs.-one): ln = = = ln = 1− σ =1 −1 = = 0 + 1 1 + 2 2 +⋯+ , = 1, … , − 1 (1) Binomial Logistic Regression (one-vs.-rest): ln = = ln = 1− = = 0 + 1 1 + 2 2 +⋯+ , = 1, … , (2) Figure 1. An example showing data clustering and aggregation. The cluster prototype used here is cluster medoid. Figure 2. An example of two types of trace alignment: (a) Context-Aware and (b) Duration-Aware. The sequences at the bottom of (a) and (b) are consensus sequences derived from the data. A gap symbol “-” or white space is inserted if a match cannot be found. The five process traces shown here are from Cluster 1 in Figure 1. Figure 3. VIT-PLA GUI design

Upload: others

Post on 08-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VIT-PLA: Visual Interactive Tool for Process Log Analysis · VIT-PLA: Visual Interactive Tool for Process Log Analysis Sen Yang1, Xin Dong1, Moliang Zhou1, Xinyu Li1, Shuhong Chen1,

VIT-PLA: Visual Interactive Tool for Process Log AnalysisSen Yang1, Xin Dong1, Moliang Zhou1, Xinyu Li1, Shuhong Chen1, Rachel Webman2, Aleksandra Sarcevic3, Ivan Marsic1, Randall Burd 2

1Rutgers University, Piscataway, NJ2Children’s Nat’l Medical Center, Washington, DC

3Drexel University, Philadelphia, PA

AbstractTechniques for analyzing and visualizing process or workflowdata have been developed and applied in a wide range ofdomains (Van Der Aalst, W. et al. 2011; Monroe, M. et al.2013). Visual analysis of large process logs and integration ofstatistical analysis, however, have been limited. We introducethe Visual Interactive Tool for Process Log Analysis (VIT-PLA)that provides a simplified process log visualization andperforms statistical correlation analysis on process attributes.We demonstrate its use by applying it to an artificial datasetand running a preliminary analysis of trauma team task datacollected from a medical emergency department.

Data Preprocessing: Sequencing of TracesProcess sequencing is necessary before more advancedprocessing. Activities coded in a process log usually have startand end timestamps (some logs may not include end time) foreach activity. Idle time may exist between activities, and someactivities may be executed concurrently. In process mining,process traces are usually sequenced by ascending order of thestart time of activities.

Future DirectionIn upcoming studies, we seek for:• In addition to hierarchical clustering, we will evaluate other

clustering algorithms (e.g. KNN, k-means, HMM-basedclustering).

• Instead of deciding the cluster number manually, we plan onbuilding a function that suggests cluster number based onsome cluster metric.

• Apply our approach on more real world datasets.

AcknowledgementThis research is supported by National Institutes of Healthunder grant number 1R01LM011834-01A1.

Preliminary Case Study I

(a)

(b)

(c)

Software: VIT-PLA is still under development, we have not published it yet. Please contact SenYang ([email protected]) if you have workflow data that you would like to analyzewith VIT-PLA

4

65

1

7 8

9

10

2

3

Objective1. Simplify visualization of large process logs (workflow data)2. Uncover the correlation between process trace clusters and

process trace attributes

T1 a b c d d

T2 a b c c a d

d

T1 a b c d

c

T2 a b c a d

time

concurrent activities

idle times

Sequencing:Step 1: Reorder concurrent activates

Step 2: Remove the idle time between activities

(a)

(b)

MethodThe core methods in VIT-PLA can be summarized as:1) Clustering of process traces (workflow data)2) Aggregation of process traces and selection of cluster

prototype (representative of the other traces in the cluster)3) Regression analysis to explore underlying knowledge4) Interactive visualization of process traces and statistical

analysis results.

Process

Attributes

Process

Traces

Trace

Clustering

Data

Aggregation

VisualizationRegression

Analysis

Statistics

Clusters

Cluster

MembershipCluster

Prototypes

(1)(2)

(3)(4)

Method

1 A B C D E

2 A B B D E

3 A B D E

4 A B D E

5 A B C D D E

6 B A C D

7 B A C D E

8 B A D E

9 E C D

10 E B C D

1 A B C D E

2 A B B D E

3 A B D E

4 A B D E

5 A B C D D E

6 B A C D

7 B A C D E

8 B A D E

9 E C D

10 E B C D

Cluster 1

Cluster 2

Cluster 3

Cluster 1 (5) A B D E

Cluster 2 (3) B A C D E

Cluster 3 (2) E C D

Cluster Prototype (Medoid)Clustering Aggregate

Input Process Log

Case Study II

Figure 5. (a) Workflow model (drawn based on BPMN) given bydomain expert describing the initial evaluation of trauma, (b)Simplified visualization of 171 traces using four clusterprototypes, (c) Alignment view of four cluster prototypes (d) p-value for binomial logistic regression coefficients

Findings: We find that one cluster (cluster 1) whose clusterprototype follows the model occurs more often during the dayand another cluster (cluster 2) whose cluster prototypedeviates from the model occurs more often at night. Thisassociation finding supports previous work showing decreasedcompliance with trauma protocols (ATLS) at night. (Carter,Elizabeth A., et al. 2013)

Figure 4. Visualization of artificially generated dataset. (a) Alignmentview of all 500 process traces; (b) Simplified visualization of 500 processtraces using 10 cluster prototypes; (c) Alignment view of 10 clusterprototypes.

Findings:• Sequential order of consensus

tasks (tasks that occur morethan or equal to 50% in thecolumn) is “ACEGFDHIB”;

• Pattern “HIJ” is repeated in twoof the ten clusters (cluster 1and cluster 2);

• Activity C is performed late inone cluster (cluster 5);

• Activity D is performed late inone cluster (cluster 3) andomitted in another (cluster 7)

Multinomial Logistic Regression (one-vs.-one):

ln𝑃 𝐶=𝑖

𝑃 𝐶=𝐽= ln

𝑃 𝐶=𝑖

1− σ𝑗=1𝐽−1

𝑃 𝐶=𝑗= 𝛽𝑖0 + 𝛽𝑖1𝑥𝑖1 +

𝛽𝑖2𝑥𝑖2 +⋯+ 𝛽𝑖𝐾𝑥𝑖𝐾 , 𝑖 = 1, … , 𝐾 − 1 (1)

Binomial Logistic Regression (one-vs.-rest):

ln𝑃 𝐶=𝑖

𝑃 𝐶≠𝑖= ln

𝑃 𝐶=𝑖

1− 𝑃 𝐶=𝑖= 𝛽𝑖0 + 𝛽𝑖1𝑥𝑖1 +

𝛽𝑖2𝑥𝑖2 +⋯+ 𝛽𝑖𝐾𝑥𝑖𝐾 , 𝑖 = 1, … , 𝐾 (2)

Figure 1. An example showing data clustering andaggregation. The cluster prototype used here iscluster medoid.

Figure 2. An example of two types of trace alignment:(a) Context-Aware and (b) Duration-Aware. Thesequences at the bottom of (a) and (b) are consensussequences derived from the data. A gap symbol “-” orwhite space is inserted if a match cannot be found.The five process traces shown here are from Cluster 1in Figure 1.

Figure 3. VIT-PLA GUI design