d ata envelopment a nalysis a tool for data mining and analytics joe zhu school of business...

35
DATA ENVELOPMENT ANALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 [email protected] www.deafrontier.net

Upload: lawrence-gaines

Post on 21-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

DATA ENVELOPMENT ANALYSIS

A Tool for Data Mining and Analytics

Joe Zhu

School of BusinessWorcester Polytechnic Institute

Worcester, MA [email protected]

www.deafrontier.net

Page 2: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

2

What is DEA? When DEA was developed/published in 1978

Non-parametric approach to estimating production functions

Thus, we have multiple inputs and multiple outputs (of a production function)

DEA tries to identify the efficient units

Page 3: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

3

What is DEA exactly? More than production efficiency estimate It is a balanced benchmarking

Sherman and Zhu(2013) that enables companies to benchmark and locate best practices that are not visible through other commonly-used management methodologies Help executives to study the top-performing units, to

identify the best practice and to transfer the valuable knowledge throughout the organization to enhance performance, also to test their assumptions that might be counter-productive

Page 4: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

4

A tool for benchmarking If one benchmarks the performance of computers, it is

natural to consider different features (screen size and resolution, memory size, process speed, hard disk size, and others). One would then have to classify these features into “inputs” and “outputs” in order to apply a proper DEA analysis. However, these features may not actually represent inputs and outputs at all, in the standard notion of production

Page 5: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

DEA - revisit

5

Multiple inputs

Multiple outputs

the smaller the better

the larger the bettera rule for classifying metrics

Page 6: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

6

DMU Definition of DMU is generic and flexible Numerous applications are found in areas of

finance, marketing, transportation, sports, accounting, energy, sustainability, fishery, insurance and others

Page 7: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

7

(Relative) Efficiency The term ‘efficiency’ here presents best-practice

Under general benchmarking, it does not necessarily mean ‘production efficiency’

We may refer to the DEA score as a form of ‘overall performance’ of an organization

An example: measuring the quality of care in the case of treating heart-attack patients Some measures which can be used in DEA to yield a

composite measure of quality indicators Patients Given Aspirin at Arrival, Patients Given Beta

Blocker at Discharge, etc.

Page 8: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

8

Mathematical Model

0,

1

0

osubject t

max

1

11

1

ir

m

iioi

m

iiji

s

rrjr

s

rror

x

xy

yz

.,...,2,1 0

;,...,2,1

;,...,2,1

subject tomin

1

1

*

nj

sryy

mixx

j

ro

n

j=jrj

io

n

jjij

Dual

Page 9: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

Business Analytics byData Envelopment Analysis (DEA) Descriptive Analytics: Gain insight from

historical data Predictive Analytics: Forecasting Prescriptive Analytics: Recommend

decisions using optimization, simulation, etc.

Decisive Analytics: supports human decisions with visual analytics

Page 10: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

10

DATA ENVELOPMENT ANALYSIS DEA is a DATA ANALYSIS tool Data Mining and Knowledge Discovery by

DEA More than Relative Efficiency

Page 11: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

11

Sample Size DEA is not a form of regression model It is meaningless to apply a sample size

requirement to DEA It is likely that a significant portion of DMUs will be

benchmarked as the best practice with ratio 1, if there are too many performance metrics given the number of DMUs One can use certain DEA approaches to reduce the

number of best-practice DMUs

Page 12: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

12

Regression analysis

Page 13: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

13

Numerous Models/ApproachesOne modification to DEA is called stratification.• Stratification results in many efficiency frontiers.

The first represents all DMUs with the highest efficiency, and so on down each stratified level until all DMUs have been included.

Data Envelopment Analysis

Page 14: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

14

Network Structure

Page 15: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

Ship Block Manufacturing Process Performance Evaluation

Page 16: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

Shipbuilding processBusiness & Service Computing Laboratory

Main processes of shipbuilding consist of several work stages

16

For effective ship construction

A ship is divided into properly sized blocks in the design stage

All blocks are manufactured (or assembled) into the body of a ship

Design

Cutting & Forming

Assembly

Pre-Outfitting & Painting

Pre-Erection

Erection

Quay

Page 17: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

management of block manufacturing process (BMP)

17

Effective block manufacturing process (BMP) management has been regarded as one of the most important issues in shipbuilding industry

A large ship usually needs more than 250 different blocks, each manufactured through a different process according to the ship’s type and size

Many blocks are assembled into a ship, each block has complex manufacturing processes

Thus

An effective and efficient BMP performance enables a reduction of the overall shipbuilding period and thereby the cost

- If any one block includes unnecessary work stages, the related inefficient resource assignment or long queuing times in the storage yard will have a negative effect on the overall shipbuilding period and productivity

practical and accurate performance evaluation method that considers various factors reflecting real manufacturing processes and situations is crucial

For an effective management of BMP performance

For example

Page 18: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

Practical difficulties in evaluating BMP performance

18

For effective BMP management, the shipbuilding companies have implemented production information systems (e.g. BAMS (Block Assembly Monitoring System) or RPMS (Real-time Progress Management System)…)

These systems only focus on work scheduling, process monitoring and work automation

There are at least two practical difficulties in evaluating BMP performance

1) There are many block assembly types (e.g. Sub-assembly, Unit-assembly, and Grand-assembly...) and each assembly type is in turn classified into one of three form types (e.g. Small, Curved, and Large…)

Generally, there is a 5~9 day delay between planned work and performed work

2) There are discrepancies between actual and planned work in the form of time gaps due to various problems (e.g. work delay, urgent work, and the convergence of blocks at the end of the process…)

But

Page 19: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

Goal of this researchBusiness & Service Computing Laboratory

This research addresses above two practical difficulties in evaluating BMP performance

19

Data pre-processing

Data Extraction

Database in shipbuilding

company

This research proposes an integrated systematic approach to evaluate the performance of BMP in the shipbuilding industry by integrating process mining (PM) and DEA

Block manufacturing processes

Generation

Performance evaluation of BMP

Evaluation

Guideline for improving the performance of underperforming

BMPs

Process mining (PM) Data envelopment analysis (DEA)

Page 20: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

Business & Service Computing Laboratory

20

Proposed method

Page 21: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

Business & Service Computing Laboratory

21

Clustering

Identification Activity Time

Block ID Operation End time of unit task 101 C1 2012/05/24 11:00 101 G9 2012/06/07 12:00 101 S6 2012/05/24 14:00 102 C1 2012/05/25 11:00 102 G9 2012/06/08 12:00 102 S6 2012/05/25 14:00 104 C1 2012/05/29 10:00 104 G9 2012/06/08 16:00 104 H2 2012/05/22 12:00 104 S6 2012/05/29 17:00 105 C1 2012/06/01 11:00 105 G9 2012/06/13 11:00 105 H2 2012/05/30 11:00

Consider block ID 101

It includes three operations; C1, G9 and S6

Extract sample log data based on the defined attributes

Database

Attributes Identification Activity Time Schedule Material Data Block ID Operations Start and End time of

operation Planned working times

Welding amount

Defined attributes

BMP is generated as a form of operations flow from the extracted log data

We arrange these operations by End time in ascending order

The sequence of operations C1 S6 G9, is the BMP of block ID 101

Block ID Sequence of operations 101 C1 S6 G9 102 C1 S6 G9 104 H2 C1 S6 G9 105 H2 C1 G9

Generation of BMPs

The generated BMPs are then subjected to performance evaluation

Page 22: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

Business & Service Computing Laboratory

22

Proposed method

Generated BMPs are heterogeneous since there are many kinds of BMPs

Block clustering

For a more accurate performance evaluation

Our intention is to evaluate homogeneous BMPs

We classify BMPs into several peer groups by their similarity

Therefore

The similarity of BMPs is measured by the similarity index, which is calculated by two vectors:

Task vector: based on the presence or absence of the same operations in two BMPs

Transition vector: based on the sequential relationship of the operations in two BMPs

The task vector and transition vector take values from 0 to 1, with values closer to 1 indicating that two BMPs are more similar

Page 23: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

Business & Service Computing Laboratory

23

Performance evaluation

Each BMP is regarded as a DMU, and only BMPs in the same group are considered for performance evaluation

Due to the nature of our performance metrics, we use a DEA model where some performance metrics have target levels developed recently by Lim & Zhu (2013)

In our case, the performance metrics are selected based on the extracted log data.

We conducted a questionnaire survey of 30 shipbuilding operating experts to obtain information on which factors are most critical to BMP performance

Page 24: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

Business & Service Computing Laboratory

24

Case study from a Korean shipbuilding company

Two projects’ event logs exported from a Block Assembly Monitoring System (BAMS) were used.

Eighty-six blocks are generated from the log data, which are then classified into six clusters

Condition of Experiment

In general, production planners assign the work resources and establish the production scheduling based on the block types defined by the empirical knowledge of shipbuilding operating experts. We refer to these defined block types in deciding the number of clusters

Page 25: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

Business & Service Computing Laboratory

25

Case studyCluster Name # of Blocks Process characteristics of cluster

C1 12 Block assembly work in work shop #5.

C2 9 Grand assembly processes in work shop #5 after Unit assembly in work shop #4.

C3 7 Component work in work shop ‘C’

C4 9 Unit assembly and Grand assembly in work shop #3 after Component and Plate works in work shop ‘C’ and ‘P’.

C5 19 Grand assembly in work shop #2 after Component work in work shop ‘C’.

C6 30 Grand assembly or Special Ship assembly in work shop #1 and 2.

Clustering results including the number of blocks and the process characteristics of each cluster

We aggregate all BMPs in the cluster C5 to show a concrete instance for the clustering result

The aggregated model of all BMPs in C5 represents BMPs performed in the work shop #2

Page 26: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

26

Case studyThe performance metrics are calculated and the descriptive statistics for them are listed

Total execution time (Hour)

Waiting time (Hour)

Gap between planned and actual working (Day)

Number of unit tasks

Material amount (m)

All BMPs Min 247.0 28.0 -10.0 5.0 112.8 Max 1,910.7 1,434.0 5.0 25.0 719.2 Avg 320.0 95.0 -2.1 14.0 481.8

BMPs in C1 Min 247.0 37.0 -7.0 6.0 84.3 Max 1,732.6 1,433.0 5.0 15.0 410.1 Avg 307.0 108.9 -1.0 9.0 234.6

BMPs in C2 Min 276.0 52.0 -3.0 5.0 72.8 Max 1,910.7 933.0 3.0 14.0 510.4 Avg 357.0 146.8 -1.5 8.0 281.1

BMPs in C3 Min 250.0 32.0 -8.0 6.0 74.3 Max 1,040.5 809.0 4.0 15.0 489.0 Avg 231.0 126.2 -1.0 9.0 293.7

BMPs in C4 Min 261.0 28.0 -8.0 8.0 105.3 Max 1,213.5 1,434.0 2.0 21.0 607.0 Avg 269.0 80.5 -3.2 14.0 323.1

BMPs in C5 Min 257.0 61.0 -4.0 9.0 139.2 Max 1,802.1 1,023.0 2.0 24.0 689.7 Avg 315.0 104.0 -1.0 15.0 497.1

BMPs in C6 Min 251.0 45.0 -10.0 5.0 123.5 Max 1,910.7 1,434.0 5.0 25.0 719.2 Avg 330.4 110.7 -4.5 15.0 498.5

Page 27: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

27

Case studyThe evaluation results are summarized

All BMPs BMPs in C1 BMPs in C2 BMPs in C3 BMPs in C4 BMPs in C5 BMPs in C6 Average performance

0.60 0.61 0.69 0.43 0.54 0.70 0.62

Blocks Score Blocks Score Blocks Score Blocks Score

1XXX_622 1 1XXX_632 0.84 2XXX_653 0.67 2XXX_621 0.54 2XXX_509 1 1XXX_653 0.81 1XXX_642 0.64 1XXX_621 0.46

2XXX_622 1 2XXX_652 0.78 1XXX_652 0.61 1XXX_110 0.23

2XXX_631 1 2XXX_632 0.73 2XXX_643 0.58 2XXX_110 0.18 2XXX_642 1 1XXX_643 0.69 1XXX_631 0.55

Average performance scores of BMPs

Performance scores of BMPs in C5

Five blocks (1XXX_622, 2XXX_509, 2XXX_622, 2XXX_631, 2XXX_642) are determined as the best-practice, whereas the remaining 14 blocks are underperformingIn particular, 1XXX_110 and 2XXX_110 are the most underperforming blocks.Most of the best-practice blocks have the same BMPs as Comp 101-‘C’ Grand 201-‘P’ Grand 202-‘3’ Grand 203-‘3’ Grand 301-‘3’

Page 28: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

Business & Service Computing Laboratory

28

Case study We analyze the underperforming BMPs (block 2XXX_110 and 1XXX_110) in from the

operations execution and resources utilization perspectives

We compare the difference between planned operations flow, which is managed by production schedulers, and the actual operations flow of block 2XXX_110

For the analysis of underperforming block from operations execution perspective

The actual operations flows for all best-practice blocks are the same as the planned operations flow

The actual operations flows for the underperforming BMPs are different from the planned operations flows

Grand 201-‘P’ and Grand 201-‘3’ have very similar operation

characteristics, but the work shop and items for these are different

As a result, block 2XXX_110 might have incurred a longer waiting time and execution time

On the other hand

The Grand 201-‘3’ was chosen discretionally by the worker for its similar operation characteristics

Page 29: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

ConclusionBusiness & Service Computing Laboratory

29

We proposed an integrated approach to BMP performance evaluation in the shipbuilding industry by using process mining (PM) and DEA

Through application of the proposed approach, we verified its effectiveness and practicality

Shipbuilding operations experts, moreover, agreed that the provided guidelines can be valuable in establishing additional strategies for improving the performance and productivity of block manufacturing

It can be said that this research makes a constructive contribution to practical block performance evaluation in the shipbuilding industry

Page 30: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

30

Page 31: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

31

United Network for Organ Sharing (UNOS)

Many variables and observations related to lung and heart transplants.Need for fair and accurate predictions of survival time and quality of life.Ability for medical professionals to accurately predict best donor/recipient pairings may be flawed/biased.Variables contributing towards accurate predictions may be many, complex, and have poorly understood relationships.Reduction of large datasets is important.

Page 32: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

32

Data concerning donor/recipient for lung/heart transplants.• Over 400 variables and 100,000+ observations BIG

DATA ANALYTICS

24 variables chosen by Oztekin et al.[2]

• Can reduce to 12,744 observations from cleaning.

Dataset

Variables Explanation Variable type

Donor Age Years Cont.

Recipient Age

Years Cont.

ABO_MAT ABO match level Ordinal

EINT Ethnicity match level

Binary

GINT Gender match level

Binary

GTIME Graft survival time

Cont.

Etc…

Page 33: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

33

Variables are chosen according

to contributio

n

Data is preprocessed using

DEA

ANN is trained Predictions

DEANN Methodology

Metrics chosen according to importance with no need to be few in number.Preprocessing with DEA allows better training of ANN.ANN is applicable for “fuzzy” situations.

Page 34: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

34

DEANN Methodology

Yes

STARTthe DEANN

methodology

STOPthe DEANN

methodology

Preprocess the dataset using

DEA

Update the dataset being

analyzed

Test: Is accuracy

satisfactory?

Measure the overall multi-class

accuracy

Perform the first prediction via

ANN

Determine the efficient DMUs

No

Conduct the initial training of ANN

12,744 records

Page 35: D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609 jzhu@wpi.edu

35

Stratification yielded 12 efficiency levels.Individual levels yielded a higher correlation between the recipient functional status and the input variables when compared to consideration of many (or all) levels.The ANN is trained using one or more of these levels using ten-fold cross validation.DEA allows efficient observations to be utilized so that outlying transplants do not result in poor training of the ANN.DEANN allows the ANN to be trained from efficient data which will result in accurate predictions/faster training time.

DEA Results