d ata envelopment a nalysis a tool for data mining and analytics joe zhu school of business...
TRANSCRIPT
DATA ENVELOPMENT ANALYSIS
A Tool for Data Mining and Analytics
Joe Zhu
School of BusinessWorcester Polytechnic Institute
Worcester, MA [email protected]
www.deafrontier.net
2
What is DEA? When DEA was developed/published in 1978
Non-parametric approach to estimating production functions
Thus, we have multiple inputs and multiple outputs (of a production function)
DEA tries to identify the efficient units
3
What is DEA exactly? More than production efficiency estimate It is a balanced benchmarking
Sherman and Zhu(2013) that enables companies to benchmark and locate best practices that are not visible through other commonly-used management methodologies Help executives to study the top-performing units, to
identify the best practice and to transfer the valuable knowledge throughout the organization to enhance performance, also to test their assumptions that might be counter-productive
4
A tool for benchmarking If one benchmarks the performance of computers, it is
natural to consider different features (screen size and resolution, memory size, process speed, hard disk size, and others). One would then have to classify these features into “inputs” and “outputs” in order to apply a proper DEA analysis. However, these features may not actually represent inputs and outputs at all, in the standard notion of production
DEA - revisit
5
Multiple inputs
Multiple outputs
the smaller the better
the larger the bettera rule for classifying metrics
6
DMU Definition of DMU is generic and flexible Numerous applications are found in areas of
finance, marketing, transportation, sports, accounting, energy, sustainability, fishery, insurance and others
7
(Relative) Efficiency The term ‘efficiency’ here presents best-practice
Under general benchmarking, it does not necessarily mean ‘production efficiency’
We may refer to the DEA score as a form of ‘overall performance’ of an organization
An example: measuring the quality of care in the case of treating heart-attack patients Some measures which can be used in DEA to yield a
composite measure of quality indicators Patients Given Aspirin at Arrival, Patients Given Beta
Blocker at Discharge, etc.
8
Mathematical Model
0,
1
0
osubject t
max
1
11
1
ir
m
iioi
m
iiji
s
rrjr
s
rror
x
xy
yz
.,...,2,1 0
;,...,2,1
;,...,2,1
subject tomin
1
1
*
nj
sryy
mixx
j
ro
n
j=jrj
io
n
jjij
Dual
Business Analytics byData Envelopment Analysis (DEA) Descriptive Analytics: Gain insight from
historical data Predictive Analytics: Forecasting Prescriptive Analytics: Recommend
decisions using optimization, simulation, etc.
Decisive Analytics: supports human decisions with visual analytics
10
DATA ENVELOPMENT ANALYSIS DEA is a DATA ANALYSIS tool Data Mining and Knowledge Discovery by
DEA More than Relative Efficiency
11
Sample Size DEA is not a form of regression model It is meaningless to apply a sample size
requirement to DEA It is likely that a significant portion of DMUs will be
benchmarked as the best practice with ratio 1, if there are too many performance metrics given the number of DMUs One can use certain DEA approaches to reduce the
number of best-practice DMUs
12
Regression analysis
13
Numerous Models/ApproachesOne modification to DEA is called stratification.• Stratification results in many efficiency frontiers.
The first represents all DMUs with the highest efficiency, and so on down each stratified level until all DMUs have been included.
Data Envelopment Analysis
14
Network Structure
Ship Block Manufacturing Process Performance Evaluation
Shipbuilding processBusiness & Service Computing Laboratory
Main processes of shipbuilding consist of several work stages
16
For effective ship construction
A ship is divided into properly sized blocks in the design stage
All blocks are manufactured (or assembled) into the body of a ship
Design
Cutting & Forming
Assembly
Pre-Outfitting & Painting
Pre-Erection
Erection
Quay
management of block manufacturing process (BMP)
17
Effective block manufacturing process (BMP) management has been regarded as one of the most important issues in shipbuilding industry
A large ship usually needs more than 250 different blocks, each manufactured through a different process according to the ship’s type and size
Many blocks are assembled into a ship, each block has complex manufacturing processes
Thus
An effective and efficient BMP performance enables a reduction of the overall shipbuilding period and thereby the cost
- If any one block includes unnecessary work stages, the related inefficient resource assignment or long queuing times in the storage yard will have a negative effect on the overall shipbuilding period and productivity
practical and accurate performance evaluation method that considers various factors reflecting real manufacturing processes and situations is crucial
For an effective management of BMP performance
For example
Practical difficulties in evaluating BMP performance
18
For effective BMP management, the shipbuilding companies have implemented production information systems (e.g. BAMS (Block Assembly Monitoring System) or RPMS (Real-time Progress Management System)…)
These systems only focus on work scheduling, process monitoring and work automation
There are at least two practical difficulties in evaluating BMP performance
1) There are many block assembly types (e.g. Sub-assembly, Unit-assembly, and Grand-assembly...) and each assembly type is in turn classified into one of three form types (e.g. Small, Curved, and Large…)
Generally, there is a 5~9 day delay between planned work and performed work
2) There are discrepancies between actual and planned work in the form of time gaps due to various problems (e.g. work delay, urgent work, and the convergence of blocks at the end of the process…)
But
Goal of this researchBusiness & Service Computing Laboratory
This research addresses above two practical difficulties in evaluating BMP performance
19
Data pre-processing
Data Extraction
Database in shipbuilding
company
This research proposes an integrated systematic approach to evaluate the performance of BMP in the shipbuilding industry by integrating process mining (PM) and DEA
Block manufacturing processes
Generation
Performance evaluation of BMP
Evaluation
Guideline for improving the performance of underperforming
BMPs
Process mining (PM) Data envelopment analysis (DEA)
Business & Service Computing Laboratory
20
Proposed method
Business & Service Computing Laboratory
21
Clustering
Identification Activity Time
Block ID Operation End time of unit task 101 C1 2012/05/24 11:00 101 G9 2012/06/07 12:00 101 S6 2012/05/24 14:00 102 C1 2012/05/25 11:00 102 G9 2012/06/08 12:00 102 S6 2012/05/25 14:00 104 C1 2012/05/29 10:00 104 G9 2012/06/08 16:00 104 H2 2012/05/22 12:00 104 S6 2012/05/29 17:00 105 C1 2012/06/01 11:00 105 G9 2012/06/13 11:00 105 H2 2012/05/30 11:00
Consider block ID 101
It includes three operations; C1, G9 and S6
Extract sample log data based on the defined attributes
Database
Attributes Identification Activity Time Schedule Material Data Block ID Operations Start and End time of
operation Planned working times
Welding amount
Defined attributes
BMP is generated as a form of operations flow from the extracted log data
We arrange these operations by End time in ascending order
The sequence of operations C1 S6 G9, is the BMP of block ID 101
Block ID Sequence of operations 101 C1 S6 G9 102 C1 S6 G9 104 H2 C1 S6 G9 105 H2 C1 G9
Generation of BMPs
The generated BMPs are then subjected to performance evaluation
Business & Service Computing Laboratory
22
Proposed method
Generated BMPs are heterogeneous since there are many kinds of BMPs
Block clustering
For a more accurate performance evaluation
Our intention is to evaluate homogeneous BMPs
We classify BMPs into several peer groups by their similarity
Therefore
The similarity of BMPs is measured by the similarity index, which is calculated by two vectors:
Task vector: based on the presence or absence of the same operations in two BMPs
Transition vector: based on the sequential relationship of the operations in two BMPs
The task vector and transition vector take values from 0 to 1, with values closer to 1 indicating that two BMPs are more similar
Business & Service Computing Laboratory
23
Performance evaluation
Each BMP is regarded as a DMU, and only BMPs in the same group are considered for performance evaluation
Due to the nature of our performance metrics, we use a DEA model where some performance metrics have target levels developed recently by Lim & Zhu (2013)
In our case, the performance metrics are selected based on the extracted log data.
We conducted a questionnaire survey of 30 shipbuilding operating experts to obtain information on which factors are most critical to BMP performance
Business & Service Computing Laboratory
24
Case study from a Korean shipbuilding company
Two projects’ event logs exported from a Block Assembly Monitoring System (BAMS) were used.
Eighty-six blocks are generated from the log data, which are then classified into six clusters
Condition of Experiment
In general, production planners assign the work resources and establish the production scheduling based on the block types defined by the empirical knowledge of shipbuilding operating experts. We refer to these defined block types in deciding the number of clusters
Business & Service Computing Laboratory
25
Case studyCluster Name # of Blocks Process characteristics of cluster
C1 12 Block assembly work in work shop #5.
C2 9 Grand assembly processes in work shop #5 after Unit assembly in work shop #4.
C3 7 Component work in work shop ‘C’
C4 9 Unit assembly and Grand assembly in work shop #3 after Component and Plate works in work shop ‘C’ and ‘P’.
C5 19 Grand assembly in work shop #2 after Component work in work shop ‘C’.
C6 30 Grand assembly or Special Ship assembly in work shop #1 and 2.
Clustering results including the number of blocks and the process characteristics of each cluster
We aggregate all BMPs in the cluster C5 to show a concrete instance for the clustering result
The aggregated model of all BMPs in C5 represents BMPs performed in the work shop #2
26
Case studyThe performance metrics are calculated and the descriptive statistics for them are listed
Total execution time (Hour)
Waiting time (Hour)
Gap between planned and actual working (Day)
Number of unit tasks
Material amount (m)
All BMPs Min 247.0 28.0 -10.0 5.0 112.8 Max 1,910.7 1,434.0 5.0 25.0 719.2 Avg 320.0 95.0 -2.1 14.0 481.8
BMPs in C1 Min 247.0 37.0 -7.0 6.0 84.3 Max 1,732.6 1,433.0 5.0 15.0 410.1 Avg 307.0 108.9 -1.0 9.0 234.6
BMPs in C2 Min 276.0 52.0 -3.0 5.0 72.8 Max 1,910.7 933.0 3.0 14.0 510.4 Avg 357.0 146.8 -1.5 8.0 281.1
BMPs in C3 Min 250.0 32.0 -8.0 6.0 74.3 Max 1,040.5 809.0 4.0 15.0 489.0 Avg 231.0 126.2 -1.0 9.0 293.7
BMPs in C4 Min 261.0 28.0 -8.0 8.0 105.3 Max 1,213.5 1,434.0 2.0 21.0 607.0 Avg 269.0 80.5 -3.2 14.0 323.1
BMPs in C5 Min 257.0 61.0 -4.0 9.0 139.2 Max 1,802.1 1,023.0 2.0 24.0 689.7 Avg 315.0 104.0 -1.0 15.0 497.1
BMPs in C6 Min 251.0 45.0 -10.0 5.0 123.5 Max 1,910.7 1,434.0 5.0 25.0 719.2 Avg 330.4 110.7 -4.5 15.0 498.5
27
Case studyThe evaluation results are summarized
All BMPs BMPs in C1 BMPs in C2 BMPs in C3 BMPs in C4 BMPs in C5 BMPs in C6 Average performance
0.60 0.61 0.69 0.43 0.54 0.70 0.62
Blocks Score Blocks Score Blocks Score Blocks Score
1XXX_622 1 1XXX_632 0.84 2XXX_653 0.67 2XXX_621 0.54 2XXX_509 1 1XXX_653 0.81 1XXX_642 0.64 1XXX_621 0.46
2XXX_622 1 2XXX_652 0.78 1XXX_652 0.61 1XXX_110 0.23
2XXX_631 1 2XXX_632 0.73 2XXX_643 0.58 2XXX_110 0.18 2XXX_642 1 1XXX_643 0.69 1XXX_631 0.55
Average performance scores of BMPs
Performance scores of BMPs in C5
Five blocks (1XXX_622, 2XXX_509, 2XXX_622, 2XXX_631, 2XXX_642) are determined as the best-practice, whereas the remaining 14 blocks are underperformingIn particular, 1XXX_110 and 2XXX_110 are the most underperforming blocks.Most of the best-practice blocks have the same BMPs as Comp 101-‘C’ Grand 201-‘P’ Grand 202-‘3’ Grand 203-‘3’ Grand 301-‘3’
Business & Service Computing Laboratory
28
Case study We analyze the underperforming BMPs (block 2XXX_110 and 1XXX_110) in from the
operations execution and resources utilization perspectives
We compare the difference between planned operations flow, which is managed by production schedulers, and the actual operations flow of block 2XXX_110
For the analysis of underperforming block from operations execution perspective
The actual operations flows for all best-practice blocks are the same as the planned operations flow
The actual operations flows for the underperforming BMPs are different from the planned operations flows
Grand 201-‘P’ and Grand 201-‘3’ have very similar operation
characteristics, but the work shop and items for these are different
As a result, block 2XXX_110 might have incurred a longer waiting time and execution time
On the other hand
The Grand 201-‘3’ was chosen discretionally by the worker for its similar operation characteristics
ConclusionBusiness & Service Computing Laboratory
29
We proposed an integrated approach to BMP performance evaluation in the shipbuilding industry by using process mining (PM) and DEA
Through application of the proposed approach, we verified its effectiveness and practicality
Shipbuilding operations experts, moreover, agreed that the provided guidelines can be valuable in establishing additional strategies for improving the performance and productivity of block manufacturing
It can be said that this research makes a constructive contribution to practical block performance evaluation in the shipbuilding industry
30
31
United Network for Organ Sharing (UNOS)
Many variables and observations related to lung and heart transplants.Need for fair and accurate predictions of survival time and quality of life.Ability for medical professionals to accurately predict best donor/recipient pairings may be flawed/biased.Variables contributing towards accurate predictions may be many, complex, and have poorly understood relationships.Reduction of large datasets is important.
32
Data concerning donor/recipient for lung/heart transplants.• Over 400 variables and 100,000+ observations BIG
DATA ANALYTICS
24 variables chosen by Oztekin et al.[2]
• Can reduce to 12,744 observations from cleaning.
Dataset
Variables Explanation Variable type
Donor Age Years Cont.
Recipient Age
Years Cont.
ABO_MAT ABO match level Ordinal
EINT Ethnicity match level
Binary
GINT Gender match level
Binary
GTIME Graft survival time
Cont.
Etc…
33
Variables are chosen according
to contributio
n
Data is preprocessed using
DEA
ANN is trained Predictions
DEANN Methodology
Metrics chosen according to importance with no need to be few in number.Preprocessing with DEA allows better training of ANN.ANN is applicable for “fuzzy” situations.
34
DEANN Methodology
Yes
STARTthe DEANN
methodology
STOPthe DEANN
methodology
Preprocess the dataset using
DEA
Update the dataset being
analyzed
Test: Is accuracy
satisfactory?
Measure the overall multi-class
accuracy
Perform the first prediction via
ANN
Determine the efficient DMUs
No
Conduct the initial training of ANN
12,744 records
35
Stratification yielded 12 efficiency levels.Individual levels yielded a higher correlation between the recipient functional status and the input variables when compared to consideration of many (or all) levels.The ANN is trained using one or more of these levels using ten-fold cross validation.DEA allows efficient observations to be utilized so that outlying transplants do not result in poor training of the ANN.DEANN allows the ANN to be trained from efficient data which will result in accurate predictions/faster training time.
DEA Results