advisor : dr.hsu graduate : keng-wei chang author : balaji rajagopalan

27
Intelligent Database Systems Lab Advisor Dr.Hsu Graduate Keng-Wei Chang Author Balaji Rajagopalan Mark W. Isken 國國國國國國國國 National Yunlin University of Science and Technology Exploiting data preparation to enhance mining and knowledge discovery IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS-PART C: APPLICATIONS AND REVIEWS, VOL. 31, NO. 4, NOVEMBER 2001

Upload: mckenzie-norton

Post on 31-Dec-2015

16 views

Category:

Documents


1 download

DESCRIPTION

國立雲林科技大學 National Yunlin University of Science and Technology. Exploiting data preparation to enhance mining and knowledge discovery. Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan Mark W. Isken. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Advisor : Dr.Hsu

Graduate : Keng-Wei Chang

Author : Balaji Rajagopalan

Mark W. Isken

國立雲林科技大學National Yunlin University of Science and Technology

Exploiting data preparation to enhance

mining and knowledge discovery

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS-PART C: APPLICATIONS AND REVIEWS, VOL. 31, NO. 4, NOVEMBER 2001

Page 2: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Outline

Motivation Objective Introduction Data Preparation Research Method Results

N.Y.U.S.T.

I.M.

Page 3: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Motivation using organizational data for mining and

knowledge discovery not amenable for mining in its natural form

N.Y.U.S.T.

I.M.

Page 4: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Objective data enhancement by the introduction of new

attributes along with judicious aggregation of existing attributes results in higher quality knowledge discovery differential impact on the performance of different

mining algorithms

N.Y.U.S.T.

I.M.

Page 5: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Introduction Exponential growth information result a

tremendous volume of data to knowledge workers.

Knowledge management solution Knowledge repository Knowledge sharing Knowledge discovery

N.Y.U.S.T.

I.M.

Page 6: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Data Preparation Present a framework based on prior research in

knowledge discovery Data quality Data characteristics Data preparation

N.Y.U.S.T.

I.M.

Page 7: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Research Method data set from a large tertiary care hospital in

the United States was used few topics

A. Problem Domain

B. Data

C. Clustering Algorithms for Knowledge Discovery

D. Entropy-Based Metrics for Cluster Quality

Assessment

E. Rule Extraction Metrics

N.Y.U.S.T.

I.M.

Page 8: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Problem Domain allocation of inpatient beds

more difficult is use quantitative resource allocation in a manageable set of patient types

quantitative resource sequence of hospital units visited and corresponding

length of stay patient types

a group of patients consuming a similar level of hospital resources

N.Y.U.S.T.

I.M.

Page 9: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Problem Domain refer to this as the patient classification

problem too few V.S. too many patient types The key is identify the set of patient types

N.Y.U.S.T.

I.M.

Page 10: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Data Inpatient obstetrical and gynecological (OB/G

YN) patient flow There are numerous fields

demographics physician information ICD9-CM diagnostic procedure codes

diagnosis-related groups (DRGs)

N.Y.U.S.T.

I.M.

Page 11: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Data almost 500 defined in DRGs range[353-384] are related to OB/GYN grouping these DRGs into five DRG types

N.Y.U.S.T.

I.M.

Page 12: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Clustering Algorithms for Knowledge Discovery

K-means and Kohonen seof-organizing Similarity

Euclidean distance function

N.Y.U.S.T.

I.M.

n

iii yxyxd

1

2,

Page 13: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Entropy-Based Metrics for Cluster Quality Assessment

Entropy

Weighted Entropy cluster size calculate a weighted average entropy measure for

a cluster solution

Purity, let

N.Y.U.S.T.

I.M.

i ijijj ppE

1log2

ijij pP max

be the number of cases having a DRG type of i in cluster j

ijn

l ljijij nnp /

Page 14: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Rule Extraction Metrics expect a high degree of resonance for most of

the rules with our domain knowledge

N.Y.U.S.T.

I.M.

Page 15: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Results detail the data enhancements relevant to this

studyA. Data Preparation : Basics

B. Mining and Knowledge Discovery

C. Differential Impact Based on Clustering Method

D. Usefulness of Knowledge Discovered

E. Limitations

F. Implications for Research and Practice

N.Y.U.S.T.

I.M.

Page 16: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Data Preparation : Basics Data set included fields that represent the path

and associated lengths of stay along that path

N.Y.U.S.T.

I.M.

Page 17: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Data Preparation : Basics Consider three data sets characterized in order

to illustrate the impact of data preparation ED1

Eight numeric variables

N.Y.U.S.T.

I.M.

Page 18: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Data Preparation : Basics ED2

Both DRG and CCS were designed to serve as aggregate measures of hospital resource consumption

in addition ED1, ED2 add five nominal variables

N.Y.U.S.T.

I.M.

Page 19: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Data Preparation : Basics ED3

in addition to ED2, ED3 contains two binary variables whether or not gave birth during the visit whether or not gave birth via C-section

N.Y.U.S.T.

I.M.

Page 20: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Mining and Knowledge DiscoveryN.Y.U.S.T.

I.M.

Page 21: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Mining and Knowledge DiscoveryN.Y.U.S.T.

I.M.

Page 22: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

N.Y.U.S.T.

I.M.

Page 23: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Differential Impact Based on Clustering Method

N.Y.U.S.T.

I.M.

Page 24: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Usefulness of Knowledge DiscoveredN.Y.U.S.T.

I.M.

Page 25: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Limitations may not exactly applicable in every case examine only two data mining algorithms

K-means and Kohonen self-organizing maps

illustrative, not exhaustive domain knowledge played a critical role in the

data preparation process

N.Y.U.S.T.

I.M.

Page 26: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Implications for Research and Practice

provides empirical evidence demonstrating the impact of data preparation on mining and knowledge discovery

engage in a comparative investigation of multiple altorithms

N.Y.U.S.T.

I.M.

Page 27: Advisor : Dr.Hsu Graduate :  Keng-Wei Chang Author :  Balaji Rajagopalan

Intelligent Database Systems Lab

Personal opinion …

N.Y.U.S.T.

I.M.