data mining, data pattern, machine learning(week 2

DATA MINING Data Mining, Data Pattern and Machine Learning

Upload: s4vana

Post on 10-Apr-2018




0 download


Page 1: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 1/19


Data Mining, Data Pattern

and Machine Learning

Page 2: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 2/19


• “…the analysis of (often large) observational data sets to find

unsuspected relationships and to summarize the data in novelways that are both understandable and useful to the dataowner.”

Hand, Mannila & Smyth


“… an interdisciplinary field bringing together techniques frommachine learning, pattern recognition, statistics, databases,and visualization to address the issue of information extractionfrom large data bases.”

Evangelos Simoudis in Cabena et al.

• “… the extraction of implicit, previously unknown, andpotentially useful information from data.”

Witten & Frank


Page 3: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 3/19

Why Has Data Mining Appeared

• Large volumes of data stored by organizations in a

competitive environment combined with advances intechnologies which can be applied to the data

• Background and evolution


• The need for exploratory data analysis

 –  Niche marketing, customer retention, the internet, onlineinteraction, scientific discovery

• The means to implement Data Mining –  data warehouses, computing power, effective modelling



Page 4: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 4/19

Structural Pattern of Data


Page 5: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 5/19

Structural Pattern of Data --cont--


Page 6: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 6/19

Machine Learning

• To learn:

 –  To get knowledge of by study, experience, or beingtaught

 –  To become aware by information or from observation


o comm t to memory –  To be informed

 –  To receive instruction

Learning: –  Things learn when they change their behavior in a way

that makes them perform better in the future


Page 7: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 7/19

Machine Learning --cont--

• Machine Learning involves learning in

practical not in theoretical

• Interested in techniques for finding and

for helping to explain that data and make

predictions from it


Page 8: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 8/19

Data Mining

• Preliminary Analysis

 –  Much interesting information can be found byquerying the data set

 –  May be supported by a visualisation of the data set


Choose a one or more modelling approaches• There are (at least?) two styles of data mining

 –  Hypothesis testing

 – Knowledge discovery

• The styles and approaches are not mutuallyexclusive


Page 9: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 9/19

The Proses of Knowlegde Discovery

• Pre-processing

 –  data selection

 –  cleaning

 –  codin

• Data Mining

 –  select a model

 –  apply the model

• Analysis of results and assimilation

 –  Take action and measure the results


Page 10: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 10/19

Data Selection

• Identify the relevant data, both internal and

external to the organisation

• Select the subset of the data appropriate for

• Store the data in a database separate from

the operational systems


Page 11: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 11/19

Data Pre-Processing

• Cleaning

 –  Domain consistency: replace certain values with


 –  -

database (DB) on each purchase transaction

 –  Disambiguation: highlighting ambiguities for a

decision by the user

• e.g., if names differed slightly but addresses were the



Page 12: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 12/19

Data Pre-Processing –cont--

• Enrichment

 –  Additional fields are added to records from externalsources which may be vital in establishingrelationships.

 –  e.g., take addresses and replace them with regionalcodes

 –  e.g., transform birth dates into age ranges

• It is often necessary to convert continuous datainto range data for categorisation purposes.


Page 13: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 13/19

Data Mining Task• Various taxonomies exist. E.g. Berry & Linoff 6 tasks:

 –  Classification

 –  Estimation (a.k.a. regression)

 –  Prediction

 –  Association Rule Discovery (a.k.a. Affinity Grouping )

 –  Clustering

 –  Description

• The tasks are also referred to as operations. Cabena et al. define 4 operations:

 –  Predictive Modelling

 –  Database Segmentation (a.k.a. clustering)

 –  Link Analysis

 –  Deviation Detection

• Beware! Different authors use different names for the same technique, operation

or task.


Page 14: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 14/19


• Classification involves considering the

features of some object then assigning it it tosome pre-defined class, for example:


 –  Which phone numbers are fax numbers

 –  Which customers are high-value


Page 15: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 15/19


• Regression deals with numerically valued

outcomes rather than discrete categories asoccurs in classification.


 –  Estimating family income


Page 16: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 16/19


• Essentially the same as classification and

estimation but involves future behavior

• Historical data is used to build a model

• The model developed is then applied to current

inputs to predict future outputs

 –  Predict which customers will respond to an

advertising promotion

 –  Classifying loan applications


Page 17: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 17/19

Association Rule Discovery

• Association Rule Discovery is also referred to

as Market Basket Analysis, or Affinitygrouping

• A common exam le is discoverin which

items are bought together at thesupermarket. Once this is known, decisionscan be made on, for example:

 –  how to arrange items on the shelves –  which items should be promoted together


Page 18: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 18/19


Clustering is also sometimes referred to assegmentation (though this has other meanings inother fields)

• In clustering there are no pre-defined classes. A

similarity measure is used to group records. The usermust attach meaning to the clusters formed

• Clustering often precedes some other data miningtask, for example:

 –  once customers are separated into clusters, a promotionmight be carried out based on market basket analysis of the resulting cluster


Page 19: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2 19/19

Deviation Detection• Records whose attributes deviate from the norm

by significant amounts are also called outliers• Application areas include:

 –  fraud detection


 –  tracing defects

• Visualization techniques and statisticaltechniques are useful in finding outliers

• A cluster which contains only a few records mayin fact represent outliers