data mining: classification & predication hosam al-samarraie, phd. centre for instructional...

Post on 13-Jan-2016

218 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Mining:Classification & Predication

Hosam Al-Samarraie, PhD.

Centre for Instructional Technology & Multimedia

Universiti Sains Malaysia

What Does Data Mining Do?

• Extract patterns from data– Pattern? A mathematical (numeric

and/or symbolic) relationship among data items.

• Types of patterns– Association– Classification & Prediction– Cluster (segmentation)

Knowledge Discovery

Steps in a Knowledge Discovery process

Supervised vs. Unsupervised Learning

• Supervised learning (classification)

– Supervision: The training of data (observations, constructs, variables, eye-movement parameters, etc.) indicating the class of the observations (out put, dependent variable, known class, etc.). = model to be tested.

• Unsupervised learning (clustering & association)n

– Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data

Classification vs. Prediction

Classification: predicts categorical class labelsclassifies data (constructs a model) based on the training set and the

values (class labels) in a classifying attribute and uses it in classifying new data

Prediction (Regression): Similar to classification but with identifying the unknown or missing

values

Classification

My DV

My IV

Classification: A Two-Step Process

• Model construction: describing a set of predetermined classes– Each case/instance is assumed to belong to a predefined

class, as determined by the class label attribute (DV)– The set of cases used for model construction name training

set

• Model usage: for classifying future or unknown objects– Estimate accuracy of the model

• The known label of test sample is compared with the classified result from the model

• Accuracy rate is the percentage of test set samples that are correctly classified by the model

Classification Process (1): Model Construction

TrainingData

ClassificationAlgorithms

IF Hosam= ‘Senior lecturer’OR years > 3THEN tenured = ‘yes’

Classifier(Model)

Classification Process (2): Use the Model in Prediction

Classifier

TestingData Unseen Data

(Anwer, Assoicate, 4)

Bonus?

10

Learning and using a model• Learning

– Learning algorithm takes instances of concept as input– Produces a structural description (model) as output

Input:conceptto learn

Learningalgorithm Model

Prediction Model takes new instance as input Outputs prediction

Input Model Prediction

Other Classification Techniques

Decision tree analysis, J48 (most popular)

Neural networksSupport vector machines (most

popular)Naïve Baye (most popular)

Classification by Decision Tree Induction

Decision tree A flow-chart-like tree structure Internal node denotes a test on an attributeBranch represents an outcome of the testLeaf nodes represent class labels or class distribution

Accuracy Measures

Most accuracy measures are derived from the classification matrix (also called the confusion matrix.) This matrix summarizes the correct and incorrect classifications that

a classifier produced for a certain dataset. Rows and columns of the confusion matrix correspond to the true

and predicted classes respectively.

13

ROC Curves

• Receiver operator characteristic

• Summarize & present performance of any binary classification model

• Models ability to distinguish between false & true positives

Cont….

• Receiver Operator Characteristic (ROC) curves are commonly used to show how the number of correctly classified positive examples varies with the number of incorrectly classified negative examples.

ROC vs Precision & Recall (PR)

Classification?

• I use classifier to identify the characteristics for each animal to be used later for prediction model testing.

Tail Hoof Rib Dewlap Stirrup Reins Twist Animal

yes Yes No No Yes Yes No Horse

yes Yes No No Yes Yes No Horse

no Yes No Yes No No Yes Sheep

yes No Yes No No No No Rabbit

yes No Yes No No No No Rabbit

no Yes No Yes No No Yes Sheep

yes Ye No No Yes Yes No Horse

Prediction?

• To have the characteristics but do not know to whom it belongs!!

Tail Hoof Rib Dewlap Stirrup Reins Twist Animal

yes Yes No No Yes Yes No ?

yes Yes No No Yes Yes No ?

no Yes No Yes No No Yes ?

yes No Yes No No No No ?

yes No Yes No No No No ?

no Yes No Yes No No Yes ?

yes Ye No No Yes Yes No ?

Summary

• Classification predicts class labels • Numeric prediction models continued-valued

functions

• Two steps of classification: • 1) Training • 2) Testing and using

• Now lets check it out using Weka

top related