data mining joyeeta dutta-moscato july 10, 2013. wherever we have large amounts of data, we have the...

19
Data Mining Joyeeta Dutta-Moscato July 10, 2013

Upload: colleen-dickerson

Post on 27-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

Data Mining

Joyeeta Dutta-MoscatoJuly 10, 2013

Page 2: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

Wherever we have large amounts of data, we have the need for building systems capable of learning information from the data

– predictions in medicine– text and web page classification– speech recognition

Learning underlying patterns useful to– to predict the presence of a disease for future

patients,– describe the dependencies between diseases

andsymptoms

Data Mining

Data Mining focuses on the discovery of (previously) unknown properties from data, using techniques from Machine Learning.

Page 3: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

• 4 attributes / features• Each attribute has values

• 3 × 3 × 2 × 2 = 36 possible combinations• 14 combinations present in this example

Data

Page 4: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

If outlook = sunny and humidity = high then play = noIf outlook = rainy and windy = true then play = noIf outlook = overcast then play = yesIf humidity = normal then play = yesIf none of the above then play = yes

A set of rules to predict whether we will get to play could look like this:

A decision list

Data Prediction

Page 5: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

F = { <Outlook, Humidity, Wind, Temp> Play Tennis? }

Decision Tree Learning

The goal is to create a model that predicts the value of a target variable based on several input variables.

Page 6: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

Problem Setting• Set of possible instances X

Each instance x in X is a feature vector x = < x1, x2, ... xn>• Unknown target function f: XY

Y is discrete valued• Set of function hypotheses H = { h | h : X Y }

Each hypothesis h is a decision tree Input: • Training examples {<x(i),y(i)>} of unknown target function f Output • Hypothesis h ∈ H that best approximates target function f

Decision Tree Learning

Page 7: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

Supervised Learning

Given a set of training examples of the form:{(x1, y1), … (xn, yn)}

a learning algorithm seeks a function: g : X Y

Where X is the input space and Y is the output space. Example: - Classify the universe of music into ‘like’ & ‘dislike’ for one person - Training set: A list of songs that the person heard, and marked as ‘like’ or ‘dislike’ - Task: Infer a function of features (of these songs) to predict what other songs the person will like

Page 8: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

Supervised Learning

Given a model family, we are interested in finding the best model parameters, such that the misfit (measured by an error function) between the data and the model is minimized.

An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances.

Page 9: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

Supervised LearningConsiderations:

• The learning algorithm must generalize from the training data to unseen situations in a "reasonable" way: Avoid overfitting

• Bias-variance tradeoff

• Number of training examples versus model complexity

Page 10: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

Supervised LearningCommon methods of supervised learning:

• RegressionX discrete or continuous → Y continuous

Examples:– debt, equity, orders, sales → stock price– age, height, weight, race, VKORC1 genotype, CYP2C9

genotype → warfarin dose

• ClassificationX discrete or continuous → Y discreteExamples:- family history, history of head trauma, age, gender,

race,APOE status → Alzheimer’s disease

- arrangement of pixels in handwritten digit → “3”

Page 11: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

• Linear Regression

Fitting the data to the model

Object: Minimize mean square error

Supervised Learning

Page 12: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

Regression

Is a mean square error of 0 (i.e. no difference between prediction and target) mean this is the best model?

OverfittingReal test of ‘best model’ is performance on data it has not been trained on

Page 13: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

Regression

What does this mean about the relationship between x and y?

Page 14: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

• Linear classifier

Classification

• Logistic regressionHard threshold

Soft threshold

Uses the logistic function, which goes between 0 and 1

Page 15: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

• Support Vector machines

Other common methods in Supervised Learning

• Artificial Neural Networks (can also be unsupervised)

• K-nearest neighbor

• Graphical models, Bayesian models

More sophisticated algorithms are needed for data that are not linearly separable

Page 16: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

Unsupervised Learning

Learn relationships among the inputs, x1 , … xn .

No y is given.

Clustering – Group inputs based on some measure of

similarity- Common “first pass” exploratory data

mining technique

Page 17: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

Hierarchical ClusteringA method of cluster analysis which aims to partition into groups that are “close” to each other according to some distance metric.

Page 18: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

k-means ClusteringA method of cluster analysis which aims to partition the data into k clusters in which each observation belongs to the cluster with the nearest mean.

Page 19: Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information

Acknowledgments

Shyam Visweswaran, Dept. of Biomedical Informatics

Tom Mitchell, Dept. of Machine Learning, CMU

“Data Mining: Practical Machine Learning Tools and Techniques” Ian H. Witten, Eibe Frank, Mark A. Hall