introduction to data mining and machine learning techniques · pdf fileintroduction to data...

22
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1

Upload: dothuan

Post on 05-Feb-2018

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Introduction to Data Mining and MachineLearning TechniquesIza Moise, Evangelos Pournaras, Dirk Helbing

Iza Moise, Evangelos Pournaras, Dirk Helbing 1

Page 2: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Overview

Main principles of data miningDefinitionSteps of a data mining processSupervised vs. unsupervised data mining

Applications

Data mining functionalities

Iza Moise, Evangelos Pournaras, Dirk Helbing 2

Page 3: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Definition

Data mining

is the automated process of discovering interesting (non-trivial, pre-viously unknown, insightful and potentially useful) information orpatterns, as well as descriptive, understandable, and predictive mod-els from (large-scale) data.

Also known as:

• Knowledge discovery in databases (KDD)

• Data analysis

• Information harvesting

• Business intelligence

Iza Moise, Evangelos Pournaras, Dirk Helbing 3

Page 4: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Goals

X search consistent patterns and/or systemic relationshipsbetween data

X validate the findings by applying the detected patterns to newsubsets of data

X predict new findings on new datasets

Iza Moise, Evangelos Pournaras, Dirk Helbing 4

Page 5: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Data Mining is...

• k-means clustering

• decision trees

• neural networks

• Bayesian networks

Iza Moise, Evangelos Pournaras, Dirk Helbing 5

Page 6: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Data Mining is not...

• Data warehousing

• SQL / Ad Hoc Queries / Reporting

• Software Agents

• Online Analytical Processing (OLAP)

• Data Visualization

Iza Moise, Evangelos Pournaras, Dirk Helbing 6

Page 7: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Knowledge Discovery Process

Iza Moise, Evangelos Pournaras, Dirk Helbing 7

Page 8: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Steps of a data mining process

1. Exploration

2. Model building and validation

3. Deployment → prediction

Iza Moise, Evangelos Pournaras, Dirk Helbing 8

Page 9: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Exploration

• data preparation→ data cleaning, data transformation etc.

• elaborate exploratory analyses using a wide variety ofgraphical and statistical methods, depending on the nature ofthe analytic problem

• determine the general nature of models that can be taken intoaccount in the next stage

Iza Moise, Evangelos Pournaras, Dirk Helbing 9

Page 10: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Model building and validation, and Deployment

• Model building and validation:→ consider various models and choose the best one→ validate the model on existing data

• Deployment:→ apply the model to new data

Iza Moise, Evangelos Pournaras, Dirk Helbing 10

Page 11: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Model building and validation, and Deployment

• Model building and validation:→ consider various models and choose the best one→ validate the model on existing data

• Deployment:→ apply the model to new data

Iza Moise, Evangelos Pournaras, Dirk Helbing 10

Page 12: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Model building and validation, and Deployment

• Model building and validation:→ consider various models and choose the best one→ validate the model on existing data

• Deployment:→ apply the model to new data

Iza Moise, Evangelos Pournaras, Dirk Helbing 10

Page 13: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Two styles of data mining: Predictive andDescriptive

1. Predictive→ Predict the value of a specific attribute (target or dependent)based on the value of other attributes (explanatory orpredictors)

2. Descriptive→ Derive patterns that summarise the relationships amongdata points

Iza Moise, Evangelos Pournaras, Dirk Helbing 11

Page 14: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Supervised vs. Unsupervised

Supervised

• predictive or directed

• useful when you have a specifictarget value to predict aboutyour data

• Classification, Regression,Anomaly Detection

Unsupervised

• descriptive or undirected

• finds hidden structure andrelation within the data

• Clustering, Association, FeatureExtraction

Iza Moise, Evangelos Pournaras, Dirk Helbing 12

Page 15: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Supervised Data Mining

• a pre-specified target variable

• explanatory variables or predictors and ↔ one (or more)dependent variables or target

• goal: specify relationships between predictors and target

• training: the model is given many training data where the targetis known

• testing: the model is applied to data for which the target isunknown

Iza Moise, Evangelos Pournaras, Dirk Helbing 13

Page 16: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Unsupervised Data Mining

• determine the existence of classes or clusters in the data

• exploratory analysis

• all variable are treated in the same way

Iza Moise, Evangelos Pournaras, Dirk Helbing 14

Page 17: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Overview

Main principles of data miningDefinitionSteps of a data mining processSupervised vs. unsupervised data mining

Applications

Data mining functionalities

Iza Moise, Evangelos Pournaras, Dirk Helbing 15

Page 18: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Fielded Applications

• Web mining→ PageRank, Search Engine, Recommenders, Social Networks

• Screening Images

• Diagnosis• Marketing and Sales

→ Market basket analysis, Direct marketing

• Biology, biomedicine, astronomy, chemistry

Iza Moise, Evangelos Pournaras, Dirk Helbing 16

Page 19: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Beers and Diapers

Iza Moise, Evangelos Pournaras, Dirk Helbing 17

Page 20: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

Overview

Main principles of data miningDefinitionSteps of a data mining processSupervised vs. unsupervised data mining

Applications

Data mining functionalities

Iza Moise, Evangelos Pournaras, Dirk Helbing 18

Page 21: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

1. Supervised Data MiningX ClassificationX RegressionX Outlier detectionX Frequent pattern mining

2. Unsupervised Data MiningX ClusteringX Feature Extraction

→ definition

→ real use-cases

→ method

→ pros and cons

Iza Moise, Evangelos Pournaras, Dirk Helbing 19

Page 22: Introduction to Data Mining and Machine Learning Techniques · PDF fileIntroduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza

1. Supervised Data MiningX ClassificationX RegressionX Outlier detectionX Frequent pattern mining

2. Unsupervised Data MiningX ClusteringX Feature Extraction

→ definition

→ real use-cases

→ method

→ pros and cons

Iza Moise, Evangelos Pournaras, Dirk Helbing 19