introduction to machine learning and data...

49
Introduction to Machine Learning and Data Mining Advanced Information Systems and Business Analytics for Air Transportation M.Sc. Air Transport Management May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre

Upload: others

Post on 17-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Introduction to Machine Learning and Data Mining

Advanced Information Systems and Business Analytics for Air TransportationM.Sc. Air Transport Management

May 16-21, 2016

Slides prepared by Prof. N. Kemal Üre

Page 2: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

A Framework for Business Analytics

2

Page 3: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

What is Machine Learning?

Study of algorithms that can learn and make predictionsfrom data

3

ModelData Prediction

• Also referred to as predictive modeling or predictive analytics• Strong ties with statistics, computer science and optimization• A wide range of applications: spam filtering, optical character recognition

(OCR), search engines and computer vision

Page 4: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

What is Machine Learning?

• How is Machine Learning (ML) different than Data Mining and Statistics?

• Statistics– Sub-field of mathematics– Inference of probabilistic models– The main objective is understanding the underlying data generation

process

• Data Mining (DM)– Carried by a person, uses methods from statistics and ML– Usually works with massive datasets with problematics entries– Gain preliminary insight and make predictions

4

Page 5: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

ML/DM Process

Source: Kantarzdic 5

Page 6: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

ML/DM Process

Source: Kantarzdic 6

Page 7: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Types of Data

7

Page 8: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Data Preparation

• Transformations

– Normalization

• Decimal Scaling

• Min-max normalization

• Standard Deviation Normalization

– Smoothing

Source: Kantarzdic8

Page 9: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Data Preparation

Source: Kantarzdic9

• Missing Data

• Time Dependent Data

Page 10: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Data Preparation• Outliers

Source: Kantarzdic10

Page 11: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Primary ML/DM Problems

• Supervised Learning

– Data is labeled <x_i,y_i>

– Learn the association between x and y

• Unsupervised Learning

– Data is unlabeled, we only have x_i

– Learn the structure and patters in x

• Reinforcement Learning

– Learn how to `control` a dynamic system

11

Page 12: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Supervised Learning

• Classification

• Regression

12

Page 13: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Classification

• Predict the class of the input variable

• Function approximation approach y = f(x)• Probabilistic approach P(y|x)

Source: Murphy 2011 13

Page 14: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Classification Examples

Document Classification, Spam Filtering, Hand-written Digit Recognition

14

Page 15: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Classification Examples

15

Face Detection

Credit Risk Calculation

Page 16: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Classification for Delay Prediction

Source: Rebollo, Balakrishnan 2014 16

Page 17: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Regression• Classification with continuous variables• Curve fitting and model selection

17

Page 18: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Regression• Classification with continuous variables• Curve fitting and model selection

18

Page 19: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Regression• Classification with continuous variables• Curve fitting and model selection

19

Page 20: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Regression• Classification with continuous variables• Curve fitting and model selection

20

Page 21: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Regression• Classification with continuous variables• Curve fitting and model selection

21

Page 22: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Regression

Beware of the noise in the data!

22

Page 23: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Regression Examples

• Predict tomorrow’s stock market price given current market conditions and other possible side information.

• Predict the age of a viewer watching a given video on YouTube.

• Predict the location in 3d space of a robot arm end effector, given control signals (torques) sent to its various motors.

• Predict the amount of prostate specific antigen (PSA) in the body as a function of a number of different clinical measurements.

• Predict the temperature at any location inside a building using weather data, time, door sensors, etc.

Source: Murphy 2011 23

Page 24: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Regression for Predicting Ticket Prices

Source: Gini 2011 24

Page 25: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Unsupervised Learning

• Clustering

• Learning Graphs

• Matrix Completion

25

Page 26: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Clustering

• Segment the data into different groups

26

Page 27: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Clustering Examples

Astronomy Social Networks

27

Page 28: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Clustering for Delivery Network

28

Page 29: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Clustering for Delivery Network

29

Page 30: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Clustering for Delivery Network

30

Page 31: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Clustering for Delivery Network

31

Page 32: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Clustering for Delivery Network

32

Page 33: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Clustering for Delivery Network

33

Page 34: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Clustering for Delivery Network

34

Page 35: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Clustering for Delivery Network

35

Page 36: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

36

smart study

prepared fair

pass

p(smart)=.8 p(study)=.6

p(fair)=.9

p(prep|…) smart smart

study .9 .7

study .5 .1

p(pass|…)smart smart

prep prep prep prep

fair .9 .7 .7 .2

fair .1 .1 .1 .1

Query: What is the probability that a student is smart, given that they pass the exam?

Bayesian Networks

Page 37: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

37

Bayesian Networks

Visit to Asia

Smoking

Lung CancerTuberculosis

Abnormalityin Chest

Bronchitis

X-Ray Dyspnea

“Asia” network:

Page 38: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

BN Application Fare Value and Passenger Behavior

Source: Booz Allen38

What is the expected fare value for a specific passenger behavior?

Can predictive modeling be developed for reservation changes and no-show rates for individual passengers on individual itineraries?

Page 39: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Matrix Completion

Source: Murphy 2011 39

Page 40: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

MC for Image Recovery

Source: Murphy 2011 40

Page 41: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

MC for Product Recommendation

• Filtering: Given my purchase history, what is my next likely purchase?• Collaborative Filtering: Given the purchase history of customers similar to me,

what is my next likely purchase?

Source: Murphy 2011 41

Page 42: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Collaborative Filtering Challenges

• Data Sparsity

• Scalability

• Synonymy

• Gray Sheep

• Attacks

42

Page 43: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Beyond the User-Item Matrix

Source: Shi 2014 43

Page 44: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Beyond the User-Item Matrix

Source: Shi 2014 44

Page 45: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Product Recommendation System For Airlines

Source: Barth 2014 45

Page 46: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Reinforcement Learning

46

Page 47: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

Maze Exploration

Source: Geramifard 2011 47

Page 48: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

RL Application - Maintenance Optimization

• A machine/component degradation model

• Maintenance costs money but restores the machine to its original state

• If not maintained, the machine eventually breaks down

• What is the optimal state to repair the machine?

Source: Bertsekas 2006 48

Page 49: Introduction to Machine Learning and Data Miningaviation.itu.edu.tr/img/aviation/datafiles/Lecture... · May 16-21, 2016 Slides prepared by Prof. N. Kemal Üre. A Framework for Business

RL Application – Active Web Advertising

Silver 2013 49