tirgul #1: gentle startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · search...

47
Tirgul #1: Gentle Start Introduction Machine Learning Bar Ilan University (89-511)

Upload: others

Post on 29-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Tirgul #1: Gentle Start

Introduction Machine LearningBar Ilan University (89-511)

Page 2: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Administration

● 70% Final Exam

● 30% Assignments

● Theoretical and Practical Assignments:

○ Python (prerequisite)

○ Appear in the final exam in some

form (exact, similar, conclusions)

[email protected]

[email protected]

Page 3: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

What is Machine Learning?

Page 4: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

“[Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed.”

Arthur Samuel, 1959

“A computer program is said to learn from experience E with respect to some task T and some performance measure P, if

its performance on T, as measured by P, improves with experience E.”

Tom Mitchell, 1997

Page 5: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Traffic Pattern at a busy intersectionTask T

Experience E

Future Traffic Pattern

Performance P

Page 6: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Machine Learning in Real Life

Key role player in a wide range of critical applications and tasks

Page 7: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Recommendation System

Recommend a new series\movie based on past selection*create a profile

Page 8: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Recommendation System

Recommend a new item based on your profile

Page 9: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Search Engine

Given keywords propose a ranked by relevancy links

History

Page 10: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Image Recognition

Recognize faces in pictures based on tagged images

Is this Martha?

Page 11: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Machine Learning Problems

● Is this cancer?● What movie should I watch

next?● Who is this?● What did you say?● Should I invest in this stock?● Is it going to rain tomorrow?● etc.

complex problems that cannot be solved by numerical means alone

Page 12: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

ML task type distinction:Supervised vs Unsupervised

Page 13: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Supervised vs Unsupervised

Supervised Learning

The program is “trained” on a predefined set of “training examples”, which then facilitate its ability to reach accurate conclusion when given new data.

Unsupervised Learning

The program is given a bunch of data and must find patterns and relationships therein.

Page 14: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Supervised Learning

Page 15: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Supervised Learning

● Goal: Develop a finely tuned predictor function (a.k.a “hypothesis”).

● “Learning” consists of using sophisticated mathematical algorithms to optimize this function.

● Given input data, it will accurately predict the desired value.

ŷx

Page 16: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Supervised Learning

The process flow

1. Train\Tweak the predictor according to given data set○ training examples

○ (x,y), where x is an instance and y is its target value

2. Test the predictor on new data○ use it in the real world

○ evaluate performance

Page 17: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

ŷ

TRAIN MODETraining and modifying the predictor function according to a given data set

(i.e. training set) Goal: well trained weight vector on target distribution

(not specific on training set)

x y

Suffer the loss (cost function)Correct weight vector

Page 18: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Cellphone representation

Satis

fact

ion

Rat

e

Page 19: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

EPOCHTRAIN MODE

● A single pass on the entire* dataset.

● In most cases a single pass\epoch is not enough for suitable training (\weight vector converges)

Page 20: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Cost Function (Loss)

● Measures the difference between two possible values from the label domain

● ŷ and y - for correction and performance evaluation

● ŷ ≠y - error rate● Depends on the task and the

framework

Page 21: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

TRAIN MODEBatch Learning

for each epoch:

for i = 1 to |Training Set|:

1. pick a pair (x,y)∈ (X,Y)

2. predict ŷ using hypothesis

3. compare performance ŷ vs. y

Update hypothesis accordingly

Return hypothesis

Iterative Learning

for each epoch:

for i = 1 to |Training Set|:

1. pick a pair (x,y)∈ (X,Y)

2. predict ŷ using hypothesis

3. compare performance ŷ vs. y

4. update w accordingly

Return averaged version of hypothesis

Page 22: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

TRAIN MODEBatch Learning

● The hypothesis stays constant while computing the error associated with each sample in the input

● More efficient in terms of # of computations.

Iterative Learning

● Constantly updating the hypothesis, its error calculation uses different weights for each input sample.

● More practical in case the big data sets.

both converge to the same minimum (explained in future lessons)

Page 23: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

ŷ

TEST MODEOnce the training process is done - our trained predictor model (i.e hypothesis) is ready for action! Evaluate the performance

x y

Performance EvaluationAccuracy\Error rate

Page 24: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

TEST MODE

for i = 1 to |Test Set|:1. pick the pair (xⅰ,yⅰ)∈ (X,Y)2. predict ŷ using the hypothesis3. compare performance ŷ vs. yⅰ

return average performance (error rate)

Page 25: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Online Learning

for t = 1 to T:1. pick the pair (xⅰ,yⅰ)∈ (X,Y)2. predict ŷ using the hypothesis3. compare performance ŷ vs. yⅰ4. update hypothesis

● Always updating hypothesis● Performance evaluated according to current hypothesis

Page 26: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Online Learning

Page 27: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Pivot Example:Smartphone

satisfaction ratepredictor Goal: predict satisfaction

rate for a given phone

Page 28: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

#1 #2? Prediction ŷ ∈ Y

Page 29: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

#1 Feature Extraction

● Learning algorithms input exist in numerical*

domains.

● Features → Numerical representation (e.g. vector).○ Too many → harder to learn (noise, large dimension)○ Too less → missing critical properties (can’t learn!)

● X∈ Ꭱͩ● Goal: representing distinguishing characteristics

○ More: Luck\Art\Magic. Less: Science

Page 30: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

● Color● Weight● Screen Size● Camera quality● Apps● Touch Screen● Scrolling button● Keyboard

● Manufacturer● Price● # of buttons● USB Connection● LTE\GSM● GPS● Size● Radio

What To Choose???

Possible Features

Page 31: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

<Touch-Screen, Weight, Price, Color> Satisfaction Rate

<1, 0.86, 300, 1> 7.3

touch:yes/no 1 - Black, 2 - White, 3 - Orange, etc.

<0, 5.32, 20, 4> 1.2

Label Domain

Page 32: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

<Comedy rate, Drama Rate, Kevin Spacey in it , ...>

<2, 5, 1, ...>

<4, 4, 0, ...>

Page 33: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

#1 F

eatu

re*

Satisfaction Rate

Page 34: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

#1 #2? Prediction ŷ

Page 35: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

#2 Learning Algorithm

● Goal: finding appropriate weights for the features○ What’s more\less important?

● Optimization* problem○ find the setup that maximizes the prediction accuracy

● Output: weight vector* w○ Also known as the hypothesis h(x)

● Inference: w·x = score○ Use the score according to the task

Page 36: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Goal: weight vector (w)

● Inference score: w·x ∈ Ꭱ● Dot product between w and x needs to be defined

○ w: vector

○ wᵀ·x instead of w·x

Page 37: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

<Touch-Screen, Weight, Price, Color>

Weighting Features

W <w1, w2, w3, w4>

coefficient vector: weighting the importance of each feature○ Ignoring useless\noisy features (very small weight)○ Focusing on distinguishing features (|α| large)

w1Touch-Screen + w2Weight + w3Price + w4Color = SCORE

Page 38: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

#1 Feature*

Satis

fact

ion

Rat

e

Perfect Predator?

Page 39: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Supervised Problems

● Regression machine learning systems: Systems where the value being predicted falls somewhere on a continuous spectrum.○ How much? \ How Many?

○ w·x = SCORE → Target Output

● Classification machine learning systems: Systems where we seek a yes-or-no/classes prediction○ w·x = SCORE → Prediction class

Page 40: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

#1 #2? Prediction ŷ

Course Focus

*From this point on assume the data represented as a numerical vector

Page 41: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Unsupervised Learning

Page 42: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Unsupervised Learning

● Task: find relationships within data● No training examples used in this process.● Given data set and tasked with finding patterns and

correlations therein.● Identifying close-knit groups of friends in social network

Page 43: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition
Page 44: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Active Learning

Page 45: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Active Learning

The learning algorithm is allowed to choose the data from which it learns—to be “curious,” if you will—it will perform better with less training.

Any supervised learning system to perform well, it must often be trained on hundreds (even thousands) of labeled instances

Sometimes these labels come at little or no cost, such as the the “spam” flag you mark on unwanted email messages, or the five-star rating you might give to films on a social networking website.

Page 46: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Active Learning

Many sophisticated supervised learning tasks, labeled instances are very difficult, time-consuming, or expensive to obtain (e.g. Speech recognition).

Active learning systems attempt to overcome the labeling bottleneck by asking queries in the form of unlabeled instances to be labeled by an oracle.

The active learner aims to achieve high accuracy using as few labeled instances as possible, thereby minimizing the cost of obtaining labeled data.

Page 47: Tirgul #1: Gentle Startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · Search Engine Given keywords propose a ranked by relevancy links History. Image Recognition

Questions?