tirgul #1: gentle startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · search...

Tirgul #1: Gentle Start

Introduction Machine LearningBar Ilan University (89-511)

Administration

● 70% Final Exam

● 30% Assignments

● Theoretical and Practical Assignments:

○ Python (prerequisite)

○ Appear in the final exam in some

form (exact, similar, conclusions)

● [email protected]

● [email protected]

What is Machine Learning?

“[Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed.”

Arthur Samuel, 1959

“A computer program is said to learn from experience E with respect to some task T and some performance measure P, if

its performance on T, as measured by P, improves with experience E.”

Tom Mitchell, 1997

Traffic Pattern at a busy intersectionTask T

Experience E

Future Traffic Pattern

Performance P

Machine Learning in Real Life

Key role player in a wide range of critical applications and tasks

Recommendation System

Recommend a new series\movie based on past selection*create a profile

Recommendation System

Recommend a new item based on your profile

Search Engine

Given keywords propose a ranked by relevancy links

History

Image Recognition

Recognize faces in pictures based on tagged images

Is this Martha?

Machine Learning Problems

● Is this cancer?● What movie should I watch

next?● Who is this?● What did you say?● Should I invest in this stock?● Is it going to rain tomorrow?● etc.

complex problems that cannot be solved by numerical means alone

ML task type distinction:Supervised vs Unsupervised

Supervised vs Unsupervised

Supervised Learning

The program is “trained” on a predefined set of “training examples”, which then facilitate its ability to reach accurate conclusion when given new data.

Unsupervised Learning

The program is given a bunch of data and must find patterns and relationships therein.

Supervised Learning

Supervised Learning

● Goal: Develop a finely tuned predictor function (a.k.a “hypothesis”).

● “Learning” consists of using sophisticated mathematical algorithms to optimize this function.

● Given input data, it will accurately predict the desired value.

ŷx

Supervised Learning

The process flow

1. Train\Tweak the predictor according to given data set○ training examples

○ (x,y), where x is an instance and y is its target value

2. Test the predictor on new data○ use it in the real world

○ evaluate performance

ŷ

TRAIN MODETraining and modifying the predictor function according to a given data set

(i.e. training set) Goal: well trained weight vector on target distribution

(not specific on training set)

x y

Suffer the loss (cost function)Correct weight vector

Cellphone representation

Satis

fact

ion

Rat

e

EPOCHTRAIN MODE

● A single pass on the entire* dataset.

● In most cases a single pass\epoch is not enough for suitable training (\weight vector converges)

Cost Function (Loss)

● Measures the difference between two possible values from the label domain

● ŷ and y - for correction and performance evaluation

● ŷ ≠y - error rate● Depends on the task and the

framework

TRAIN MODEBatch Learning

for each epoch:

for i = 1 to |Training Set|:

1. pick a pair (x,y)∈ (X,Y)

2. predict ŷ using hypothesis

3. compare performance ŷ vs. y

Update hypothesis accordingly

Return hypothesis

Iterative Learning

for each epoch:

for i = 1 to |Training Set|:

1. pick a pair (x,y)∈ (X,Y)

2. predict ŷ using hypothesis

3. compare performance ŷ vs. y

4. update w accordingly

Return averaged version of hypothesis

TRAIN MODEBatch Learning

● The hypothesis stays constant while computing the error associated with each sample in the input

● More efficient in terms of # of computations.

Iterative Learning

● Constantly updating the hypothesis, its error calculation uses different weights for each input sample.

● More practical in case the big data sets.

both converge to the same minimum (explained in future lessons)

ŷ

TEST MODEOnce the training process is done - our trained predictor model (i.e hypothesis) is ready for action! Evaluate the performance

x y

Performance EvaluationAccuracy\Error rate

TEST MODE

for i = 1 to |Test Set|:1. pick the pair (xⅰ,yⅰ)∈ (X,Y)2. predict ŷ using the hypothesis3. compare performance ŷ vs. yⅰ

return average performance (error rate)

Online Learning

for t = 1 to T:1. pick the pair (xⅰ,yⅰ)∈ (X,Y)2. predict ŷ using the hypothesis3. compare performance ŷ vs. yⅰ4. update hypothesis

● Always updating hypothesis● Performance evaluated according to current hypothesis

Online Learning

Pivot Example:Smartphone

satisfaction ratepredictor Goal: predict satisfaction

rate for a given phone

#1 #2? Prediction ŷ ∈ Y

#1 Feature Extraction

● Learning algorithms input exist in numerical*

domains.

● Features → Numerical representation (e.g. vector).○ Too many → harder to learn (noise, large dimension)○ Too less → missing critical properties (can’t learn!)

● X∈ Ꭱͩ● Goal: representing distinguishing characteristics

○ More: Luck\Art\Magic. Less: Science

● Color● Weight● Screen Size● Camera quality● Apps● Touch Screen● Scrolling button● Keyboard

● Manufacturer● Price● # of buttons● USB Connection● LTE\GSM● GPS● Size● Radio

What To Choose???

Possible Features

<Touch-Screen, Weight, Price, Color> Satisfaction Rate

<1, 0.86, 300, 1> 7.3

touch:yes/no 1 - Black, 2 - White, 3 - Orange, etc.

<0, 5.32, 20, 4> 1.2

Label Domain

#1 F

eatu

re*

Satisfaction Rate

#1 #2? Prediction ŷ

#2 Learning Algorithm

● Goal: finding appropriate weights for the features○ What’s more\less important?

● Optimization* problem○ find the setup that maximizes the prediction accuracy

● Output: weight vector* w○ Also known as the hypothesis h(x)

● Inference: w·x = score○ Use the score according to the task

Goal: weight vector (w)

● Inference score: w·x ∈ Ꭱ● Dot product between w and x needs to be defined

○ w: vector

○ wᵀ·x instead of w·x

<Touch-Screen, Weight, Price, Color>

Weighting Features

W <w1, w2, w3, w4>

coefficient vector: weighting the importance of each feature○ Ignoring useless\noisy features (very small weight)○ Focusing on distinguishing features (|α| large)

w1Touch-Screen + w2Weight + w3Price + w4Color = SCORE

#1 Feature*

Satis

fact

ion

Rat

e

Perfect Predator?

Supervised Problems

● Regression machine learning systems: Systems where the value being predicted falls somewhere on a continuous spectrum.○ How much? \ How Many?

○ w·x = SCORE → Target Output

● Classification machine learning systems: Systems where we seek a yes-or-no/classes prediction○ w·x = SCORE → Prediction class

#1 #2? Prediction ŷ

Course Focus

*From this point on assume the data represented as a numerical vector


● Task: find relationships within data● No training examples used in this process.● Given data set and tasked with finding patterns and

correlations therein.● Identifying close-knit groups of friends in social network

Active Learning

Active Learning

The learning algorithm is allowed to choose the data from which it learns—to be “curious,” if you will—it will perform better with less training.

Any supervised learning system to perform well, it must often be trained on hundreds (even thousands) of labeled instances

Sometimes these labels come at little or no cost, such as the the “spam” flag you mark on unwanted email messages, or the five-star rating you might give to films on a social networking website.

Active Learning

Many sophisticated supervised learning tasks, labeled instances are very difficult, time-consuming, or expensive to obtain (e.g. Speech recognition).

Active learning systems attempt to overcome the labeling bottleneck by asking queries in the form of unlabeled instances to be labeled by an oracle.

The active learner aims to achieve high accuracy using as few labeled instances as possible, thereby minimizing the cost of obtaining labeled data.

Questions?

tirgul #1: gentle startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · search...

Documents