tirgul #1: gentle startu.cs.biu.ac.il/~jkeshet/teaching/iml2015/iml2015_tirgul1.pdf · search...
TRANSCRIPT
Tirgul #1: Gentle Start
Introduction Machine LearningBar Ilan University (89-511)
Administration
● 70% Final Exam
● 30% Assignments
● Theoretical and Practical Assignments:
○ Python (prerequisite)
○ Appear in the final exam in some
form (exact, similar, conclusions)
What is Machine Learning?
“[Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed.”
Arthur Samuel, 1959
“A computer program is said to learn from experience E with respect to some task T and some performance measure P, if
its performance on T, as measured by P, improves with experience E.”
Tom Mitchell, 1997
Traffic Pattern at a busy intersectionTask T
Experience E
Future Traffic Pattern
Performance P
Machine Learning in Real Life
Key role player in a wide range of critical applications and tasks
Recommendation System
Recommend a new series\movie based on past selection*create a profile
Recommendation System
Recommend a new item based on your profile
Search Engine
Given keywords propose a ranked by relevancy links
History
Image Recognition
Recognize faces in pictures based on tagged images
Is this Martha?
Machine Learning Problems
● Is this cancer?● What movie should I watch
next?● Who is this?● What did you say?● Should I invest in this stock?● Is it going to rain tomorrow?● etc.
complex problems that cannot be solved by numerical means alone
ML task type distinction:Supervised vs Unsupervised
Supervised vs Unsupervised
Supervised Learning
The program is “trained” on a predefined set of “training examples”, which then facilitate its ability to reach accurate conclusion when given new data.
Unsupervised Learning
The program is given a bunch of data and must find patterns and relationships therein.
Supervised Learning
Supervised Learning
● Goal: Develop a finely tuned predictor function (a.k.a “hypothesis”).
● “Learning” consists of using sophisticated mathematical algorithms to optimize this function.
● Given input data, it will accurately predict the desired value.
ŷx
Supervised Learning
The process flow
1. Train\Tweak the predictor according to given data set○ training examples
○ (x,y), where x is an instance and y is its target value
2. Test the predictor on new data○ use it in the real world
○ evaluate performance
ŷ
TRAIN MODETraining and modifying the predictor function according to a given data set
(i.e. training set) Goal: well trained weight vector on target distribution
(not specific on training set)
x y
Suffer the loss (cost function)Correct weight vector
Cellphone representation
Satis
fact
ion
Rat
e
EPOCHTRAIN MODE
● A single pass on the entire* dataset.
● In most cases a single pass\epoch is not enough for suitable training (\weight vector converges)
Cost Function (Loss)
● Measures the difference between two possible values from the label domain
● ŷ and y - for correction and performance evaluation
● ŷ ≠y - error rate● Depends on the task and the
framework
TRAIN MODEBatch Learning
for each epoch:
for i = 1 to |Training Set|:
1. pick a pair (x,y)∈ (X,Y)
2. predict ŷ using hypothesis
3. compare performance ŷ vs. y
Update hypothesis accordingly
Return hypothesis
Iterative Learning
for each epoch:
for i = 1 to |Training Set|:
1. pick a pair (x,y)∈ (X,Y)
2. predict ŷ using hypothesis
3. compare performance ŷ vs. y
4. update w accordingly
Return averaged version of hypothesis
TRAIN MODEBatch Learning
● The hypothesis stays constant while computing the error associated with each sample in the input
● More efficient in terms of # of computations.
Iterative Learning
● Constantly updating the hypothesis, its error calculation uses different weights for each input sample.
● More practical in case the big data sets.
both converge to the same minimum (explained in future lessons)
ŷ
TEST MODEOnce the training process is done - our trained predictor model (i.e hypothesis) is ready for action! Evaluate the performance
x y
Performance EvaluationAccuracy\Error rate
TEST MODE
for i = 1 to |Test Set|:1. pick the pair (xⅰ,yⅰ)∈ (X,Y)2. predict ŷ using the hypothesis3. compare performance ŷ vs. yⅰ
return average performance (error rate)
Online Learning
for t = 1 to T:1. pick the pair (xⅰ,yⅰ)∈ (X,Y)2. predict ŷ using the hypothesis3. compare performance ŷ vs. yⅰ4. update hypothesis
● Always updating hypothesis● Performance evaluated according to current hypothesis
Online Learning
Pivot Example:Smartphone
satisfaction ratepredictor Goal: predict satisfaction
rate for a given phone
#1 #2? Prediction ŷ ∈ Y
#1 Feature Extraction
● Learning algorithms input exist in numerical*
domains.
● Features → Numerical representation (e.g. vector).○ Too many → harder to learn (noise, large dimension)○ Too less → missing critical properties (can’t learn!)
● X∈ Ꭱͩ● Goal: representing distinguishing characteristics
○ More: Luck\Art\Magic. Less: Science
● Color● Weight● Screen Size● Camera quality● Apps● Touch Screen● Scrolling button● Keyboard
● Manufacturer● Price● # of buttons● USB Connection● LTE\GSM● GPS● Size● Radio
What To Choose???
Possible Features
<Touch-Screen, Weight, Price, Color> Satisfaction Rate
<1, 0.86, 300, 1> 7.3
touch:yes/no 1 - Black, 2 - White, 3 - Orange, etc.
<0, 5.32, 20, 4> 1.2
Label Domain
<Comedy rate, Drama Rate, Kevin Spacey in it , ...>
<2, 5, 1, ...>
<4, 4, 0, ...>
#1 F
eatu
re*
Satisfaction Rate
#1 #2? Prediction ŷ
#2 Learning Algorithm
● Goal: finding appropriate weights for the features○ What’s more\less important?
● Optimization* problem○ find the setup that maximizes the prediction accuracy
● Output: weight vector* w○ Also known as the hypothesis h(x)
● Inference: w·x = score○ Use the score according to the task
Goal: weight vector (w)
● Inference score: w·x ∈ Ꭱ● Dot product between w and x needs to be defined
○ w: vector
○ wᵀ·x instead of w·x
<Touch-Screen, Weight, Price, Color>
Weighting Features
W <w1, w2, w3, w4>
coefficient vector: weighting the importance of each feature○ Ignoring useless\noisy features (very small weight)○ Focusing on distinguishing features (|α| large)
w1Touch-Screen + w2Weight + w3Price + w4Color = SCORE
#1 Feature*
Satis
fact
ion
Rat
e
Perfect Predator?
Supervised Problems
● Regression machine learning systems: Systems where the value being predicted falls somewhere on a continuous spectrum.○ How much? \ How Many?
○ w·x = SCORE → Target Output
● Classification machine learning systems: Systems where we seek a yes-or-no/classes prediction○ w·x = SCORE → Prediction class
#1 #2? Prediction ŷ
Course Focus
*From this point on assume the data represented as a numerical vector
Unsupervised Learning
Unsupervised Learning
● Task: find relationships within data● No training examples used in this process.● Given data set and tasked with finding patterns and
correlations therein.● Identifying close-knit groups of friends in social network
Active Learning
Active Learning
The learning algorithm is allowed to choose the data from which it learns—to be “curious,” if you will—it will perform better with less training.
Any supervised learning system to perform well, it must often be trained on hundreds (even thousands) of labeled instances
Sometimes these labels come at little or no cost, such as the the “spam” flag you mark on unwanted email messages, or the five-star rating you might give to films on a social networking website.
Active Learning
Many sophisticated supervised learning tasks, labeled instances are very difficult, time-consuming, or expensive to obtain (e.g. Speech recognition).
Active learning systems attempt to overcome the labeling bottleneck by asking queries in the form of unlabeled instances to be labeled by an oracle.
The active learner aims to achieve high accuracy using as few labeled instances as possible, thereby minimizing the cost of obtaining labeled data.
Questions?