csc 311: introduction to machine learning - lecture 1 ...lecture 1 - introduction murat a. erdogdu...

57
CSC 311: Introduction to Machine Learning Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54

Upload: others

Post on 13-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

CSC 311: Introduction to Machine LearningLecture 1 - Introduction

Murat A. Erdogdu & Richard Zemel

University of Toronto

Intro ML (UofT) CSC311-Lec1 1 / 54

Page 2: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

This course

Broad introduction to machine learningI First half: algorithms and principles for supervised learning

I nearest neighbors, decision trees, ensembles, linear regression,logistic regression, SVMs

I Unsupervised learning: PCA, K-means, mixture modelsI Basics of reinforcement learning

Coursework is aimed at advanced undergrads. We will usemultivariate calculus, probability, and linear algebra.

Intro ML (UofT) CSC311-Lec1 2 / 54

Page 3: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Course Information

Course Website: https://erdogdu.github.io/csc311/

Main source of information is the course webpage; check regularly! Wewill also use Quercus for announcements & grades.

Did you receive the announcement today?

We will use Piazza for discussions.

Sign up: https://piazza.com/utoronto.ca/fall2019/csc311/home

Your grade does not depend on your participation onPiazza. It’s just a good way for asking questions, discussing withyour instructor, TAs and your peers. We will only allow questionsthat are related to the course materials/assignments/exams.

Intro ML (UofT) CSC311-Lec1 3 / 54

Page 4: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Course Information

While cell phones and other electronics are not prohibited inlecture, talking, recording or taking pictures in class is strictlyprohibited without the consent of your instructor. Please askbefore doing!

http://www.illnessverification.utoronto.ca is the onlyacceptable form of direct medical documentation.

For accessibility services: If you require additional academicaccommodations, please contact UofT Accessibility Services assoon as possible, studentlife.utoronto.ca/as.

Intro ML (UofT) CSC311-Lec1 4 / 54

Page 5: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Course Information

Recommended readings will be given for each lecture. But thefollowing will be useful throughout the course:

Hastie, Tibshirani, and Friedman: “The Elements of StatisticalLearning”

Christopher Bishop: “Pattern Recognition and MachineLearning”, 2006.

Kevin Murphy: “Machine Learning: a Probabilistic Perspective”,2012.

David Mackay: “Information Theory, Inference, and LearningAlgorithms”, 2003.

Shai Shalev-Shwartz & Shai Ben-David: “Understanding MachineLearning: From Theory to Algorithms”, 2014.

There are lots of freely available, high-quality ML resources.

Intro ML (UofT) CSC311-Lec1 5 / 54

Page 6: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Requirements and Marking

4 assignmentsI Combination of pen & paper derivations and short programming

exercisesI Weights to be decided (expect 10% each), for a total of 40%

MidtermI Oct. 21, 6–7pm (tentative)I Worth 20% of course mark

Final ExamI Three hoursI Date and time TBAI Worth 40% of course mark

Intro ML (UofT) CSC311-Lec1 6 / 54

Page 7: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Final Exam

Everybody must take the final exam! No exceptions.

Time/location will be announced later.

Intro ML (UofT) CSC311-Lec1 7 / 54

Page 8: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

More on Assignments

Collaboration on the assignments is not allowed. Each student is responsible for his/herown work. Discussion of assignments should be limited to clarification of the handout itself,and should not involve any sharing of pseudocode or code or simulation results. Violation ofthis policy is grounds for a semester grade of F, in accordance with university regulations.

The schedule of assignments will be posted on the course webpage.

Assignments should be handed in by deadline; a late penalty of 10% per day will beassessed thereafter (up to 3 days, then submission is blocked).

Extensions will be granted only in special situations, and you will need a Student MedicalCertificate or a written request approved by the course coordinator at least one week beforethe due date.

Intro ML (UofT) CSC311-Lec1 8 / 54

Page 9: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Related Courses

More advanced ML courses such as csc413 (neural nets) andcsc412 (probabilistic graphical models) both build upon thematerial in this course.

If you’ve already taken an applied statistic course, there will besome overlap. Sorry.

This is the first time this course is listed only as an undergradcourse.

Intro ML (UofT) CSC311-Lec1 9 / 54

Page 10: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

What is learning?

”The activity or process of gaining knowledge or skill by studying,practicing, being taught, or experiencing something.”

Merriam Webster dictionary

“A computer program is said to learn from experience E with respectto some class of tasks T and performance measure P, if its performanceat tasks in T, as measured by P, improves with experience E.”

Tom Mitchell

Intro ML (UofT) CSC311-Lec1 10 / 54

Page 11: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

What is machine learning?

For many problems, it’s difficult to program the correct behaviorby hand

I recognizing people and objectsI understanding human speech

Machine learning approach: program an algorithm toautomatically learn from data, or from experience

Why might you want to use a learning algorithm?I hard to code up a solution by hand (e.g. vision, speech)I system needs to adapt to a changing environment (e.g. spam

detection)I want the system to perform better than the human programmersI privacy/fairness (e.g. ranking search results)

Intro ML (UofT) CSC311-Lec1 11 / 54

Page 12: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

What is machine learning?

It’s similar to statistics...I Both fields try to uncover patterns in dataI Both fields draw heavily on calculus, probability, and linear algebra,

and share many of the same core algorithms

But it’s not statistics!I Stats is more concerned with helping scientists and policymakers

draw good conclusions; ML is more concerned with buildingautonomous agents

I Stats puts more emphasis on interpretability and mathematicalrigor; ML puts more emphasis on predictive performance,scalability, and autonomy

Intro ML (UofT) CSC311-Lec1 12 / 54

Page 13: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Relations to AI

Nowadays, “machine learning” is often brought up with “artificialintelligence” (AI)

AI does not often imply a learning based systemI Symbolic reasoningI Rule based systemI Tree searchI etc.

Learning based system → learned based on the data → moreflexibility, good at solving pattern recognition problems.

Intro ML (UofT) CSC311-Lec1 13 / 54

Page 14: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Relations to human learning

Human learning is:I Very data efficientI An entire multitasking system (vision, language, motor control, etc.)I Takes at least a few years :)

For serving specific purposes, machine learning doesn’t have tolook like human learning in the end.

It may borrow ideas from biological systems (e.g. neuralnetworks).

It may perform better or worse than humans.

Intro ML (UofT) CSC311-Lec1 14 / 54

Page 15: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

What is machine learning?

Types of machine learningI Supervised learning: have labeled examples of the correct

behaviorI Reinforcement learning: learning system receives a reward

signal, tries to learn to maximize the reward signalI Unsupervised learning: no labeled examples – instead, looking

for interesting patterns in the data

Intro ML (UofT) CSC311-Lec1 15 / 54

Page 16: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

History of machine learning

1957 — Perceptron algorithm (implemented as a circuit!)

1959 — Arthur Samuel wrote a learning-based checkers programthat could defeat him

1969 — Minsky and Papert’s book Perceptrons (limitations oflinear models)

1980s — Some foundational ideasI Connectionist psychologists explored neural models of cognitionI 1984 — Leslie Valiant formalized the problem of learning as PAC

learningI 1988 — Backpropagation (re-)discovered by Geoffrey Hinton and

colleaguesI 1988 — Judea Pearl’s book Probabilistic Reasoning in Intelligent

Systems introduced Bayesian networks

Intro ML (UofT) CSC311-Lec1 16 / 54

Page 17: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

History of machine learning

1990s — the “AI Winter”, a time of pessimism and low funding

But looking back, the ’90s were also sort of a golden age for MLresearch

I Markov chain Monte CarloI variational inferenceI kernels and support vector machinesI boostingI convolutional networks

2000s — applied AI fields (vision, NLP, etc.) adopted ML

2010s — deep learningI 2010–2012 — neural nets smashed previous records in

speech-to-text and object recognitionI increasing adoption by the tech industryI 2016 — AlphaGo defeated the human Go champion

Intro ML (UofT) CSC311-Lec1 17 / 54

Page 18: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

History of machine learning

ML conferences selling out like Beyonce tickets.

Source: medium.com/syncedreview/

This year NeurIPS registration starts on Sep 6, 11 am EDT.

Intro ML (UofT) CSC311-Lec1 18 / 54

Page 19: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

History of machine learning

ML conferences selling out like Beyonce tickets.

Source: medium.com/syncedreview/

This year NeurIPS registration starts on Sep 6, 11 am EDT.

Intro ML (UofT) CSC311-Lec1 18 / 54

Page 20: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Computer vision: Object detection, semantic segmentation, poseestimation, and almost every other task is done with ML.

Instance segmentation - Link

Intro ML (UofT) CSC311-Lec1 19 / 54

Page 21: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Speech: Speech to text, personal assistants, speaker identification...

Intro ML (UofT) CSC311-Lec1 20 / 54

Page 22: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

NLP: Machine translation, sentiment analysis, topic modeling, spamfiltering.

Intro ML (UofT) CSC311-Lec1 21 / 54

Page 23: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Playing Games

DOTA2 - Link

Intro ML (UofT) CSC311-Lec1 22 / 54

Page 24: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

E-commerce & Recommender Systems : Amazon,netflix, ...

Intro ML (UofT) CSC311-Lec1 23 / 54

Page 25: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Why this class?

Why not jump straight to csc412/413, and learn neural nets first?

The principles you learn in this course will be essential to reallyunderstand neural nets.

I 4 weeks of csc413 were devoted to background material covered inthis course!

The techniques in this course are still the first things to try for anew ML problem.

I E.g., try logistic regression before building a deep neural net!

There’s a whole world of probabilistic graphical models.

Intro ML (UofT) CSC311-Lec1 24 / 54

Page 26: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Why this class?

2017 Kaggle survey of data science and ML practitioners: what datascience methods do you use at work?

Intro ML (UofT) CSC311-Lec1 25 / 54

Page 27: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

ML Workflow

ML workflow sketch:

1. Should I use ML on this problem?I Is there a pattern to detect?I Can I solve it analytically?I Do I have data?

2. Gather and organize data.I Preprocessing, cleaning, visualizing.

3. Establishing a baseline.

4. Choosing a model, loss, regularization, ...

5. Optimization (could be simple, could be a Phd...).

6. Hyperparameter search.

7. Analyze performance and mistakes, and iterate back to step 4 (or2).

Intro ML (UofT) CSC311-Lec1 26 / 54

Page 28: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Implementing machine learning systems

You will often need to derive an algorithm (with pencil andpaper), and then translate the math into code.

Array processing (NumPy)I vectorize computations (express them in terms of matrix/vector

operations) to exploit hardware efficiencyI This also makes your code cleaner and more readable!

Intro ML (UofT) CSC311-Lec1 27 / 54

Page 29: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Implementing machine learning systems

Neural net frameworks: PyTorch, TensorFlow, Theano, etc.I automatic differentiationI compiling computation graphsI libraries of algorithms and network primitivesI support for graphics processing units (GPUs)

Why take this class if these frameworks do so much for you?I So you know what to do if something goes wrong!I Debugging learning algorithms requires sophisticated detective

work, which requires understanding what goes on beneath the hood.I That’s why we derive things by hand in this class!

Intro ML (UofT) CSC311-Lec1 28 / 54

Page 30: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Preliminaries and kNN

Intro ML (UofT) CSC311-Lec1 29 / 54

Page 31: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Introduction

Today (and for the next 5 weeks) we’re focused on supervised learning.

This means we’re given a training set consisting of inputs andcorresponding labels, e.g.

Task Inputs Labelsobject recognition image object categoryimage captioning image caption

document classification text document categoryspeech-to-text audio waveform text

......

...

Intro ML (UofT) CSC311-Lec1 30 / 54

Page 32: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Input Vectors

What an image looks like to the computer:

[Image credit: Andrej Karpathy]

Intro ML (UofT) CSC311-Lec1 31 / 54

Page 33: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Input Vectors

Machine learning algorithms need to handle lots of types of data:images, text, audio waveforms, credit card transactions, etc.

Common strategy: represent the input as an input vector in Rd

I Representation = mapping to another space that’s easy tomanipulate

I Vectors are a great representation since we can do linear algebra!

Intro ML (UofT) CSC311-Lec1 32 / 54

Page 34: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Input Vectors

Can use raw pixels:

Can do much better if you compute a vector of meaningful features.

Intro ML (UofT) CSC311-Lec1 33 / 54

Page 35: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Input Vectors

Mathematically, our training set consists of a collection of pairs of aninput vector x ∈ Rd and its corresponding target, or label, t

I Regression: t is a real number (e.g. stock price)I Classification: t is an element of a discrete set {1, . . . , C}I These days, t is often a highly structured object (e.g. image)

Denote the training set {(x(1), t(1)), . . . , (x(N), t(N))}I Note: these superscripts have nothing to do with exponentiation!

Intro ML (UofT) CSC311-Lec1 34 / 54

Page 36: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Nearest Neighbors

Suppose we’re given a novel input vector x we’d like to classify.

The idea: find the nearest input vector to x in the training set and copyits label.

Can formalize “nearest” in terms of Euclidean distance

||x(a) − x(b)||2 =

√√√√ d∑j=1

(x(a)j − x

(b)j )2

Algorithm:

1. Find example (x∗, t∗) (from the stored training set) closest tox. That is:

x∗ = argminx(i)∈train. set

distance(x(i),x)

2. Output y = t∗

Note: we don’t need to compute the square root. Why?

Intro ML (UofT) CSC311-Lec1 35 / 54

Page 37: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Nearest Neighbors: Decision Boundaries

We can visualize the behavior in the classification setting using a Voronoidiagram.

Intro ML (UofT) CSC311-Lec1 36 / 54

Page 38: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Nearest Neighbors: Decision Boundaries

Decision boundary: the boundary between regions of input space assigned todifferent categories.

Intro ML (UofT) CSC311-Lec1 37 / 54

Page 39: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Nearest Neighbors: Decision Boundaries

Example: 2D decision boundary

Intro ML (UofT) CSC311-Lec1 38 / 54

Page 40: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Nearest Neighbors[Pic by Olga Veksler]

Nearest neighbors sensitive to noise or mis-labeled data (“class noise”).Solution?

Smooth by having k nearest neighbors vote

Algorithm (kNN):

1. Find k examples {x(i), t(i)} closest to the test instance x2. Classification output is majority class

y = argmaxt(z)

k∑r=1

δ(t(z), t(r))

δ(a, b) = 1 if a = b, 0 otw.

Intro ML (UofT) CSC311-Lec1 39 / 54

Page 41: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

k-Nearest Neighbors[Pic by Olga Veksler]

Nearest neighbors sensitive to noise or mis-labeled data (“class noise”).Solution?

Smooth by having k nearest neighbors vote

Algorithm (kNN):

1. Find k examples {x(i), t(i)} closest to the test instance x2. Classification output is majority class

y = argmaxt(z)

k∑r=1

δ(t(z), t(r))

δ(a, b) = 1 if a = b, 0 otw.

Intro ML (UofT) CSC311-Lec1 39 / 54

Page 42: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

k-Nearest Neighbors[Pic by Olga Veksler]

Nearest neighbors sensitive to noise or mis-labeled data (“class noise”).Solution?

Smooth by having k nearest neighbors vote

Algorithm (kNN):

1. Find k examples {x(i), t(i)} closest to the test instance x2. Classification output is majority class

y = argmaxt(z)

k∑r=1

δ(t(z), t(r))

δ(a, b) = 1 if a = b, 0 otw.Intro ML (UofT) CSC311-Lec1 39 / 54

Page 43: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

K-Nearest neighbors

k=1

[Image credit: ”The Elements of Statistical Learning”]

Intro ML (UofT) CSC311-Lec1 40 / 54

Page 44: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

K-Nearest neighbors

k=15

[Image credit: ”The Elements of Statistical Learning”]

Intro ML (UofT) CSC311-Lec1 41 / 54

Page 45: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

k-Nearest Neighbors

Tradeoffs in choosing k?

Small k

I Good at capturing fine-grained patternsI May overfit, i.e. be sensitive to random idiosyncrasies in the

training data

Large k

I Makes stable predictions by averaging over lots of examplesI May underfit, i.e. fail to capture important regularities

Rule of thumb: k <√n, where n is the number of training examples

Intro ML (UofT) CSC311-Lec1 42 / 54

Page 46: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

K-Nearest neighbors

We would like our algorithm to generalize to data it hasn’t seen before.

We can measure the generalization error (error rate on new examples)using a test set.

[Image credit: ”The Elements of Statistical Learning”]

Intro ML (UofT) CSC311-Lec1 43 / 54

Page 47: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Validation and Test Sets

k is an example of a hyperparameter, something we can’t fit as part ofthe learning algorithm itself

We can tune hyperparameters using a validation set:

The test set is used only at the very end, to measure the generalizationperformance of the final configuration.

Intro ML (UofT) CSC311-Lec1 44 / 54

Page 48: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Pitfalls: The Curse of Dimensionality

Low-dimensional visualizations are misleading! In high dimensions,“most” points are far apart.

If we want the nearest neighbor to be closer than ε, how many points dowe need to guarantee it?

The volume of a single ball of radius ε is O(εd)

The total volume of [0, 1]d is 1.

Therefore O(( 1ε )d)

balls are needed to cover the volume.

[Image credit: ”The Elements of Statistical Learning”]

Intro ML (UofT) CSC311-Lec1 45 / 54

Page 49: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Pitfalls: The Curse of Dimensionality

In high dimensions, “most” points are approximately the same distance.(Homework question coming up...)

Saving grace: some datasets (e.g. images) may have low intrinsicdimension, i.e. lie on or near a low-dimensional manifold. So nearestneighbors sometimes still works in high dimensions.

Intro ML (UofT) CSC311-Lec1 46 / 54

Page 50: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Pitfalls: Normalization

Nearest neighbors can be sensitive to the ranges of different features.

Often, the units are arbitrary:

Simple fix: normalize each dimension to be zero mean and unit variance.I.e., compute the mean µj and standard deviation σj , and take

xj =xj − µjσj

Caution: depending on the problem, the scale might be important!

Intro ML (UofT) CSC311-Lec1 47 / 54

Page 51: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Pitfalls: Computational Cost

Number of computations at training time: 0

Number of computations at test time, per query (naıve algorithm)

I Calculuate D-dimensional Euclidean distances with N data points:O(ND)

I Sort the distances: O(N logN)

This must be done for each query, which is very expensive by thestandards of a learning algorithm!

Need to store the entire dataset in memory!

Tons of work has gone into algorithms and data structures for efficientnearest neighbors with high dimensions and/or large datasets.

Intro ML (UofT) CSC311-Lec1 48 / 54

Page 52: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Example: Digit Classification

Decent performance when lots of data

[Slide credit: D. Claus]Intro ML (UofT) CSC311-Lec1 49 / 54

Page 53: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Example: Digit Classification

KNN can perform a lot better with a good similarity measure.

Example: shape contexts for object recognition. In order to achieveinvariance to image transformations, they tried to warp one image tomatch the other image.

I Distance measure: average distance between corresponding pointson warped images

Achieved 0.63% error on MNIST, compared with 3% for Euclidean KNN.

Competitive with conv nets at the time, but required careful engineering.

[Belongie, Malik, and Puzicha, 2002. Shape matching and object recognition using shapecontexts.]

Intro ML (UofT) CSC311-Lec1 50 / 54

Page 54: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Example: 80 Million Tiny Images

80 Million Tiny Images wasthe first extremely largeimage dataset. It consisted ofcolor images scaled down to32× 32.

With a large dataset, you canfind much better semanticmatches, and KNN can dosome surprising things.

Note: this required a carefullychosen similarity metric.

[Torralba, Fergus, and Freeman, 2007. 80 Million Tiny Images.]

Intro ML (UofT) CSC311-Lec1 51 / 54

Page 55: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Example: 80 Million Tiny Images

[Torralba, Fergus, and Freeman, 2007. 80 Million Tiny Images.]Intro ML (UofT) CSC311-Lec1 52 / 54

Page 56: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Conclusions

Simple algorithm that does all its work at test time — in a sense,no learning!

Can control the complexity by varying k

Suffers from the Curse of Dimensionality

Next time: decision trees, another approach to regression andclassification

Intro ML (UofT) CSC311-Lec1 53 / 54

Page 57: CSC 311: Introduction to Machine Learning - Lecture 1 ...Lecture 1 - Introduction Murat A. Erdogdu & Richard Zemel University of Toronto Intro ML (UofT) CSC311-Lec1 1 / 54. This course

Questions?

?

Intro ML (UofT) CSC311-Lec1 54 / 54