perceptrons and linear classifiers william cohen 2-4-2008

31
Perceptrons and Linear Classifiers William Cohen 2-4-2008

Upload: estella-hampton

Post on 16-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Perceptrons and Linear Classifiers William Cohen 2-4-2008

Perceptrons and Linear Classifiers

William Cohen

2-4-2008

Page 2: Perceptrons and Linear Classifiers William Cohen 2-4-2008

Announcement: no office hours for William this Friday 2/8

Page 3: Perceptrons and Linear Classifiers William Cohen 2-4-2008

Dave Touretzky’s Gallery of CSS Descramblers

Page 4: Perceptrons and Linear Classifiers William Cohen 2-4-2008

Linear Classifiers

• Let’s simplify life by assuming:– Every instance is a vector of real numbers, x=(x1,…,xn).

(Notation: boldface x is a vector.)– There are only two classes, y=(+1) and y=(-1)

• A linear classifier is vector w of the same dimension as x that is used to make this prediction:

)(sign)...sign(ˆ 2211 xw nnxwxwxwy

0if1

0 if1)(sign

xx

Page 5: Perceptrons and Linear Classifiers William Cohen 2-4-2008

w

-W

Visually, x · w is the distance you get if you “project x onto w”

X1

x2X1

. w

X2 . w

The line perpendicular to w divides the vectors classified as positive from the vectors classified as negative.

In 3d: lineplaneIn 4d: planehyperplane…

Page 6: Perceptrons and Linear Classifiers William Cohen 2-4-2008

w

-W

)(sign)...sign(ˆ 2211 xw nnxwxwxwy

Wolfram MathWorld

Mediaboost.comGeocities.com/bharatvarsha1947

Page 7: Perceptrons and Linear Classifiers William Cohen 2-4-2008

w

-W

Notice that the separating hyperplane goes through the origin…if we don’t want this we can preprocess our examples:

nxxx ,...,, 21x

nxxx ,...,,,1 21x

)(sign)...sign(ˆ 2211 xw nnxwxwxwy

)(sign)...1sign(ˆ 22110 xw nnxwxwxwwy

Page 8: Perceptrons and Linear Classifiers William Cohen 2-4-2008

What have we given up?,...,,,,,,

,....,

,,,,,,

1

cooltempmildtemphottemprainoutlookovercastoutlooksunnyoutlook

n

xxxxxx

xx

-1

+1

1,0 0,1, ,1,0,0 ,0,1,0

,...,0,1,07 ,,,

rainoutlookovercastoutlooksunnyoutlook xxxD

Outlook overcast Humidity normal

Page 9: Perceptrons and Linear Classifiers William Cohen 2-4-2008

What have we given up?

• Not much!– Practically, it’s a little harder to understand a

particular example (or classifier)– Practically, it’s a little harder to debug

• You can still express the same information• You can analyze things mathematically much

more easily

Page 10: Perceptrons and Linear Classifiers William Cohen 2-4-2008

Naïve Bayes as a Linear Classifier

Consider Naïve Bayes with two classes (+1, -1) and binary features (0,1).

Page 11: Perceptrons and Linear Classifiers William Cohen 2-4-2008

Naïve Bayes as a Linear Classifier

Page 12: Perceptrons and Linear Classifiers William Cohen 2-4-2008

Naïve Bayes as a Linear Classifier

“log odds”

Page 13: Perceptrons and Linear Classifiers William Cohen 2-4-2008

Naïve Bayes as a Linear Classifierpi

qi

Page 14: Perceptrons and Linear Classifiers William Cohen 2-4-2008

Naïve Bayes as a Linear Classifier

Page 15: Perceptrons and Linear Classifiers William Cohen 2-4-2008

Naïve Bayes as a Linear Classifier

• Summary: – NB is linear classifier

– Weights wi have a closed form• which is fairly simple, expressed in log-odds

)(signˆ xwy

Proceedings of ECML-98,

10th European Conference on Machine Learning

Page 16: Perceptrons and Linear Classifiers William Cohen 2-4-2008

An Even Older Linear Classifier

• 1957: The perceptron algorithm: Rosenblatt – WP: “A handsome bachelor, he drove a classic MGA sports car and was often seen with his

cat named Tobermory. He enjoyed mixing with undergraduates, and for several years taught an interdisciplinary undergraduate honors course entitled "Theory of Brain Mechanisms" that drew students equally from Cornell's Engineering and Liberal Arts colleges…this course was a melange of ideas .. experimental brain surgery on epileptic patients while conscious, experiments on .. the visual cortex of cats, ... analog and digital electronic circuits that modeled various details of neuronal behavior (i.e. the perceptron itself, as a machine).”

– Built on work of Hebbs (1949); also developed by Widrow-Hoff (1960)

• 1960: Perceptron Mark 1 Computer – hardware implementation

Page 17: Perceptrons and Linear Classifiers William Cohen 2-4-2008

Bell Labs TM 59-1142-11– Datamation 1961 – April 1 1984 Special Edition of CACM

Page 18: Perceptrons and Linear Classifiers William Cohen 2-4-2008

An Even Older Linear Classifier

• 1957: The perceptron algorithm: Rosenblatt – WP: “A handsome bachelor, he drove a classic MGA sports car and was often seen with his

cat named Tobermory. He enjoyed mixing with undergraduates, and for several years taught an interdisciplinary undergraduate honors course entitled "Theory of Brain Mechanisms" that drew students equally from Cornell's Engineering and Liberal Arts colleges…this course was a melange of ideas .. experimental brain surgery on epileptic patients while conscious, experiments on .. the visual cortex of cats, ... analog and digital electronic circuits that modeled various details of neuronal behavior (i.e. the perceptron itself, as a machine).”

– Built on work of Hebbs (1949); also developed by Widrow-Hoff (1960)

• 1960: Perceptron Mark 1 Computer – hardware implementation• 1969: Minksky & Papert book shows perceptrons limited to linearly

separable data, and Rosenblatt dies in boating accident• 1970’s: learning methods for two-layer neural networks• Mid-late 1980’s (Littlestone & Warmuth): mistake-bounded learning

& analysis of Winnow method; early-mid 1990’s, analyses of perceptron/Widrow-Hoff

Page 19: Perceptrons and Linear Classifiers William Cohen 2-4-2008

Experimental evaluation of Perceptron vs WH and Experts (Winnow-like methods) in SIGIR-1996 (Lewis,

Schapire, Callan, Papka), and (Cohen & Singer)

Freund & Schapire, 1998-1999 showed “kernel trick” and averaging/voting worked

Page 20: Perceptrons and Linear Classifiers William Cohen 2-4-2008

The voted perceptron

A Binstance xi Compute: yi = sign(vk . xi )

^

yi

^

yi

If mistake: vk+1 = vk + yi xi

Page 21: Perceptrons and Linear Classifiers William Cohen 2-4-2008

u

-u

u

-u

+x1v1

(1) A target u (2) The guess v1 after one positive example.

Page 22: Perceptrons and Linear Classifiers William Cohen 2-4-2008

u

-u

u

-u

v1

+x2

v2

+x1v1

-x2

v2

(3a) The guess v2 after the two positive examples: v2=v1+x2

(3b) The guess v2 after the one positive and one negative example: v2=v1-x2

I want to show two things:

1. The v’s get closer and closer to u: v.u increases with each mistake

2. The v’s do not get too large: v.v grows slowly

Page 23: Perceptrons and Linear Classifiers William Cohen 2-4-2008

u

-u

u

-u

v1

+x2

v2

+x1v1

-x2

v2

(3a) The guess v2 after the two positive examples: v2=v1+x2

(3b) The guess v2 after the one positive and one negative example: v2=v1-x2

> γ

Page 24: Perceptrons and Linear Classifiers William Cohen 2-4-2008

u

-u

u

-u

v1

+x2

v2

+x1v1

-x2

v2

Page 25: Perceptrons and Linear Classifiers William Cohen 2-4-2008
Page 26: Perceptrons and Linear Classifiers William Cohen 2-4-2008

On-line to batch learning

1. Pick a vk at random according to mk/m, the fraction of examples it was used for.

2. Predict using the vk you just picked.

3. (Actually, use some sort of deterministic approximation to this).

Page 27: Perceptrons and Linear Classifiers William Cohen 2-4-2008

The voted perceptron

Page 28: Perceptrons and Linear Classifiers William Cohen 2-4-2008

Some more commentsPerceptrons are like support vector machines (SVMs)

1. SVMs search for something that looks like u: i.e., a vector w where ||w|| is small and the margin for every example is large

2. You can use “the kernel trick” with perceptrons• Replace x.w with (x.w+1)d

Page 29: Perceptrons and Linear Classifiers William Cohen 2-4-2008

Experimental Results

Page 30: Perceptrons and Linear Classifiers William Cohen 2-4-2008

Task: classifying hand-written digits for the post office

Page 31: Perceptrons and Linear Classifiers William Cohen 2-4-2008

More Experimental Results (Linear kernel, one pass over the data)