m ore c lassifiers. a genda key concepts for all classifiers precision vs recall biased sample sets...

46
MORE CLASSIFIERS

Upload: diana-lee

Post on 13-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

MORE CLASSIFIERS

Page 2: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

AGENDA

Key concepts for all classifiers Precision vs recall Biased sample sets

Linear classifiers Intro to neural networks

Page 3: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

RECAP: DECISION BOUNDARIES

With continuous attributes, a decision boundary is the surface in example space that splits positive from negative examples

x1>=20x2

x1

F

x2>=10

T

F

F

T

x2>=15

T F

T

Page 4: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

4

BEYOND ERROR RATES

Page 5: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

BEYOND ERROR RATE Predicting security risk

Predicting “low risk” for a terrorist, is far worse than predicting “high risk” for an innocent bystander (but maybe not 5 million of them)

Searching for images Returning irrelevant images is

worse than omitting relevant ones

5

Page 6: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

BIASED SAMPLE SETS

Often there are orders of magnitude more negative examples than positive

E.g., all images of Kris on Facebook If I classify all images as “not Kris” I’ll have

>99.99% accuracy

Examples of Kris should count much more than non-Kris!

Page 7: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

FALSE POSITIVES

7x1

x2

True decision boundary Learned decision boundary

Page 8: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

FALSE POSITIVES

8x1

x2

New query

An example incorrectly predicted

to be positive

True decision boundary Learned decision boundary

Page 9: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

FALSE NEGATIVES

9x1

x2

New query

An example incorrectly predicted

to be negative

True decision boundary Learned decision boundary

Page 10: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

PRECISION VS. RECALL

Precision # of relevant documents retrieved / # of total

documents retrieved Recall

# of relevant documents retrieved / # of total relevant documents

Numbers between 0 and 1

10

Page 11: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

PRECISION VS. RECALL

Precision # of true positives / (# true positives + # false

positives) Recall

# of true positives / (# true positives + # false negatives)

A precise classifier is selective A classifier with high recall is inclusive

11

Page 12: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

REDUCING FALSE POSITIVE RATE

12x1

x2

True decision boundary Learned decision boundary

Page 13: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

REDUCING FALSE NEGATIVE RATE

13x1

x2

True decision boundary Learned decision boundary

Page 14: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

PRECISION-RECALL CURVES

14

Precision

Recall

Measure Precision vs Recall as the decision boundary is tuned

Perfect classifier

Actual performance

Page 15: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

PRECISION-RECALL CURVES

15

Precision

Recall

Measure Precision vs Recall as the decision boundary is tuned

Penalize false negatives

Penalize false positives

Equal weight

Page 16: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

PRECISION-RECALL CURVES

16

Precision

Recall

Measure Precision vs Recall as the decision boundary is tuned

Page 17: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

PRECISION-RECALL CURVES

17

Precision

Recall

Measure Precision vs Recall as the decision boundary is tuned

Better learningperformance

Page 18: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

OPTION 1: CLASSIFICATION THRESHOLDS Many learning algorithms (e.g., probabilistic

models, linear models) give real-valued output v(x) that needs thresholding for classification

v(x) > t => positive label given to xv(x) < t => negative label given to x

May want to tune threshold to get fewer false positives or false negatives

18

Page 19: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

OPTION 2: WEIGHTED DATASETS

Weighted datasets: attach a weight w to each example to indicate how important it is Instead of counting “# of errors”, count “sum of

weights of errors” Or construct a resampled dataset D’ where each

example is duplicated proportionally to its w As the relative weights of positive vs

negative examples is tuned from 0 to 1, the precision-recall curve is traced out

Page 20: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

LINEAR CLASSIFIERS : MOTIVATION

Decision tree produces axis-aligned decision boundaries

Can we accurately classify data like this?

x2

x1

Page 21: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

PLANE GEOMETRY

Any line in 2D can be expressed as the set of solutions (x,y) to the equation ax+by+c=0 (an implicit surface) ax+by+c > 0 is one side of the line ax+by+c < 0 is the other ax+by+c = 0 is the line itself

y

x

b

a

Page 22: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

PLANE GEOMETRY

In 3D, a plane can be expressed as the set of solutions (x,y,z) to the equation ax+by+cz+d=0 ax+by+cz+d > 0 is one side of the plane ax+by+cz+d < 0 is the other side ax+by+cz+d = 0 is the plane itself

a b

c

z

x

y

Page 23: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

LINEAR CLASSIFIER

In d dimensions, c0+c1*x1+…+cd*xd =0

is a hyperplane. Idea:

Use c0+c1*x1+…+cd*xd > 0 to denote positive classifications

Use c0+c1*x1+…+cd*xd < 0 to denote negative classifications

Page 24: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

24

PERCEPTRON

S gxi

x1

xn

ywi

y = f(x,w) = g(Si=1,…,n wi xi)

+ +

+

++ -

-

--

-x1

x2

w1 x1 + w2 x2 = 0

g(u)

u

Page 25: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

25

A SINGLE PERCEPTRON CAN LEARN

S gxi

x1

xn

ywi

A disjunction of boolean literals x1 x2 x3

Majority function

Page 26: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

26

A SINGLE PERCEPTRON CAN LEARN

S gxi

x1

xn

ywi

A disjunction of boolean literals x1 x2 x3

Majority function

XOR?

Page 27: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

27

PERCEPTRON LEARNING RULE

θ θ + x(i)(y(i)-g(θT x(i))) (g outputs either 0 or 1, y is either 0 or 1)

If output is correct, weights are unchanged If g is 0 but y is 1, then the value of g on

attribute i is increased If g is 1 but y is 0, then the value of g on

attribute i is decreased

Converges if data is linearly separable, but oscillates otherwise

Page 28: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

28

PERCEPTRON

S gxi

x1

xn

ywi

+ +

+ +

+ -

-

--

-

?

y = f(x,w) = g(Si=1,…,n wi xi)

g(u)

u

Page 29: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

29

UNIT (NEURON)

S gxi

x1

xn

ywi

y = g(Si=1,…,n wi xi)

g(u) = 1/[1 + exp(-au)]

Page 30: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

30

NEURAL NETWORK

Network of interconnected neurons

S gxi

x1

xn

ywi

S gxi

x1

xn

ywi

Acyclic (feed-forward) vs. recurrent networks

Page 31: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

31

TWO-LAYER FEED-FORWARD NEURAL NETWORK

Inputs Hiddenlayer

Outputlayer

w1j w2k

Page 32: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

32

NETWORKS WITH HIDDEN LAYERS

Can represent XORs, other nonlinear functions

Common neuron types: Soft perceptron (sigmoid), radial basis functions,

linear, … As the number of hidden units increase, so

does the network’s capacity to learn functions with more nonlinear features

How to train hidden layers?

Page 33: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

33

BACKPROPAGATION (PRINCIPLE)

Treat the problem as one of minimizing errors between the example label and the network output, given the example and network weights as input Error(xi,yi,w) = (yi – f(xi,w))2

Sum this error term over all examples E(w) = i Error(xi,yi,w) = i (yi – f(xi,w))2

Minimize errors using an optimization algorithm Stochastic gradient descent is typically used

Page 34: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

Gradient direction is orthogonal to the level sets (contours) of E,points in direction of steepest increase

Page 35: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

Gradient direction is orthogonal to the level sets (contours) of E,points in direction of steepest increase

Page 36: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

Gradient descent: iteratively move in direction

Page 37: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

Gradient descent: iteratively move in direction E

Page 38: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

Gradient descent: iteratively move in direction E

Page 39: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

Gradient descent: iteratively move in direction E

Page 40: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

Gradient descent: iteratively move in direction E

Page 41: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

Gradient descent: iteratively move in direction

Page 42: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

Gradient descent: iteratively move in direction

Page 43: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

STOCHASTIC GRADIENT DESCENT

For each example (xi,yi), take a gradient descent step to reduce the error for (xi,yi) only.

43

Page 44: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

STOCHASTIC GRADIENT DESCENT

Objective function values (measured over all examples) over time settle into local minimum

Step size must be reduced over time, e.g., O(1/t)

44

Page 45: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

NEURAL NETWORKS: PROS AND CONS

Pros Bioinspiration is nifty Can represent a wide variety of decision boundaries Complexity is easily tunable (number of hidden

nodes, topology) Easily extendable to regression tasks

Cons Haven’t gotten close to unlocking the power of the

human (or cat) brain Complex boundaries need lots of data Slow training Mostly lukewarm feelings in mainstream ML

(although the “deep learning” variant is en vogue now)

Page 46: M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

NEXT CLASS

Another guest lecture