machine learning - school of computingzhe/pdf/lec-7-linear-classifiers... · machine learning...

28
1 Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar

Upload: others

Post on 02-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

1

Linear Classifiers: Expressiveness

MachineLearningFall2017

SupervisedLearning:TheSetup

1

Machine LearningSpring 2018

The slides are mainly from Vivek Srikumar

Page 2: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Lecture outline

• Linear classifiers: Introduction

• What functions do linear classifiers express?

• Least Squares Method for Regression

2

Page 3: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Where are we?

• Linear classifiers: Introduction

• What functions do linear classifiers express?– Conjunctions and disjunctions– m-of-n functions– Not all functions are linearly separable– Feature space transformations– Exercises

• Least Squares Method for Regression

3

Page 4: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Which Boolean functions can linear classifiers represent?

• Linear classifiers are an expressive hypothesis class

• Many Boolean functions are linearly separable– Not all though– Recall: Decision trees can represent any Boolean function

4

Page 5: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Conjunctions and disjunctions

y = x1 ∧ x2 ∧ x3

5

x1 x2 x3 y0 0 0 00 0 1 00 1 0 00 1 1 01 0 0 01 0 1 01 1 0 01 1 1 1

Page 6: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Conjunctions and disjunctions

y = x1 ∧ x2 ∧ x3 is equivalent to “y = 1 if x1 + x2 + x3 ≥ 3”

6

x1 x2 x3 y0 0 0 00 0 1 00 1 0 00 1 1 01 0 0 01 0 1 01 1 0 01 1 1 1

Page 7: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Conjunctions and disjunctions

y = x1 ∧ x2 ∧ x3 is equivalent to “y = 1 if x1 + x2 + x3 ≥ 3”

7

x1 x2 x3 y x1 + x2 + x3 – 3 sign0 0 0 0 -3 00 0 1 0 -2 00 1 0 0 -2 00 1 1 0 -1 01 0 0 0 -2 01 0 1 0 -1 01 1 0 0 -1 01 1 1 1 0 1

Page 8: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Conjunctions and disjunctions

y = x1 ∧ x2 ∧ x3 is equivalent to “y = 1 if x1 + x2 + x3 ≥ 3”

8

x1 x2 x3 y x1 + x2 + x3 – 3 sign0 0 0 0 -3 00 0 1 0 -2 00 1 0 0 -2 00 1 1 0 -1 01 0 0 0 -2 01 0 1 0 -1 01 1 0 0 -1 01 1 1 1 0 1

How about negations?

y = x1 ∧ x2 ∧ ¬x3

Page 9: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Conjunctions and disjunctions

y = x1 ∧ x2 ∧ x3 is equivalent to “y = 1 if x1 + x2 + x3 ≥ 3”

9

x1 x2 x3 y x1 + x2 + x3 – 3 sign0 0 0 0 -3 00 0 1 0 -2 00 1 0 0 -2 00 1 1 0 -1 01 0 0 0 -2 01 0 1 0 -1 01 1 0 0 -1 01 1 1 1 0 1

Negations are okay too.In general, use 1-x in the linear threshold unit if x is negated

y = x1 ∧ x2 ∧ ¬x3

is equivalent to

y’ = 1 if x1 + x2 - x3 ≥ 2

Page 10: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Conjunctions and disjunctions

y = x1 ∧ x2 ∧ x3 is equivalent to “y = 1 if x1 + x2 + x3 ≥ 3”

10

x1 x2 x3 y x1 + x2 + x3 – 3 sign0 0 0 0 -3 00 0 1 0 -2 00 1 0 0 -2 00 1 1 0 -1 01 0 0 0 -2 01 0 1 0 -1 01 1 0 0 -1 01 1 1 1 0 1

Exercise: What would the linear threshold function be if the conjunctions here were replaced with disjunctions?

Negations are okay too.In general, use 1-x in the linear threshold unit if x is negated

y = x1 ∧ x2 ∧ ¬x3

is equivalent to

y’ = 1 if x1 + x2 - x3 ≥ 2

Page 11: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Conjunctions and disjunctions

y = x1 ∧ x2 ∧ x3 is equivalent to “y = 1 if x1 + x2 + x3 ≥ 3”

11

x1 x2 x3 y x1 + x2 + x3 – 3 sign0 0 0 0 -3 00 0 1 0 -2 00 1 0 0 -2 00 1 1 0 -1 01 0 0 0 -2 01 0 1 0 -1 01 1 0 0 -1 01 1 1 1 0 1

Exercise: What would the linear threshold function be if the conjunctions here were replaced with disjunctions?

Negations are okay too.In general, use 1-x in the linear threshold unit if x is negated

y = x1 ∧ x2 ∧ ¬x3

is equivalent to

y’ = 1 if x1 + x2 - x3 ≥ 2

Questions?

Page 12: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

m-of-n functions

m-of-n rules• There is a fixed set of n variables• y = true if, and only if, at least m of them are true• All other variables are ignored

Suppose there are five Boolean variables: x1, x2, x3, x4, x5

What is a linear threshold unit that is equivalent to the classification rule “at least 2 of {x1, x2, x3}”?

12

Page 13: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

m-of-n functions

m-of-n rules• There is a fixed set of n variables• y = true if, and only if, at least m of them are true• All other variables are ignored

Suppose there are five Boolean variables: x1, x2, x3, x4, x5

What is a linear threshold unit that is equivalent to the classification rule “at least 2 of {x1, x2, x3}”?

x1 + x2 + x3 ≥ 2

13

Page 14: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

m-of-n functions

m-of-n rules• There is a fixed set of n variables• y = true if, and only if, at least m of them are true• All other variables are ignored

Suppose there are five Boolean variables: x1, x2, x3, x4, x5

What is a linear threshold unit that is equivalent to the classification rule “at least 2 of {x1, x2, x3}”?

x1 + x2 + x3 ≥ 2

14

Questions?

Page 15: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Parity is not linearly separable

15

++

+++++

+

--

-

-- -- --

---- --

--

x1

x2

++

+++++

+

Can’t draw a line to separate the two classes

Questions?

Not all functions are linearly separable

--

-

-- -- --

---- --

--

Page 16: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Not all functions are linearly separable

• XOR is not linear– y = (x1 ∨ x2) ∧ (¬x1 ∨ ¬x2)– Parity cannot be represented as a linear classifier

• f(x) = 1 if the number of 1’s is even

• Many non-trivial Boolean functions– y = (x1 ∧ x2) ∨ (x3 ∧ ¬x4)– The function is not linear in the four variables

16

Page 17: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Even these functions can be made linear

The trick: Change the representation

17

These points are not separable in 1-dimension by a line

What is a one-dimensional line, by the way?

Page 18: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

The blown up feature spaceThe trick: Use feature conjunctions

18

Transform points: Represent each point x in 2 dimensions by (x, x2)

Page 19: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

The blown up feature spaceThe trick: Use feature conjunctions

19

Transform points: Represent each point x in 2 dimensions by (x, x2)

Page 20: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

The blown up feature spaceThe trick: Use feature conjunctions

20

Transform points: Represent each point x in 2 dimensions by (x, x2)

Now the data is linearly separable in this space!

Page 21: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

The blown up feature spaceThe trick: Use feature conjunctions

21

Transform points: Represent each point x in 2 dimensions by (x, x2)

Now the data is linearly separable in this space!

-2

(-2, 4)

Page 22: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Exercise

How would you use the feature transformation idea to make XOR in two dimensions linearly separable in a new space?

To answer this question, you need to think about a function that maps examples from two dimensional space to a higher dimensional space.

22

Page 23: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Almost linearly separable data

23

++

+++++

+-

--

-

-- -

-

--

-

--- -- --

sgn(b +w1 x1 + w2x2)

x1

x2

Training data is almost separable, except for some noise

How much noise do we allow for?

Page 24: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Almost linearly separable data

24

++

+++++

+-

--

-

-- -

-

--

-

--- -- --

sgn(b +w1 x1 + w2x2)

x1

x2

b +w1 x1 + w2x2=0 Training data is almost separable, except for some noise

How much noise do we allow for?

Page 25: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Linear classifiers: An expressive hypothesis class

• Many functions are linear

• Often a good guess for a hypothesis space

• Some functions are not linear– The XOR function– Non-trivial Boolean functions

• But there are ways of making them linear in a higher dimensional feature space

25

Page 26: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Why is the bias term needed?

26

x1

x2

++

+++++

+

-- --

-- -- --

------

--

b +w1 x1 + w2x2=0

Page 27: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Why is the bias term needed?

27

x1

x2

++

+++++

+-- --

-- -- --

------

--

If b is zero, then we are restricting the learner only to hyperplanes that go through the origin

May not be expressive enough

Page 28: Machine Learning - School of Computingzhe/pdf/lec-7-linear-classifiers... · Machine Learning Spring 2018 The slides are mainly from VivekSrikumar. Lecture outline • Linear classifiers:

Why is the bias term needed?

28

x1

x2

++

+++++

+-- --

-- -- --

------

--

w1 x1 + w2x2=0

If b is zero, then we are restricting the learner only to hyperplanes that go through the origin

May not be expressive enough