computational learning theory pac iid vc dimension svm

Computational Learning Theory• PAC• IID• VC Dimension• SVM

Kunstmatige Intelligentie / RuG

Marius Bulacu

2

The Problem

• Why does learning work?

• How do we know that the learned hypothesis h is close to the target function f if we do not know what f is?

answer provided by

computational learning theory

3

The Answer

• Any hypothesis h that is consistent with a sufficiently large number of training examples is unlikely to be seriously wrong.

Therefore it must be:

Probably Approximately Correct

PAC

4

The Stationarity Assumption

• The training and test sets are drawn randomly from the same population of examples using the same probability distribution.

Therefore training and test data are

Independently and Identically Distributed

IID

“the future is like the past”

5

How many examples are needed?

Number of examples Probability that h and f disagree on an example

Probability of existence of a wrong hypothesis

consistent with all examples

)Hln(lnm 11

Size of hypothesis space

Sample complexity

6

Formal Derivation

H (the set of all possible hypothese)

f

HBAD (the set of “wrong” hypotheses)

1))x(f)x(h,x(P

))x(f)x(h,x(P

)Hln(lnm)(H

)(H)Hh(P

m

mBADBAD

11

1

1

7

What if hypothesis space is infinite?

Can’t use our result for finite H Need some other measure of complexity for H

– Vapnik-Chervonenkis dimension

12

SVM (1): Kernels

Complicated separation boundary

Simple separation boundary: Hyperplane

f1

f2

f1

f2

f3

Kernels Polynomial Radial basis Sigmoid

Implicit mapping to a higher dimensional space where linear separation is possible.

13

SVM (2): Max Margin

Support vectors

Max Margin

“Best” Separating Hyperplane

From all the possible separating hyperplanes, select the one that gives Max Margin.

Solution found by Quadratic Optimization – “Learning”.

f1

f2Good generalization

computational learning theory pac iid vc dimension svm

Documents

hemelrijk kootstra rugsvm

hemelrijk kootstra rugwhat

hemelrijk kootstra rughow

past2003 schomaker

correctpac2003 schomaker

wrong hypothesis consistent

learned hypothesis h

linear separation