machine learning - disi, university of trentodisi.unitn.it/~agiordani/cmda/6-pac.pdf · bad...
TRANSCRIPT
![Page 1: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/1.jpg)
MACHINE LEARNINGMACHINE LEARNING
Probably Approximately Correct (PAC)
Learning
Probably Approximately Correct (PAC)
Learning
Alessandra Giordani
Department of Information Engineering and Computer Science
University of TrentoEmail: [email protected]
LearningLearning
![Page 2: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/2.jpg)
Objectives: defining a well defined
statistical framework
Objectives: defining a well defined
statistical framework
What can we learn and how can we decide if our
learning is effective?
Efficient learning with many parameters
Trade-off (generalization/and training set error)Trade-off (generalization/and training set error)
How to represent real world objects
![Page 3: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/3.jpg)
Objectives: defining a well defined
statistical framework
Objectives: defining a well defined
statistical framework
What can we learn and how can we decide if our
learning is effective?
Efficient learning with many parameters
Trade-off (generalization/and training set error)Trade-off (generalization/and training set error)
How to represent real world objects
![Page 4: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/4.jpg)
PAC Learning Definition (1)PAC Learning Definition (1)
Let c be the function (i.e. a concept) we want to learn
Let h be the learned concept and x an instance (e.g. a
person)
error(h) = Prob [c(x) ≠ h(x)] error(h) = Prob [c(x) ≠ h(x)]
It would be useful if we could find:
Pr(error(h) > ε ) < δ
Given a target error ε, the probability to make a larger
error is less δ
![Page 5: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/5.jpg)
Definizione di PAC Learning (2)Definizione di PAC Learning (2)
This methodology is called Probably Approximately
Correct Learning
The smaller ε and δ are the better the learning is
Problem:Problem:
Given ε and δ, determine the size m of the training-set.
Such size may be independent of the learning algorithm
Let us do it for a simple learning problem
![Page 6: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/6.jpg)
A simple learning problemA simple learning problem
Learning the concept of medium-built people from
examples:
Interesting features are: Height and Weight.
The training-set of examples has a cardinality of m.
(m people for who we know if they are medium-built people (m people for who we know if they are medium-built people size, their height and their size).
Find m to learn this concept well.
The adjective “well” can be expressed with probability
error.
![Page 7: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/7.jpg)
Graphical Representation of the target
learning problem
Graphical Representation of the target
learning problem
Weight-Max
c
h
Weight
Height
Weight-Min
Height-Min Height-Max
h
![Page 8: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/8.jpg)
Learning Algorithm and Learning Function
Class
Learning Algorithm and Learning Function
Class
1. If no positive examples of the concept are available
⇒ the learned concept is NULL
2. Else the concept is the smallest rectangular (parallel
to the axes) containing all positive examplesto the axes) containing all positive examples
![Page 9: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/9.jpg)
We don’t consider other complex hypothesesWe don’t consider other complex hypotheses
![Page 10: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/10.jpg)
We don’t consider other complex hypothesisWe don’t consider other complex hypothesis
![Page 11: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/11.jpg)
How good is our algorithm?How good is our algorithm?
An example x is misclassified if it falls between the
two rectangles.
Let ε be the measure of the area
⇒ The error probability (error) of h is ε
With which assumption?
1- ε
ε
c
h
![Page 12: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/12.jpg)
Proving PAC LearnabilityProving PAC Learnability
Given an error ε and a probability δ, how many
training examples m are needed to learn the concept?
We can find a bound to δ, i.e. the probability of
learning a function h with an error > ε.
For this purpose, let us compute the probability of
selecting a hypothesis h which:
correctly classifies m training examples and;
shows an error greater than ε.
This is a bad function
![Page 13: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/13.jpg)
Probability of Bad HypothesesProbability of Bad Hypotheses
Given x, P(h(x)=c(x)) < 1- ε
since the error of bad function is greater than ε
Given ε, m examples fall in the rectangle h with a
probability < (1-ε)m
The probability of choosing a bad hypothesis h is
< (1-ε)m ⋅ N
where N is the number of hypotheses with an error > ε.
![Page 14: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/14.jpg)
Upper-bound ComputationUpper-bound Computation
If we set a bound on the probability of bad hypotheses
N ⋅ (1-ε)m < δ
we would be done but we don’t know N
⇒ we have to find a bound, independent of the number ⇒ we have to find a bound, independent of the number
of bad hypothesis.
Let us divide our rectangle in four strip of area ε/4
![Page 15: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/15.jpg)
Initial ExampleInitial Example
Weight
Weight-Max
c
h
t
Height
Weight-Min
Height-Min Height-Max
h
![Page 16: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/16.jpg)
A bad hypothesis cannot intersect more than 3
strips at a time
A bad hypothesis cannot intersect more than 3
strips at a time
1- ε1- ε
Bad hypotheses with error >
ε are contained in those
having an error = ε
1- ε
To intersect 3 edges I can
increase the rectangle length
but I must decrease the
height to have an area ≤ 1- ε
![Page 17: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/17.jpg)
Upper-bound computation (2)Upper-bound computation (2)
A bad hypothesis has error > ε ⇒ it has an area < 1- ε
A rectangle of area < 1- ε cannot intersect 4 strips ⇒ if the
examples fall into all the 4 strips they cannot be part of the same
bad hypothesis.
A necessary condition to have a bad hypothesis is that all the mA necessary condition to have a bad hypothesis is that all the m
examples are at least outside of one strip.
In other words, when m examples are outside of one of the 4
strips we may have a bad hypothesis.
⇒ the probability of “outside at least one of the strips” >
probability of bad hypothesis.
![Page 18: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/18.jpg)
Logic viewLogic view
Bad Hypothesis ⇒ examples out of at least one strip
(viceversa is not true)
> 1- ε
A ⇒ B
P(A) ≤ P(B)
P(bad hyp.) ≤ P(out of one strip)
![Page 19: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/19.jpg)
Upper-bound computation (3)Upper-bound computation (3)
P(x out of the target strip) = (1- ε/4)
P(m points out of the target strip) = (1- ε/4)m
P(m points out of at least one strip) < 4⋅ (1- ε/4)m
⇒⇒ P(error(h) > ε) < 4⋅ (1- ε/4)m
![Page 20: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/20.jpg)
Expliciting mExpliciting m
-ln(1-y) = y +y2/2 + y3/3 +…
⇒ ln(1-y) = -y -y2/2 -y3/3 -… < -y
⇒ (1-y) < e(-y) it holds strictly for y > 0 as in our case
from m > ln(δ/4)/ln(1- ε/4)
⇒ m > ln(δ/4)/ln(e(-ε/4))
⇒ m > ln(δ/4)/(-ε/4) ⇒ m > ln(δ/4) ⋅(4/-ε)
⇒ m > ln((δ/4)-1)⋅(4/ε) ⇒⇒⇒⇒ m > (4/εεεε) ⋅⋅⋅⋅ ln(4/δδδδ)
![Page 21: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/21.jpg)
Expliciting mExpliciting m
Our upperbound must be lower than δ, i.e.
4 ⋅ (1- ε/4)m <δ
⇒ ln(1- ε/4)m < δ/4
⇒ m⋅ ln(1- ε/4) < ln(δ/4)
⇒ m > ln(δ/4) / ln(1- ε/4)
change “>” into “<”as ln(1- ε/4) < 0
![Page 22: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/22.jpg)
Numeric ExamplesNumeric Examples
ε | δ | m
==============
0.1 | 0.1 | 148
0.1 | 0.01 | 240
0.1 | 0.001 | 332
----------------------- -----------------------
0.01 | 0.1 | 1476
0.01 | 0.01 | 2397
0.01 | 0.001 | 3318
-----------------------
0.001 | 0.1 | 14756
0.001 | 0.01 | 23966
0.001 | 0.001 | 33176
================
![Page 23: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/23.jpg)
Formal PAC-Learning DefinitionFormal PAC-Learning Definition
Let f be the function we want to learn, f: X→I, f ∈ F
D is a probability distribution on X
used to draw training and test sets
h ∈ H,
h is the learned function and H the set of such function classh is the learned function and H the set of such function class
m is the training-set size
error(h) = Prob [f(x) ≠ h(x)]
F is a PAC learnable function class if there is a learning
algorithm such that for each f, for all distribution D over X and
for each 0 <ε, δ <1, produces h : P(error(h) > ε)< δ
![Page 24: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/24.jpg)
Lower Bound on training-set sizeLower Bound on training-set size
Let us reconsider the first bound that we found:
h is bad: error(h) > ε
P(f(x)=h(x)) for m examples is lower than (1- ε)m
Multiplying by the number of bad hypotheses we calculate
the probability of selecting a bad hypothesisthe probability of selecting a bad hypothesis
P(bad hypothesis) < N⋅ (1- ε)m <δ
P(bad hypothesis) < N⋅ (e-ε)m = N⋅ e-εm <δ
⇒ m >(1/ε) (ln(1/δ )+ln(N))
This is a general lower bound
![Page 25: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/25.jpg)
ExampleExample
Suppose we want to learn a boolean function in n
variable
The maximum number of different function are
⇒ m > (1/ ε ) (ln(1/δ )+ln( ))=n2
n22
⇒ m > (1/ ε ) (ln(1/δ )+ln( ))=
= (1/ ε ) (ln(1/δ )+2nln(2))
n22
![Page 26: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/26.jpg)
Some NumbersSome Numbers
n | epsilon | delta | m
===========================
5 | 0.1 | 0.1 |245
5 | 0.1 | 0.01 |268
5 | 0.01 | 0.1 |2450 5 | 0.01 | 0.1 |2450
5 | 0.01 | 0.01 |2680
---------------------------
10 | 0.1 | 0.1 |7123
10 | 0.1 | 0.01 |7146
10 | 0.01 | 0.1 |71230
10 | 0.01 | 0.01 |71460
========================== =
![Page 27: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/27.jpg)
Computational Learing TheoryComputational Learing Theory
![Page 28: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/28.jpg)
Sample ComplexitySample Complexity
![Page 29: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/29.jpg)
Sample ComplexitySample Complexity
![Page 30: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/30.jpg)
True Error of the HypotesisTrue Error of the Hypotesis
![Page 31: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/31.jpg)
![Page 32: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/32.jpg)
![Page 33: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/33.jpg)
![Page 34: MACHINE LEARNING - DISI, University of Trentodisi.unitn.it/~agiordani/CMDA/6-PAC.pdf · bad hypothesis. A necessary condition to have a bad hypothesis is that all the m examples are](https://reader033.vdocument.in/reader033/viewer/2022041720/5e4ddc2ce0df9614320c2840/html5/thumbnails/34.jpg)
ReferencesReferences
PAC-learning:
BOOK:
Artificial Intelligence: a modern approach
(Second Edition) by Stuart Russell and Peter Norvig(Second Edition) by Stuart Russell and Peter Norvig
http://www.cis.temple.edu/~ingargio/cis587/readings/pa
c.html
Machine Learning, Tom Mitchell, McGraw-Hill.