Download - CSC446: Pattern Recognition (LN3)
CSC446 : Pattern Recognition
Prof. Dr. Mostafa G. M. Mostafa Faculty of Computer & Information Sciences
Computer Science Department
AIN SHAMS UNIVERSITY
Lecture Note 3:
Mathematical Foundations
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
Appendix, Pattern Classification and PRML
CS446 : Pattern Recognition
Readings: Chapter 1 in Bishop’s PRML
Data Modeling (Regression)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
Learning: Data Modeling
• Assume we have examples of pairs (x , y) and we
want to learn the mapping 𝑭:𝑿 → 𝒀 to predict y
for future values of x.
𝒚 𝒙 = 𝐬𝐢𝐧(𝟐𝝅𝒙)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
Polynomial Curve Fitting
• Problem: There are many possible mapping
functions 𝑭:𝑿 → 𝒀 exist!
Which one to choose?
• We could choose the one
that minimize the error :
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
Polynomial Curve Fitting
• Fitting a different polynomials (models) to
data:
𝑦 𝑥 = 𝒘𝟎 𝑦 𝑥 = 𝒘𝟎+𝒘𝟏𝒙
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
Polynomial Curve Fitting
• Fitting a different polynomials (models) to
data:
𝑦 𝑥 = 𝒘𝟎+𝒘𝟏𝒙+𝒘𝟐𝒙𝟐 𝑦 𝑥 = 𝒘𝟎+𝒘𝟏𝒙+𝒘𝟐𝒙
𝟐 +⋯+𝒘𝟖𝒙𝟖
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
Overfitting
• At M = 9, we get zero training Error , BUT
highest testing Error
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
Effect of Data Size
• As number of data samples N increases, we
get more closer to the real data model with
higher order.
M = 9 M = 9
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
Performance Evaluation
• Generalization error is the true error for the
population of examples we would like to optimize
– Sample mean only approximates it.
• Two ways to assess the generalization error is:
• Theoretical: Law of Large numbers
– statistical bounds on the difference between the true and
sample mean errors
• Practical: Use a separate data set with m data
samples to test the model
(Mean) test error =
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
Assignment 1
1. Derive an equation for estimating the
parameters w from the sample data for
the cases M = 1 and M = 2.
2. Use such equations to draw a relation
between w and E(w) for each M. Use the
estimated values of w as the middle values
of the w range.
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
CS446 : Pattern Recognition
Readings: Appendix A
Probability & Statistics
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory • Randomness:
–we call a phenomenon random if individual outcomes
are uncertain but there is nonetheless a regular
distribution of outcomes in a large number of
repetitions.
• Probability:
– the probability of any outcome of a random phenomenon
is the proportion of times the outcome would occur in a
very long series of repetitions.
–Probability is the long-term relative frequency.
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory
• Discrete random variables:
–Let x X ; the sample space X = {v1, v2, ... , vm}.
–We denote by pi the probability that x = vi:
• Where pi must satisfy the following two conditions:
pi = Pr{ x = vi } , i = 1, . . . , m.
m
i
iipp
1
1 and 0
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory
• Equally likely outcomes:
“Equally likely outcomes are outcomes that
have the same probability of occurring.”
• Examples:
– Rolling a fair die
– Tossing a fair coin
• P(x) is a “Uniform Distribution”
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory
• Equally likely outcomes:
• if we have ten identical balls numbered from 0 to 9, in a box
find the probability of randomly drawing a ball with a number
divisible by 3,
– the event space (desired outcomes): A={3,6,9}.
– the sample space (possible outcomes): S = {0, 1, 2, . . . , 9}.
• Since the drawing is at random, then each outcome is equally
likely to occur, i.e.: P(0) = P(1) = P(2) =…= P(9) =1/10
• P(A) ={numb. Of outcomes in A} / {number of outcomes in S}
= 3/10 = 0.3
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory
• Biased outcomes (non-uniform dist.):
“Biased outcomes are outcomes that have
different probability of occurring.”
• Examples:
– Rolling a unfair die
– Tossing a unfair coin
• P(x) is a “Non-uniform Dist.”
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory
• Biased outcomes (non-uniform dist.):
• A biased coin, twice as likely to come up tails as
heads, is tossed twice:
– What is the probability that at least one head occurs?
• Solution:
– Sample space = {HH, HT, TH, TT}
– P(H= head) = 1/3 , P(T= tail) =2/3
– Sample points/probability for the event:
• P(HT)= 1/3 x 2/3 = 2/9 P(HH)= 1/3 x 1/3= 1/9
• P(TH) = 2/3 x 1/3 = 2/9 P(TT)= 2/3 x 2/3 = 4/9
– Answer: 5/9 = 0.56 (sum of weights in red)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory
• Probability and Language
What’s the probability of a random word (from a random
dictionary page) being a verb?
• Solution:
• All words = just count all the words in the dictionary
• # of ways to get a verb: number of words which are verbs!
• If a dictionary has 50,000 entries, and 10,000 are verbs,
then:
• P(Verb) =10000/50000 = 1/5 = .20
wordsall
verbagettowaysofverbadrawingP
#) (
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory
• Conditional Probability
– A way to reason about the outcome of an
experiment based on partial information:
• In a word guessing game the first letter for the word
is a “t”. How likely is the second letter is an “h”?
• How likely is a person has a disease given that a
medical test was negative?
• A spot shows up on a radar screen. How likely it
corresponds to an aircraft?
• I saw your friend, How likely I will saw you?
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory
• Conditional Probability
• let A and B be events
• p(B|A) = the probability of event B occurring given event A occurs
• definition:
)(
),()|(
BP
BAPBAP
A B A,B
Note: P(A,B)=P(A|B) · P(B)
Also : P(A,B) = P(B,A)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory
• Conditional Probability
• One of the following 30 items is chosen at random.
• What is P(X), the probability that it is an X?
• What is P(X|red), the probability that it is an X given that it
is red?
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory
• Statistically Independent events
– Variables x and y are said to be
statistically independent if and only if:
– That is, knowing the value of x did not
give us any additional knowledge about
the possible value of y
)()(),( yPxPyxP
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory
• Marginal Probability
• Conditional Probability
• Joint Probability
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory
• Sum Rule
• Product Rule
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory
• Sum Rule
• Product Rule
• The Rules of Probability
)()|()()|(),( YpYXpXpXYpYXp
Y
YXpXp ),()(
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory
• Bayes Theorem
where
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
1- Probability Theory
• Probability mass function, P(x):
– P(x) is the cumulative distribution of p(x).
Xx
z
xP
xP
dxxpz)P(x
1)( and
0)(
)(
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
2- Statistics • Statistics is the science of collecting, organizing, and interpreting numerical
facts, which we call data.
• The best way of
looking at data is to
draw its histogram/
(frequency
distribution)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
2- Statistics
• Univariate Gaussian/Normal Density: – A density that is analytically tractable
– Continuous density
– A lot of processes are asymptotically Gaussian
Where:
= mean (or expected value) of x 2 = squared deviation or variance
,2
1exp
2
1)(
2
xxp
1)( dxxp
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
2- Statistics • Univariate Gaussian/Normal Density
p(u) ~ N(0,1)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
2- Statistics
• Multivariate Normal Density – Multivariate normal density in d dimensions is:
where:
x = (x1, x2, …, xd)t = The multivariate random variable
= (1, 2, …, d)t = the mean vector
= d*d covariance matrix, || and -1 are it determinant
and inverse, respectively .
)x()x(
2
1exp
)2(
1)x( 1
2/12/
t
dp
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
2- Statistics
• Multivariate Density: Statistically Independent
– If xi and xj are statistically independent
σij = 0.
– In this case, p (x) reduces to the product of the
univariate normal densities for the components of
x. That is: if p(xi) ~ N(xi | µi , σi )
p(x) = p(x1,x2, …, xd) = p(x1) p(x2) … p(xd)
= p(xi) , i
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
2- Statistics
• Multivariate Normal Density
– From the multivariate normal density, the loci of
points of constant density are hyperellipsoids for
which the quadratic form (x−µ)t Σ−1(x−µ) is
constant
– The quantity:
r2 = (x−µ)t Σ−1 (x−µ)
is sometimes called the squared Mahalanobis
distance from x to µ.
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
2- Statistics
Multivariate Normal Density
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
2- Statistics
Expected values:
• The expected value, mean or average of the random variable
x is defined by:
• if f(x) is any function of x, the expected value of f is defined
by:
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
2- Statistics
Expected values:
• The second moment of x is defined by:
• The variance of x is defined by:
where σ is the standard deviation of x.
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
3- Mathematical Notations
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
3- Mathematical Notations
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
3- Mathematical Notations
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
3- Mathematical Notations
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
3- Mathematical Notations
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
3- Mathematical Notations
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
Next Time
Bayesian Decision Theory
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1