Transcript
Page 1: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

CE-717: Machine Learning Sharif University of Technology

M. Soleymani

Review (Probability & Linear Algebra)

Page 2: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Outline

Axioms of probability theory

Joint probability, conditional probability, Bayes theorem

Discrete and continuous random variables

Probability mass and density functions

Expected value, variance, standard deviation

Expectation for two variables

covariance, correlation

Some probability distributions

Gaussian distribution

Linear Algebra

2

Page 3: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Basic Probability Elements

3

Sample space (ฮฉ): set of all possible outcomes (or worlds)

Outcomes are assumed to be mutually exclusive.

An event ๐ด is a certain set of outcomes (i.e., subset of ฮฉ).

A random variable is a function defined over the sample

space

Gender: ๐‘†๐‘ก๐‘ข๐‘‘๐‘’๐‘›๐‘ก๐‘  โ†’ {๐‘š๐‘Ž๐‘™๐‘’, ๐‘“๐‘’๐‘š๐‘Ž๐‘™๐‘’}

Height: ๐‘†๐‘ก๐‘ข๐‘‘๐‘’๐‘›๐‘ก๐‘  โ†’ โ„+

Page 4: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Probability Space

A probability space is defined as a triple (ฮฉ, ๐น, ๐‘ƒ):

A sample space ฮฉ โ‰  โˆ… that contains the set of all possible

outcomes (outcomes also called states of nature)

A set ๐น whose elements are called events. The events are

subsets of ฮฉ.๐น should be a โ€œBorel Fieldโ€.

๐‘ƒ represents the probability measure that assigns

probabilities to events.

4

Page 5: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Probability Axioms (Kolomogrov)

5

Axioms define a reasonable theory of uncertainty

Kolomogrovโ€™s probability axioms

๐‘ƒ(๐ด) โ‰ฅ 0 (โˆ€๐ด โŠ† ฮฉ)

๐‘ƒ ฮฉ = 1

๐‘ƒ ๐ด โˆช ๐ต = ๐‘ƒ ๐ด + ๐‘ƒ ๐ต โˆ’ ๐‘ƒ(๐ด โˆฉ ๐ต) (โˆ€๐ด, ๐ต โŠ† ฮฉ)

ฮฉ

A B

๐ด โˆฉ ๐ต

Page 6: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Random Variables

Random variables: Variables in probability theory

Domain of random variables:Boolean,discrete or continuous

Probability distribution: the function describing probabilities

of possible values of a random variable

๐‘ƒ ๐ท๐‘–๐‘๐‘’ = 1 =1

6, ๐‘ƒ(๐ท๐‘–๐‘๐‘’ = 2) =

1

6, โ€ฆ

6

Page 7: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Random Variables

Random variable is a function that maps every outcome

in ฮฉ to a real (complex) number.

To define probabilities easily as functions defined on (real)

numbers.

To compute expectation, variance,โ€ฆ

Head

Real line0 1

Tail

7

Page 8: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Base Definitions

8

Joint probability distribution

The rules of probability (sum and product rule)

Bayesโ€™ theorem

Independence

new evidence may be irrelevant

Page 9: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Joint Probability Distribution

9

Probability of all combinations of the values for a set of

random variables.

If two or more random variables are considered together, they can

be described in terms of their joint probability

Example: Joint probability of features

๐‘ƒ(๐‘‹1, ๐‘‹2, โ€ฆ , ๐‘‹๐‘‘)

Page 10: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Two Fundamental Rules

10

Sum rule:

๐‘ƒ ๐‘Œ =

๐‘‹

๐‘ƒ(๐‘‹, ๐‘Œ)

Product rule:๐‘ƒ ๐‘‹, ๐‘Œ = ๐‘ƒ ๐‘‹ ๐‘Œ ๐‘ƒ(๐‘Œ)

Page 11: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Chain Rule

Chain rule is derived by successive application of product rule:

๐‘ƒ(๐‘‹1, โ€ฆ , ๐‘‹๐‘›)= ๐‘ƒ(๐‘‹1, โ€ฆ , ๐‘‹๐‘›โˆ’1 ) ๐‘ƒ(๐‘‹๐‘›|๐‘‹1, โ€ฆ , ๐‘‹๐‘›โˆ’1)= ๐‘ƒ(๐‘‹1, โ€ฆ , ๐‘‹๐‘›โˆ’2) ๐‘ƒ(๐‘‹๐‘›โˆ’1|๐‘‹1, โ€ฆ , ๐‘‹๐‘›โˆ’2) ๐‘ƒ(๐‘‹๐‘›|๐‘‹1, โ€ฆ , ๐‘‹๐‘›โˆ’1)= โ€ฆ

= ๐‘ƒ(๐‘‹1) ๐‘–=2

๐‘›

๐‘ƒ(๐‘‹๐‘–|๐‘‹1, โ€ฆ , ๐‘‹๐‘–โˆ’1)

11

Page 12: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Sum Rule: Example

12

[Bishop, Section1.2]

Page 13: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Conditional Probability

๐‘ƒ(๐‘‹|๐‘Œ) = ๐‘ƒ(๐‘‹, ๐‘Œ) ๐‘ƒ(๐‘Œ) if ๐‘ƒ(๐‘‹) > 0

Obtained from the product rule

๐‘ƒ(๐‘‹|๐‘Œ) obeys the same rules as probabilities

๐‘‹๐‘ƒ(๐‘‹|๐‘Œ) = 1

13

Page 14: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Conditional Probability

For statistically dependent variables, knowing the value of

one variable may allow us to better estimate the other.

All probabilities in effect are conditional probabilities

E.g., ๐‘ƒ(๐ด) = ๐‘ƒ(๐ด | ๐‘œ๐‘ข๐‘Ÿ ๐‘๐‘Ž๐‘๐‘˜๐‘”๐‘Ÿ๐‘œ๐‘ข๐‘›๐‘‘ ๐‘˜๐‘›๐‘œ๐‘ค๐‘™๐‘’๐‘‘๐‘”๐‘’)

Type equation here.

๐ด๐ต

๐‘ƒ(. |๐ต)ฮฉ

ฮฉ

Renormalize the probability of events jointly occur with B

14

Page 15: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Conditional Probability: Example Rolling a fair dice

๐ด : the outcome is an even number

๐ต : the outcome is a prime number

๐‘ƒ ๐ด ๐ต =๐‘ƒ(๐ด, ๐ต)

๐‘ƒ(๐ต)=

1/6

1/2=

1

3

Meningitis(๐‘€) & Stiff neck (๐‘†)

๐‘ƒ ๐‘€ = 1, ๐‘† = 1 =1

5000

๐‘ƒ ๐‘€ = 1 =1

2000

๐‘ƒ ๐‘† = 1 ๐‘€ = 1 =๐‘ƒ ๐‘€ = 1, ๐‘† = 1

๐‘ƒ(๐‘€ = 1)= 0.4

15

Page 16: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Conditional Probability: Another Example

16

[Bishop, Section1.2]

Page 17: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Bayes Theorem

๐‘ƒ ๐‘Œ ๐‘‹ =๐‘ƒ ๐‘‹ ๐‘Œ ๐‘ƒ ๐‘Œ

๐‘ƒ ๐‘‹

Obtained form the product rule and the symmetry property

๐‘ƒ ๐‘‹, ๐‘Œ = ๐‘ƒ ๐‘Œ, ๐‘‹

๐‘ƒ ๐‘‹, ๐‘Œ = ๐‘ƒ ๐‘‹|๐‘Œ ๐‘ƒ ๐‘Œ = ๐‘ƒ(๐‘Œ|๐‘‹)๐‘ƒ ๐‘‹

In some problems, it may be difficult to compute ๐‘ƒ(๐‘Œ|๐‘‹)directly, yet we might have information about ๐‘ƒ(๐‘‹|๐‘Œ).

๐‘ƒ(๐ถ๐‘Ž๐‘ข๐‘ ๐‘’|๐ธ๐‘“๐‘“๐‘’๐‘๐‘ก) = ๐‘ƒ(๐ธ๐‘“๐‘“๐‘’๐‘๐‘ก|๐ถ๐‘Ž๐‘ข๐‘ ๐‘’) ๐‘ƒ(๐ถ๐‘Ž๐‘ข๐‘ ๐‘’) / ๐‘ƒ(๐ธ๐‘“๐‘“๐‘’๐‘๐‘ก)

17

Page 18: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Bayes Theorem

Often it would be useful to derive the rule a bit further:

๐‘ƒ ๐‘Œ ๐‘‹ =๐‘ƒ ๐‘‹ ๐‘Œ ๐‘ƒ ๐‘Œ

๐‘ƒ ๐‘‹=

๐‘ƒ ๐‘‹ ๐‘Œ ๐‘ƒ ๐‘Œ

๐‘Œ๐‘ƒ ๐‘‹ ๐‘Œ ๐‘ƒ ๐‘Œ

18

Page 19: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Bayes Theorem: Example

19

Meningitis(๐‘€) & Stiff neck (๐‘†)

๐‘ƒ ๐‘€ = 1 =1

5000

๐‘ƒ ๐‘† = 1 = 0.01

๐‘ƒ ๐‘† = 1 ๐‘€ = 1 = 0.7

๐‘ƒ ๐‘€ = 1 ๐‘† = 1 =?

๐‘ƒ(๐‘€ = 1|๐‘† = 1) = ๐‘ƒ ๐‘† = 1 ๐‘€ = 1 ๐‘ƒ(๐‘€ = 1) /๐‘ƒ(๐‘† = 1)= 0.7 ร— 0.0002/0.01 = 0.0014

Page 20: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Prior and Posterior Probabilities

Prior or unconditional probabilities: belief in the absence of

any other evidence

e.g.,๐‘ƒ ๐‘† = 1 = 0.01

Posterior or conditional probabilities: belief in the presence of

evidences

e.g.,๐‘ƒ ๐‘† = 1 ๐‘€ = 1) = 0.7

20

Page 21: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Independence of Random Variables

๐‘‹ and ๐‘Œ are independent iff

๐‘ƒ(๐‘‹|๐‘Œ) = ๐‘ƒ(๐‘‹)

๐‘ƒ ๐‘Œ ๐‘‹ = ๐‘ƒ ๐‘Œ

๐‘ƒ(๐‘‹, ๐‘Œ) = ๐‘ƒ(๐‘‹) ๐‘ƒ(๐‘Œ)

Knowing ๐‘Œ tells us nothing about ๐‘‹ (and vice versa)

21

Page 22: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Probability Mass Function (PMF)

Probability Mass Function (PMF) shows the probability

for each value of a discrete random variable

Each impulse magnitude is equal to the probability of the

corresponding outcome

Example: PMF of a fair dice

๐‘ƒ(๐‘‹ = ๐‘ฅ) โ‰ฅ 0

๐‘ฅโˆˆ๐‘‹

๐‘ƒ(๐‘‹ = ๐‘ฅ) = 1

22

๐‘ƒ(๐‘‹)

๐‘‹

Page 23: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Probability Density Function (PDF)

Probability Density Function (PDF) is defined for

continuous random variables

The probability of ๐‘ฅ โˆˆ (๐‘ฅ0, ๐‘ฅ0 + ๐›ฟ๐‘ฅ) is ๐‘(๐‘ฅ0) ร— ๐›ฟ๐‘ฅ (for ๐›ฟ๐‘ฅ โ†’ 0)

๐‘(๐‘ฅ): probability density over ๐‘ฅ

23

๐‘(๐‘ฅ)

๐‘(๐‘ฅ) โ‰ฅ 0

๐‘ ๐‘ฅ ๐‘‘๐‘ฅ = 1

๐‘ฅ

๐‘(๐‘ฅ0)

๐‘ฅ0

Page 24: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Cumulative Distribution Function (CDF)

Cumulative Distribution Function (CDF)

Defined as the integration of PDF

Similarly defined on discrete variables (summation instead of integration)

Non-decreasing

Right Continuous

๐น(โˆ’โˆž) = 0

๐น โˆž = 1

๐‘‘๐น(๐‘ฅ)

๐‘‘๐‘ฅ= ๐‘ ๐‘ฅ

๐‘ƒ ๐‘ข โ‰ค ๐‘ฅ โ‰ค ๐‘ฃ = ๐น ๐‘ฃ โˆ’ ๐น(๐‘ข)

24

๐‘ฅ

๐น ๐‘ฅ

๐‘(๐‘ฅ)

๐ถ๐ท๐น ๐‘ฅ = ๐น ๐‘ฅ = โˆ’โˆž

๐‘ฅ

๐‘(๐‘ฅ)

Page 25: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Distribution Statistics

Basic descriptors of spatial distributions:

Mean value

Variance & standard deviation

Moments

Covariance & correlation

25

Page 26: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Expected Value

Expected (or mean) value: weighted average of all possible

values of the random variable

Expectation of a discrete random variable ๐‘‹:

๐ธ ๐‘ฅ = ๐‘ฅ๐‘ฅ๐‘(๐‘ฅ)

Expectation of a function of a discrete random variable ๐‘‹ :

๐ธ ๐‘“(๐‘ฅ) = ๐‘ฅ๐‘“(๐‘ฅ)๐‘(๐‘ฅ)

Expected value of a continuous random variable ๐‘‹ :

๐ธ ๐‘ฅ = ๐‘ฅ๐‘ ๐‘ฅ ๐‘‘๐‘ฅ

Expectation of a function of a continuous random variable ๐‘‹ :

๐ธ ๐‘“(๐‘ฅ) = ๐‘“(๐‘ฅ)๐‘ ๐‘ฅ ๐‘‘๐‘ฅ

26

Page 27: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Expected Value

27

For expectation of a function of several variables, a

subscript is used to specify the variables is being average

over

Examples:

๐ธ๐‘ฅ ๐‘“ ๐‘ฅ, ๐‘ฆ = ๐‘ฅ ๐‘ ๐‘ฅ ๐‘“(๐‘ฅ, ๐‘ฆ)

๐ธ๐‘ฅ|๐‘ฆ ๐‘“ ๐‘ฅ, ๐‘ฆ = ๐‘ฅ ๐‘ ๐‘ฅ|๐‘ฆ ๐‘“(๐‘ฅ, ๐‘ฆ)

Other notation:๐ธ๐‘ฅ[๐‘“(๐‘ฅ, ๐‘ฆ)|๐‘ฆ]

๐ธ๐‘ฅ,๐‘ฆ ๐‘“ ๐‘ฅ, ๐‘ฆ = ๐‘ฅ ๐‘ฆ ๐‘ ๐‘ฅ, ๐‘ฆ ๐‘“(๐‘ฅ, ๐‘ฆ)

Page 28: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Variance

Variance: a measure of how far values of a random

variable are spread out around its expected value

๐‘‰๐‘Ž๐‘Ÿ ๐‘ฅ = ๐ธ ๐‘ฅ โˆ’ ๐ธ ๐‘ฅ 2

= ๐ธ ๐‘ฅ2 โˆ’ ๐ธ ๐‘ฅ 2

Standard deviation: square root of variance:

ฯƒ๐‘ฅ = ๐‘‰๐‘Ž๐‘Ÿ[๐‘ฅ]

28

Page 29: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Moments

Moments nth order moment of a random variable ๐‘‹:

๐ธ ๐‘ฅ๐‘›

Normalized nth order moment:

๐ธ ๐‘ฅ โˆ’ ๐ธ ๐‘ฅ ๐‘›

The first order moment is the mean value.

The second order moment is the variance added by thesquare of the mean.

29

Page 30: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Correlation & Covariance

Correlation

๐ถ๐‘Ÿ๐‘Ÿ ๐‘ฅ, ๐‘ฆ = ๐ธ๐‘ฅ,๐‘ฆ ๐‘ฅ๐‘ฆ

Covariance is the correlation of mean removed variables:

๐ถ๐‘œ๐‘ฃ ๐‘ฅ, ๐‘ฆ = ๐ธ๐‘ฅ,๐‘ฆ (๐‘ฅ โˆ’ ๐ธ ๐‘ฅ )(๐‘ฆ โˆ’ ๐ธ ๐‘ฆ )

30

Discrete RVs

๐‘ฅ

๐‘ฆ๐‘ฅ๐‘ฆ ๐‘(๐‘ฅ, ๐‘ฆ)

Page 31: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Covariance: Example

31

๐ถ๐‘œ๐‘ฃ ๐‘ฅ, ๐‘ฆ = 0 ๐ถ๐‘œ๐‘ฃ ๐‘ฅ, ๐‘ฆ = 0.9๐‘ฅ

๐‘ฆ

๐‘ฅ

๐‘ฆ

Page 32: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Covariance Properties

32

The covariance value shows the tendency of the pair of

RVs to increase together

๐ถ๐‘œ๐‘ฃ๐‘ฅ๐‘ฆ > 0 โˆถ ๐‘ฅ and ๐‘ฆ tend to increase together

๐ถ๐‘œ๐‘ฃ๐‘ฅ๐‘ฆ < 0 : ๐‘ฅ tends to decrease when ๐‘ฆ increases

๐ถ๐‘œ๐‘ฃ๐‘ฅ๐‘ฆ = 0 : no linear correlation between ๐‘ฅ and ๐‘ฆ

Page 33: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Pearsonโ€™s Product Moment Correlation

33

๐œŒ๐‘‹๐‘Œ =๐ถ๐‘œ๐‘ฃ(๐‘‹, ๐‘Œ)

๐œŽ๐‘‹๐œŽ๐‘Œ=

๐ธ ๐‘‹ โˆ’ ๐œ‡๐‘‹ ๐‘Œ โˆ’ ๐œ‡๐‘Œ๐œŽ๐‘‹๐œŽ๐‘Œ

Defined only if both ๐œŽ๐‘‹ and ๐œŽ๐‘Œ are finite and nonzero.

๐œŒ๐‘‹๐‘Œ shows the degree of linear dependence between ๐‘‹ and ๐‘Œ.

โˆ’1 โ‰ค ๐œŒ๐‘‹๐‘Œโ‰ค 1

๐ธ ๐‘‹ โˆ’ ๐œ‡๐‘‹ ๐‘Œ โˆ’ ๐œ‡๐‘Œ2 โ‰ค ๐ธ ๐‘‹ โˆ’ ๐œ‡๐‘‹

2 ๐ธ ๐‘Œ โˆ’ ๐œ‡๐‘Œ2 according to Cauchy-

Schwarz inequality (๐ถ๐‘‹๐‘Œ โ‰ค ๐œŽ๐‘‹๐œŽ๐‘Œ)

๐œŒ๐‘‹๐‘Œ = 1 shows a perfect positive linear relationship and ๐œŒ๐‘‹๐‘Œ = โˆ’1

shows a perfect negative linear relationship

Page 34: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Pearsonโ€™s Correlation: Examples

34

[Wikipedia]

๐œŒ๐‘‹๐‘Œ =๐ถ๐‘œ๐‘ฃ(๐‘‹, ๐‘Œ)

๐œŽ๐‘‹๐œŽ๐‘Œ

Page 35: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Orthogonal, Uncorrelated & Independent RVs

Orthogonal random variables (๐ธ ๐‘ฅ๐‘ฆ = 0)

๐ถ๐‘Ÿ๐‘Ÿ ๐‘ฅ, ๐‘ฆ = 0

Uncorrelated random variables (๐ธ (๐‘ฅ โˆ’ ๐ธ[๐‘ฅ])(๐‘ฆ โˆ’ ๐ธ[๐‘ฆ])= 0)

๐ถ๐‘œ๐‘ฃ ๐‘ฅ, ๐‘ฆ = 0

Independent random variables โ‡’ ๐ถ๐‘œ๐‘ฃ ๐‘ฅ, ๐‘ฆ = 0

๐ถ๐‘œ๐‘ฃ ๐‘ฅ, ๐‘ฆ = 0 โ‡ Independent random variables

35

Page 36: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Covariance Matrix

If ๐’™ is a vector of random variables (๐‘‘ -dim random

vector):

Covariance matrix indicates the tendency of each pair of RVs

to vary together

36

๐œฎ =๐ธ((๐‘ฅ1 โˆ’ ๐œ‡1)(๐‘ฅ1 โˆ’ ๐œ‡1)) โ‹ฏ ๐ธ((๐‘ฅ1 โˆ’ ๐œ‡1)(๐‘ฅ๐‘‘ โˆ’ ๐œ‡๐‘‘))

โ‹ฎ โ‹ฑ โ‹ฎ๐ธ((๐‘ฅ๐‘‘ โˆ’ ๐œ‡๐‘‘)(๐‘ฅ1 โˆ’ ๐œ‡1)) โ‹ฏ ๐ธ((๐‘ฅ๐‘‘ โˆ’ ๐œ‡๐‘‘)(๐‘ฅ๐‘‘ โˆ’ ๐œ‡๐‘‘))

๐œฎ = ๐ธ[ ๐’™ โˆ’ ๐๐’™ ๐’™ โˆ’ ๐๐’™๐‘‡]

๐๐’™ =

๐œ‡1โ‹ฎ๐œ‡๐‘‘

=๐ธ(๐‘ฅ1)

โ‹ฎ๐ธ(๐‘ฅ๐‘‘)

Page 37: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Covariance Matrix: Two Variables

37

ฮฃ = ๐ถ =๐œŽ12 ๐œŽ12

๐œŽ21 ๐œŽ22

๐œฎ =1 00 1 ๐œฎ =

1 0.90.9 1

๐‘‹

๐‘Œ

๐‘‹

๐‘Œ

๐œŽ21 = ๐œŽ12 = ๐ถ๐‘œ๐‘ฃ(๐‘‹, ๐‘Œ)

Page 38: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Covariance Matrix

38

๐œŽ๐‘–๐‘— shows the covariance of ๐‘‹๐‘– and ๐‘‹๐‘—:

๐œŽ๐‘–๐‘— = ๐œŽ๐‘—๐‘– = ๐ถ๐‘œ๐‘ฃ(๐‘‹๐‘– , ๐‘‹๐‘—)

๐œฎ =

๐œŽ12 ๐œŽ12

๐œŽ21 ๐œŽ22

โ‹ฏ ๐œŽ1๐‘‘โ‹ฏ ๐œŽ2๐‘‘

โ‹ฎ โ‹ฎ๐œŽ๐‘‘1 ๐œŽ๐‘‘2

โ‹ฑ โ‹ฎโ‹ฏ ๐œŽ๐‘‘

2

Page 39: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Sums of Random Variables

๐‘ = ๐‘‹ + ๐‘Œ

Mean: ๐ธ[๐‘ง] = ๐ธ[๐‘ฅ] + ๐ธ[๐‘ฆ]

Variance:๐‘‰๐‘Ž๐‘Ÿ(๐‘) = ๐‘‰๐‘Ž๐‘Ÿ(๐‘‹) + ๐‘‰๐‘Ž๐‘Ÿ(๐‘Œ) + 2๐ถ๐‘œ๐‘ฃ(๐‘‹, ๐‘Œ)

If ๐‘‹,๐‘Œ independent:๐‘‰๐‘Ž๐‘Ÿ(๐‘) = ๐‘‰๐‘Ž๐‘Ÿ(๐‘‹) + ๐‘‰๐‘Ž๐‘Ÿ(๐‘Œ)

Distribution:

,( ) ( , )

( ) ( ) ( ) ( )

Z X Y

X Y X Y

p z p x z x dx

p x p y p x p z x dx

39

independence

Page 40: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Some Famous Probability Density Functions

Uniform

Gaussian (Normal)

1

p x U a U bb a

2

22.1

,. 2

x

p x e N

๐‘ฅ

40

๐‘ฅ~๐‘ˆ(๐‘Ž, ๐‘)๐‘(๐‘ฅ)

๐‘๐‘Ž

1

๐‘ โˆ’ ๐‘Ž

๐‘ฅ~๐‘(๐œ‡, ๐œŽ2)

๐‘ฅ

๐‘(๐‘ฅ)

1

2๐œ‹๐œŽ

๐œ‡

Page 41: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Gaussian (Normal) Distribution

41

68% within ๐œ‡ โˆ’ ๐œŽ,๐œ‡ + ๐œŽ

95% within ๐œ‡ โˆ’ 2๐œŽ,๐œ‡ + 2๐œŽ

It is widely used to model the distribution of continuousvariables

Standard Normal distribution: ๐œ‡ = 0, ๐œŽ = 1

Page 42: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Some Famous Probability Density Functions

42

Exponential

0

0 0

x

x e xp x e U x

x

๐‘ฅ

๐‘(๐‘ฅ)

0

Page 43: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Some Famous Probability Mass Functions

Bernoulli: ๐‘ฅ โˆˆ 0,1

๐‘ ๐‘ฅ = 1 โˆ’ ๐œ‡ 1โˆ’๐‘ฅ๐œ‡๐‘ฅ

Binomial

1n k k

nP X k p p

k

43

๐‘ฅ~๐ต๐‘–๐‘›๐‘œ๐‘š๐‘–๐‘Ž๐‘™(๐‘, ๐‘›)

๐‘˜๐‘›๐‘

๐‘ƒ(๐‘ฅ = ๐‘˜)

๐‘ƒ(๐‘ฅ)

0 1

๐œ‡

Page 44: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Multivariate Gaussian Distribution

๐’™ is a vector of ๐‘‘ Gaussian variables

๐‘ ๐’™ ~๐‘ ๐, ๐œฎ =1

2๐œ‹ ๐‘‘/2 ๐œฎ 1/2๐‘’โˆ’

12

๐’™โˆ’๐ ๐‘‡๐œฎโˆ’1 ๐’™โˆ’๐

44

๐ =

๐œ‡1โ‹ฎ๐œ‡๐‘‘

=๐ธ(๐‘ฅ1)

โ‹ฎ๐ธ(๐‘ฅ๐‘‘)

๐œฎ = ๐ธ ๐’™ โˆ’ ๐ ๐’™ โˆ’ ๐ ๐‘‡

Page 45: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Multivariate Gaussian Distribution

45

The covariance matrix is always symmetric and positive semi-

definite

Multivariate Gaussian is completely specified by ๐‘‘ + ๐‘‘(๐‘‘ + 1)/2 parameters

Special cases of :

= ๐œŽ2๐‘ฐ : Independent random variables with the same variance

(circularly symmetric Gaussian)

Digonal matrix =๐œŽ12 โ€ฆ ๐ŸŽโ‹ฎ โ‹ฑ โ‹ฎ๐ŸŽ โ‹ฏ ๐œŽ๐‘‘

2: Independent random variables with

different variances

Page 46: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Multivariate Gaussian Distribution

Level Surfaces

46

The Gaussian distribution will be constant on surfaces in

๐’™-space for which:๐’™ โˆ’ ๐ ๐‘‡๐œฎโˆ’๐Ÿ ๐’™ โˆ’ ๐ = ๐ถ

Principal axes of the hyper-ellipsoids are the eigenvectors of .

Bivariate Gaussian: Curves of constant density are

ellipses.

Hyper-ellipsoid

Page 47: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Bivariate Gaussian distribution

๐œ†1 and ๐œ†2 are the eigenvalues of (๐œ†1 โ‰ฅ ๐œ†2) and ๐’—1 and

๐’—2 are the corresponding eigenvectors

47

๐’—1๐’—2

๐‘™1๐‘™2

๐‘™1๐‘™2

=๐œ†1

๐œ†2

Page 48: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Gaussian Distribution Properties

Some attracting properties of the Gaussian distribution:

Marginal and conditional distributions of a Gaussian are also Gaussian

After a linear transformation, a Gaussian distribution is again Gaussian

There exists a linear transformation that diagonalizes the covariance matrix

(whitening transform).

It converts the multivariate normal distribution into a spherical one.

Gaussian is a distribution that maximizes the entropy

Gaussian is stable and infinitely divisible

Central Limit Theorem

Some distributions can be estimated by Gaussian distribution when their

parameter value is sufficiently large (e.g., Binomial)

48

Page 49: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Central Limit Theorem

(under mild conditions)

Suppose i.i.d. (Independent Identically Distributed) RVs ๐‘‹๐‘– (๐‘–= 1,โ€ฆ , ๐‘) with finite variances

Let ๐‘†๐‘ = ๐‘–=1๐‘ ๐‘‹๐‘– be the sum of these RVs

Distribution of ๐‘†๐‘ converges to a normal distribution as ๐‘increases, regardless to the distribution of the RVs.

Example:

49

๐‘‹๐‘–~ uniform, ๐‘– = 1,โ€ฆ , ๐‘

๐‘†๐‘ =1

๐‘

๐‘–=1

๐‘

๐‘‹๐‘–

๐‘†1 ๐‘†2 ๐‘†10

Page 50: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Linear Algebra: Basic Definitions

Matrix ๐‘จ:

Matrix Transpose

Symetric matrix ๐‘จ = ๐‘จ๐‘‡

Vector ๐’‚

11 12 1

21 22 2

1 2

...

...[ ]

... ... ... ...

...

n

n

ij m n

m m mn

a a a

a a aa

a a a

A

1 ,1T

ij jib a i n j m B A

1

1... [ ,..., ]T

n

n

a

a a

a

a a

50

Page 51: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Linear Mapping

Linear function

๐‘“(๐’™ + ๐’š) = ๐‘“(๐’™) + ๐‘“(๐’š) โˆ€๐’™, ๐’š โˆˆ ๐‘‰

๐‘“(๐‘Ž๐’™) = ๐‘Ž๐‘“(๐’™) โˆ€๐’™ โˆˆ ๐‘‰, ๐‘Ž โˆˆ ๐น

A linear function: ๐‘“ ๐’™ = ๐‘ค1๐‘ฅ1 +โ‹ฏ+๐‘ค๐‘‘๐‘ฅ๐‘‘ = ๐’˜๐‘‡ ๐’™

In general, a matrix ๐–mร—d = ๐’˜1 โ‹ฏ ๐’˜๐‘š๐‘‡ can be used to

denote a map ๐‘“:โ„๐‘‘ โ†’ โ„๐‘š where ๐‘“๐‘– ๐’™ = ๐‘ค11๐‘ฅ1 +โ‹ฏ+๐‘ค๐‘‘1๐‘ฅ๐‘‘ = ๐’˜๐‘–

๐‘‡๐’™

51

Page 52: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Inner (dot) product

Matrix multiplication

. .

[ ] [ ]

[ ] ,

ij m p ij p n

T

ij m n ij i j

a b

c c A B

A B

AB = C

Linear Algebra: Basic Definitions

1

,n

T

i i

i

a b

a b a b

52

i-th row of A j-th column of B

Page 53: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Inner Product

Inner (dot) product

Length (Euclidean norm) of a vector

๐’‚ is normalized iff ๐’‚ 2 = 1

Angle between vectors

Orthogonal vectors ๐’‚ and ๐’ƒ:

Orthonormal set of vectors ๐’‚1, ๐’‚2, โ€ฆ , ๐’‚๐‘›:

โˆ€๐‘–, ๐‘— ๐’‚๐‘–๐‘‡๐’‚๐‘— =

1 ๐‘– = ๐‘—

0 ๐‘œ.๐‘ค.53

๐’‚ 2 = ๐’‚๐‘‡๐’‚ =

๐‘–=1

๐‘‘

๐‘Ž๐‘–2

๐’‚๐‘‡๐’ƒ =

๐‘–=1

๐‘‘

๐‘Ž๐‘–๐‘๐‘–

๐’‚๐‘‡๐’ƒ = 0

๐‘๐‘œ๐‘ ๐œƒ =๐’‚๐‘‡๐’ƒ

๐’‚ 2 ๐’ƒ 2

Page 54: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Linear Independence

A set of vectors is linearly independent if no vector is

a linear combination of other vectors.

๐‘1๐’—1 + ๐‘2๐’—2+ . . . + ๐‘๐‘˜๐’—๐‘˜ = 0 โ‡’

๐‘1 = ๐‘2 = . . . = ๐‘๐‘˜ = 0

54

Page 55: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Matrix Determinant and Trace

Determinant

๐‘‘๐‘’๐‘ก(๐‘จ๐‘ฉ) = ๐‘‘๐‘’๐‘ก(๐‘จ) ร— ๐‘‘๐‘’๐‘ก(๐‘ฉ)

Trace

1

[ ]n

jj

j

tr A a

1

det( ) ; 1,.... ;

( 1) det( )

n

ij ij

j

i j

ij ij

A a A i n

A M

55

Page 56: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Matrix Inversion

Inverse of ๐ด๐‘›ร—๐‘›:

๐ดโˆ’1 exists iff det(๐ด) โ‰  0 (๐ด is nonsingular)

Singular:det(๐ด) = 0

ill-conditioned:๐ด is nonsingular but close to being singular

Pseudo-inverse for a non square matrix ๐ด# = ๐ด๐‘‡๐ด โˆ’1๐ด๐‘‡

๐ด๐‘‡๐ด is not singular

๐ด#๐ด = ๐ผ

1 nAB BA I B A

56

Page 57: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Matrix Rank

๐‘Ÿ๐‘Ž๐‘›๐‘˜(๐‘จ) : maximum number of linearly independent

columns or rows of A.

๐‘จ๐‘šร—๐‘›: ๐‘Ÿ๐‘Ž๐‘›๐‘˜(๐‘จ) โ‰ค min(๐‘š, ๐‘›)

Full rank ๐‘จ๐‘›ร—๐‘› : ๐‘Ÿ๐‘Ž๐‘›๐‘˜(๐‘จ) = ๐‘› iff ๐‘จ is nonsingular

(det(๐‘จ) โ‰  0)

57

Page 58: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Eigenvectors and Eigenvalues

det( ) 0n A I

1

( )n

j

j

tr

A

1

det( )n

j

j

A A

๐ด๐’— = ๐œ†๐’—

Characteristic equation:

n-th order polynomial, with n roots

58

Page 59: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Eigenvector: Example

59

๐‘จ =2 11 2

[wikipedia]

Page 60: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Eigenvectors and Eigenvalues:

Symmetric Matrix

For a symmetric matrix, the eigenvectors corresponding

to distinct eigenvalues are orthogonal

These eigenvectors can be used to form an orthonormal

set (โˆ€๐‘– โ‰  ๐‘— ๐’—๐‘–๐‘‡๐’—๐‘— = 0 and ๐’—๐‘– = 1)

60

Page 61: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Eigen Decomposition: Symmetric Matrix

๐‘ฝ = ๐’—1 โ€ฆ ๐’—๐‘ , ๐œฆ =๐œ†1 โ€ฆ ๐ŸŽโ‹ฎ โ‹ฑ โ‹ฎ๐ŸŽ โ€ฆ ๐œ†๐‘

๐‘จ๐‘ฝ = ๐‘ฝ๐œฆ โ‡’ ๐‘จ๐‘ฝ๐‘ฝ๐‘‡ = ๐‘ฝ๐œฆ๐‘ฝ๐‘‡๐‘ฝ๐‘ฝ๐‘‡=๐‘ฐ

๐‘จ = ๐‘ฝ๐œฆ๐‘ฝ๐‘‡

Eigen decomposition of a symmetric matrix: ๐‘จ = ๐‘ฝ๐œฆ๐‘ฝ๐‘‡

61

Page 62: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Positive Definite Matrix

Symmetric ๐ด๐‘›ร—๐‘› is positive definite iff:

โˆ€๐’™ โˆˆ โ„๐‘› โ‡’ ๐’™๐‘‡๐ด๐’™ > 0

Eigen values of a positive define matrix are positive:

โˆ€๐‘–, ๐œ†๐‘– > 0

62

Page 63: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)

Vector Derivatives

๐œ•๐’™๐‘‡๐‘จ๐’™

๐œ•๐’™= (๐‘จ + ๐‘จ๐‘‡)๐’™

๐œ•๐’ƒ๐‘‡๐’™

๐œ•๐’™= ๐’ƒ

You can see more on the vector derivatives in the

uploaded review materials

63


Top Related