introducing probability and statistics - school of mathematicssta6ajb/mscinductionlecture1.pdf ·...

Introducing Probability and Statistics


Dr. Andrew Baczkowski

Notes originally by Dr. Robert G Aykroyd

Edited by Dr. Stuart Barber

University of LeedsDepartment of Statistics

27th September 2018


Overview

Course Structure

Today

Thursday: AM: 09:30-11:00 (LT22)PM: 13:30-16:00 (LT22)

Tomorrow

Friday: AM: 10:00-12:00 R TrainingChemical and ProcessEngineering Cluster G.06


Overview

Course Background

Motivation & Explanation

Reminder of basics in probability and statistics.

Notes taken from UG Level 2 module Mathematical Statistics.

Emphasises theoretical basis, and not data analysis.

Fast paced — do not expect to understand all.

Stop me and ask if you have questions.

Will not cover all in notes.

Can be a resource for future studies.


Overview

Course Contents

1. Basic probability.2. Conditional Probability.3. Standard Distributions.

4. Linear Regression.5. Classical Estimation.

6. The Normal Distribution.7. Derived Distributions.

8. Bayesian estimation.Summary.


1. Basic Probability

1.1 Introduction

What is probability?

Probability measures the likelihood, or chance, of some eventoccurring.

Probability zero means the event is impossible.

Probability 1 means that the event is certain.



1.2 Events and axioms

What are the rules?

Let A and B represent events, with Ω the sample space.

The (Kolmogorov) axioms:

K1 Pr(A) ≥ 0 for any event A,K2 Pr(Ω) = 1 for any sample space Ω,K3 Pr(A ∪ B) = Pr(A) + Pr(B) for any mutually exclusive events

A and B (that is when A ∩ B = ∅).

General addition rule

Clearly, these are very basic properties but they are sufficient toallow many complex rules to be derived, such as:

Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).



1.3 Random variables

Definitions

Whenever outcome of a random experiment is a number, thenthe experiment can be described using a random variable,denoted X ,Y ,Z for example.

Discrete random variables have finite, or countably infinite,range spaces.

Continuous random variables have uncountably infinite rangespaces.

Probability functions

Probability mass function, p(x), if discrete.

Probability density function, f (x), if continuous.



1.4 Expectation and variance

Expectation

E [X ] =

∑

x xp(x) for discrete X∫x xf (x)dx for continuous X .

Expectation of the square

E[X 2]=

∑

x x2p(x) for discrete X∫

x x2f (x)dx for continuous X .

Variance

Var (X ) = E [(X − µ)2] = E [X 2] − E [X ]2.


2. Conditional Probability

2.1 Definitions

Joint probability functions

Probability mass function, p(x , y), if discrete.

Probability density function, f (x , y), if continuous.

Marginal probability functions

Probability mass functions, if discrete,

pX (x) =∑y

p(x , y) or pY (y) =∑x

p(x , y)

Probability density functions, if continuous,

fX (x) =

∫f (x , y)dy or fY (y) =

∫f (x , y)dx



2.1 Definitions

Conditional probability functions

Probability mass functions, if discrete,

pX |Y (x |y) =p(x , y)

pY (y)or pY |X (y |x) =

p(x , y)

pX (x).

Probability density functions, if continuous,

fX |Y (x |y) =f (x , y)

fY (y)or fY |X (y |x) =

f (x , y)

fX (x).



2.2 Independent random variables

Definition

Two random variables X and Y are independent if and only if

p(x , y) = pX (x)pY (y) for all discrete x , y

f (x , y) = fX (x)fY (y) for all continuous x , y .



2.3 Expectation and correlation

Expectation

E [h(X ,Y )] =

∑

x

∑y h(x , y)p(x , y) for discrete X ,Y∫

x

∫y h(x , y)f (x , y)dydx for continuous X ,Y .

Correlation

Corr (X ,Y )) =Cov(X ,Y )√Var(X )Var(Y )

where Cov(X ,Y ) = E [(X − µX )(Y − µY )] is the covariance of Xand Y , and µX = E [X ] and µY = E [Y ].



2.3 Expectation and correlation

Interpretation

If X and Y are independent, then

covariance is zero and hence the correlation is zero, and

variables are said to be uncorrelated.

Warning

Note, however, that in general uncorrelated does not mean thevariables are independent.



2.4 Total probability and Bayes Theorem

Definition

Let the events B1,B2, . . . ,Bk partition the sample space.

For B1,B2, . . . ,Bk to be a partition of the sample space Ω,they must be

mutually exclusive, that is Bi ∩ Bj = ∅ (for i 6= j) andexhaustive, that is B1 ∪ B2 ∪ · · · ∪ Bk = Ω.

Total probability

Pr(A) =k∑

j=1

Pr(A|Bj)Pr(Bj).


3. Standard distributions

3.1 Example distributions

Binomial distribution, B(n, π)

The binomial distribution can be defined as the number ofsuccesses in n independent Bernoulli trials with two possibleoutcomes (success and failure) with probabilities π and 1 − π.

p(x) =

(n

x

)πx(1 − π)n−x x = 0, 1, ..., n (0 < π < 1).

E [X ] = n π Var(X ) = n π(1 − π)




Poisson distribution, Po(λ)

The Poisson distribution is often used as a model for the numberof occurrences of rare events in time or space.

p(x) =e−λλx

x !x = 0, 1, ... (λ > 0).

E [X ] = λ Var(X ) = λ




Exponential distribution, exp(λ)

The exponential distribution is often used to describe the timebetween events which occur at random, or to model “lifetimes”. Itpossesses the so-called “memoryless” property.

f (x) = λe−λx x ≥ 0 (λ > 0).

E [X ] =1

λVar(X ) =

1

λ2




Normal distribution, N(µ, σ2)

The normal (or Gaussian) distribution is the most widely used. It isconvenient to use, often fits data well and can be theoreticallyjustified (via the central limit theorem).

f (x) =1√

2πσ2exp

−

1

2

(x − µ)2

σ2

, −∞ < x <∞.

E [X ] = µ Var(X ) = σ2



3.2 MGFs

Definition

The moment generating function (mgf) of a random variable X isdefined as

MX (t) = E [etX ] =

∑

x etxpX (x) if discrete∫

x etx fX (x)dx if continuous.



3.2 MGFs

Properties

The mgf is unique to a probability distribution.

By considering the (Taylor) power series expansion

MX (t) =∞∑r=0

tr

r !E [X r ],

we see that E [X r ] is the coefficient of tr/r !

Moments can easily be found by differentiation

E [X r ] =d r

dtrMX (t)

∣∣∣∣∣t=0

i.e. E [X r ] is the rth derivative of MX (t) with t = 0.



3.2 MGFs

Properties (cont.)

If X has mgf MX (t) and Y = aX + b, where a and b areconstants, then the mgf of Y is

MY (t) = ebtMX (at)

If X and Y are independent random variables with mgfsMX (t) and MY (t) respectively, then Z = X + Y has mgf

MZ (t) = MX (t)MY (t).



3.3 Sampling

The sampling distribution

Suppose we have a random sample x1, . . . , xn.

Summarise using sample mean and variance, x and s2.

Repeated sampling leads to different values – this is due tosampling variation.

The distribution of the summary statistic is the samplingdistribution.



Summary

We have learnt about:

Definitions of events and the axioms.

Basic probability rules.

Random variables.

Joint and conditional probability.

Expectation, variance and correlation.

Total probability and Bayes Theorem.

A few simple distributions.

Generating functions.

Sampling distributions.

End of Lecture 1.

introducing probability and statistics - school of mathematicssta6ajb/mscinductionlecture1.pdf ·...

Documents