introducing probability and statistics - school of mathematicssta6ajb/mscinductionlecture1.pdf ·...

24
Introducing Probability and Statistics Introducing Probability and Statistics Dr. Andrew Baczkowski Notes originally by Dr. Robert G Aykroyd Edited by Dr. Stuart Barber University of Leeds Department of Statistics 27th September 2018

Upload: others

Post on 17-Oct-2019

14 views

Category:

Documents


0 download

TRANSCRIPT

Introducing Probability and Statistics

Introducing Probability and Statistics

Dr. Andrew Baczkowski

Notes originally by Dr. Robert G Aykroyd

Edited by Dr. Stuart Barber

University of LeedsDepartment of Statistics

27th September 2018

Introducing Probability and Statistics

Overview

Course Structure

Today

Thursday: AM: 09:30-11:00 (LT22)PM: 13:30-16:00 (LT22)

Tomorrow

Friday: AM: 10:00-12:00 R TrainingChemical and ProcessEngineering Cluster G.06

Introducing Probability and Statistics

Overview

Course Background

Motivation & Explanation

Reminder of basics in probability and statistics.

Notes taken from UG Level 2 module Mathematical Statistics.

Emphasises theoretical basis, and not data analysis.

Fast paced — do not expect to understand all.

Stop me and ask if you have questions.

Will not cover all in notes.

Can be a resource for future studies.

Introducing Probability and Statistics

Overview

Course Contents

1. Basic probability.2. Conditional Probability.3. Standard Distributions.

4. Linear Regression.5. Classical Estimation.

6. The Normal Distribution.7. Derived Distributions.

8. Bayesian estimation.Summary.

Introducing Probability and Statistics

1. Basic Probability

1.1 Introduction

What is probability?

Probability measures the likelihood, or chance, of some eventoccurring.

Probability zero means the event is impossible.

Probability 1 means that the event is certain.

Introducing Probability and Statistics

1. Basic Probability

1.2 Events and axioms

What are the rules?

Let A and B represent events, with Ω the sample space.

The (Kolmogorov) axioms:

K1 Pr(A) ≥ 0 for any event A,K2 Pr(Ω) = 1 for any sample space Ω,K3 Pr(A ∪ B) = Pr(A) + Pr(B) for any mutually exclusive events

A and B (that is when A ∩ B = ∅).

General addition rule

Clearly, these are very basic properties but they are sufficient toallow many complex rules to be derived, such as:

Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).

Introducing Probability and Statistics

1. Basic Probability

1.3 Random variables

Definitions

Whenever outcome of a random experiment is a number, thenthe experiment can be described using a random variable,denoted X ,Y ,Z for example.

Discrete random variables have finite, or countably infinite,range spaces.

Continuous random variables have uncountably infinite rangespaces.

Probability functions

Probability mass function, p(x), if discrete.

Probability density function, f (x), if continuous.

Introducing Probability and Statistics

1. Basic Probability

1.4 Expectation and variance

Expectation

E [X ] =

x xp(x) for discrete X∫x xf (x)dx for continuous X .

Expectation of the square

E[X 2]=

x x2p(x) for discrete X∫

x x2f (x)dx for continuous X .

Variance

Var (X ) = E [(X − µ)2] = E [X 2] − E [X ]2.

Introducing Probability and Statistics

2. Conditional Probability

2.1 Definitions

Joint probability functions

Probability mass function, p(x , y), if discrete.

Probability density function, f (x , y), if continuous.

Marginal probability functions

Probability mass functions, if discrete,

pX (x) =∑y

p(x , y) or pY (y) =∑x

p(x , y)

Probability density functions, if continuous,

fX (x) =

∫f (x , y)dy or fY (y) =

∫f (x , y)dx

Introducing Probability and Statistics

2. Conditional Probability

2.1 Definitions

Conditional probability functions

Probability mass functions, if discrete,

pX |Y (x |y) =p(x , y)

pY (y)or pY |X (y |x) =

p(x , y)

pX (x).

Probability density functions, if continuous,

fX |Y (x |y) =f (x , y)

fY (y)or fY |X (y |x) =

f (x , y)

fX (x).

Introducing Probability and Statistics

2. Conditional Probability

2.2 Independent random variables

Definition

Two random variables X and Y are independent if and only if

p(x , y) = pX (x)pY (y) for all discrete x , y

f (x , y) = fX (x)fY (y) for all continuous x , y .

Introducing Probability and Statistics

2. Conditional Probability

2.3 Expectation and correlation

Expectation

E [h(X ,Y )] =

x

∑y h(x , y)p(x , y) for discrete X ,Y∫

x

∫y h(x , y)f (x , y)dydx for continuous X ,Y .

Correlation

Corr (X ,Y )) =Cov(X ,Y )√Var(X )Var(Y )

where Cov(X ,Y ) = E [(X − µX )(Y − µY )] is the covariance of Xand Y , and µX = E [X ] and µY = E [Y ].

Introducing Probability and Statistics

2. Conditional Probability

2.3 Expectation and correlation

Interpretation

If X and Y are independent, then

covariance is zero and hence the correlation is zero, and

variables are said to be uncorrelated.

Warning

Note, however, that in general uncorrelated does not mean thevariables are independent.

Introducing Probability and Statistics

2. Conditional Probability

2.4 Total probability and Bayes Theorem

Definition

Let the events B1,B2, . . . ,Bk partition the sample space.

For B1,B2, . . . ,Bk to be a partition of the sample space Ω,they must be

mutually exclusive, that is Bi ∩ Bj = ∅ (for i 6= j) andexhaustive, that is B1 ∪ B2 ∪ · · · ∪ Bk = Ω.

Total probability

Pr(A) =k∑

j=1

Pr(A|Bj)Pr(Bj).

Introducing Probability and Statistics

2. Conditional Probability

2.4 Total probability and Bayes Theorem

Bayes theorem

Simple version

Pr(B |A) =Pr(A|B)Pr(B)

Pr(A)when Pr(A) > 0.

General version

Pr(Bi |A) =Pr(A|Bi )Pr(Bi )k∑

j=1

Pr(A|Bj)Pr(Bj)

i = 1, . . . , k .

Introducing Probability and Statistics

3. Standard distributions

3.1 Example distributions

Binomial distribution, B(n, π)

The binomial distribution can be defined as the number ofsuccesses in n independent Bernoulli trials with two possibleoutcomes (success and failure) with probabilities π and 1 − π.

p(x) =

(n

x

)πx(1 − π)n−x x = 0, 1, ..., n (0 < π < 1).

E [X ] = n π Var(X ) = n π(1 − π)

Introducing Probability and Statistics

3. Standard distributions

3.1 Example distributions

Poisson distribution, Po(λ)

The Poisson distribution is often used as a model for the numberof occurrences of rare events in time or space.

p(x) =e−λλx

x !x = 0, 1, ... (λ > 0).

E [X ] = λ Var(X ) = λ

Introducing Probability and Statistics

3. Standard distributions

3.1 Example distributions

Exponential distribution, exp(λ)

The exponential distribution is often used to describe the timebetween events which occur at random, or to model “lifetimes”. Itpossesses the so-called “memoryless” property.

f (x) = λe−λx x ≥ 0 (λ > 0).

E [X ] =1

λVar(X ) =

1

λ2

Introducing Probability and Statistics

3. Standard distributions

3.1 Example distributions

Normal distribution, N(µ, σ2)

The normal (or Gaussian) distribution is the most widely used. It isconvenient to use, often fits data well and can be theoreticallyjustified (via the central limit theorem).

f (x) =1√

2πσ2exp

1

2

(x − µ)2

σ2

, −∞ < x <∞.

E [X ] = µ Var(X ) = σ2

Introducing Probability and Statistics

3. Standard distributions

3.2 MGFs

Definition

The moment generating function (mgf) of a random variable X isdefined as

MX (t) = E [etX ] =

x etxpX (x) if discrete∫

x etx fX (x)dx if continuous.

Introducing Probability and Statistics

3. Standard distributions

3.2 MGFs

Properties

The mgf is unique to a probability distribution.

By considering the (Taylor) power series expansion

MX (t) =∞∑r=0

tr

r !E [X r ],

we see that E [X r ] is the coefficient of tr/r !

Moments can easily be found by differentiation

E [X r ] =d r

dtrMX (t)

∣∣∣∣∣t=0

i.e. E [X r ] is the rth derivative of MX (t) with t = 0.

Introducing Probability and Statistics

3. Standard distributions

3.2 MGFs

Properties (cont.)

If X has mgf MX (t) and Y = aX + b, where a and b areconstants, then the mgf of Y is

MY (t) = ebtMX (at)

If X and Y are independent random variables with mgfsMX (t) and MY (t) respectively, then Z = X + Y has mgf

MZ (t) = MX (t)MY (t).

Introducing Probability and Statistics

3. Standard distributions

3.3 Sampling

The sampling distribution

Suppose we have a random sample x1, . . . , xn.

Summarise using sample mean and variance, x and s2.

Repeated sampling leads to different values – this is due tosampling variation.

The distribution of the summary statistic is the samplingdistribution.

Introducing Probability and Statistics

3. Standard distributions

Summary

We have learnt about:

Definitions of events and the axioms.

Basic probability rules.

Random variables.

Joint and conditional probability.

Expectation, variance and correlation.

Total probability and Bayes Theorem.

A few simple distributions.

Generating functions.

Sampling distributions.

End of Lecture 1.