introducing probability and statistics - school of mathematicssta6ajb/mscinductionlecture1.pdf ·...
TRANSCRIPT
Introducing Probability and Statistics
Introducing Probability and Statistics
Dr. Andrew Baczkowski
Notes originally by Dr. Robert G Aykroyd
Edited by Dr. Stuart Barber
University of LeedsDepartment of Statistics
27th September 2018
Introducing Probability and Statistics
Overview
Course Structure
Today
Thursday: AM: 09:30-11:00 (LT22)PM: 13:30-16:00 (LT22)
Tomorrow
Friday: AM: 10:00-12:00 R TrainingChemical and ProcessEngineering Cluster G.06
Introducing Probability and Statistics
Overview
Course Background
Motivation & Explanation
Reminder of basics in probability and statistics.
Notes taken from UG Level 2 module Mathematical Statistics.
Emphasises theoretical basis, and not data analysis.
Fast paced — do not expect to understand all.
Stop me and ask if you have questions.
Will not cover all in notes.
Can be a resource for future studies.
Introducing Probability and Statistics
Overview
Course Contents
1. Basic probability.2. Conditional Probability.3. Standard Distributions.
4. Linear Regression.5. Classical Estimation.
6. The Normal Distribution.7. Derived Distributions.
8. Bayesian estimation.Summary.
Introducing Probability and Statistics
1. Basic Probability
1.1 Introduction
What is probability?
Probability measures the likelihood, or chance, of some eventoccurring.
Probability zero means the event is impossible.
Probability 1 means that the event is certain.
Introducing Probability and Statistics
1. Basic Probability
1.2 Events and axioms
What are the rules?
Let A and B represent events, with Ω the sample space.
The (Kolmogorov) axioms:
K1 Pr(A) ≥ 0 for any event A,K2 Pr(Ω) = 1 for any sample space Ω,K3 Pr(A ∪ B) = Pr(A) + Pr(B) for any mutually exclusive events
A and B (that is when A ∩ B = ∅).
General addition rule
Clearly, these are very basic properties but they are sufficient toallow many complex rules to be derived, such as:
Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).
Introducing Probability and Statistics
1. Basic Probability
1.3 Random variables
Definitions
Whenever outcome of a random experiment is a number, thenthe experiment can be described using a random variable,denoted X ,Y ,Z for example.
Discrete random variables have finite, or countably infinite,range spaces.
Continuous random variables have uncountably infinite rangespaces.
Probability functions
Probability mass function, p(x), if discrete.
Probability density function, f (x), if continuous.
Introducing Probability and Statistics
1. Basic Probability
1.4 Expectation and variance
Expectation
E [X ] =
∑
x xp(x) for discrete X∫x xf (x)dx for continuous X .
Expectation of the square
E[X 2]=
∑
x x2p(x) for discrete X∫
x x2f (x)dx for continuous X .
Variance
Var (X ) = E [(X − µ)2] = E [X 2] − E [X ]2.
Introducing Probability and Statistics
2. Conditional Probability
2.1 Definitions
Joint probability functions
Probability mass function, p(x , y), if discrete.
Probability density function, f (x , y), if continuous.
Marginal probability functions
Probability mass functions, if discrete,
pX (x) =∑y
p(x , y) or pY (y) =∑x
p(x , y)
Probability density functions, if continuous,
fX (x) =
∫f (x , y)dy or fY (y) =
∫f (x , y)dx
Introducing Probability and Statistics
2. Conditional Probability
2.1 Definitions
Conditional probability functions
Probability mass functions, if discrete,
pX |Y (x |y) =p(x , y)
pY (y)or pY |X (y |x) =
p(x , y)
pX (x).
Probability density functions, if continuous,
fX |Y (x |y) =f (x , y)
fY (y)or fY |X (y |x) =
f (x , y)
fX (x).
Introducing Probability and Statistics
2. Conditional Probability
2.2 Independent random variables
Definition
Two random variables X and Y are independent if and only if
p(x , y) = pX (x)pY (y) for all discrete x , y
f (x , y) = fX (x)fY (y) for all continuous x , y .
Introducing Probability and Statistics
2. Conditional Probability
2.3 Expectation and correlation
Expectation
E [h(X ,Y )] =
∑
x
∑y h(x , y)p(x , y) for discrete X ,Y∫
x
∫y h(x , y)f (x , y)dydx for continuous X ,Y .
Correlation
Corr (X ,Y )) =Cov(X ,Y )√Var(X )Var(Y )
where Cov(X ,Y ) = E [(X − µX )(Y − µY )] is the covariance of Xand Y , and µX = E [X ] and µY = E [Y ].
Introducing Probability and Statistics
2. Conditional Probability
2.3 Expectation and correlation
Interpretation
If X and Y are independent, then
covariance is zero and hence the correlation is zero, and
variables are said to be uncorrelated.
Warning
Note, however, that in general uncorrelated does not mean thevariables are independent.
Introducing Probability and Statistics
2. Conditional Probability
2.4 Total probability and Bayes Theorem
Definition
Let the events B1,B2, . . . ,Bk partition the sample space.
For B1,B2, . . . ,Bk to be a partition of the sample space Ω,they must be
mutually exclusive, that is Bi ∩ Bj = ∅ (for i 6= j) andexhaustive, that is B1 ∪ B2 ∪ · · · ∪ Bk = Ω.
Total probability
Pr(A) =k∑
j=1
Pr(A|Bj)Pr(Bj).
Introducing Probability and Statistics
2. Conditional Probability
2.4 Total probability and Bayes Theorem
Bayes theorem
Simple version
Pr(B |A) =Pr(A|B)Pr(B)
Pr(A)when Pr(A) > 0.
General version
Pr(Bi |A) =Pr(A|Bi )Pr(Bi )k∑
j=1
Pr(A|Bj)Pr(Bj)
i = 1, . . . , k .
Introducing Probability and Statistics
3. Standard distributions
3.1 Example distributions
Binomial distribution, B(n, π)
The binomial distribution can be defined as the number ofsuccesses in n independent Bernoulli trials with two possibleoutcomes (success and failure) with probabilities π and 1 − π.
p(x) =
(n
x
)πx(1 − π)n−x x = 0, 1, ..., n (0 < π < 1).
E [X ] = n π Var(X ) = n π(1 − π)
Introducing Probability and Statistics
3. Standard distributions
3.1 Example distributions
Poisson distribution, Po(λ)
The Poisson distribution is often used as a model for the numberof occurrences of rare events in time or space.
p(x) =e−λλx
x !x = 0, 1, ... (λ > 0).
E [X ] = λ Var(X ) = λ
Introducing Probability and Statistics
3. Standard distributions
3.1 Example distributions
Exponential distribution, exp(λ)
The exponential distribution is often used to describe the timebetween events which occur at random, or to model “lifetimes”. Itpossesses the so-called “memoryless” property.
f (x) = λe−λx x ≥ 0 (λ > 0).
E [X ] =1
λVar(X ) =
1
λ2
Introducing Probability and Statistics
3. Standard distributions
3.1 Example distributions
Normal distribution, N(µ, σ2)
The normal (or Gaussian) distribution is the most widely used. It isconvenient to use, often fits data well and can be theoreticallyjustified (via the central limit theorem).
f (x) =1√
2πσ2exp
−
1
2
(x − µ)2
σ2
, −∞ < x <∞.
E [X ] = µ Var(X ) = σ2
Introducing Probability and Statistics
3. Standard distributions
3.2 MGFs
Definition
The moment generating function (mgf) of a random variable X isdefined as
MX (t) = E [etX ] =
∑
x etxpX (x) if discrete∫
x etx fX (x)dx if continuous.
Introducing Probability and Statistics
3. Standard distributions
3.2 MGFs
Properties
The mgf is unique to a probability distribution.
By considering the (Taylor) power series expansion
MX (t) =∞∑r=0
tr
r !E [X r ],
we see that E [X r ] is the coefficient of tr/r !
Moments can easily be found by differentiation
E [X r ] =d r
dtrMX (t)
∣∣∣∣∣t=0
i.e. E [X r ] is the rth derivative of MX (t) with t = 0.
Introducing Probability and Statistics
3. Standard distributions
3.2 MGFs
Properties (cont.)
If X has mgf MX (t) and Y = aX + b, where a and b areconstants, then the mgf of Y is
MY (t) = ebtMX (at)
If X and Y are independent random variables with mgfsMX (t) and MY (t) respectively, then Z = X + Y has mgf
MZ (t) = MX (t)MY (t).
Introducing Probability and Statistics
3. Standard distributions
3.3 Sampling
The sampling distribution
Suppose we have a random sample x1, . . . , xn.
Summarise using sample mean and variance, x and s2.
Repeated sampling leads to different values – this is due tosampling variation.
The distribution of the summary statistic is the samplingdistribution.
Introducing Probability and Statistics
3. Standard distributions
Summary
We have learnt about:
Definitions of events and the axioms.
Basic probability rules.
Random variables.
Joint and conditional probability.
Expectation, variance and correlation.
Total probability and Bayes Theorem.
A few simple distributions.
Generating functions.
Sampling distributions.
End of Lecture 1.