chapter 3: common distributions - university of kent · web viewin this chapter we examine four of...
TRANSCRIPT
Chapter 5: Common Distributions
In this chapter we examine four of the distributions that will be frequently encountered later in the course.
5.1 The Normal Distribution
The normal distribution is the most widely used distribution in statistics. Continuous data such as mass, length, etc, can often be modelled using a normal distribution.
The normal distribution has two parameters- the mean ( ) and variance ( ). If a random variable X has a normal distribution then we can write this as:
A normal distribution with = 0 and = 1 is referred to as a standard normal distribution (and a random variable with this distribution is usually denoted Z).
Important result: If X is a random variable distributed as , then
The process of subtracting the mean and dividing by the standard deviation is referred to as standardisation:
General Normal Standard NormalZ ~ N[0, 1]
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
pdf of N[0,1]
dnor
m(x
)
-5 0 5
0.00
0.02
0.04
0.06
0.08
0.10
0.12
pdf of N[0,9]
dnor
m(x
, sd
= 3)
Example:The fully grown lengths (in mm) of a certain insect can be regarded as having the following normal distribution:
X ~ N[64, 16].What is the probability that an insect has length less than 59 mm?
Applying the standardisation formula,
Thus,
3.1.1 Percentage points
Definition: Consider a random variable X with some distribution. The (upper) 100 % point is the value of x such that:
P(X > x) = .
For the standard normal distribution, we will denote the (upper) 100% point by , i.e.:P(Z > ) = .
Z ~ N[0, 1]
In statistical tables (e.g. Lindley and Scott), there is a separate percentage point table covering the most used values of . In Lindley and Scott, P represents 100 , x(P) represents the value of .
Extract:
P = 100 x(P) = 10% 0.01 1.28165% 0.05 1.64492% 0.02 2.05371% 0.01 2.32630.1% 0.001 3.0902
Example 1:Let X ~ N[50, 16]. Find the value of x such that P(X > x) = 0.05, i.e. find the (upper) 5% point.
If X ~ N[50, 16], then
The 5% point for the standard normal is
Thus, the 5% point for a N[50, 16] distribution can be obtained by solving
So, the 5% point is
Example 2:Let Z ~ N[0, 1]. Find the value of z such that P(Z < z) = 0.01 (i.e. find the lower 1% point).
The upper 1% point for a standard normal is . Therefore, P(Z > 2.3263) = 0.01.
By symmetry, we must also have P(Z < -2.3263) = 0.01. So, the lower 1% point is –2.3263.
5.2 The chi-squared distribution
3.2.1 Introduction
The chi-squared ( ) distribution has a single parameter called the degrees of freedom- this can be any positive integer. The distribution with n degrees of freedom is denoted .
Probability density function:If X ~ , then the p.d.f. of X (for x > 0) is given by:
For
This density is written in terms of the gamma function. Some of the key properties of this function are:
;
if x is a natural number.
The 10% point for the standard normal is
The degrees of freedom, n, define the shape of the density. For n < 3, the density has a mode at zero. For n 3, the mode moves further away from zero as n increases. The shapes of some specific densities are given below.
5.2.2 Finding probabilities
Probabilities associated with the distribution can be looked up in probability tables. Lindley and Scott list the d.o.f. (which they denote ) along the top of each column. Then for each value x listed, the values in the table are the probability that X < x.
Extracts:
= 3.0 = 7.0x P(X < x) x P(X < x)0.0 0.0000 1.0 0.00520.5 0.0811 2.0 0.04021.0 0.1987 3.0 0.11501.5 0.3177 4.0 0.22022.0 0.4276 5.0 0.34002.5 0.5247 6.0 0.46033.0 0.6084 7.0 0.57113.5 0.6792 8.0 0.66744.0 0.7385 9.0 0.7473etc 10.0 0.8114
Example 1:If , then P(X < 2.5) = 0.5247.
Example 2:Suppose . Find P(X > 10).
0 2 4 6 8 10 120
0.1
0.2
0.3
0.4
0.5
0.6Graph of several chi-squared densities
n = 1n = 2n = 4n = 8
Now, from tables we can find, P(X < 10) = 0.8114 P(X > 10) = 1 – 0.8114 = 0.1886.
3.2.2 Percentage points
The 100 % point for the distribution is denoted . Therefore, if X ~ , then
P(X > ) = .
The percentage points of the distribution are in a separate table in Lindley and Scott.
Extract:
P 99 95 10 5 1
= 1.0 0.000 0.004 2.706 3.841 6.635 = 2.0 0.020 0.103 4.606 5.991 9.210 = 3.0 0.115 0.352 6.251 7.815 11.34 = 4.0 0.297 0.711 7.779 9.488 13.28 = 5.0 0.554 1.145 9.236 11.07 15.09 = 6.0 0.872 1.635 10.64 12.59 16.81 = 7.0 1.239 2.167 12.02 14.07 18.48 = 8.0 1.646 2.733 13.36 15.51 20.09
In this table, the degrees of freedom for the distribution are listed going down the rows and P is 100.
The chi-squared distribution is not symmetric (unlike the normal distribution). So if we want a lower percentage point (i.e. a value of x such that P(X < x) = ) , then we can't simply negate the corresponding upper percentage point. Instead we need to find .
Example 1:Let X ~ . Find the lower 1% point (i.e. the value of x such that P(X < x) = 0.01).
The lower 1% point is denoted , the value for which is 1.646.
Example 2:Suppose X ~ . Find the value of t for which P(X > t) = 0.1321.
Here, t would be the 13.21% point for the distribution. But, 0.1321 is a non-standard value of . So we need to use the distribution function table to find t.
P(X > t) = 0.1321 P(X < t) = 1 – 0.1321 = 0.8679.Going through the distribution table we find that t = 15.
5.3 The Student t-distribution
5.3.1 Introduction
Definition: Suppose that we have two independent random variables Y and Z, such that:Y ~ N[0, 1] and Z ~ .
Then the random variable X defined by
So P(X > 9.236) = 0.1
has a t-distribution with n degrees of freedom- denoted .
The t-distribution is symmetric about zero and its general shape is like the bell-shape of a normal distribution. However, the tails of the t-distribution can approach zero much more slowly than those of the normal distribution- i.e. the t-distribution is more heavy tailed than the normal. The degrees of freedom define how heavy-tailed the t-distribution is.Note:The t-distribution with n = 1 is sometimes referred to as the Cauchy distribution. This is so heavy tailed that its mean and variance do not exist! (This is because the integrals specifying the mean and variance are not absolutely convergent.)
Important note:The density of a t-distribution converges to that of the standard normal as n .
The diagram below shows how the t-distribution varies for different degrees of freedom.
5.3.2 Probabilities
-3 -2 -1 0 1 2 30
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
x
Den
sity
Comparing several t distributions with the standard normal
normalt2t5t20
Probabilities associated with the t-distribution can be looked up in tables. In Lindley and Scott, the degrees of freedom are again denoted by and are listed along the top of the columns. Then for each value t listed, the values in the table are the probability that X < t.
Example 1:Let . Then P(X < 2.5) = 0.9561.
Example 2:Let . Find P(X > 2.5).
P(X > 2.5) = 1 - P(X < 2.5) = 1 - 0.986 = 0.014.
Percentage points
The 100 % point for the distribution is denoted by . If , then:P(X > ) = .
Percentage points for the t-distribution are tabulated separately. The degrees of freedom for the distribution are listed down the rows and P = 100.
Example 1:Find the 5% point for .
Directly from tables, this is seen to be (Thus P(X > 1.943) = 0.05.)
As the t-distribution is symmetric, finding lower percentage points is simple.
Example 2:Let Find the value of t such that P(X < t) = 0.01 (i.e. find the lower 1% point).
The upper 1% point is But
P(X > 2.764) = 0.01 P(X < -2.764) = 0.01.So, the lower 1% point, t, is -2.764.
Note: To find non-standard percentage points (such as the 12.5% point, for example), we need to use the t-distribution function table.
5.3The (Fisher’s) F-distribution
5.4.1 Introduction
Definition: Consider two independent random variables Y and Z such that
The random variable X defined by
is then said to have an F-distribution with n and m degrees of freedom- denoted .
The F-distribution therefore has two parameters, both of which are degrees of freedom. The order of the degrees of freedom is important! The distribution is not the same as the
distribution.
Note: The density for the F-distribution is only defined for positive values of x. The values of the two degrees of freedom define the shape of the distribution. Plots of the F-distribution for various values of n and m are shown below.
0 1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
Den
sity
Graphs of several F distributions
n=2, m=2n=4, m=4n=8, m=8n=20, m=20
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
Den
sity
Graphs of several more F distributions
n = 2, m = 4n = 4, m = 2n = 5, m = 10n = 10, m = 20
Lindley and Scott do not have tables for looking up probabilities associated with the F-distribution.
3.3.2 Percentage points
Separate tables giving 10, 5, 2.5, 1, 0.5 and 0.1 percentage points for F-distributions with different combinations of degrees of freedom can be found in Lindley and Scott.
We will denote the (upper) 100 % point for the distribution by . If X ~ , then:
P(X > ) = .In the table of the 100 percentage points for the F-distribution, the first degrees of freedom is denoted and listed along the columns. The second degrees of freedom is denoted by and listed down the rows.
Extract: 1% points of the F-distribution
1 2 3 4 51 4052 4999 5403 5625 57642 98.50 99.00 99.17 99.25 99.303 34.12 30.82 29.46 28.71 28.244 21.20 18.00 16.69 15.98 15.525 16.26 13.27 12.06 11.39 10.97
Example:Find the 5% point for both the and the distributions.
From the 5% points table:
Notice that these are not the same.
The tables in Lindley and Scott give the upper percentage points only- i.e. they give the values of x such that P(X > x) = , for small values of . Since the F-distribution is not symmetric, to find lower upper percentage points we cannot simply use the negative of the corresponding upper percentage point:
The density is in fact not even defined for x < 0.
5.3.3 Finding lower percentage points
The (upper) 1% point for an distribution is 28.24. We write
Result: Suppose that . Then
Proof:
if
But by definition of the F-distribution, this means that
as required.
We can use this result to find lower percentage points for F-distributions:
Important result:The lower 100 percentage point for the distribution is the reciprocal of the upper 100 percentage point of the distribution.
Proof:If X ~ and x represents the lower 100 percentage point for this distribution, then P(X < x) = .But
As then is (by definition) the upper 100 percentage point of the
distribution.
So, .
Example 1:Let X ~ Suppose we wish to find x such that P(X < x) = 0.05- i.e. we want to find the lower 5% point of the distribution.
The lower 5% point of the distribution is the reciprocal of the upper 5% point of distribution.
So,
Example 2: Suppose . Find the upper and lower 10% points.
The upper 10% point can be found directly from tables:
The lower 10% point is the reciprocal of the upper 10% point of the distribution:
Lower 10% point =
Exercise:Suppose . Find the upper and lower 1% points.
5.5 Some additional facts about distributions
1) If are independent with , i = 1, …, n, then
;
2) If are i.i.d. as N[0, 1], then(a) for i = 1, 2, …, n;
(b)
3) If are independent with , i = 1, …, n, then
,
where
4) If then .
These results are not proved in this course.
Chapter 6: Sampling Distributions
6.1 Parameters
The purpose of many statistical investigations is to learn about the distribution of some random variable X. Many aspects about X's distribution may be of interest, but attention often focuses on one or two particular population characteristics.
Example 1:A bakery needs to decide how many loaves of fresh bread it should put out on its shelves each day. If they put out too many, then they will lose money as stale bread will not sell, and if they put out too few, then they will lose potential sales. Therefore, to help the bakery make its order, interest might focus on the mean number of loaves, , usually sold on a particular day.
Example 2:Suppose that a company has the job of packing a certain breakfast cereal into boxes, so that each box approximately contains 500g of cereal. The weight of cereal in each box varies around 500g due to the variability of the cereal product. The company wants to check that the amount going into each box doesn't vary too much about 500g- weights greater than 500g will lose the company money and weights less than 500g could lead to customer dissatisfaction. In this case, attention may focus on the variability of weights in the boxes as described by , the standard deviation of weights.
Example 3:When testing a new drug, a doctor might not be interested so much in the number of people cured by the drug, but rather the proportion, , of people who are cured by the drug.
We call , , or population parameters. To learn about such parameters, we can observe a random sample of n observations, , and then use these data to calculate estimates for the parameter(s) of interest.For example, a sample mean could be used to estimate .
Definition: Any quantity computed from values in a sample is called a (sample) statistic.
Example:All the numerical summaries introduced in Chapter 2 are statistics as they are all calculated from values in the random sample. This includes statistics such as the sample mean (which utilises all the observations in its calculation) and the sample median (which only takes account of the middle observations).
It is important to realise that there is a difference between population parameters and sample statistics. The population parameter is a characteristic of the distribution of the random variable, is typically unknown and cannot be observed. By contrast, a statistic is a characteristic of the sample and can be observed. For example, the population mean has some fixed (but unknown) value. On the other hand, the sample mean, , can be observed and therefore can be known for a particular sample. The observed value of , however, can vary from sample to sample (as different samples will give different values of ). The value of a statistic, therefore, is subject to sampling variability.
Definition: As a statistic is a function of the random variables , it is itself a random variable. The distribution of a statistic is called its sampling distribution.
The sampling distribution of a statistic describes the long-run behaviour of the statistic's values when many different samples, each of size n, are obtained and the value of the statistic is computed for each sample.
6.2 The sampling distribution of the sample mean
To investigate the sampling distribution for , we will consider several experiments.
Experiment 1: We generate 500 random samples (each of size n) from N[100, 400]. For each of these 500 samples we calculate , so we have a random sample of 500 observations from the sampling distribution of . This was repeated for n = 5, 20, 50.
Observations: In each case the distribution seems roughly normal and it is clear that each of these histograms is centred roughly at 100 (the mean of the normal distribution from which the samples were generated). We can also see that as the sample size n increases, the variability in the sampling distributions decreases (look carefully at the scales on the horizontal axes).
These points can also be seen if we look at some statistics relating to each histogram above:
Sample size Mean Standard deviationn = 5 100.07 8.17n = 20 99.83 4.40n = 50 100.05 2.81
We will do a similar set of experiments to see what the sampling distribution for is like when we are not sampling from the normal distribution.
Experiment 2: We generate 500 random samples (each of size n) from a uniform U[0,1] distribution. Again, for each of these 500 samples we calculate , so we have a random sample of 500 observations from the sampling distribution of . This was repeated for n = 5, 10, 20, 50.
Note: If X ~ U[0, 1], then E[X] = 0.5 and Var[X] = 1/12 (so s.d. = 0.289).
Observations: The shapes of the histograms relating to the sample means look increasingly more like normal distributions as n increases- this is despite the data being sampled from a uniform distribution. The histograms in each case seem to centre on 0.5 (the mean of the U[0, 1] distribution). Also, the variability of the sampling distributions is decreasing as the sample size becomes larger.
The mean and standard deviation for the data in the four situations above are given below:
Sample size Mean Standard deviationn = 5 0.491 0.133n = 10 0.504 0.095n = 20 0.502 0.068n = 50 0.499 0.042
Important Result:For an independent random from a distribution with mean and variance , the sampling distribution for has the following properties:1. .
2. . The standard deviation of (often called the standard error) is therefore
.
3. If each , then regardless of the size of n.
4. If are not normally distributed then when n is large (say at least 30) the
distribution of is approximately .
Proof
(as required).
Because we are assuming that the random variables are independent, we can also write:
(as required).
A linear combination of normally distributed random variables also has a normal distribution. The mean and variance are as given above.
Not proved here.
Note:Part (4) of the above result is the Central Limit Theorem, an extremely powerful and useful result in Statistics.
Example 1:are independently and identically distributed N[30, 5]. Find the sampling
distribution for .
Here n = 20 and so ~ N[30, 5/20] = N[30, 0.25].
Example 2: are i.i.d Po(10) random variables. What approximately is the sampling
distribution for ?
The sample size can be considered large enough for the Central Limit Theorem to be applied. The sampling distribution can therefore be considered approximately normal. A Po(10) distribution has mean and variance equal to 10, therefore
(roughly).
6.3 Sampling distribution of the sample proportion
In many statistical investigations we are interested in learning about the proportion of individuals, or objects, in a population that possess a specified property. For example, we might be interested in what proportion of patients are alive 5 years after diagnosis of a particular cancer, or we might be interested in the proportion of UK adults who would like a ban on blood-sports. Denote the true population proportion of interest by . Note that is a population parameter.
To learn about , we could observe a random sample in which each of the n observations is either a “success” or a “failure”. The sample proportion, p, is given by:
p = (number of successes) n.The sample proportion is clearly a sample statistic. It makes sense to use p to learn about . We are therefore interested in the sampling distribution for p.
To investigate the sampling distribution for p, we will look at 2 experiments in which we generate random samples of observed values of p.
Experiment 1:Suppose that we generate 500 samples of size n where each sampled value is either a success (with probability = 0.25) or a failure (with probability 1 - = 0.75). We then calculate the observed proportion of “successes” in each of the 500 samples. We will do this for n = 5, 10, 25 and 50.
Observations: For a sample of size 5, the possible values of p are 0, 0.2, 0.4, 0.6, 0.8 and 1. The sampling distribution for p gives the probability of each of these 6 values. The histogram for the case n = 5 is positively skewed.As n increases, the histograms become more and more symmetrical and in fact when n = 50 the histogram clearly resembles a normal curve centred on 0.25. In addition, increasing the sample size decreases the range of observed values for p.
Experiment 2:Once again we will generate 500 samples, but this time we will have the sample sizes n = 10, 25, 50 and 100 and we will take the true proportion of successes, to be 0.07. So once again each observation in each sample is either a success (S) with probability 0.07, or failure (F) with probability 0.93.
Observations:When n = 10, the possible values for p are 0, 0.1, 0.2, …, 1. The histogram for the 500 samples is very positively skewed and no values greater than 0.4 was observed for p. [Notice how in the previous experiment, the density for p was not very skewed when n = 10].
As n increases to 25 and 50, the histograms still look positively skewed. However, when the sample size reaches 100, the histogram is beginning to look slightly more normal. Therefore we note that in this experiment we need larger sample sizes than in Experiment 1 before the sampling distribution for p looks approximately normal.
We also note that increasing the sample size again results in a narrowing in the range of observed values for p.
Thus to summarise the observations from this experiment: Densities are roughly centred about = 0.07. Variance for p decreases as n increases. As the sample size increases, the density for p becomes approximately normal. However,
the density tends to normality much slower than when we had = 0.25. Therefore, it appears that the rate at which the sampling distribution for p tends to normality depends not only on the sample size n, but also on the value of .
Important result:
If p is the sample proportion of successes in a random sample of size n where is the true proportion of successes, then the following results hold: The expected value of p is .
The standard error (i.e. s.d.) of p is
When n is sufficiently large, the sampling distribution for p is approximately normal.
Note: The further the value of is from 0.5, the larger the value of n must be in order for the normal approximation of the sampling distribution for p to be accurate.
Rule of thumb:If both , then we may use the normal approximation for p.
Proof:Let X = total number of successes in the sample. Then X ~ Bi[n, ] and so:
E[X] = nV[X]= n(1 - ) sd[X] = .
But, by definition, the sample proportion p = , and so
E[p] =
Also, V[p] =
Taking square roots, we get the required standard error for p.
Proof of the normality approximation is simply an application of the Central Limit Theorem, so that for large n
approximately.
Example 1:Suppose that the proportion of women who believe that they are underpaid is 0.55.a) If we had a random sample of size 10, could we assume that the sampling distribution
for p is approximately normal?b) For a random sample of 400, what are the mean value and standard deviation for p?c) In a sample of size 400, what is the probability that we observe the proportion of
women who believe they are underpaid to be greater than 0.6?
a) = 0.55 and n = 10, so n = 5.5 and n(1 - ) = 4.5.As both of these are not 5, then we cannot assume that the distribution of p is normal with only a sample size of 10.
b) n = 400, so:E[p] = = 0.55
V[p] =
sd[X] = 0.0249.For n = 400, n = 220 and n(1 - ) = 180 and so p's distribution can be considered approximately normal. Therefore:
p ~ N[0.55, 0.000619].
c)
Example 2:Suppose that the true proportion of individuals with a particular disease is 0.02. What minimum sample size would be needed before p's distribution can be assumed to be approximately normal?
For approximate normality we need n 5 and n(1 - ) 5. Now n (0.02) 5 n 250.n (0.98) 5 n 5.102
Therefore, to assume approximate normality for p, we would need a sample size of at least 250.
Exercise:90% of the population are right-handed. In a sample of 200 people, what is the probability that the sample proportion who are right-handed is less than 0.86?
6.4 Sampling distribution for sample variance
When we want to learn about the variance, , of a population, it is natural to first look towards the sample variance, . We are therefore interested in the sampling distribution for .
In general, the sampling distribution for does not follow any fixed rules and so here we will only look at the case when are i.i.d. N[ ].
Important result:If are i.i.d. N[ ] where is unknown, then
Proof: The proof will not be given in this course.
Experiment: To demonstrate that this result does in fact hold in practice, 500 samples were generated from N[100, 400] for various samples sizes, n and the value of
calculated for each of the 500 samples. Histograms of these samples
then demonstrate what the sampling distribution for looks like in each case.
Observations:
In the case when n = 3, the histogram for the sample of 500 observations of is
heavily positively skewed and resembles a distribution. The histograms for the other cases, where n = 5, 10 and 20, also resemble chi-squared distributions (the respective degrees of freedom should be 4, 9 and 19).