faculty of life sciences - biostatistics.dk · faculty of life sciences the binomial distribution...
TRANSCRIPT
Faculty of Life Sciences
The binomial distribution
Claus Ekstrøm & Ib SkovgaardE-mail: [email protected]
Program
• Independent trials
• The binomial distribution
• Estimation, mean and standard deviation• Normal approximation• Inference for the binomial distribution
• Comparison of to binomial distributions
Slide 2 � Statistics for Life Science (Week 7-1) � The binomial distribution
Independent trials
• n trials
• Two possible results: success, failure
• Same probability of success, p.
• Trials are independent
Slide 3 � Statistics for Life Science (Week 7-1) � The binomial distribution
Example: germination of seeds
Assume that each seed hasgermination probability 0.6
Consider n = 3 seeds.
• What is the probability forprecisely one of the seeds togerminate?
• What is the probability for atleast one seed to germinate?
What are the probabilities forlarger n?
Slide 4 � Statistics for Life Science (Week 7-1) � The binomial distribution
Probability tree
First trial
Second
trial
Third
trial
FFF , (1−p)3F
1−p
FFS , (1−p)(1−p)pSp
F1−p
Third
trial
FSF , (1−p)p(1−p)F
1−p
FSS , (1−p)ppSp
Sp
F1−p
Second
trial
Third
trial
SFF , p(1−p)(1−p)F
1−p
SFS , p(1−p)pSp
F1−p
Third
trial
SSF , pp(1−p)F
1−p
SSS , pppSp
Sp
Sp
Slide 5 � Statistics for Life Science (Week 7-1) � The binomial distribution
The binomial distribution
Let Y denote the number of successes from n independent trials.Then the binomial distribution is given by
P(j �successes�) = P(Y = j) =
(n
j
)·pj · (1−p)n−j ,
where the binomial coe�cients are de�ned as(n
j
)=
n!
j!(n− j)!
We writeY ∼ bin(n,p)
Slide 6 � Statistics for Life Science (Week 7-1) � The binomial distribution
Binomial distributions
0 5 10 15 20
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Pro
babi
lity
p=0.10
0 5 10 15 20
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Pro
babi
lity
p=0.25
0 5 10 15 20
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Pro
babi
lity
p=0.50
0 5 10 15 20
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Pro
babi
lity
p=0.80
Slide 7 � Statistics for Life Science (Week 7-1) � The binomial distribution
Mean, variance and standard deviation
For a binomially distributed variable Y ∼ bin(n,p)
the mean isEY = n ·p,
the variance isvar(Y ) = n ·p · (1−p),
and hence the standard deviation is
SD(Y ) =√n ·p · (1−p).
Note that Y is a sum of n binomials with n = 1 (n zero-one trials),and this agrees with the rule of adding means and variances.
Slide 8 � Statistics for Life Science (Week 7-1) � The binomial distribution
Example: CG islands in DNA
In parts of the DNA consisting occurrence of the duo CG amongthe sequences of bases A, C, G and T, is more frequent than inmost parts. These parts of the DNA are called CG-islands.
Suppose that the normal frequency of the duo CG is 0.06, and thata certain sample contains 125754 duos. If 7800 CG duos are foundin the sample, does that indicate that the CG-frequency is elevatedin the sample?
What is P(Y ≥ 7800)?
Slide 9 � Statistics for Life Science (Week 7-1) � The binomial distribution
Approximation with the normal distribution
This approximation may be useful when n is large. bin(n,p) is
approximated by N(np,np(1−p)):
P(Y ≤ y)≈ Φ
((y +0.5)−np√
np(1−p)
)If Y ∼ bin(125754,0.06) this gives
P(Y ≥ 7800) ≈ 1−Φ
((7800−0.5)−125754 ·0.06√
125754 ·0.06(1−0.06)
)= 1−0.9987 = 0.0013.
Slide 10 � Statistics for Life Science (Week 7-1) � The binomial distribution
Binomial approximations
0 5 10 15 20
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Pro
babi
lity
p=0.10
0 5 10 15 20
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Pro
babi
lity
p=0.25
0 5 10 15 20
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Pro
babi
lity
p=0.50
0 5 10 15 20
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Pro
babi
lity
p=0.80
Slide 11 � Statistics for Life Science (Week 7-1) � The binomial distribution
Inference
The estimate of the parameter p is
p̂ =number of successes
number of trials.
with corresponding standard error
SE(p̂) =s√n
=
√p̂(1− p̂)
n.
Hence, a con�dence interval for the probability of �success�, p, is
p̂±1.96 ·√
p̂(1− p̂)
n.
Slide 12 � Statistics for Life Science (Week 7-1) � The binomial distribution
Inference � hypothesis testing
Alternative to con�dence interval: test the hypothesis H0 : p = p0.Results, that are more extreme than the observed gives the p-value:
p-value =
∑Extreme y 's
P(Y = y)
where the sum is over y 's shownin red in the graph, (namely thosewith a smaller H0-probability thanthe observed).
0 2 4 6 80.
00.
10.
20.
30.
4P
roba
bilit
y0 2 4 6 8
0.0
0.1
0.2
0.3
0.4
Slide 13 � Statistics for Life Science (Week 7-1) � The binomial distribution
Gender of o�spring from speckled bear
Over a decade 63 speckledbears were born inEuropean zoo's. Of these40 were males and 23 werefemales.
Problem: Could this bedue to chance if the twogenders were equally likely?
For Brown bear 6 males and 12 females were born.
Slide 14 � Statistics for Life Science (Week 7-1) � The binomial distribution
Comparison of two proportions
Lad Y1 ∼ bin(n1,p1) and Y2 ∼ bin(n2,p2). Interested in thedi�erence
p1−p2
We know thatp̂1 =
y1
n1and p̂2 =
y2
n2.
Hence,
p̂1−p2 = p̂1− p̂2 =y1
n1− y2
n2.
Da Y1 and Y2 is independent is
Var(p̂1− p̂2) = Var(p̂1) +Var(p̂2) =p̂1(1− p̂1)
n1+
p̂2(1− p̂2)
n2
Slide 15 � Statistics for Life Science (Week 7-1) � The binomial distribution
Comparison of two proportions � II
Using
SE(p̂1− p̂2) =
√p̂1(1− p̂1)
n1+
p̂2(1− p̂2)
n2.
we get the 95% con�dence interval for the di�erence between thetwo proportions, p1−p2:
p̂1− p̂2±1.96 ·
√p̂1(1− p̂1)
n1+
p̂2(1− p̂2)
n2.
Slide 16 � Statistics for Life Science (Week 7-1) � The binomial distribution
Animal o�spring III
To investigate the stability of the many male-births for Kirk'sdik-dik, a comparison was made with a previous study.
Males Females
Last 10 years 154 96Previous study 169 134
Do the two studies seem to agree?
Slide 17 � Statistics for Life Science (Week 7-1) � The binomial distribution
Lecture summary: main points
• The binomial distribution
• When should it be used?• Estimation, con�dence intervals, tests of hypotheses• Approximation by the normal distribution
• Comparison of success probabilities for two binomials.
Slide 18 � Statistics for Life Science (Week 7-1) � The binomial distribution