faculty of life sciences - biostatistics.dk · faculty of life sciences the binomial distribution...

5
=

Upload: others

Post on 03-Sep-2019

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Faculty of Life Sciences - Biostatistics.dk · Faculty of Life Sciences The binomial distribution Claus Ekstrøm & Ib Skovgaard E-mail:ims@life.ku.dk Program Independent trials The

Faculty of Life Sciences

The binomial distribution

Claus Ekstrøm & Ib SkovgaardE-mail: [email protected]

Program

• Independent trials

• The binomial distribution

• Estimation, mean and standard deviation• Normal approximation• Inference for the binomial distribution

• Comparison of to binomial distributions

Slide 2 � Statistics for Life Science (Week 7-1) � The binomial distribution

Independent trials

• n trials

• Two possible results: success, failure

• Same probability of success, p.

• Trials are independent

Slide 3 � Statistics for Life Science (Week 7-1) � The binomial distribution

Example: germination of seeds

Assume that each seed hasgermination probability 0.6

Consider n = 3 seeds.

• What is the probability forprecisely one of the seeds togerminate?

• What is the probability for atleast one seed to germinate?

What are the probabilities forlarger n?

Slide 4 � Statistics for Life Science (Week 7-1) � The binomial distribution

Page 2: Faculty of Life Sciences - Biostatistics.dk · Faculty of Life Sciences The binomial distribution Claus Ekstrøm & Ib Skovgaard E-mail:ims@life.ku.dk Program Independent trials The

Probability tree

First trial

Second

trial

Third

trial

FFF , (1−p)3F

1−p

FFS , (1−p)(1−p)pSp

F1−p

Third

trial

FSF , (1−p)p(1−p)F

1−p

FSS , (1−p)ppSp

Sp

F1−p

Second

trial

Third

trial

SFF , p(1−p)(1−p)F

1−p

SFS , p(1−p)pSp

F1−p

Third

trial

SSF , pp(1−p)F

1−p

SSS , pppSp

Sp

Sp

Slide 5 � Statistics for Life Science (Week 7-1) � The binomial distribution

The binomial distribution

Let Y denote the number of successes from n independent trials.Then the binomial distribution is given by

P(j �successes�) = P(Y = j) =

(n

j

)·pj · (1−p)n−j ,

where the binomial coe�cients are de�ned as(n

j

)=

n!

j!(n− j)!

We writeY ∼ bin(n,p)

Slide 6 � Statistics for Life Science (Week 7-1) � The binomial distribution

Binomial distributions

0 5 10 15 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Pro

babi

lity

p=0.10

0 5 10 15 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Pro

babi

lity

p=0.25

0 5 10 15 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Pro

babi

lity

p=0.50

0 5 10 15 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Pro

babi

lity

p=0.80

Slide 7 � Statistics for Life Science (Week 7-1) � The binomial distribution

Mean, variance and standard deviation

For a binomially distributed variable Y ∼ bin(n,p)

the mean isEY = n ·p,

the variance isvar(Y ) = n ·p · (1−p),

and hence the standard deviation is

SD(Y ) =√n ·p · (1−p).

Note that Y is a sum of n binomials with n = 1 (n zero-one trials),and this agrees with the rule of adding means and variances.

Slide 8 � Statistics for Life Science (Week 7-1) � The binomial distribution

Page 3: Faculty of Life Sciences - Biostatistics.dk · Faculty of Life Sciences The binomial distribution Claus Ekstrøm & Ib Skovgaard E-mail:ims@life.ku.dk Program Independent trials The

Example: CG islands in DNA

In parts of the DNA consisting occurrence of the duo CG amongthe sequences of bases A, C, G and T, is more frequent than inmost parts. These parts of the DNA are called CG-islands.

Suppose that the normal frequency of the duo CG is 0.06, and thata certain sample contains 125754 duos. If 7800 CG duos are foundin the sample, does that indicate that the CG-frequency is elevatedin the sample?

What is P(Y ≥ 7800)?

Slide 9 � Statistics for Life Science (Week 7-1) � The binomial distribution

Approximation with the normal distribution

This approximation may be useful when n is large. bin(n,p) is

approximated by N(np,np(1−p)):

P(Y ≤ y)≈ Φ

((y +0.5)−np√

np(1−p)

)If Y ∼ bin(125754,0.06) this gives

P(Y ≥ 7800) ≈ 1−Φ

((7800−0.5)−125754 ·0.06√

125754 ·0.06(1−0.06)

)= 1−0.9987 = 0.0013.

Slide 10 � Statistics for Life Science (Week 7-1) � The binomial distribution

Binomial approximations

0 5 10 15 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Pro

babi

lity

p=0.10

0 5 10 15 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Pro

babi

lity

p=0.25

0 5 10 15 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Pro

babi

lity

p=0.50

0 5 10 15 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Pro

babi

lity

p=0.80

Slide 11 � Statistics for Life Science (Week 7-1) � The binomial distribution

Inference

The estimate of the parameter p is

p̂ =number of successes

number of trials.

with corresponding standard error

SE(p̂) =s√n

=

√p̂(1− p̂)

n.

Hence, a con�dence interval for the probability of �success�, p, is

p̂±1.96 ·√

p̂(1− p̂)

n.

Slide 12 � Statistics for Life Science (Week 7-1) � The binomial distribution

Page 4: Faculty of Life Sciences - Biostatistics.dk · Faculty of Life Sciences The binomial distribution Claus Ekstrøm & Ib Skovgaard E-mail:ims@life.ku.dk Program Independent trials The

Inference � hypothesis testing

Alternative to con�dence interval: test the hypothesis H0 : p = p0.Results, that are more extreme than the observed gives the p-value:

p-value =

∑Extreme y 's

P(Y = y)

where the sum is over y 's shownin red in the graph, (namely thosewith a smaller H0-probability thanthe observed).

0 2 4 6 80.

00.

10.

20.

30.

4P

roba

bilit

y0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

Slide 13 � Statistics for Life Science (Week 7-1) � The binomial distribution

Gender of o�spring from speckled bear

Over a decade 63 speckledbears were born inEuropean zoo's. Of these40 were males and 23 werefemales.

Problem: Could this bedue to chance if the twogenders were equally likely?

For Brown bear 6 males and 12 females were born.

Slide 14 � Statistics for Life Science (Week 7-1) � The binomial distribution

Comparison of two proportions

Lad Y1 ∼ bin(n1,p1) and Y2 ∼ bin(n2,p2). Interested in thedi�erence

p1−p2

We know thatp̂1 =

y1

n1and p̂2 =

y2

n2.

Hence,

p̂1−p2 = p̂1− p̂2 =y1

n1− y2

n2.

Da Y1 and Y2 is independent is

Var(p̂1− p̂2) = Var(p̂1) +Var(p̂2) =p̂1(1− p̂1)

n1+

p̂2(1− p̂2)

n2

Slide 15 � Statistics for Life Science (Week 7-1) � The binomial distribution

Comparison of two proportions � II

Using

SE(p̂1− p̂2) =

√p̂1(1− p̂1)

n1+

p̂2(1− p̂2)

n2.

we get the 95% con�dence interval for the di�erence between thetwo proportions, p1−p2:

p̂1− p̂2±1.96 ·

√p̂1(1− p̂1)

n1+

p̂2(1− p̂2)

n2.

Slide 16 � Statistics for Life Science (Week 7-1) � The binomial distribution

Page 5: Faculty of Life Sciences - Biostatistics.dk · Faculty of Life Sciences The binomial distribution Claus Ekstrøm & Ib Skovgaard E-mail:ims@life.ku.dk Program Independent trials The

Animal o�spring III

To investigate the stability of the many male-births for Kirk'sdik-dik, a comparison was made with a previous study.

Males Females

Last 10 years 154 96Previous study 169 134

Do the two studies seem to agree?

Slide 17 � Statistics for Life Science (Week 7-1) � The binomial distribution

Lecture summary: main points

• The binomial distribution

• When should it be used?• Estimation, con�dence intervals, tests of hypotheses• Approximation by the normal distribution

• Comparison of success probabilities for two binomials.

Slide 18 � Statistics for Life Science (Week 7-1) � The binomial distribution