what is the probability that of 10 newborn babies at least 7 are boys? p(girl) = p(boy) = 0.5...

What is the probability that of 10 newborn babies at least 7 are boys?

( ) k n knp k p q

172.05.05.010

105.05.0

10)6( 010192837

0 2 4 6 8 10

p(girl) = p(boy) = 0.5

Lecture 10Important statistical distributions

Bernoulli distribution

( ) k n knp k p q

( ) ( )k

nF k p x k p q

The Bernoulli or binomial distribution comes from the Taylor expansion of the binomial

nin qpi

1 )1()(

0 1 2 3 4 5 6 7 8 9 10p

1010( ) 0.2 0.8k kp k

Bernoulli or binomial distribution

Assume the probability to find a certain disease in a tree population is 0.01. A bio-monitoring program surveys 10 stands of trees and takes in each case a random sample of

100 trees. How large is the probability that in these stands 1, 2, 3, and more than 3 cases of this disease will occur?

146.39.9

9.999.0*01.0*1000

1001.0*10002

0074.099.0*01.03

1000)3(

0022.099.0*01.02

1000)2(

0004.099.0*01.01

1000)1(

pMean, variance, standard deviation

99.099.001.03

100099.001.0

99.001.00

1000199.001.01)3(1)3(

997399829991

100003

inikpkp

What happens if the number of trials n becomes larger and larger and p the event probability becomes smaller and smaller.

( )! 1 ( )!( )

!( 1)! ( ) ( ) ! ( 1)!( )1

rk r k

r k r r kp X k

k r r r k r r

( )!lim 1

( 1)!( )

p X k ek

Poisson distribution

( ) k n knp k p q

rpnp 1

The distribution or rare events

Assume the probability to find a certain disease in a tree population is 0.01. A bio-monitoring program surveys 10 stands of trees and takes in each case a random sample of

100 trees. How large is the probability that in these stands 1, 2, 3, and more than 3 cases of this disease will occur?

1001.0*1000

0076.0!3

0023.0!2

00045.0!110

0074.0)3(

0022.0)2(

0004.0)1(

pPoisson solution Bernoulli solution

The probability that no infected tree will be detected

000045.0!0

10)0( 1010

eepep )0(

The probability of more than three infected trees

981.0019.01)3(

019.00076.00023.000045.0)3()2()1()0(

pppp99.0)3( kp

Bernoulli solution

0 1 2 3 4 5 6 7 8 9 10 11 12 13

= 2 = 3

= 4 = 6

Variance, mean

Skewness

What is the probability in Duży Lotek to have three times cumulation if the first time 14 000 000 people bet, the second time 20 000 000,

and the third time 30 000 000?

The probability to win is

140000001

!49!43!6

142857.214000000

130000000

428571.114000000

120000000

114000000

117.0!0

142857.2

239.0!0

428571.1

368.0!01

142857.20

428571.10

The events are independent:

01.0117.0*239.0*368.03,2,1 p

The zero term of the Poisson distribution gives the probability of no eventThe probability of at least one event:

ekp 1)1(

T→CTCA→GAG→C→GTG→C→AAACG

TTCA→GAGTGCCCT

Single substitution

Parallel substitution

Back substitution

Multiple substitution

Probabilities of DNA substitutionWe assume equal substitution probabilities. If the total probability for a substitution is p:

The probability that A mutates to T, C, or G isP¬A=p+p+pThe probability of no mutation ispA=1-3p

Independent events)()()( BpApBAp

Independent events

)()()( BpApBAp The probability that A mutates to T and C to G isPAC=(p)x(p)

p(A→T)+p(A→C)+p(A→G)+p(A→A) =1

The construction of evolutionary trees from DNA sequence data

The probability matrix

T→CTCA→GAG→C→GTG→C→AAACG

TTCA→GAGTGCCCT

Single substitution

Parallel substitution

Back substitution

Multiple substitution

A T C GA

What is the probability that after 5 generations A did not change?

55 )31( pp

The Jukes - Cantor model (JC69) now assumes that all substitution probabilities are equal.

Arrhenius model

The Jukes Cantor model assumes equal substitution probabilities within these 4 nucleotides.

Substitution probability after time t

Transition matrix

tPtP )0()(

tePtPtPdttdP )0()()()(

Substitution matrix

tA,T,G,C A

The probability that nothing changes is the zero term of the Poisson distribution

pteeGTCAP 4),,(

The probability of at least one substitution ispteeGTCAP 41)(

The probability to reach a nucleotide from any other is

),,,( 4 pteACGTAP

The probability that a nucleotide doesn’t change after time t is

ptpt eeAGCTAAP 44

1(31)|,,,(

Probability for a single difference

This is the mean time to get x different sites from a sequence of n nucleotides. It is also a measure of distance that dependents only on the number of

substitutions

ptpt eeGCTAAP 44

))1(41(3),,,(

What is the probability of n differences after time t?

xptxnx ee

)1(),( 44

3lnln)1ln()(lnln),(ln 44 ptpt exnex

npxnpx

We use the principle of maximum likelihood and the Bernoulli distribution

0 1 2 3 4 5 6 7 8 9 10p

1010( ) 0.2 0.8k kp k

GorillaPan paniscusPan troglodytesHomo sapiens

Homo neandertalensis

Divergence - number of substitutions

Phylogenetic trees are the basis of any systematic

classificaton

A pile model to generate the binomial.If the number of steps is very, very large the binomial becomes smooth.

The normal distribution is the continous equivalent to the discrete

Bernoulli distribution

Abraham de Moivre (1667-1754)

)( xCexf

If we have a series of random variates Xn, a new random variate Yn that is the sum of all Xn will for n→∞ be a variate that is asymptotically normally distributed.

00.010.020.030.040.05

-2 -1.2 -0.4 0.4 1.2 2X

00.010.020.030.040.05

-2 -1.2 -0.4 0.4 1.2 2X

0.10.15

0.20.25

-2 -1.2 -0.4 0.4 1.2 2X

The central limit theorem

00.020.040.060.08

0.10.120.140.160.18

0 3 6 9 12 15 18X

0 6 12 18 24 30 36 42 48X

0 2 4 6 8 10X

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X

F x e dv

The normal or Gaussian distribution

Mean: mVariance: s2

Important features of the normal distribution• The function is defined for every real x.• The frequency at x = m is given by

1 0.4( )

• The distribution is symmetrical around m. • The points of inflection are given by the second

derivative. Setting this to zero gives

( )x x

00.020.040.060.08

0.10.120.140.160.18

0 3 6 9 12 15 18X

0 6 12 18 24 30 36 42 48X

0 2 4 6 8 10X

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X

+s-s 0.68

+2s-2s 0.95

975.02

Many statistical tests compare observed values with those of the standard normal distribution and assign

the respective probabilities to H1.

F x e dv

The Z-transform

The variate Z has a mean of 0 and and variance of 1.

A Z-transform normalizes every statistical distribution.Tables of statistical distributions are always given as Z-

transforms.

The standard normal

The 95% confidence limit

P(m - s < X < m + s) = 68%P(m - 1.65s < X < m + 1.65s) =

90%P(m - 1.96s < X < m + 1.96s) =

95%P(m - 2.58s < X < m + 2.58s) =

99% P(m - 3.29s < X < m + 3.29s) =

The Fisherian significance levels

00.020.040.060.08

0.10.120.140.160.18

0 3 6 9 12 15 18X

0 6 12 18 24 30 36 42 48X

0 2 4 6 8 10X

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X

+s-s 0.68

+2s-2s 0.95

The Z-transformed (standardized) normal distribution

Why is the normal distribution so important?

The normal distribution is often at least approximately found in nature. Many additive or multiplicative processes generate distributions of patterns that are normal. Examples are body sizes,

intelligence, abundances, phylogenetic branching patterns, metabolism rates of individuals, plant and animal organ sizes, or egg numbers. Indeed following the Belgian biologist Adolphe Quetelet (1796-1874)

the normal distribution was long hold even as a natural law. However, new studies showed that most often the normal distribution is only a approximation and that real distributions frequently follow more

complicated unsymmetrical distributions, for instance skewed normals.

The normal distribution follows from the binomial. Hence if we take samples out of a large population of discrete events we expect the distribution of events (their frequency) to be normally

distributed.

The central limit theorem holds that means of additive variables should be normally distributed. This is a generalization of the second argument. In other words the normal is the expectation when

dealing with a large number of influencing variables.

Gauß derived the normal distribution from the distribution of errors within his treatment of measurement errors. If we measure the same thing many times our measurements will not always give

the same value. Because many factors might influence our measurement errors the central limit theorem points again to a normal distribution of errors around the mean.

In the next lecture we will see that the normal distribution can be approximated by a number of

other important distribution that form the basis of important statistical tests.

x,sx,sx,s

The estimation of the population mean from a series of samples

The n samples from an additive random variate.

Z is asymptotically normally distributed.

Confidence limit of the estimate of a mean from a series of

samples.

a is the desired probability level.

00.020.040.060.08

0.10.120.140.160.18

0 3 6 9 12 15 18X

0 6 12 18 24 30 36 42 48X

0 2 4 6 8 10X

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X

+- 0.68

+2-2 0.95

Standard error

How to apply the normal distribution

Intelligence is approximately normally distributed with a mean of 100 (by definition) and a standard deviation of 16 (in North America). For an intelligence study we need 100 persons with an IO above 130. How many persons do we have to test to find this

number if we take random samples (and do not test university students only)?

( ) ( )1302 2

1 1( 130) 1

F x e dv e dv

( ) ( )a

z F x a

40 60 80 100 120 140 160

IQ<130 IQ>130

One and two sided tests

We measure blood sugar concentrations and know that our method estimates the concentration with an error of about 3%. What is the probability that our

measurement deviates from the real value by more than 5%?

Albinos are rare in human populations. Assume their frequency is 1 per 100000 persons. What is the probability to find 15

albinos among 1000000 persons?

15 9999851000000( 15) (0.00001) (0.99999)

=KOMBINACJE(1000000,15)*0.00001^15*(1-0.00001)^999985 = 0.0347

np 2 npq

Home work and literature

Refresh:

• Bernoulli distribution• Poisson distribution• Normal distribution• Central limit theorem• Confidence limits• One, two sided tests • Z-transform

Prepare to the next lecture:

• c2 test• Mendel rules• t-test• F-test• Contingency table• G-test

Literature:

Łomnicki: Statystyka dla biologówMendel:http://en.wikipedia.org/wiki/Mendelian_inheritancePearson Chi2 testhttp://en.wikipedia.org/wiki/Pearson's_chi-square_test

what is the probability that of 10 newborn babies at least 7 are boys? p(girl) = p(boy) = 0.5...

Documents

bernoulli 0704

makalah bernoulli

understanding bernoulli

healthcare risk adjustment and predictive modeling · 7...

bernoulli if xel p )=p var(x)=e(x elx5=p-p2=p(...

bernoulli print

fluid flow pressure bernoulli principle surface tension ·...

final bernoulli

reading- unit 2- animal babies in grasslands missing p

bernoulli equations - usm · logo1 overview an example...

bernoulli trials - ansatt.hig.no discrete/cs... ·...

nikolaysibiryakov -...

informe bernoulli

prob 2 bernoulli notes - schenectady math...

daniel bernoulli

the bernoulli filter - flowgasket · the bernoulli filter...

the diagonal bernoulli differential estimation … diagonal...

bernoulli s

bernoulli trials and expected...

bernoulli solution