3 dec 2012comp80131-seedsm12_61 scientific methods 1 barry & goran ‘scientific evaluation,...

3 Dec 2012 COMP80131-SEEDSM12_6 1

Scientific Methods 1

Barry & Goran

‘Scientific evaluation, experimental design

& statistical methods’

COMP80131

Lecture 6: Statistical Methods-Significance

www.cs.man.ac.uk/~barry/mydocs/myCOMP80131

3 Dec 2012 COMP80131-SEEDSM12_6 2

Continuous random processes• Characterised by probability density functions (pdf)

b

a

abdxxpdf )(

Uniform pdf: Prob of the random variable x lying between a and b is:

pdf(x)

1

1

a b

x

Gaussian (Normal) pdf with mean m & std dev .2

21

21)(

mx

expdf

m

pdf(x)

a b xm- m+

b

a

dxxpdfob )(Pr

68%95.5% for m 299.7% for m 3

3 Dec 2012 COMP80131-SEEDSM12_6 3

pdf & Histograms• Ru = rand(10000,1); %10000 unif samples• hist(Ru,20);• Rg=randn(10000,1); %Gaussian with m=0, std=1• hist(Rg,20);

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

100

200

300

400

500

600

-4 -3 -2 -1 0 1 2 3 4 50

200

400

600

800

1000

1200

1400

1600

3 Dec 2012 COMP80131-SEEDSM12_6 4

Convert histogram to estimate of pdf

• Divide each column by number of samples• Then divide by width of bins.• For better approximation, increase number of bins

3 Dec 2012 COMP80131-SEEDSM12_6 5

MATLAB illustrationRg = randn(100000,1); %10000 Gaussians with m=0, std=1widthBin = 0.2;X = -4 : widthBin : 4 ;H = hist(Rg,X); % Histogram with bins centred on elements of Xfigure(2); bar(X,(H/100000)/widthBin); ylabel('pdf estimate');

Histogram as pdf estimate.

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

pdf e

stim

ate

3 Dec 2012 COMP80131-SEEDSM12_6 6

Gaussian (normal) pdf• Measurements {xi} of many naturally occurring phenomena

tend to be normally distributed with some mean µ & stdev .• Let zi = (xi - µ)/, • Then {zi} has standard normal pdf with mean = 0 & std = 1.

• Conversely, if you generate a set of pseudo-random numbers {zi} with mean = 0 & std = 1, let

xi = (zi) + µ to scale the mean & std as required.

3 Dec 2012 COMP80131-SEEDSM12_6 7

Plot true standard normal pdfMean=0; Std=1;K = 1/( Std*sqrt(2*pi) ); X = -4*Std : widthBin : 4*Std ;for I=1:length(X); G(I) = K * exp(-(X(I)-Mean)^2 / (2*Std^2) ); end;figure(4); plot(X,G); ylabel('pdf');

-4 -3 -2 -1 0 1 2 3 40

0.05

0.1

0.15

0.20.25

0.3

0.35

0.4

Gau

ssia

n pd

f

x

2

21

21)(

mx

expdf

3 Dec 2012 COMP80131-SEEDSM12_6 8

Plot Gaussian cdfX=-4:0.1:4;C = normcdf(X,0,1);figure(1); plot(X,C); grid on;xlabel('x'); ylabel('prob that var < x');

Cumulative density function (cdf)

Probability of Gaussian variable (m=0 std=1)being < x.

No formula for this.

Use MATLAB function: normcdf(X,m,std)

-4 -3 -2 -1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

prob

that

rand

var

iabl

e <

x

3 Dec 2012 COMP80131-SEEDSM12_6 9

Complementary Gaussian cdf

This is just 1 – normcdf(x,m,)

It is prob of Gaussian random variable (mean= m, std=) being > x.

-4 -3 -2 -1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

prob

that

var

> x

3 Dec 2012 COMP80131-SEEDSM12_6 10

Complementary error function• Some call the complementary Gaussian cdf (m=0, =1)

the ‘complementary error function’ Q(z)• But ‘erfc’ is also called this.• Q(z) = comp-Gaussian cdf = 0.5 erfc(-z/2).• Used to rely on tables & graphs of Q(z). • When m0 & 1, use Q((z-m)/)

3 Dec 2012 COMP80131-SEEDSM12_6 11

3 Dec 2012 COMP80131-SEEDSM12_6 12

Use of ‘normcdf’ function

Prob of random var being between D & E is:

E

Ddxxpdf )m,normcdf(D, )m,normcdf(E,)(

D E

-4 -3 -2 -1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4G

auss

ian

pdf

xD E

3 Dec 2012 COMP80131-SEEDSM12_6 13

Tail of distribution

D

Prob of random variable being greater than D is:

D

dxxpdf )m,normcdf(D,1)(

-4 -3 -2 -1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Gau

ssia

n pd

f

x

D

3 Dec 2012 COMP80131-SEEDSM12_6 14

An Engineering Question• Rectangular 1v & 0v pulses used to transmit a binary signal.• Affected by additive white Gaussian noise (AWGN).• Mean of noise =0 & power (variance) 2 = 0.01. • Estimate the bit-error probability.• Bit-error may occur if noise adds voltage > 0.5v to 0 v or < -0.5 v to 1v.• Assume same no. of 1’s & 0’s

t

+1

+1/2

Voltage

3 Dec 2012 COMP80131-SEEDSM12_6 15

Solution

prob(error) = prob(noise > 0.5) when bit =0 + prob(noise < -0.5) when bit =1 = 0.5 prob(noise > 0.5) +0.5 prob(noise < -

0.5) = prob(noise > 0.5) because of symmetry = 1 - normcdf(0.5, 0, 0.1) = 2.910-7

Or, using graph Q(z/) on next page, prob(error) = Q(0.5/) = Q(0.5/0.1) = Q(5) 310-7

3 Dec 2012 COMP80131-SEEDSM12_6 16

/

z/

Q(z

/)

3 Dec 2012 COMP80131-SEEDSM12_6 17

Back to sampling• Assume a population has true mean , & stdev .• Take a sample of N measurements from it; say N=50• Calculate sample-mean m1 & stdev s1.• Cannot expect m1 = µ & s1 = , exactly.• Take another sample, & calculate m2 & s2 .• Repeat to obtain m1, m2, …, mM & s1, s2, …, sM• Now have distributions for sample-mean & sample-stdev.• If population is Gaussian, pdf of sample-means will be

Gaussian with mean = & stdev = / N.• Can confirm by increasing M & estimating mean & stdev

of sample-mean from m1, m2, …, mM • What about mean & stdev of sample-variances? (later)

3 Dec 2012 COMP80131-SEEDSM12_6 18

Significance testing• Assume pop-mean (‘mu’)may change.• Assume we know pop-stdev & that it will not change.• Assume we can only take one sample of 50 values.• Calculate m1 to decide whether µ has changed.• Null Hypothesis – it has not changed.

i.e. new pop-mean New =

• If Null Hyp is true, pdf of sample-mean is on next slide:

3 Dec 2012 COMP80131-SEEDSM12_6 19

pdf of sample-mean

-2s1 -s1 +s1 +2s1 +4s1

00.050.1

0.150.2

0.250.3

0.350.4

Gau

ssia

n pd

f

m1

•Assume value we got was m1 = + 2.5s1. E.g. if µ=0 & =1, then m1 = 2.5/50 0.36

•How unlikely if Null Hypothesis is true?

s1 = /50

3 Dec 2012 COMP80131-SEEDSM12_6 20

Concept of a ‘null-hypothesis’

• A null-hypothesis is an assumption that is made and then tested by a set of experiments designed to reveal that it is likely to be false, if it is false.

• Testing is done by considering how probable the results are, assuming the null hypothesis is true.

• If the results appear very improbable the researcher may conclude that the null-hypothesis is likely to be false.

• This is usually the outcome the researcher hopes for when he or she is trying to prove that a new technique is likely to have some value.

3 Dec 2012 COMP80131-SEEDSM12_6 21

p-value• “Probability of obtaining a test result at least as extreme as

the one observed, assuming that null-hypothesis is true”. • Reject null-hypothesis if the p-value is less than some

value α (significance level) which is often 0.05 or 0.01. • When null-hypothesis is rejected, result is statistically

significant. • Here p-value is 1 - normcdf(m1, , s1) …with s1= /N = 1-normcdf(+2.5s1, , s1) = 1- normcdf(2.5s1 ,0, s1) = 0.0062 = 1- normcdf(2.5 ,0, 1) = 0.0062

• Much less than 0.01 so reject NH at 1% confidence level.• Conclude that mean has changed.

3 Dec 2012 COMP80131-SEEDSM12_6 22

Our two assumptions• That was easy because we made 2 assumptions:

population is Gaussian & pop-stdev is known to us.• Now need to eliminate these 2 assumptions.• We have some help from the Central Limit Theorem:

3 Dec 2012 COMP80131-SEEDSM12_6 23

Central Limit Theorem• If samples of size N are ‘randomly’ chosen from a pop

with mean & std , the pdf of their sample-means, m1, approaches a Normal (Gaussian) pdf with mean & std /N as N is made larger & larger.

• Regardless of whether population is Gaussian or not!• Previous example can be made to work for non-

Gaussian pop provided N is ‘large enough’.• More on this next week.

3 Dec 2012 COMP80131-SEEDSM12_6 24

Another example• Assume we wish to find out if a technique designed to

benefit users of a system is likely to have any value.

• Divide users into two groups & offer proposed technique to one group, and something different to the other group.

• The null-hypothesis would be that the proposed technique offers no measurable advantage over the other techniques.

3 Dec 2012 COMP80131-SEEDSM12_6 25

The testing• Look for differences between the sets of results obtained

for each of the two groups. • Careful experimental design will try to eliminate

differences not caused by techniques being compared.• Take a large number of users in each group & randomize

the way the users are assigned to groups. • Once other differences have been eliminated as far as

possible, remaining difference will hopefully be indicative of the effectiveness of the techniques being investigated.

• Vital question is whether they are likely to be due to the advantages of the new technique, or the inevitable random variations that arise from the other factors.

• Are the differences statistically significant? • Can employ a statistical significance to find out.

3 Dec 2012 COMP80131-SEEDSM12_6 26

Failure of the experiment• If results are not found to look improbable under the null-

hypothesis, i.e. if the differences between the two groups are not

statistically significant, then no conclusion can be made.

• Null-hypothesis could be true, or it could still be false. • Mistake to conclude that the ‘null-hypothesis’ has been

proved likely to be true in this circumstance. • It is quite possible that the results of the experiment give

insufficient evidence to make any conclusions at all.

3 Dec 2012 COMP80131-SEEDSM12_6 27

Question: fair coin testChecking whether a coin is fairSuppose we obtain heads 14 times out of 20 flips. The p-value for this test result would be the probability of a fair coin landing on heads at least 14 times out of 20 flips. From binomial distribution formula( Lecture 4), this is:

(20C14 + 20C15+20C16+20C17+20C18+20C19+20C20) / 220

= 0.058This is probability that a fair coin would give a result as extreme or more extreme than 14 heads out of 20.

3 Dec 2012 COMP80131-SEEDSM12_6 28

Significance test for fair coin question• Reject null-hypothesis if p-value α . • If α= 0.05, rejection of null-hypothesis is:

“at the 5% (significance) level”.• Probability of wrongly rejecting null-hypothesis

(Type 1 error) will be equal to α. • This is often considered ‘sufficiently low’. • In our example, p-value = 0.058 > 0.05.• Observation is consistent with null-hypothesis & we

cannot reject it.• Cannot conclude that coin is likely to be unfair.• But we have NOT proved that coin is likely to be fair.• 14 heads out of 20 flips can be ascribed to chance alone• It falls within the range of what could happen 95% of the

time with a fair coin.

3 Dec 2012 COMP80131-SEEDSM12_6 29

Questions from Lecture 2• Analyse the ficticious exam results & comment on features.• Compute means, stdevs & vars for each subject & histograms for the

distributions.• Make observations about performance in each subject & overall• Do marks support the hypothesis that people good at Music are also

good at Maths?• Do they support the hypothesis that people good at English are also

good at French?• Do they support the hypothesis that people good at Art are also good

at Maths?• If you have access to only 50 rows of this data, investigate the same

hypotheses– What conclusions could you draw, and with what degree of certainty?

3 Dec 2012 COMP80131-SEEDSM12_6 30

Questions from L41. A patent goes to a doctor with a bad cough & a fever. The doctor needs

to decide whether he has ‘swine flu’. Let statement S = ‘has bad cough and fever’ & statement F = ‘has swine flu’. The doctor consults his medical books and finds that about 40% of patients with swine-flu have these same symptoms. Assuming that, currently, about 1% of the population is suffering from swine-flu and that currently about 5% have bad cough and fever (due to many possible causes including swine-flu), we can apply Bayes theorem to estimate the probability of this particular patient having swine-flu.

2. A doctor in another country knows form his text-books that for 40% of patients with swine-flu, the statement S, ‘has bad cough and fever’ is true. He sees many patients and comes to believe that the probability that a patient with ‘bad cough and fever’ actually has swine-flu is about 0.1 or 10%. If there were reason to believe that, currently, about 1% of the population have a bad cough and fever, what percentage of the population is likely to be suffering from swine-flu?

3 dec 2012comp80131-seedsm12_61 scientific methods 1 barry & goran ‘scientific evaluation,...

Documents