three frameworks for statistical analysis. sample design forest, n=6 field, n=4 count ant nests per...

49
Three Frameworks for Statistical Analysis

Upload: erik-sparks

Post on 14-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Three Frameworks for Statistical Analysis

Page 2: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Sample Design

Forest, N=6

Field, N=4

Count ant nests per quadrat

Page 3: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

DataId # Habitat Number of ant nest per quadrat

1 Forest 9

2 Forest 6

3 Forest 4

4 Forest 6

5 Forest 7

6 Forest 10

7 Field 12

8 Field 9

9 Field 12

10 Field 10

Page 4: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Three Frameworks for Statistical Analysis

• Monte Carlo Analysis• Parametric Analysis• Bayesian Analysis

Biology
I think we should call this a "Resampling Analysis". As pointed out in the Crowley (1992) paper, Monte Carlo anlaysis is a specific type of resampling technique that requires assumptions about the underlying distribution, whereas the randomization technique described in the ant example makes no such assumption!
Page 5: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

The model

• yi= is a measurement on a “continuous” scale, which belongs to an individual type of habitat “i”

• xi= is an indicator or dummy variable for groups (0,1)

• The model includes three parameters:

• α: the mean for groups

• β: the mean difference between groups, and

• The variance (σ2) of the normal distribution from which the residuals εi are assumed to have come from.

iii xy * 2,0~ NormaliFor the Parametric and Bayesian

Biology
I'm not sure whether it is necessary to introduce this formal model here. I think the main point is that we are trying to determine whether there is a difference between population means based on sample data, and that there are 3 different major frameworks we're going to consider to do our testing. I think it would be better to save this for when we get to the chapters on linear models...
Page 6: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Monte Carlo Analysis

Involves a number of methods in whichdata are randomized orreshuffled so that observationsare randomly reassigned to differenttreatment groups. This randomizationspecifies the null hypothesis underconsideration

Biology
Again, I think we should call this "Resampling Analysis"
Page 7: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Monte Carlo Analysis1. Specify a test statistic or index to

describe the pattern in the data2. Create a distribution of the test

statistic that would be expected under the null hypothesis

3. Decide on one- or two-tailed test4. Compare the observed test statistic

to a distribution of simulated values and estimate the appropriate P value as a tail probability

Biology
"Resampling Analysis"
Page 8: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

1. Specifying the Test Statistic

75.3775.10 obsDif

Page 9: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

2. Creating the Null Distribution

Biology
Are you going to explain how the null distribution is created? It is not clear from this slide.
Page 10: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

3. Deciding on a One or Two tailed Test

Abs (difference) =

3.750

P =

0.036Threshold

Page 11: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

4. Calculating the Tail Probability

Inequality N

DIFsim> DIFobs 7

DIFsim= DIFobs 29

DIFsim< DIFobs 964

36/1000=0.036

Page 12: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Differences between means

Difference =

3.7500

P1 =

0.0228

75.3775.10 obsdifference

Page 13: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Assumptions

• The data collected represent random, independent samples

• The test statistic describes the pattern of interest

• The randomization creates an appropriate null distribution for the question

Page 14: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Advantages

• It makes clear and explicit the underlying assumptions and the structure of the null hypothesis

• It does not require the assumption that the data are sampled from a specified probability distribution, such as the normal

Page 15: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Disadvantages

• It is computer intensive and is not included in most traditional statistical packages

• Different analyses of the same data set can yield slightly different answers

• The domain of inference for a Monte Carlo analysis is subtly more restrictive than that for a parametric analysis

Biology
Resampling
Page 16: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Parametric analysis

• Refers to statistical tests built on the assumption that the data being analyzed were sampled from a specified distribution

• Most statistical tests specify the normal distribution

Page 17: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Parametric analysis

1. Specify the test statistic

2. Specify the null distribution

3. Calculate the tail probability

Page 18: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

1. Specify the test statistic t test

21

21

XXs

XXt

21

222

211 11**

21 NNdf

sdfsdfs

TXX

Biology
We should point out that this formula would give us a standard normal variable (via the Central Limit Theorem), but that because we are estimating the variance it gives us a t-distribution with (N1+N2-2) degrees of freedom.
Page 19: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Specify the test statistic

Null hypothesis

Forest Field

Page 20: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

2. Specify the null distribution

Critical value

Page 21: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

3. Calculate the tail probability: Student’s t table

df\p 0.4 0.25 0.1 0.05 0.025 0.01 0.005 0.0005

1 0.32492 1 3.077684 6.313752 12.7062 31.82052 63.65674 636.6192

2 0.288675 0.816497 1.885618 2.919986 4.30265 6.96456 9.92484 31.5991

3 0.276671 0.764892 1.637744 2.353363 3.18245 4.5407 5.84091 12.924

4 0.270722 0.740697 1.533206 2.131847 2.77645 3.74695 4.60409 8.6103

5 0.267181 0.726687 1.475884 2.015048 2.57058 3.36493 4.03214 6.8688

6 0.264835 0.717558 1.439756 1.94318 2.44691 3.14267 3.70743 5.9588

7 0.263167 0.711142 1.414924 1.894579 2.36462 2.99795 3.49948 5.4079

8 0.261921 0.706387 1.396815 1.859548 2.306 2.89646 3.35539 5.0413

http://www.statsoft.com/textbook/sttable.html#t

Page 22: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Results of t-test

Levene's Test for Equality of Variances t-test for Equality of Means

F Sig. t dfSig. (2-

tailed)Mean

Difference

Equal variances assumed 0.4255 0.5324 -2.96319 8 0.018 -3.75

Equal variances not assumed -3.21265 7.95 0.012 -3.75

Habitat N Mean Std. DeviationStd. Error

Mean

Forest 6 7 2.19 0.89

Field 4 10.75 1.5 0.75

Page 23: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Assumptions

• The data collected represent random, independent samples

• The data were sampled from a specified distribution

Page 24: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Advantages

• It uses a powerful framework based on known probability distributions

Page 25: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Disadvantages

• It may not be as powerful as sophisticated Monte Carlo models that are tailored to particular questions or data

• It rarely incorporates a priori information or results from other experiments

Biology
randomization
Page 26: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

What About Non-Parametric Analyses?

• Essentially, these analyses give the P-values that would be obtained by ranking the observations and then performing randomization tests on the ranked data

• Like other resampling methods, non-parametric analyses do not require distributional assumptions.

• However, they have less power than the equivalent parametric tests and can only be used with simple experimental designs.

Page 27: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Bayesian analysis

• It includes prior information and then uses current data to build on earlier results

• It also allows us to quantify the probability of the observed difference [i.e., P(Ha|data)]

Page 28: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Bayesian analysis

1. Specify the hypothesis

2. Specify parameters as random variables

3. Specify the prior probability distribution

4. Calculate the likelihood

5. Calculate the posterior probability distribution

6. Interpret the results

Page 29: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

1. Specify the hypothesis

• The primary goal of a Bayesian analysis is to determine the probability of the hypothesis given the data P(H | data)

• The hypothesis needs to be quite specific, and need to be quantitative:

P(diff>2 | diffobs =3.75)

Biology
Are you going to explain how you came up with this hypothesis? It is not clear from the slide.
Page 30: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

P(hypothesis | data)

)(

)|()()|(

dataP

hypothesisdataPhypothesisPdatahypothesisP

The left hand side of the equation is called the posterior probability distribution, and is the quantity of interest

Page 31: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

P(hypothesis | data)

)(

)|()()|(

dataP

hypothesisdataPhypothesisPdatahypothesisP

The right hand side of the equation consists of a fraction. In the numerator, the term P(hypothesis) is the prior probability distribution, and is the probability of the hypothesis of interest before you conducted the experiment

Page 32: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

P(hypothesis | data)

)(

)|()()|(

dataP

hypothesisdataPhypothesisPdatahypothesisP

The next term in the numerator is referred as the likelihood of the data; it reflects the probability of observing the data given the hypothesis

Page 33: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

P(hypothesis | data)

)(

)|()()|(

dataP

hypothesisdataPhypothesisPdatahypothesisP

The denominator is a normalizing constant that reflects the probability of the data given all possible hypotheses. It scales the posterior probability distribution to the range [0,1].

Page 34: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

P(hypothesis | data)

)|()()|( hypothesisdataPhypothesisPdatahypothesisP

We can focus our attention on the numerator

Page 35: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

2. Specify the parameters as random variables

),(~ 2 fieldfield N

),(~ 2 forestforest N

The type of random variable used for each population parameter should reflect biological reality or mathematical convenience

Page 36: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

3. Specify the prior probability distribution

• We can combine and re-analyze data from the literature, talk to experts, etc. to come up with reasonable estimates for the density of ant nests in fields and forests

• OR, we can use an “uninformative prior”, for which we initially estimate the density of ant nests to be equal to zero and the variances to be very large

Page 37: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

)10000,0(~ Nforest

Page 38: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

2/1~

sigma ~ dunif(0,10)

Page 39: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

WinBugs codemodel

{#Priorsmu1 ~ dnorm(0,0.001)delta ~ dnorm(0,0.001)tau <- 1/(sigma*sigma)sigma ~ dunif(0,10)

#Likelihoodfor (i in 1:n) { y[i]~ dnorm(mu[i],tau) mu[i] <- mu1 + delta*x[i] residual[i] <- y[i]-mu[i]}# Derived quantities mu2 <- mu1 + delta

}

Page 40: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Comparison between approaches

• Parametric• Null hypothesis:

• P(data | H0)

• P(tobs= 2.96 |t>F theoretical=1.86)

• Parameters are fixed

• Bayesian• Hypothesis:• P(H | data)

• P(diff> 2 | diffobs =3.75)

• Parameters are random variables

Page 41: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

4. Calculate the likelihood

Field Forest

The likelihood is a distribution that is proportional to the probability of the observed data given the hypothesis

Maximum likelihood

Field mean

Field variance

Page 42: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

5. Calculate the posterior probability distribution

• We multiply the prior by the likelihood, and divide by the normalizing constant

• In contrast to the results of the parametric or Monte Carlo analysis, the result of a Bayesian analysis is a probability distribution, not a single P-value

Biology
randomization
Page 43: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Bayesian output

[1]

[2]

box plot: a

4.0

6.0

8.0

10.0

12.0

14.0

a[1] sample: 650000

-5.0 0.0 5.0 10.0 15.0

0.0

0.2

0.4

0.6

a[2] sample: 650000

-10.0 0.0 10.0

0.0 0.1 0.2 0.3 0.4

Field

Forest

delta chains 1:3 sample: 2997

-5.0 0.0 5.0 10.0

0.0 0.1 0.2 0.3 0.4

Delta (difference)

Page 44: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Estimates

Estimator

Analysis delta

(slope)λForest λField σForest σField

Parametric 3.75 (1.27) 7.00 10.75 0.98 0.87

Bayesian uniformed prior

3.75 (1.61) 7.00 10.74 1.01 1.22

Page 45: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

6. Interpreting the Results

• Given the Bayesian estimate of

mean diff= 3.698; [P(diff>2 | 3.75)=0.87

(2607/2997),

In other words, the analysis indicates that there is a P=0.87 that ant nest densities between the two habitats are different by > 2 nests.

Page 46: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Assumptions

• The data collected represent random, independent samples

• The parameters to be estimated are random variables with known distributions

Page 47: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Advantages

• It allows for the explicit incorporation of prior information, and the results from one experiment can be used to inform subsequent experiments

• The results are interpreted in an intuitively straightforward way, and the inferences are conditional on both the observed data and the prior information

Page 48: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat

Disadvantages

• It has computational challenges and the requirement to condition the hypothesis on the data

• Potential lack of objectivity, because different results will be obtained using different priors

Page 49: Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat