analysis of variance and statistical inference

Download Analysis  of  variance  and  statistical inference

If you can't read please download the document

Upload: alka

Post on 25-Feb-2016

49 views

Category:

Documents


5 download

DESCRIPTION

Analysis of variance and statistical inference. R epetitive designs. I n medical research we test patients before and after medical treatment to infer the influence of the therapy. - PowerPoint PPT Presentation

TRANSCRIPT

Slajd 1

Analysis of variance and statistical inference

1

2

3Repetitive designsIn medical research we test patients before and after medical treatment to infer the influence of the therapy. We have to divide the total variance (SStotal) in a part that contains the variance between patients (SSbetween) and within the patient (SSwithin). The latter can be divided in a part that comes from the treatment (SStreat) and the error (SSerror)

Medical

treatment4

Ipsative data

5Spiders from two Mazuarian lake ensemblesSummary statistics

6Starting hyothesesThe degree of disturbance (human impact) influences species richenss.Species richness and abundance depends on island area and environmental afctors.Island ensembles differ in species richness and abundance.Area, abundance, and species richness are non-linearly related.Latitude and longitude do not influence species richness.SortingArea, abundance, and species richness are non-linearly related.Latitude and longitude do not influence species richness.Species richness and abundance depends on island area and environmental factors.Island ensembles differ in species richness and abundance. The degree of disturbance (human impact) influences species richenss.The hypotheses are not independent.

Each hypothesis influences the way how to treat the next.7

Area, abundance, and species richness are non-linearly related.Species area and individuals area relationships

8Latitude and longitude do not influence species richness.Is species richness correlated with longitude and latitude?Does the distance between islands influence species richness? Are geographically near islands also similar in species richness irrespective of island area?R(S-Long) = 0.22 n.s.R(S-Lat) = 0.28 n.s.)That there is no significant correlation does not mean that latitude and longitude do not have an influence on the regression model with environmental variables.Spatial autocorrelationS1S3S5S6S2S4In spatial autocorrelation the distance between study sites influence the response (dependent) variable. Spatialy adjacent sites are then expected to be more similar with respect to the response variable. 9Morans I as a measure of spatial autocorrelationMorans I is similar to a correlation coefficient all applied to pairwise cells of a spatial matrix. It differs by weighting the covariance to account for spatial non-independence of cells with respect to distance.

If cell values were randomly distributed (not spatially autocorrelated) the expected I is

Statistical significance is calculated from a Monte Carlo simulationS1S3S5S6S2S4

All combinations of sites10

11

Individuals/trap is slightly spatially autocorrelatedLatitude and longitude slightly influence species richenss.Even this weak effect might influence the outcome of a regression analysis.12

Errors:Too many variables!!Solution: prior factor analysis to reduce the number of dependent variablesStepwsie variable reduction Akaike information criterion.The lower AIC the more appropriate is the model OLS resultSpatial autoregression resultLog transformed variables13Information criteria

What function fits best?The more free parameters a model has the higher will be R2.The more parsimonious a model is the lesser is the bias towards type I errors. We have to find a compromis between goodness of fit and bias!

Model parametersfewmanyBiasExplained varianceThe optimal number of model parameters14The Akaike criterion of model choice

k: number of model parameters L: maximum likelihood estimate of the model If the parameter errors are normal and independent we get

n: number data pointsRSS: residual sums of squares

If we fit using c2:If we fit using R2:At small sample size we should use the following correction

The preferred model is the one with the lowest AIC.15

We get the surprising result that the seemingly worst fitting model appears to be the preferred one.A single outlier makes the difference. The single high residual makes the exponential fitting worse16Significant difference in model fit

Approximately DAIC is statisticaly significant in favor of the model with thesmaller AIC at the 5% error benchmark if |DAIC| > 2.

The last model is significantly (5% level) the best.

17

Stepwise variable eliminationStandardized coefficients (b-values) are equivalents of correlation coefficients. They should have values above 1.Such values point to too high correlation between the predictor variables (collinearity). Collnearity disturbs any regression model and has to be eliminated prior to analysis.Highly correlated variables essentially contain the same information. Correlations of less than 0.7 can be tolerated.Hence check first the matrix of correlation coefficients.Eliminate variables that do not add information. 18

The final modelSimple test wise probability levels. We yet have to correct for multiple testing.Bonferroni correction

To get an experiment wise error rate of 0.05 our test wise error rates have be less than 0.05/nThe best model is not always the one with the lowest AIC or the highest R2.Species richness is positively correlated with island area and negatively with soil humidity.19Island ensembles differ in species richness and abundance.Analysis of covariance (ANCOVA)

Species richness depends on environmental factors that may differ between island ensembles.A simple ANOVA does not detect any difference20Analysis of covariance (ANCOVA)ANCOVA is the combination of multiple regression and analysis of variance.First we perform a regression anlyis and use the residuals of the full model as entries in the ANOVA. ANCOVA is the ANOVA on regression residuals.

We use the regression residuals for further analysisThe metrically scaled variables serve as covariates.Sites with very high positive residuals are particularly species rich even after controlling for environmental factors. These are ecological hot spots.Regression analysis serves to identify such hot spots21

ANCOVA

Species richness does not differ between island ensembles.22

The degree of disturbance (human impact) influences species richenss.

Species richness of spiders on lake islands appears to be independent of the degree of disturbance23

How does abundance depend on environmental fatcors?The ful model and stepwise variable eliminationAll coefficients are highly significant!All standardized coefficients are above 1. This points to too high collinearity

We furthr eliminate uninformative variables.Abundance does not significally depend on environmental variables24

How does abundance depend on the degree of disturbance?Abundance of spiders on lake islands appears to be independent of the degree of disturbance25