step three: statistical analyses to test biological hypotheses general protocol continued

32
Step three: statistical Step three: statistical analyses to test analyses to test biological hypotheses biological hypotheses General protocol General protocol continued continued

Upload: jalyn-whitenton

Post on 30-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Step three: statistical analyses to test biological hypotheses General protocol continued

Step three: statistical analyses Step three: statistical analyses to test biological hypothesesto test biological hypotheses

General protocol continuedGeneral protocol continued

Page 2: Step three: statistical analyses to test biological hypotheses General protocol continued

Biological hypotheses and Biological hypotheses and statistical testsstatistical tests

Hypotheses driven by Hypotheses driven by BiologyBiology

Statistics depend on data and hypothesesStatistics depend on data and hypotheses

NO NEW STATISTICAL TOOLS ARE NEEDED FOR NO NEW STATISTICAL TOOLS ARE NEEDED FOR MORPHOMETRICS!!MORPHOMETRICS!!

Explanatory hypotheses: relative position of Explanatory hypotheses: relative position of specimens in data space:relationship among specimens in data space:relationship among specimens in data spacespecimens in data space

Confirmatory hypotheses: compare groups, Confirmatory hypotheses: compare groups, associate shape with other variables, etc.associate shape with other variables, etc.

Page 3: Step three: statistical analyses to test biological hypotheses General protocol continued

Some hypotheses (shape related)Some hypotheses (shape related)

How do populations and species differ?How do populations and species differ?

Does the observed variation generate a Does the observed variation generate a predictable pattern?predictable pattern?

Are there additional factors (ecological, Are there additional factors (ecological, evolutionary) correlated with variation?evolutionary) correlated with variation?

How does shared evolutionary history How does shared evolutionary history affect the observed patterns?affect the observed patterns?

Page 4: Step three: statistical analyses to test biological hypotheses General protocol continued

Hypotheses as statistical testsHypotheses as statistical tests

Do populations Do populations differ?differ?

Is there a Is there a predictable predictable pattern?pattern?

Correlated factors?Correlated factors?

Effect of Effect of phylogeny?phylogeny?

MANOVA, CVAMANOVA, CVA

PCA, UPGMAPCA, UPGMA

Regression, 2B-PLSRegression, 2B-PLS

Comparative MethodComparative Method

Page 5: Step three: statistical analyses to test biological hypotheses General protocol continued

Exploratory data analysisExploratory data analysis

Investigate data using only Investigate data using only YY-matrix of -matrix of shape variables (PWScores + U1,U2)shape variables (PWScores + U1,U2)Specimens are points in high-dimensional Specimens are points in high-dimensional data spacedata spaceLook for patterns and distributions of Look for patterns and distributions of pointspointsGenerate summary plot of data space Generate summary plot of data space (ordination)(ordination)Look for relationships of points (clustering)Look for relationships of points (clustering)

Page 6: Step three: statistical analyses to test biological hypotheses General protocol continued

Ordination and dimension reductionOrdination and dimension reduction

Visualize high dimensional data space as Visualize high dimensional data space as succinctly as possiblesuccinctly as possible

Describe variation in original data with new set of Describe variation in original data with new set of variables (typically orthogonal vectors)variables (typically orthogonal vectors)

Order new variables by variation explained (most Order new variables by variation explained (most – least)– least)

Plot first few dimensions to summarize dataPlot first few dimensions to summarize data

Principal Components Analysis (PCA) one Principal Components Analysis (PCA) one approach (others include: PCoA, MDS, CA, etc.)approach (others include: PCoA, MDS, CA, etc.)

Page 7: Step three: statistical analyses to test biological hypotheses General protocol continued

PCA: what does it do?PCA: what does it do?

Rotates data so that main axis of variation (PC1) is Rotates data so that main axis of variation (PC1) is horizontalhorizontal

Subsequent PC axes are orthogonal to PC1, and are ordered Subsequent PC axes are orthogonal to PC1, and are ordered to explain sequentially less variationto explain sequentially less variation

The goal is to explain more variation in fewer dimensionsThe goal is to explain more variation in fewer dimensions

Page 8: Step three: statistical analyses to test biological hypotheses General protocol continued

PCA: interpretationsPCA: interpretations

Eigenvectors are Eigenvectors are linear combinations linear combinations of original variables of original variables (interpreted by PC loadings of each variable)(interpreted by PC loadings of each variable)PCA PCA PRESERVES EUCLIDEAN DISTANCES PRESERVES EUCLIDEAN DISTANCES among objectsamong objectsPCA does PCA does NOTHING NOTHING to the data, except rotate it to axes to the data, except rotate it to axes expressing the most variation; it loses expressing the most variation; it loses NO INFORMATIONNO INFORMATION (if all PC vectors retained)(if all PC vectors retained)If the original variables are uncorrelated, PCA not helpful in If the original variables are uncorrelated, PCA not helpful in reducing dimensionality of datareducing dimensionality of data

PCA does not PCA does not find find a particular factor a particular factor (e.g., group (e.g., group differences, allometry): it identifies the direction of most differences, allometry): it identifies the direction of most variation, which may be interpretable as a ‘factor’ (but may variation, which may be interpretable as a ‘factor’ (but may not)not)

Page 9: Step three: statistical analyses to test biological hypotheses General protocol continued

Example: leatherside chubExample: leatherside chub

Page 10: Step three: statistical analyses to test biological hypotheses General protocol continued

ClusteringClustering

Data are dots in a high-dimensional space (Data are dots in a high-dimensional space (YY--matrix)matrix)Can we connect to dots for Can we connect to dots for groupingsgroupings, where , where clusters represent groups of similar specimens?clusters represent groups of similar specimens?Cluster methods generate ‘1-dimensional view’ of Cluster methods generate ‘1-dimensional view’ of relationships, based on some criterionrelationships, based on some criterionClustering requires distance (or similarity) Clustering requires distance (or similarity) between points between points MANY different criteriaMANY different criteria

Clustering is algorithmic, not algebraic (i.e., it is a Clustering is algorithmic, not algebraic (i.e., it is a procedure, or set of rules for connecting data) procedure, or set of rules for connecting data)

Page 11: Step three: statistical analyses to test biological hypotheses General protocol continued

Clustering: UPGMAClustering: UPGMA

UC RW LOC PRED PROCRUST NOTAIL 40Group average

1 1 1 1 1 1 3 7 1 3 1 3 2 2 5 7 7 2 5 5 5 5 5 5 7 7 8 4 3 3 3 4 1 7 7 7 7 7 7 4 3 3 3 3 3 3 3 3 7 4 4 2 4 5 8 4 4 4 3 3 3 3 3 7(A

) 2 4 3 7 3 4 4 4 4 4 4 4 4 4 4 6 6 8 6 6 6 6 6 5 6 8 8 2 8 8 7 6 4 4 7 4 4 7 2 2 2 6 2 6 2 2 2 2 5 7 8 4 4 4 4 7 8 2 2 2 8 2 6 6 3 3 3 3 3 4 3 2 2 8 2 2

Samples0

0.05

0.10

0.15

Dis

tan

ce

Resemblance: D1 Euclidean distance

Page 12: Step three: statistical analyses to test biological hypotheses General protocol continued

Conclusions: exploratory methodsConclusions: exploratory methods

Useful tools for summarizing shape Useful tools for summarizing shape variationvariation

Help you understand your data through Help you understand your data through visualizing variation (both ordination plots visualizing variation (both ordination plots and cluster diagrams)and cluster diagrams)

Help describe relationships among Help describe relationships among specimens in terms of overall similarityspecimens in terms of overall similarity

Page 13: Step three: statistical analyses to test biological hypotheses General protocol continued

Confirmatory data analysisConfirmatory data analysis

Investigate data using shape variables (Investigate data using shape variables (YY--matrix) and other (independent) variables matrix) and other (independent) variables ((XX-matrix)-matrix)

Test for patterns of shape variationTest for patterns of shape variation

Independent variables determine type of Independent variables determine type of statistical teststatistical test

Page 14: Step three: statistical analyses to test biological hypotheses General protocol continued

Types of independent variablesTypes of independent variables

Categorical: variables delineating Categorical: variables delineating groups of specimens (e.g., groups of specimens (e.g., male/female, species, etc.)male/female, species, etc.)

Continuous: variables on a Continuous: variables on a continuous scale (e.g., size, continuous scale (e.g., size, moisture, age, etc.)moisture, age, etc.)

Different statistical methods for eachDifferent statistical methods for each

Page 15: Step three: statistical analyses to test biological hypotheses General protocol continued

Some statistical testsSome statistical tests

Categorical: shape Categorical: shape differences among differences among groupsgroupsContinuous: relationship Continuous: relationship of variables and shapeof variables and shapeContinuous: association Continuous: association of variables and shapeof variables and shape

MANOVAMANOVA

Mult. RegressionMult. Regression

2B-PLS (2-Block Partial 2B-PLS (2-Block Partial Least squares)Least squares)

MANOVA and multivariate regression are both GLM statistics (General Linear Models)

Page 16: Step three: statistical analyses to test biological hypotheses General protocol continued

Group differences: MANOVAGroup differences: MANOVA

Is there a difference in shape between Is there a difference in shape between groups?groups?

Multivariate generalization of ANOVAMultivariate generalization of ANOVA

Compares variation within groups to Compares variation within groups to variation between groupsvariation between groups

Significant MANOVA: Group means are Significant MANOVA: Group means are different in shapedifferent in shape

Page 17: Step three: statistical analyses to test biological hypotheses General protocol continued

RW1-RW30 Utah chubRW1-RW30 Utah chub

SourceSource

SexSexLocLoc

Sex X loc Sex X loc IL/SLIL/SL

SizeSize

MANOVAMANOVA

Wilks' Lambda 0.61907356 1.83 30 89 0.0159

Wilks' Lambda 0.75516916 0.96 30 89 0.5318

Wilks' Lambda 0.10138762 1.40 180 533.33 0.0020

Wilks' Lambda 0.00308619 3.26 240 706.35 <.0001

Wilks' Lambda 0.38888016 4.66 30 89 <.0001

Page 18: Step three: statistical analyses to test biological hypotheses General protocol continued

MANOVA: post hoc testsMANOVA: post hoc tests

Pairwise comparisons using Generalized Pairwise comparisons using Generalized Mahalanobis Distance (DMahalanobis Distance (D22 or D) or D)

Convert DConvert D22 →→TT2 2 → → F to testF to testFor experiment-wise error rate, adjust For experiment-wise error rate, adjust using Bonferroni:using Bonferroni:

α exp = α / # comparisons

Page 19: Step three: statistical analyses to test biological hypotheses General protocol continued

Discriminant analysis: CVA & DFADiscriminant analysis: CVA & DFA

‘‘Combination’ of MANOVA and PCACombination’ of MANOVA and PCATests for group differences (MANOVA)Tests for group differences (MANOVA)PCA of among-group variation relative to PCA of among-group variation relative to within-group variationwithin-group variationSuggests which groups differ on which Suggests which groups differ on which variablesvariablesCan ‘classify’ specimens to groupsCan ‘classify’ specimens to groups

Special case: 2 groups= discriminant Special case: 2 groups= discriminant function analysis (DFA)function analysis (DFA)

Page 20: Step three: statistical analyses to test biological hypotheses General protocol continued

DFA/CVA: post-hoc testsDFA/CVA: post-hoc testsFor DFA/CVA, compare difference among groups using Generalized For DFA/CVA, compare difference among groups using Generalized Mahalanobis Distance (DMahalanobis Distance (D22))Mahalanobis Mahalanobis DD2 2 is logical choice because CVA/DFA is MANOVA, is logical choice because CVA/DFA is MANOVA, and the PCA is relative to within-group variability (i.e., VCV and the PCA is relative to within-group variability (i.e., VCV ‘standardized’)‘standardized’)Convert DConvert D22 →→TT2 2 → → FF to perform statistical test to perform statistical testExperiment-wise error rate adjusted as before (i.e., Experiment-wise error rate adjusted as before (i.e., adjusted adjusted α)α)

Page 21: Step three: statistical analyses to test biological hypotheses General protocol continued

Continuous variation: regressionContinuous variation: regression

Is there a relationship between shape and Is there a relationship between shape and some other variable?some other variable?

Multivariate regression of shape on Multivariate regression of shape on continuous variablecontinuous variable

Significant regression implies shape Significant regression implies shape changes as a function of other variable changes as a function of other variable (e.g., size)(e.g., size)

Page 22: Step three: statistical analyses to test biological hypotheses General protocol continued

Example of shape on size in Example of shape on size in mountain suckermountain sucker

Multivariate tests of significance: Statistic Value Fs df1 df2 Prob Wilks' Lambda: 0.34356565 22.822 36 430.0 3.580E-078 Pillai's trace: 0.65643435 22.822 36 430.0 3.580E-078 Hotelling-Lawley trace: 1.91065190 22.822 36 430.0 3.580E-078 Roy's maximum root: 1.91065190 22.822 36 430.0 3.580E-078

Test that kth root and those that follow are zero: k U Fs df1 df2 Prob 1 0.34356565 22.822 36 430.0 3.580E-078

Page 23: Step three: statistical analyses to test biological hypotheses General protocol continued

Continuous variation: association Continuous variation: association 2B-PLS2B-PLS

Is there an association between shape and some other set Is there an association between shape and some other set of variables (not causal)?of variables (not causal)?Find pairs of linear combinations for X & Y that maximize Find pairs of linear combinations for X & Y that maximize the the covariation covariation between data setsbetween data setsLinear combinations are constrained to be orthogonal Linear combinations are constrained to be orthogonal within each set (like PC axes) but within each set (like PC axes) but NOT NOT between data setsbetween data setsCalculations less complicated for 2B-PLS (because fewer Calculations less complicated for 2B-PLS (because fewer mathematical constraints)mathematical constraints)Analogous to ‘multivariate correlation’Analogous to ‘multivariate correlation’

2B-PLS is called 2B-PLS is called SINGULAR WARPSSINGULAR WARPS when shape is one or when shape is one or more of the data sets. Bookstein et al., 2003: more of the data sets. Bookstein et al., 2003: J. of Hum. J. of Hum. Evol.)Evol.)

Page 24: Step three: statistical analyses to test biological hypotheses General protocol continued

Resampling methodsResampling methods

Methods that take many samples from original data Methods that take many samples from original data set in some specified way and evaluate the set in some specified way and evaluate the significance of the original based on these samplessignificance of the original based on these samplesResampling approaches are nonparametric, because Resampling approaches are nonparametric, because they do not depend of theoretical distributions for they do not depend of theoretical distributions for significance testing (they generate a distribution from significance testing (they generate a distribution from the data)the data)Are very flexible, and can allow for complicated Are very flexible, and can allow for complicated designsdesigns

Very useful in morphometrics, and can be used for:Very useful in morphometrics, and can be used for:• Testing standard designsTesting standard designs• Testing non-standard designsTesting non-standard designs• Testing when sample sizes small relative to # of Testing when sample sizes small relative to # of

variablesvariables

Page 25: Step three: statistical analyses to test biological hypotheses General protocol continued

Randomization (permutation)Randomization (permutation)Proposed by Fisher (1935) for assessing Proposed by Fisher (1935) for assessing significance of 2-sample comparison (Fisher’s exact significance of 2-sample comparison (Fisher’s exact test)test)Fisher’s exact test: a total enumeration of possible Fisher’s exact test: a total enumeration of possible pairings of datapairings of dataRandomization can be used to determine most any Randomization can be used to determine most any test statistic test statistic ProtocolProtocol

• Calculate observed statistic (e.g., T-statistic): ECalculate observed statistic (e.g., T-statistic): Eobsobs

• Reorder data set (i.e. randomly shuffle data) and recalculate statistic EReorder data set (i.e. randomly shuffle data) and recalculate statistic Erandrand

• Repeat many times to generate distribution of statisticRepeat many times to generate distribution of statistic• Percentage of EPercentage of Erand rand more extreme than Emore extreme than Eobs obs is significance levelis significance level

Page 26: Step three: statistical analyses to test biological hypotheses General protocol continued

Randomization: commentsRandomization: comments

Randomization EXTREMELY useful and flexible Randomization EXTREMELY useful and flexible techniquetechniqueHow and what to resample depends upon data How and what to resample depends upon data and hypothesisand hypothesis

• Regression and correlation: shuffle Y vs. XRegression and correlation: shuffle Y vs. X• Group comparison (e.g., ANOVA): shuffle Y on Group comparison (e.g., ANOVA): shuffle Y on

groupsgroups• Some tests (e.g., t-test) may depend on direction Some tests (e.g., t-test) may depend on direction

(1-tailed vs. 2-tailed)(1-tailed vs. 2-tailed)

Also useful when no theoretical distribution exists Also useful when no theoretical distribution exists for statistic, or when design is ‘non-standard’for statistic, or when design is ‘non-standard’This is frequently the case in E&E studiesThis is frequently the case in E&E studies

Page 27: Step three: statistical analyses to test biological hypotheses General protocol continued

Step four: Graphical depiction of Step four: Graphical depiction of resultsresults

Strength of landmark-based TPS Strength of landmark-based TPS approachapproach

Can view deformation of TPS grid among Can view deformation of TPS grid among groups or with continuous variablegroups or with continuous variable

Page 28: Step three: statistical analyses to test biological hypotheses General protocol continued

SuperimpositionSuperimposition

1

2

34

5

6789

10

11

12

13

14

15

1617

18

19

20

21

22

23

24

Page 29: Step three: statistical analyses to test biological hypotheses General protocol continued

Effect of relative intestinal length: Effect of relative intestinal length: measure of trophic levelmeasure of trophic level

1

2

3

4

5 678 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

1

2

3

4

5

67

89

10

11

12

13

1415

1617 18

19

20

21

22

23

24

Long IL/SL3.0

Short IL/SL0.72

Page 30: Step three: statistical analyses to test biological hypotheses General protocol continued

Effect of gradient on shape in Effect of gradient on shape in mountain suckermountain sucker

Low

High

Page 31: Step three: statistical analyses to test biological hypotheses General protocol continued

RW 1 40%

-0.12 -0.10 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08

RW

2 2

0%

-0.06

-0.04

-0.02

0.00

0.02

0.04

0.06

0.08

loc 1 nonloc 8 predloc 4 nonloc 7 predloc 3 nonloc 6 predloc 2 nonloc 6 pred

1

2

345

678 9

10

11

12

13

1415

1617 18

19

20

21

22

23

24

1

2

3

4

5678 9

10

11

12

13

14

15

1617

18

19

20

21

22

23

24

Page 32: Step three: statistical analyses to test biological hypotheses General protocol continued

RW1 40%

-0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04

RW

2 2

0%

-0.02

-0.01

0.00

0.01

0.02

0.03

0.04

nonpredpred

1

2

34

5678 9

10

11

12

13

14

15

1617

18

1920

21

22

23

24

1

2

34

5678 9

10

11

12

13

1415

1617 18

19

20

21

22

23

24

1

2

34

5

6789

10

11

12

13

1415

1617 18

19

20

21

22

23

24