characters co-vary covariation is biologically important...

26
Outline for Morphometrics Workshop #2 : Mathematics of Morphometric Methods Why use multivariate methods? Characters co-vary Covariation is biologically important How to characterize covariation Orthogonality Transformations Translation Rotation Scalar Multiplication Shear Eigenanalysis Eigenvectors Eigenvalues Ordination Principal components analysis Euclidean Distance Discrimination and Classification Discriminant analysis Mahalanobis Distance Outline analysis Radial Functions Tangent Angle Functions Fourier Series Harmonic Distance How to measure closed shapes Pixel Descriptions Moments Description and transformation Thin-plate splines Methods for Distance matrices Mantel tests Cluster Analysis Multi-Dimensional Scaling Red Book - Bookstein, F. L., B. Chernoff, R. Elder, J. Humphries, G. Smith, and R. Strauss. 1985. Morphometrics in evolutionary biology: The geometry of size and shape change, with examples from fishes. Academy of Natural Sciences of Philadelphia Special Publication 15. Blue Book - Rohlf, F. J. and F. L. Bookstein (eds.). 1990. Proceedings of the Michigan Morphometrics Workshop. Special Publication No. 2, University of Michigan Museum of Zoology: Ann Arbor. Orange Book - Bookstein, F. L. 1991. Morphometric Tools for Landmark Data. Geometry and Biology. Cambridge University Press: New York. Reyment's Black Book - Reyment, R. A. 1991. Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A primer for multivariate statistics.

Upload: others

Post on 09-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Outline for Morphometrics Workshop #2 : Mathematics of Morphometric Methods Why use multivariate methods? Characters co-vary Covariation is biologically important How to characterize covariation Orthogonality Transformations Translation Rotation Scalar Multiplication Shear Eigenanalysis Eigenvectors Eigenvalues Ordination Principal components analysis Euclidean Distance Discrimination and Classification Discriminant analysis Mahalanobis Distance Outline analysis Radial Functions Tangent Angle Functions Fourier Series Harmonic Distance How to measure closed shapes Pixel Descriptions Moments Description and transformation Thin-plate splines Methods for Distance matrices Mantel tests Cluster Analysis Multi-Dimensional Scaling Red Book - Bookstein, F. L., B. Chernoff, R. Elder, J. Humphries, G. Smith, and R. Strauss. 1985. Morphometrics in evolutionary biology: The geometry of size and shape change, with examples from fishes. Academy of Natural Sciences of Philadelphia Special Publication 15. Blue Book - Rohlf, F. J. and F. L. Bookstein (eds.). 1990. Proceedings of the Michigan Morphometrics Workshop. Special Publication No. 2, University of Michigan Museum of Zoology: Ann Arbor. Orange Book - Bookstein, F. L. 1991. Morphometric Tools for Landmark Data. Geometry and Biology. Cambridge University Press: New York. Reyment's Black Book - Reyment, R. A. 1991. Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A primer for multivariate statistics.

Page 2: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Lestrel, Pete E. 1997. Fourier descriptors and their applications in biology. New York : Cambridge University Press

Page 3: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Why use multivariate methods? Biological characters co-vary Variance-Covariance matrix LV2 LV4 LACV LPCV LV2 1.15 1.05 1.25 0.66 LV4 1.05 1.00 1.15 0.63 LACV 1.25 1.15 2.49 0.64 LPCV 0.66 0.63 0.64 0.75

----Covariation is biologically important ----Univariate analyses discard valuable information Populations or species may overlap in univariate characters, yet be quite distinct when more characters are considered

2

121 How to characterize covariation Variance-Covariance matrix Covariance--Units important, LV2 LV4 LACV LPCV

all characters measured in same units LV2 1.15 1.05 1.25 0.66 LV4 1.05 1.00 1.15 0.63 LACV 1.25 1.15 2.49 0.64 LPCV 0.66 0.63 0.64 0.75

Correlation matrix Correlation--No units, LV2 LV4 LACV LPCV

all characters rescaled so variance is 1.0 LV2 1.00 0.98 0.74 0.71 LV4 0.98 1.00 0.73 0.73 LACV 0.74 0.73 1.00 0.47 LPCV 0.71 0.73 0.47 1.00

For morphometric analysis, it is usually best to preserve information about covariances, thus it is important to measure all characters in the same units so that the covariance matrix may be calculated.

Page 4: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Some general mathematical ideas. Distance: a positive measure or metric that obeys the triangle inequality (d(x)+d(y)<=d(x)+d(y) d(x)>=0 d(ax)=ad(x) --Euclidean distance --Mahalanobis distance Metric space: a space in which some sort of distance exists. Transformation: a function which maps one metric space into another.

Matrix:

x y z

x y z

x y z

Scalar: a single number, a Linear Combination

Y = a1X1 + a2X2 + a3X3+. . . +akXk

Decomposition into Canonical Forms ---Take something complex and decompose it into a sum of simpler terms. ---The decomposition should be unique ---It is even best if the simple terms are orthogonal orthogonal - at right angles, or completely independent. Orthogonal axes in multivariate space are defined at right angles to each other - in two-dimensional space they would be described as being perpendicular to each other -E. Michel. statistical independence vs orthogonality unique representation and orthogonality

Page 5: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Some Simple Transformations Consider a cloud of data in an n-dimensional morphospace :

D Translation

D Rotation

D Scalar Multiplication (Shrinking or Expanding entire space)

D Reflection

Shrinking or expanding in one dimension only

Shear

Page 6: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Ordination To look at a mess of points in multivariate space, we would like an ordination procedure or transformation that has "nice" characteristics: What are "nice" characteristics? Produces Orthogonal Axes Axes not orthogonal

Preserves Relative Distances Distorts distances

Somehow Reduces number of Keeps number of dimensions informative dimensions the same

Principal components analysis does the things in the first column.

Page 7: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Principal components analysis (PCA) - Most generally described as an ordination technique for describing the variation in a multivariate data set. The first axis (the first principal component, or PC) describes the maximum variation in the whole data set, the second describes the maximum variance remaining, and so forth, with each axis orthogonal to the preceding axis. A principal component is an eigenvector of a covariance or correlation matrix -Ellinor Michel See Manly for a good description (and short!) of these analyses. PCA is equivalent to a solid rotation of your data, (plus a bit of translation and scalar multiplication if needed)

where the new axes are the principal components which are linear combinations of the old axes. PCA finds the unique transformation which will accomplish this: X * T = P When PCA is Variance-Covariance matrix applied to a matrix LV2 LV4 LACV LPCV LV2 1.15 1.05 1.25 0.66 LV4 1.05 1.00 1.15 0.63 LACV 1.25 1.15 2.49 0.64 LPCV 0.66 0.63 0.64 0.75

Var-Cov matrix after PCA PC1 4.345 0.000 0.000 0.000 PC2 0.000 0.767 0.000 0.000 PC3 0.000 0.000 0.250 0.000 PC4 0.000 0.000 0.000 0.019

where Eigenvalues Percentage each eigenvalue 4.345 0.807 0.767 0.142 0.250 0.046 0.019 0.003

Page 8: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Example: Principal Components Analysis 3 populations 3 characters 30 specimens per population Multivariate normal distributions

Variance-Covariance Matrix e EYE LEG WING EYE 6.83 1.77 2.32 LEG 1.77 2.53 3.78 WING 2.32 3.78 8.29 Vector Correlations pc1 pc2 EYE 0.6573308 0.7529939 LEG 0.8788818 -0.185262 WING 0.9129473 -0.392646 Eigenvalues EVAL PERCENT 11.81 0.669 5.23 0.296 0.59 0.033

Euclidean Distance Is preserved during PCA as long as scores are not standardized. Thus the distance between centroids in PC space is the same as in the original space.

Page 9: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Discriminant Analysis Discriminant analysis (or Canonical Discriminant analysis, CDA) - any procedure for constructing a linear combination of variables that tends to have different values from organisms in different groups. Group membership is specified in advance, and discriminant analysis then determines the axes that explains the greatest variation between the groups (optimizes variance among groups divided by variance within groups). -Ellinor Michel. Terminology is not standard Discriminant / Canonical Discriminant / Canonical Variates Analysis Oddly enough, different names to what are essentially the same analyses. The name depends on the characteristics of the study. The usual terminology is that the term "Discriminant Analysis" is applied to studies that have the characteristics in the first row while "Canonical Variates Analysis" is applied to studies that have the characteristics in the second row, while most authors deplore the term "Canonical Discriminant Analysis". See the Blue Book for more details on terminology. See Manly for a good description (and short!) of these analyses. Purpose Nature Number of groups Classification Confirmatory 2 groups Ordination Exploratory more than 2 groups Steps in a Canonical Variates Analysis a la Albrecht

CAN1

2

A

B

C A

A

A

B C

B C

B

C

RAW DATA

Rotate

Rotate

Rescale

1

2

3

1) Rotate to maximize within-group variation 2) Rescale to standardize within-group variation 3) Rotate to maximize among-group variation

Page 10: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Thus canonical variates analysis finds the linear combination of characters which maximizes the among group variation compared to the within-group variation or

F =varamong

varwithin

Notice that this F is the same statistics used in an ANOVA for one variable and a MANOVA for more than one variable. Assumptions: Same (or very similar) Covariance structure in all groups Multivariate normal distributions Case1 Case2

Page 11: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Raw Data

CanonicalVariatesSpace

Space

Can1

2

Can1

2

Rotate

Rescale

Rotate

Page 12: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Example: Canonical Variates Analysis

Cross-validation analysis ---gives the classification rates for each group classified as PROB A B C A 0.87 0.13 0.00 taken from B 0.07 0.93 0.00 C 0.00 0.00 1.00 Vector correlations CAN1 CAN2 EYE 0.851 0.427 LEG 0.055 0.189 WING -0.221 0.645 Eigenvalues EVAL PERCENT CAN1 10.8 0.874 CAN2 1.5 0.125

Mahalanobis distance: The Euclidean distance in Canonical Variates Space, or the distance in the raw data space modified by the pooled

Page 13: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

within-group variance structure. (See Manly).

A B

B > A

A > B Euclidean Distance

Mahalanobis Distance

Page 14: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Outline analysis Radial Functions

r

Tangent Angle Functions

Page 15: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Fourier Series

r = F(θ) = a0 + [an cos(nθ) + bn sin(nθ)n=1

k∑ ]

Any reasonably well-behaved periodic function can be written as the sum of cos and sin terms or Fourier series.

Total

Fourier Series (Phase angle representation)

Phase angle = Φn = arctanbn

an

Amplitude = Cn = an2

+ bn2

Page 16: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

2 6

1

2

3

4

5

6

7

8

9

Power Spectrum

Harmonics

2+6

2 13 15

2+13+15

Harmonics

Power Spectrum

Page 17: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Outline

1...9

1..8

1..7

1..6

1..5

1..4

1..3

1..2

1

2 6

Power Spectrum

Harmonics

2 13 15

Harmonics

Power Spectrum

Page 18: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Power = Cn2

How much of variation is explained nth Harmonic

Form r = f θ( ) = C0 + Cn cos(nθ +Φn )n=1

k

Species A Species B

1 2 3 4 5 6 7 8Harmonic

1 2 3 4 5 6 7 8Harmonic

Species A Species B

(n) (n)

Amplitude Amplitude

Fourier Series Representation is an Orthogonal Decomposition so terms are orthogonal to each other. For n Harmonics, a form would be represented by n values of the amplitude and the corresponding n values of the phase angle (see Lestrel) Harmonic Distance Measures the distance using the amplitudes and the phase angles

dhij = 1

k

([ cni

n=1

k∑ cos Φ ni − cnj cos Φnj )

2+ (cni sin Φni − cnj sin Φnj )

2)]

Page 19: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

How to measure closed shapes Pixel Descriptions

Moments

mpq = ��Xp

Yq

f(x,y) dx dy

Example: For each pixel (X,Y)=Location of pixel f(x,y) = 0 if white 1 if black So, just sum across all pixels Configuration Distance between pixel configurations can be as simple as overlaying images and bit mapping. See Lestrel for more details.

Page 20: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Transformations: Goal: Characterize the transformation from one landmark configuration to another.

?

Question: How many transformations are there between any two landmark configurations? Answer: An infinite number.

?

...

...

... Question: Which of these have "nice" properties? Question: Which of these have biologically relevant properties?

Page 21: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Properties of mappings: Which are "nice"? Which are biologically relevant? Preserve points Create points Destroy points

Lines to Lines Lines to Points Lines to Planes

Parallel Lines to Parallel Lines to Parallel Lines to Non-intersecting Lines Parallel Lines Intersecting Lines

Line to curve Line to line with kink

Question: Which of these are easy to compute? Thin-plate splines have the properties in the first column.

Page 22: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Landmark configurations

How does this mapping deform space?

Page 23: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

How does any mapping work?

Interpolate change space between landmarks

Thin-Plate Splines--Minimize Bending Energy ------Imagine you have a thin sheet of metal, the Bending Energy is the amount of energy it takes to bend this sheet. See Orange Book

Page 24: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Canonical Decomposition of a transformation into Principal Warps: ------The eigenvalues give the bending energy of each warp. ------The warp (or eigenvector) represents how much each point in the plane is bent or warped.

bending energy .390 principal warp 1

bending energy .160 principal warp 2 Distance: Total Bending Energy

Page 25: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

Methods for Distance matrices (See Manly for more details on all these analyses) Cluster Analysis ----Makes diagram which represents distances

D =

0 d12 d13

d12 0 d 23

d13 d23 0

1

2

3

Multi-Dimensional Scaling -----Constructs the best configuration of points given a distance matrix

D =

0 d12 d13

d12 0 d 23

d13 d23 0

Fits distances together minimizing "Strain" function.

1 23

Mantel test --- Compares two distance matrices directly

Page 26: Characters co-vary Covariation is biologically important ...docencia.med.uchile.cl/smg/pdf/workshop2.pdf · Multidimensional Palaeobiology. Pergamon Press: Oxford. Manly, 1986. A

D1 =

0 d12

d13

d12

0 d23

d13

d23

0

D2 =

0 d12

d13

d12

0 d23

d13

d23

0

Distribution of Statistic constructed by randomizing rows and columns of one matrix

p

value of statistic from data