principal coordinate analysis, correspondence analysis and multidimensional scaling: multivariate...
TRANSCRIPT
![Page 1: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/1.jpg)
Principal Coordinate Analysis, Correspondence Analysis and
Multidimensional Scaling:Multivariate Analysis of
Association Matrices
BIOL4062/5062
Hal Whitehead
![Page 2: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/2.jpg)
• Association matrices
• Principal Coordinates Analysis (PCO)
• Correspondence Analysis (COA)
• Multidimensional Scaling (MDS)
![Page 3: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/3.jpg)
The Association Matrix
A B C D E F G …ABCDEFG…
Units:
Units:
![Page 4: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/4.jpg)
Association matrices• Social structure
– association between individuals
• Community ecology– similarity between species, sites
– dissimilarities between species sites
• Genetic distances
• Correlation matrices
• Covariance matrices
• Distance matrices– Euclidean, Penrose, Mahalanobis
Similarity
Dissimilarity
![Page 5: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/5.jpg)
Association matricesSymmetric/Asymmetric
Genetic relatedness among bottlenose dolphins (Krutzen et
al. 2003)
Grooming ratesof capuchinmonkeys(Perry 1996)
GRI -0.24VAX 0.02 0.08KRI 0.02 -0.04 -0.19MYR -0.27 0.44 -0.03 -0.11WOW 0.22 0.11 0.32 -0.10 0.10HOB -0.04 0.11 -0.17 -0.13 -0.08 -0.12WBE 0.15 0.07 -0.08 0.08 -0.08 0.23 0.13HOR -0.08 0.21 -0.14 -0.23 0.18 0.12 0.11 0.26AJA -0.24 0.23 -0.04 -0.16 -0.01 -0.16 0.07 0.25 0.32PIK -0.11 0.35 -0.07 0.04 0.02 -0.05 0.09 0.60 0.21 0.27ANV -0.05 -0.23 -0.39 -0.39 -0.21 -0.13 -0.41 0.11 0.11 0.02 -0.06VEE 0.14 0.02 0.15 -0.11 -0.08 0.00 -0.09 -0.05 0.06 0.01 -0.17 -0.17
LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV
Recipient
Actor A S N D W T
A - 5.8 3.5 2.1 2.3 0.04
S 41.6 - 28.6 18.1 9.0 7.4
N 10.3 25.5 - 9.6 9.9 4.3
D 23.3 9.3 10.5 - 13.4 6.9
W 21.2 15.2 14.6 25.1 - 10.4
T 2.5 2.9 3.7 3.6 5.3 -
![Page 6: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/6.jpg)
Principal Coordinates Analysis
• Consider a symmetric dissimilarity matrixB 5
C 3 7
D5 4 4
A B C
• As a distance matrix
• And then plot it
![Page 7: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/7.jpg)
Principal Coordinates Analysis
B 5
C 3 7
D5 4 4
A B C AB 5
C
37
D
5
44
Can represent: distances between 2 points in 1 dimension distances between 3 points in 2 dimensions distances between 4 points in 3 dimensions … distances between k points in k-1dimensions
![Page 8: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/8.jpg)
Principal Coordinates AnalysisHOWEVER!
B 5
C 3 7
D5 4 4
A B C AB 5
Triangle inequality violated if:
AB + AC < BC
No representation possible
10C ??
![Page 9: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/9.jpg)
Principal Coordinates Analysis
• Take distance (dissimilarity) matrix with k units• Represent as k points in k-1 dimensional space
– if triangle inequality holds throughout
• Find direction of greatest variability– 1st Principal Coordinate
• Find direction of next greatest variability (orthogonal)– 2nd Principal Coordinate
• …• k-1 Principal Coordinates
Reducesdimensionality
ofrepresentation
![Page 10: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/10.jpg)
Principal Coordinates Analysis• Eigenvectors of distance matrix give principal
coordinates• Eigenvalues give proportion of variance accounted
for• Triangle inequality equivalent to:
– matrix is positive semi-definite– no unreal eigenvectors– no negative eigenvalues– analysis probably OK if few small, negative eigenvalues
![Page 11: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/11.jpg)
Principal Coordinates Analysis (PCO)& Principal Coomponents Analysis (PCA)
• PCO is equivalent to PCA on covariance matrix of transposed data matrix if distance matrix is Euclidean
• PCO is equivalent to PCA on correlation matrix of transposed data matrix if distance matrix is Penrose
• PCO only gives information on units or variables not both
• Axes (principal coordinates) rarely interpretable in PCO
![Page 12: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/12.jpg)
Principal Coordinates Analysis
Proportion of time chickadees seen together at feeder
SCAO 1.00 AOPR 0.18 1.00 ARPO 0.07 0.27 1.00 YOSA 0.26 0.12 0.12 1.00 ROAY 0.21 0.19 0.18 0.31 1.00 SORA 0.06 0.02 0.03 0.15 0.04 1.00 BJAO 0.19 0.17 0.09 0.16 0.21 0.28 1.00 SCAO AOPR ARPO YOSA ROAY SORA BJAO
Ficken et al. Behav. Ecol. Sociobiol. 1981
![Page 13: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/13.jpg)
Principal Coordinates Analysis
Proportion of time chickadees seen together at feederTransformed to distance matrix (1-X)
SCAO 0.00 AOPR 0.91 0.00 ARPO 0.96 0.85 0.00 YOSA 0.86 0.94 0.94 0.00 ROAY 0.89 0.90 0.91 0.83 0.00 SORA 0.97 0.99 0.98 0.92 0.98 0.00 BJAO 0.90 0.91 0.95 0.92 0.89 0.85 0.00 SCAO AOPR ARPO YOSA ROAY SORA BJAO
![Page 14: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/14.jpg)
Principal CoordinatesAnalysis:Chickadeesat Feeder
-0.4 -0.2 0 0.2 0.4 0.6-0.4
-0.2
0
0.2
SCAO
AOPR
ARPO
YOSA
ROAY
SORA
BJAO
1st principal coordinate
2nd
prin
cipa
l coo
rdin
ate
SCAO 1.00 AOPR 0.18 1.00 ARPO 0.07 0.27 1.00 YOSA 0.26 0.12 0.12 1.00 ROAY 0.21 0.19 0.18 0.31 1.00 SORA 0.06 0.02 0.03 0.15 0.04 1.00 BJAO 0.19 0.17 0.09 0.16 0.21 0.28 1.00 SCAO AOPR ARPO YOSA ROAY SORA BJAO
Prin Coord % explained Cumulative Eigenvalue 1 22.77 22.77 0.575 2 20.05 42.82 0.507 3 16.63 59.45 0.420 4 15.17 74.62 0.383 5 13.37 87.98 0.338 6 12.02 100.00 0.304
![Page 15: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/15.jpg)
Correspondence Analysis
• Uses incidence matrix– counts indexed by two factors– e.g., Archaeology: tombs X artifacts– e.g., Community ecology: sites X species
• Data matrix with counts and many zeros
![Page 16: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/16.jpg)
Correspondence Analysis• Distance between two species, i and j, over sites k=1,
…,p is (“Chi-squared” measure):
ri species totals
ck site totals
• {Difference in proportions of each species at each site}
D
x r x rcij
ik i jk j
kk
p
/ /2
1
Then do Principal Coordinates Analysis
![Page 17: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/17.jpg)
Correspondence Analysis• Distance between two species, i and j, over sites
k=1,…,p is (“Chi-squared” measure):
• Distance between two sites, k and l, over species i=1,…,n is:
D
x r x rcij
ik i jk j
kk
p
/ /2
1
D
x c x crkl
ik k il l
ii
n
/ /
2
1
![Page 18: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/18.jpg)
Correspondence Analysis Example: Sperm Whale Haplotypes by Clan
Reg Short 4-plus
#1 48 28 2
#2 8 27 11
#3 9 26 0
#4 0 0 3
#5 1 2 1
#6 1 0 5
#7 4 0 0
#8 0 4 1
#9 0 2 0
#11 3 0 0
#12 0 1 0
#13 4 1 0
#14 1 0 0
#15 1 0 0
mtD
NA
hap
loty
pe
Correspondence Plot
-3 -1 1 3Dim(1)
-3
-1
1
3
Dim
(2)
11 14157
13
46
8
25
12 9
3
14-plus
Short
Reg
Eigenvalue 0.394
Eig
enva
lue
0.20
5
![Page 19: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/19.jpg)
Multidimensional Scaling
• “Non-parametric version of principal coordinates analysis”
• Given an association matrix between units:– tries to find a representation of the units in a
given number of dimensions– preserving the pattern/ordering in the
association matrix
![Page 20: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/20.jpg)
Multidimensional ScalingHow it works:
1 Provide association matrix (similarity/dissimilarity)
2 Provide number of dimensions
3 Produce initial plot, perhaps using Principal Coordinates
4 Orders distances on plot, compares them with ordering of association matrix
5 Computes STRESS
6 Juggles points to reduce STRESS
7 Go to 4, until STRESS is stabilized
8 Output plot, STRESS
9 Perhaps repeat with new starting conditions
![Page 21: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/21.jpg)
Multidimensional Scaling
• STRESS:
• dij associations between i and j
• xij associations between i and j predicted using distances on plot (by regression)
d x
d
ij iji j
iji j
2
2
,
,
![Page 22: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/22.jpg)
Multidimensional Scaling
• Iterative– No unique solution– Try with different starting positions
• Different possible definitions of STRESS
![Page 23: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/23.jpg)
Multidimensional ScalingShepard Diagrams
Metric Scaling Non-metric Scaling
Similar plots to Principal Coordinates Easier to fit
Stress 23% Stress 16%
Shepard Diagram
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Data
0
1
2
3
Dis
tan
ces
Association values
Shepard Diagram
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Data
0
1
2
3
Dis
tan
ces
Association values
![Page 24: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/24.jpg)
Genetic distances between sperm whale groups
Configuration
-2 -1 0 1 2Dimension-1
-2
-1
0
1
2
Dim
ensi
on-2
45
23 3
48
37
39
40
62
43
2144
4124
46
Stress 23%
Metric MDS
Configuration
-2 -1 0 1 2Dimension-1
-2
-1
0
1
2
Dim
ens
ion -
2
45
233
37
39
40
44
62
43
41
21
2446
48
Non-Metric 2-D MDS
Stress 16%
Configuration
-2-1
01
2
Dimension-1
-2
-1
0
1
2
Dimension-2
-1
0
1
2
Dim
en
sio
n-3 3
23
24
44
39
21
40
41
37
43
46
48
6245
Non-Metric 3-D MDS
Stress 8%
Principal coordinates13/14 eigenvalues negative -not a good representation
![Page 25: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/25.jpg)
Multidimensional Scaling• How many dimensions?
– STRESS <10% is “good representation”– Scree diagram– two (or three) dimensions for visual ease
• Metric or non-metric?– Metric has few advantages over Principal Coordinates
Analysis (unless many negative eigenvalues)– Non-metric does better with fewer dimensions
![Page 26: Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead](https://reader035.vdocument.in/reader035/viewer/2022081501/56649e9e5503460f94b9f757/html5/thumbnails/26.jpg)
Non-metric Multidimensional Scaling vs. Principal Coordinates Analysis
Principal Coordinates MDSCAL
Scaling: Metric Non-metric
Input: Distance matrix Association matrix
Matrix: Pos. Semi-Def. -
Solution: Unique Iterative
Max. Units: 100's 25-100
Dimensions: More Less
Choose no. of
dimensions: Afterwards Before