some multivariate methods used by ecologists
Post on 27-Jan-2015
140 Views
Preview:
DESCRIPTION
TRANSCRIPT
Some Multivariate Methods Used by Ecologists
Peter Chapman Wokingham U3A
4 October 2012
Contents
• Assumptions • Introduction • Notation • Refresher on matrices • Eigen vectors and eigen values • Principal component analysis • Principal coordinate analysis • Correspondence analysis • Redundancy analysis and canonical correspondence analysis • R Software
Assumptions
1 2 3
2
I am assuming that you are familiar with the following :
A sum of several numbers ............. , 1 to
The mean of several numbers
1 1Variance ( ) , or
( 1)
n
i
i
n
i
i
n
i
i
x x x x i n
x
xn
x xn n
2( )
1Covariance ( )
( 1)
n
i
i
n
i i
i
x x
x x y yn
Introduction
Before 1980 multivariate research tended to be theoretical. Often this involved working on distribution theory for hypothetical but unrealistic situations. Very rarely did anyone carry out a multivariate analysis. Since 1980 or so, with growth of computing power, people started using multivariate methods. This has led to further development of the old methods, plus introduction of a lot of new methods. Starting in the mid 1980s a lot of biologically trained people who were also computer literate started appearing in the work place. These people, who rarely had any formal training in mathematics or statistics were able to run multivariate software and get results. This led to a lot of nonsense, including some published nonsense.
1 2 3 4 5 6 7
1 3 8
2 0 9
3 13 4
4 5 5
8 19
23
77 7 3
Species Si
tes
1 2 3 4 5 6 7
1 45.8 78.6
2 32.8 98.5
3 56.1 45.0
4
77 78.3
Counts
Environmental
Site
s
e.g. Soil, Climate, etc.
Real Numbers
Typical Example of Ecology Data
In pesticide research “site” could refer to different chemicals, or different rates of the same chemical
Example data sets: Although this is a talk about methods used in ecology, I shall be using data from other sources. This is because it is easier to understand what is going on if data is more familiar. Also, because during my search of the web I found it quite difficult to find suitable ecological examples. I have also tended to use data with small numbers of dimensions
Notation
is an matrix :
I will call it an n by p matrix - which means n rows and p columns
Always in blue, and always a captital if p > 1
And using the matric "style" in "Mathtype w h
it
n pY
dimensions in subscripts
also represents matrix
showing generic cell member,
with subscripts but not dimensions....................not used very much
ijy
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
. . .
. . .
. . .
. . . . . . . is another way of representing a matrix
. . . . . . .
. . . . . . .
. . .
p
p
p
n n n np
y y y y
y y y y
y y y y
y y y y
If matrix has only one row or only one column i will use lower case blue,e.g. n×1
u
Refresher on Matrices
Generic Matrix
Square Matrix
Row Matrix Column Matrix Vector
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
. . .
. . .
. . .
. . . . . . .
. . . . . . .
. . . . . . .
. . .
p
p
p
ij
n n n np
y y y y
y y y y
y y y y
y
y y y y
n pY
11 12 13
21 22 23
31 32 33
ij
y y y
y y y y
y y y
3 3Y
11 12 11 3 1. . . . . . ny y y y p
Y
11
21
31
1
.
.
.
.
n
y
y
y
y
n 1Y
Matrix Multiplication
3 3 3 2 2 3
Q B C
11 12 13 11 12
11 12 13
21 22 23 21 22
21 22 23
31 32 33 31 32
q q q b bc c c
q q q b bc c c
q q q b b
11 11 11 12 21
12 11 12 12 22
13 11 13 12 23
33 31` 13 32 23
.
q b c b c
q b c b c
q b c b c
q b c b c
32 32 32
Q B C
11 12 11 12 11 12 11 11 12 12
21 22 21 22 21 22 21 21 22 21
31 32 31 32 31 32 31 31 32 33
q q b b c c b c b c
q q b b c c b c b c
q q b b c c b c b c
Matrix Addition
Diagonal (Square) Matrix
11
22
33
44
55
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
ij
a
a
b a
a
a
5×5B
Identity Matrix
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
5×5I
n×n n×n n×n n×n
I C C I
Norm: Normalisation
1
2
3
.
.
.
.
n
b
b
b
b
nb
3
4
4
3
4
3
2 2 221 ....... nb b b b
2 23 4 5 b
Length or Norm
/ / 5 b b b b b
Normalised
/b b b
Transpose of a Matrix
11 12 13 14
21 22 23 24
31 32 33 34
41 42 43 44
51 52 53 54
11 21 31 41 51
12 22 32 42 52
13 23 33 43 53
14 24 34 44 54
the transpose of i s
y y y y
y y y y
y y y y
y y y y
y y y y
y y y y y
y y y y y
y y y y y
y y y y y
5×4
5×4 4×5
Y
Y Y
( ) =
( ) =
n×p p×q q×p p×n
n×p p×q q×m m×q q×p p×n
A B B A
A B C C B A
Scalar Product
1 2 3 1
2
3
. .
.
.
n
n
b c b b b b c
c
c
c
1 n n 1b c1 1 2 2 3 3 .......... n nbc b c b c b c
If b and c are orthogonal then
length of length of cos b c
cos cos90 0
1 2 3 1
2
3
. .
.
.
n
n
b c b b b b b
b
b
b
1×n n×1b b1 1 2 2 3 3 .......... n nbb b b b b b c
0 1 n n 1
b cso
length of length of cos0 b b
2
length of b
1 if is normal ised b
Determinant
11 12
11 22 12 21
21 22
b bb b b b
b b
B
11 12 13
21 22 23
31 32 33
22 23 21 23 21 221 1 1 2 1 3
11 12 13
32 33 31 33 31 32
( 1) ( 1) ( 1)
b b b
b b b
b b b
b b b b b bb b b
b b b b b b
B
Scalar
Rank of a Square Matrix
1 1 1
3 0 2
4 1 3
(-2*Col 1) = col 2 + (3*col3) row 1 = row 2 – row 3 Only two linearly independent (orthogonal) rows so rank = 2.
2 1 4
2 1 4
2 1 4
(-2*col1) = (4*col2) = col 3 row 1 = row 2 = row 3 Rank = 1
Order of a square matrix is number of rows/columns Rank of a square matrix is the number of linearly independent rows/columns A square matrix whose rank is less than its order has a determinant of zero. If a square matrix has a non-zero determinant it has full rank = number of rows or columns A full rank square matrix is called non-singular
Inverse of a Square Matrix
If is non singula r then -1 -1B BB B B I
is called the inverse o f-1B B
1 1
1 0
3 1
32B
1 3 1
2 5 1
23 23 32 22 32 23 33C C B I B C I
4 15 4
7 25 6
23 23 32 22 32 23 33C C B I B C I
C is not unique
http://www.mathwords.com/i/inverse_of_a_matrix.htm http://mathworld.wolfram.com/MatrixInverse.html http://www.purplemath.com/modules/mtrxinvr.htm
Association Matrices
Association Matrices Q-Mode
11 1
1
p
n np
y y
y y
npY
11 12 13 1. . . . . . py y y y
21 22 23 2. . . . . . py y y y
Row 1
Row 2
2 2 2 2
12 21 11 21 11 21 11 21 11 21.......a a y y y y y y y y
Euclidean Distance
Correlation
1 1. 2 2.
1
12 212 2
1 1. 2 2.
p
j j
j
j j
y y y y
a a
y y y y
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
. . .
. . .
. . .
. . . . . . .
. . . . . . .
. . . . . . .
. . .
n
n
n
n
i
n n
n j
n nn
a a a a
a a a a
a a a a
a
a a a a
A
11 1
1
p
n np
y y
y y
npY
11
21
31
1
.
.
.
.
n
y
y
y
y
12
22
32
2
.
.
.
.
n
y
y
y
y
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
. . .
. . .
. . .
. . . . . . .
. . . . . . .
. . . . . . .
. . .
p
p
p
ij
p p p pp
a a a a
a a a a
a a a a
a
a a a a
nnA
Compute
12 21a a
Association Matrices R-Mode
Summary: Q and R Mode
Descriptors Objects
Objects e.g. Sites
Descriptors e.g. Species
n pYn n
A
p pA
R mode association matrix
Q mode association matrix
Original data matrix
Eigen Values and Eigen Vectors
Ecological data sets usually include a large number of variables that are associated to one
another (e.g. linearly correlated). The basic idea underlying several methods of data analysis is
to reduce this large number of inter-correlated variables to a smaller number of composite, but
linearly independent variables, each explaining a different fraction of the observed variation.
One of the main goals of numerical data analysis is to generate a small number of variables,
each explaining a large proportion of the variation, and to ascertain that these new variables
explain different aspects of the phenomena under study.
Eigen analysis is a key tool in helping us achieve this aim.
Eigen Values and Eigenvectors
For any square matrix the following relationship always exists
Where the colums of are orthonormal
n×n
-1
n×n n×n n×n n×n n×n n×n n×n
n×n
A
A U Λ U U Λ U
U
11 12 13
21 22 23
31 32 33
. . .
. . .
. . .
. . . . . .
. . . . . .
ij
a a a
a a a
a a a a
n×nA
1
2
3
0 0 . .
0 0 . .
0 0 . .
. . . . .
. . . . .
n×nΛ
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
. .
. .
. .
. . . . . .
. .
n
n
ij n
n n n nn
u u u u
u u u u
u u u u u
u u u u
n×nU
Matrix is known as the canonical form of n×n n×n
Λ A
Any square matrix Eigenvalues Some may be zero Some may be equal Lagrange multipliers
Columns are eigenvectors Columns orthonormal
Eigen Values and Eigenvectors
We compute eigen values of square matrix by solving n equations: i
i iAu u i = 1 to n
i i i
Au λ u 0
( )i i A I u 0
n*n matrix n*1 vector
n*1 vector
0i A Iare found by solving i a polynomial of degree n
iu are then easily found characteristic equation
n×nA
Singular Value Decomposition
Singular Value Decomposition (SVD)
Any matrix can be factorised as follows :
Columns of are the left singular vectors of
is a diagonal matrix containing (non - negative) singular va
lue
n×p
n×p n×p p×p p×p
n×p n×n n×p p×p
n×n n×p
n×p
Y
Y V W U
Y V W U
V Y
W s
Columns of are the right singular vectors of p×p n×p
U Y
Lack of consistency in literature
SVD can be applied to any m × n matrix.
Eigenvalue decomposition can only be applied to certain classes of square matrices.
Nevertheless, the two decompositions are related.
Given an SVD of M, as described above, the following holds:
) (
p×n n×p p×p p×n n×n n×n n×p p×p
p×p p×n n×p p×p p×p p×p p×p
Y Y U W V V W U
U W W U U Λ U
SVD Transpose of SVD
n×p p×n n×n n×p p×n n×n n×n n×n n×nY Y V (W W )V V Λ Valso
eigenvectors eigenvalues
Principal Component Analysis
Karl Pearson, 1901 Harold Hotelling, 1933, 1936
11
21
1
We have a matrix with variance - covariance matrix
We transform as follows
Now consider the first column of
:
.
or .
.
.
n
z
z
z
n×p p p
n×p n×p n×p p×p
n×p n×1 n×p n×1
Y S
Y Z Y U
Z z Y u
11 12 1 11 11 11 12 21 1 1
21 22 2 21 21 11 22 21 2 1
1 1 1 11 2 21
. . .....
. . .....
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . .....
p p p
p p p
n np p n n
y y y u y u y u y u
y y y u y u y u y u
y y u y u y u y
1np pu
2
1 .
1
2
1 11 2 21 1 .1 11 .2 21 . 1
1
2
1 .1 11 2 .2 21 21 1
1
2 2
1 .1 11 1 .1 2 .2 11 21
1( )
1..... ( ..... )
1( ) ( ) ..... ( )
1( ) ( )( ) ..............
n
i p
n
i i ip p p p
n
i i ip ip p
i i i
Var z zn
y u y u y u y u y u y un
y y u y y u u y y un
y y u y y y y u un
n×1z
1
..........
and so on
n
1×p p×p p×1u S u
1
1
Now we need to find that maximises var[ ] subject to = 1
Let be a Lagrange multiplier
Then maximise = ( -1)
p×1 p×1 1×p p×1
1×p p×p p×1 1×p p×1
u z u u
u S u u u
1
1
1
0
( ) 0
We find first by solving the polynomial
p×p p×1 p×1
p×p p×p p×1
p
S u (1)
u
u (2S I
S
)
1
1
1
1
0 (degree p)
We then substitute into to find
From we can show that var[ ] , subject to = 1
is the first eig
(
e
1)
×p p×p
p×1
1×p p×p p×1 p×1 1×p p×1
I
u
u S u z u
(2)
u
n value of and is the first eigen vec to rp×p p×1S u
To compute 2nd,3rd, and higher eigen values and eigen vectors is an identical process
Additional constraints are needed
Notably ( ) = 0 equivalent to Cov( ) = 0
where and are d
,
i
1×p p×1 p×1 p×1
p×1 p×1
u v u v
u v fferent eigen vectors
Burglary Violence Robbery
Bath 3 5.7 0.4
Birmingham 7.2 11.6 4.1
Brighton 4.5 7.6 1.5
Bristol 9.8 10.4 2.9
Cambridge 7.6 7.6 1.6
Canterbury 2.7 6.3 0.5
Cardiff 6 8.6 1.1
Coventry 7.4 11.2 2.2
Lancaster 2.5 7.7 0.4
Leeds 12 7.5 1.7
Leicester 8.6 11.7 2.6
Liverpool 9.1 7.7 2.3
Manchester 13.2 11.5 5.1
Newcastle 5.2 8.6 0.9
Nottingham 11.7 15.1 4.3
Oxford 5.2 8.3 1.7
Sheffield 6.9 7.5 1.2
Southampton 4.9 14.6 1.8
Swansea 4.7 7.3 0.4
York 4.4 6.7 0.5
Bournemouth 5.2 12.9 0.9
Plymouth 4 10.7 0.7
Norwich 3.9 8.9 1.1
Lincoln 6 9.8 0.9
Objects
Source: http://www.thecompleteuniversityguide.co.uk/preparing-to-go/staying-safe-and-secure/how-safe-is-your-city/
Descriptors
Ro
bb
ery
Violence
Vio
len
ce
Burglary
Burglary
Ro
bb
ery
3.0 5.7 6.5 9.4 3.49 3.70
7.2 11.6 6.5 9.4 0.71 2.20
4.5 7.6. . . 1.99. 1.80
. . . . . . =
. . . . . .
. . . . . .
. . . . . .
6.0 9.8 6.5 9.4 0.49 0.40
24×2 24×2 24×2 24×2Y Y Y Y -
Burglary Violence
Let’s start with two variables
8.799 3.0361
3.036 6.5051n
24×2 24×2 24×2 24×2 24×2S Y Y Y - Y
R-Mode
08.799 3.0360 1 2
03.036 6.505
k
k
k
k or
24×2 24×2S I
Eigen Values = 10.898 = 4.407 1 2
-0.8226264 0.5685823
-0.5685823 -0.8226264
2×2UEigen Vectors
Variance-covariance matrix
3.49 3.70
0.71 2.20
1.99. 1.80
. . -0.8226264 0.5685823
. . -0.5685823 -0.8226264
. .
. .
0.49 0
(
.
)
40
24×2 24×2 24×2 2×2Z Y Y U
1 0.8226264 0.5685823z Burglary Violence
Accounting for 10.898*100/(10.898+4.407) = 71.2 % of variance
2 0.5685823 0.8226264z Burglary Violence
Accounting for 28.8% of Variance
Violence
2 0.5685823 0.8226264Z Burglary Violence
Plots of 1st and 2nd Components
Robbery
Violence
Plots of 1st and 2nd Components
Robbery
1Z2Z
2 2 2 2
1 2
2 2 2 2
1 2
1 1 1 1
1 2
1 1 1 1
variance( ) variance( ) variance( ) variance( )
n n n n
Robbery Violence Z Z
Robbery Violence Z Zn n n n
Robbery Violence Z Z
The PCA rotation maximises variance of Z1 relative to Z2
It also maximises relative to
1
2
Now three variables
2 3
3.0 5.7 0.4 6.5 9.4 1.7 3.49 3.70 1
7.2 11.6 4.1 6.5 9.4 1.7
4.5 7.6. 1.5 6.5 9.4 1.7
. . . . . . =
. . . . . .
. . . . . .
. . . . . .
6.0 9.8 0.9 6.5 9.4 1.7
4× 24×3 24×3 24×3Y Y Y Y -
.3
0.71 2.20 2.4
1.99. 1.80 0.2
. . .
. . .
. . .
. . .
0.49 0.40 0.8
Burglary Violence Robbery
8.799 3.036 3.1211
3.036 6.505 2.0051
3.121 2.005 1.689n
24×3 24×3 24×3 24×3 24×3S Y Y Y Y
8.799 3.036 3.121 0 0
3.036 6.505 2.005 0 0 0 1,2, 3
3.121 2.005 1.689 0 0
k
k k
k
k
24×3 24×3
S I
Eigen Values = 12.205 = 4.411 = 0.378
0.7788 0.5562 0.2900
0.5318 0.8307 0.1648
0.3325 0.0259 0.9427
3×3UEigen Vectors
1 2 3
3.49 3.70 1.3
0.71 2.20 2.4
1.99. 1.80 0.20.7788 0.5562 0.2900
. . . 0.5318 0.8307 0.1648
. . .0.3325 0.0259 0.9427
. . .
. . .
0.49 0.40 0.
( )
8
24×3 24×3 24×3 3×3Z Y Y U
1 0.7788 0.5318 0.3325z Burglary Violence Robbery
Accounting for 12.205*100/(12.205+4.411+0.378) = 71.8 % of variance
2 0.5562 0.8307 0.0259z Burglary Violence Robbery
Accounting for 26.0% of Variance
3 0.2900 0.1648 0.9427z Burglary Violence Robbery
Accounting for 2.2% of Variance
-6
-4
-2
0
2
4
6
-6 -4 -2 0 2 4 6 8 10
High Crime
High Burglary/ Lower Violence
Low Crime
High Violence/ Lower Burglary
Bath
Canterbury
Lancaster
Leeds
Liverpool
Nottingham
Manchester Bournemouth
Southampton
2Z
1Z
Violence
2 0.5685823 0.8226264Z Burglary Violence
Plots of 1st and 2nd Components
Robbery
Bi-Plot
Ordination in 2 dimensions
Original 3-dimensions
Projection of 3 dimensions into 2 dimensions
Bi-Plot
Both the direction and length of the vectors can be interpreted. So, for these data, where the vectors represent judges, and the points cars, a group of vectors pointing in the same direction correspond to a group of judges who have the same preference opinions about the automobiles
In a biplot, the length of the lines approximates the variances of the variables. The longer the line, the higher is the variance. The angle between the lines, or, to be more precise, the cosine of the angle between the lines, approximates the correlation between the variables they represent. The closer the angle is to 90, or 270 degrees, the smaller the correlation. An angle of 0 or 180 degrees reflects a correlation of 1 or −1, respectively.
Taken from The Stata Journal (2005) 5, Number 2, pp. 208–223 Data inspection using biplots. Ulrich Kohler, Wissenschaftszentrum Berlin kohler@wz-berlin.de,Magdalena Luniak, Wissenschaftszentrum Berlin luniak@wz-berlin.de
http://onlinelibrary.wiley.com/doi/10.1002/9780470238004.app2/pdf
Hardcover: 476 pages Publisher: Wiley-Blackwell (24 Dec 2010) Language: English ISBN-10: 0470012552 ISBN-13: 978-0470012550
Bi-Plot
Ordination in 2 dimensions
Original 3-dimensions
Projection of 3 dimensions into 2 dimensions
US Agricultural Exports: 1990 to 2010
Country Value (million dollars)
1990 1995 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Canada 4,214 5,796 7,060 7,643 8,124 8,662 9,315 9,733 10,618 11,951 14,062 16,253 15,725 16,856
Mexico 2,560 3,522 5,625 6,410 7,407 7,238 7,891 8,510 9,429 10,881 12,692 15,508 12,932 14,575
Caribbean 1,015 1,281 1,493 1,408 1,400 1,518 1,590 1,850 1,913 2,114 2,575 3,592 3,082 3,192
Central America 483 871 1,100 1,121 1,234 1,252 1,338 1,429 1,589 1,832 2,363 3,106 2,553 2,923
Brazil 177 522 212 264 221 329 384 279 228 287 411 677 386 575
Colombia 120 464 441 417 453 521 513 594 680 868 1,223 1,675 907 832
Japan 8,142 11,149 8,893 9,292 8,884 8,384 8,906 8,147 7,931 8,390 10,159 13,223 11,072 11,819
Korea, South 2,650 3,742 2,449 2,546 2,588 2,673 2,886 2,489 2,233 2,851 3,528 5,561 3,917 5,308
Taiwan \3 1,663 2,591 1,945 1,996 2,010 1,966 2,025 2,065 2,301 2,477 3,097 3,419 2,988 3,190
China \3 \4 818 2,633 854 1,716 1,939 2,068 5,017 5,542 5,233 6,711 8,314 12,115 13,109 17,522
Hong Kong 702 1,503 1,209 1,264 1,228 1,091 1,114 913 872 977 1,168 1,715 2,008 2,808
India 108 194 145 210 353 274 317 257 295 365 475 489 691 755
Indonesia 275 816 531 668 907 810 996 925 958 1,102 1,542 2,195 1,796 2,246
Philippines 381 765 783 901 794 776 626 695 798 888 1,112 1,734 1,294 1,634
Thailand 275 590 409 493 570 611 684 686 675 703 885 1,063 1,046 1,152
Australia 226 339 320 317 292 339 412 410 463 520 662 826 840 928
European Union \5 7,474 8,789 6,858 6,515 6,676 6,398 6,736 6,953 7,052 7,408 8,754 10,080 7,445 8,894
USSR 2,248 1,233 839 670 1,060 695 740 1,112 1,227 1,025 1,665 2,304 1,736 1,454
Saudi Arabia 482 517 447 477 429 343 332 364 350 426 710 890 694 840
Turkey 226 516 502 658 571 675 921 944 1,062 1,030 1,496 1,696 1,499 2,112
Egypt 687 1,309 966 1,050 1,022 863 967 935 819 1,022 1,801 2,050 1,354 2,092
South Africa 81 267 173 134 99 148 149 169 146 126 291 393 162 292
Oceania 343 506 486 490 473 512 621 601 742 760 963 1,189 1,282 1,394
(1) 55418724 (2) 26490533 (3) 13560738 (4) 8083174 (5) 6362753 (6) 3097378.91 (7) 1717442 (8) 1214768 (9) 344449 (10) 137884 (11) 67213 (12) 49836 (13) 11222 (14) 6340
Eigenvalues
Principal Coordinate Analysis
Gower, 1966
Start with a matrix with n rows (objects) and p columns (descriptors)
Compute the Euclidean distance (between rows) matrix [ ]
Transform into new matrix = [a ], such that :
hi
hi
d
n×p
n×n
n×n n×n
Y
D
D A
21 = -
2
Then compute centred matrix [ ] : . . ..
Finally, scale eigen vectors so that
If eigen vectors are arranged as columns. Rows of the
hi hi
hi hi hi h i
k
a d
a a a a
n×p
k k
Δ
u u
resulting table are the
coordinates of the objects in the space of principal coordinates.
Oxford Newcastle Plymouth Dover Glasgow Bristol
Oxford 0 258 187 144 363 72
Newcastle 258 0 408 368 145 297
Plymouth 187 408 0 287 484 118
Dover 144 368 287 0 498 194
Glasgow 363 145 484 498 0 372
Bristol 72 297 118 194 372 0
Distances Between British Cities
Q-Mode
0 258 187 144 363 72
258 0 408 368 145 297
187 408 0 287 484 118
144 144 287 0 498 194
363 363 484 498 0 372
72 72 118 194 372 0
6×6D
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.1473475 0.1128741 0.04538458 0.4082483 0.81339727 0.3677045 [2,] -0.4335791 0.2304177 0.65857972 0.4082483 -0.30873244 0.2514103 [3,] 0.3970476 -0.6344915 0.35761362 0.4082483 -0.03729046 -0.3792480 [4,] 0.3754683 0.6761005 -0.15553511 0.4082483 -0.16408006 -0.4291055 [5,] -0.6752523 -0.1782828 -0.42405981 0.4082483 0.13883744 -0.3827275 [6,] 0.1889680 -0.2066181 -0.48198299 0.4082483 -0.44213175 0.5719662
Distances computed from Google maps – road distances are not straight – so ordination will have 3 or more dimensions but should be dominated by 2-dimensions
1.931479e+05 = 19314.97 4.733022e+04 = 4733.022 3.820062e+03 = 382.0062 4.820322e-11 -4.952561e+02 -6.316731e+03
Eigenvalues
Can be ignored
Eigen vectors
Ordination Plot: First Two Eigen Vectors
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Angouleme 4.2 4.9 7.9 10.4 13.6 17.0 18.7 18.4 16.1 11.7 7.6 4.9 Angers 4.6 5.4 8.9 11.3 14.5 17.2 19.5 19.4 16.9 12.5 8.1 5.3 Besancon 1.1 2.2 6.4 9.7 13.6 16.9 18.7 18.3 15.5 10.4 5.7 2.0 Biarritz 7.6 8.0 10.8 12.0 14.7 17.8 19.7 19.9 18.5 14.8 10.9 8.2 Bordeaux 5.6 6.6 10.3 12.8 15.8 19.3 20.9 21.0 18.6 13.8 9.1 6.2 Brest 6.1 5.8 7.8 9.2 11.6 14.4 15.6 16.0 14.7 12.0 9.0 7.0 Cler-Ferr 2.6 3.7 7.5 10.3 13.8 17.3 19.4 19.1 16.2 11.2 6.6 3.6 Dijon 1.3 2.6 6.9 10.4 14.3 17.7 19.6 19.0 15.9 10.5 5.7 2.1 Grenoble 1.5 3.2 7.7 10.6 14.5 17.8 20.1 19.5 16.7 11.4 6.5 2.3 Lille 2.4 2.9 6.0 8.9 12.4 15.3 17.1 17.1 14.7 10.4 6.1 3.5 Limoges 3.1 3.9 7.4 9.9 13.3 16.8 18.4 17.8 15.3 10.7 6.7 3.8 Lyon 2.1 3.3 7.7 10.9 14.9 18.5 20.7 20.1 16.9 11.4 6.7 3.1 Marseille 5.5 6.6 10.0 13.0 16.8 20.8 23.3 22.8 19.9 15.0 10.2 6.9 Montpellier 5.6 6.7 9.9 12.8 16.2 20.1 22.7 22.3 19.3 14.6 10.0 6.5 Nancy 0.8 1.6 5.5 9.2 13.3 16.5 18.3 17.7 14.7 9.4 5.2 1.8 Nantes 5.0 5 3 8.4 10.8 13.9 17.2 18.8 18.6 16.4 12.2 8.2 5.5 Nice 7.5 8.5 10.8 13.3 16.7 20.1 22.7 22.5 20.3 16.0 11.5 8.2 Nimes 5.7 6.8 10.1 13.0 16.6 20.8 23.6 22.9 19.7 14.6 9.8 6.5 Orleans 2.7 3.6 6.9 9.8 13.4 16.6 18.4 18.2 15.6 10.9 6.6 3.6 Paris 3.4 4.1 7.6 10.7 14.3 17.5 19.1 18.7 16.0 11.4 7.1 4.3 Perpignan 7.5 8.4 11.3 13.9 17.1 21.1 23.8 23.3 20.5 15.9 11.5 8.6 Reims 1.9 2.8 6.2 9.4 13.3 16.4 18.3 17.9 15.1 10.3 6.1 3.0 Rennes 4.8 5.3 7.9 10.1 13.1 16.2 17.9 17.8 15.7 11.6 7.8 5.4 Rouen 3.4 3.9 6.8 9.5 12.9 15.7 17.6 17.2 15.0 11.0 6.8 4.3 St-Quent 2.0 2.9 6.3 9.2 12.7 15.6 17.4 17.4 15.0 10.5 6.1 3.1 Strasbourg 0.4 1.5 5.6 9.8 14.0 17.2 19.0 18.3 15.1 9.5 4.9 1.3 Toulon 8.6 9.1 11.2 13.4 16.6 20.2 22.6 22.4 20.5 16.5 12.6 9.7 Toulouse 4.7 5.6 9.2 11.6 14.9 18.7 20.9 20.9 18.3 13.3 8.6 5.5 Tours 3.5 4.4 7.7 10.6 13.9 17.4 19.1 18.7 16.2 11.7 7.2 4.3 Vichy 2.4 3.4 7.1 9.9 13.6 17.1 19.3 18.8 16.0 11.0 6.6 3.4
Monthly Average Temperatures: 30 French Cities
Eigen values
1113.8 170.4 4.6 2.7 1.2 0.6 0.5 0.3 0.2 0.1 0.1 0
In practice only 2 dimensions in these data
30 in total
Mediterranean Coast
Brittany Coast
Southwest
South and West of Paris
North and East of Paris – Belgian Border
South East of Paris
North and East of Paris – Belgian Border
Brittany Coast
Mediterranean Coast
South East of Paris South and West of Paris
Correspondence Analysis
Many people stretching back to 1933
11 12 13
21 22 23
31 32 33
41 42 43
O O O
O O O
O O O
O O O
11 12 13
21 22 23
31 32 33
41 42 43
E E E
E E E
E E E
E E E
Matrix of observed counts
Matrix of expectations
ij ij ijijj i ji
ij ij
i jij ij ij
i j i j i j
O O OO
E OO O O
Under hypothesis that row and column are independent
2
2 2
( 1) ( 1) 3 2
( )ij ij
n p
i j ij
O E
E
and
follows a chi-squared distribution
10 50 90
20 60 100
30 70 110
40 80 120
19.23 50.00 80.77
23.08 60.00 96.92
26.92 70.00 113.08
30.77 80.00 129.23
X-squared = 9.8576, df = 3*2=6, p-value = 0.1308
non-significant rows and columns independent
10 50 120
20 60 110
30 70 100
40 80 90
12.82 33.33 53.85
14.10 36.67 59.23
15.38 40.00 64.62
16.67 43.33 70.00
X-squared = 36.7282, df = 3*26, p-value = 1.989e-06
significant result Rows, columns not independent
ijEijO
Marsh Lotus Open
Swamp Swamp Water
Purple swamphen 798 78 25
Yellow-vented bulbul 690 101 129
Pink-necked green pigeon 614 150 90
Peaceful dove 462 101 84
Spotted dove 386 56 67
Pacific swallow 208 39 85
White-breasted waterhen 200 38 25
Baya weaver 173 7 52
Common myna 166 17 51
Purple heron 164 52 22
Yellow bittern 162 42 11
Jungle myna 154 15 117
White-throated kingfisher 128 51 42
Scaly-breasted munia 125 36 49
Relative abundance of bird species recorded at three habitats of Paya IndahWetland Reserve, Peninsular Malaysia.
Chi-squared = 505.9142, df = 13*2=26, p-value < 2.2e-16
International Journal of Zoology Volume 2011, Article ID 758573, 17 pages doi:10.1155/2011/758573
Bird Species Abundance and Their Correlationship with Microclimate and Habitat Variables at Natural Wetland Reserve, Peninsular Malaysia
2
2
( 1) ( 1)
( )Earlier we saw that
ij ij
n p
i j ij
O E
E
( )1Now we start with
ij ij
ij
ij iji j
O E
O E
r×cQ
: Apply SVD to r×c r×c r×r r×c c×c
Q Q V W U
orthonormal orthonormal diagonal
c×c
We know from an earlier discussion that
are the eigen vectors of
are the eigen vectors of
Diagonal elements of W are square roots of eigenval
ues, ii iw
c×c c×r r×c
r×r r×c c×r
U Q Q
V Q Q
We will also need : [ ]ij
ij
ij
i j
Op
O
r×cP
Matrices and can be used to plot the positions
of the row and column vectors in two seperate scatter diagrams
c×c r×rU V
Eigen Values (1) 2.506575e-01 (2) 1.436226e-01 (3) 2.032566e-17
For joint plots a number of different scalings have been proposed :
For example :
Where is a diagonal matrix in which the diagonals are the
reciprocals of the square roo
ts of the
-1/2
c×c c×c c×c
-1/2
c×c
X D U
D
column totals of
And
Where is a diagonal matrix in which the diagonals are the
reciprocals of row totals of
And
Finally, plot column 1 of against c
r r
r×c
-1
r×c r×r r×c c×c
-1
×
r×c
c×c c×c c×c
r×c
P
F D P X
D
P
G X W
F olumn2 of : on same graph
plot column1 of against column 2 of
r×c
c×c c×c
F
G G
H+ H- A+ A- "+" "-" GD
1 Manchester City 55 12 38 17 93 29 64 2 Manchester United 52 19 37 14 89 33 56 3 Arsenal 39 17 35 32 74 49 25 4 Tottenham Hotspur 39 17 27 24 66 41 25 5 Newcastle United 29 17 27 34 56 51 5 6 Chelsea 41 24 24 22 65 46 19 7 Everton 28 15 22 25 50 40 10 8 Liverpool 24 16 23 24 47 40 7 9 Fulham 36 26 12 25 48 51 -3
10 West Bromwich Albion 21 22 24 30 45 52 -7 11 Swansea City 27 18 17 33 44 51 -7
12 Norwich City 28 30 24 36 52 66 -14 13 Sunderland 26 17 19 29 45 46 -1 14 Stoke City 25 20 11 33 36 53 -17 15 Wigan Athletic 22 27 20 35 42 62 -20 16 Aston Villa 20 25 17 28 37 53 -16 17 Queens Park Rangers 24 25 19 41 43 66 -23 18 Bolton Wanderers 23 39 23 38 46 77 -31
19 Blackburn Rovers 26 33 22 45 48 78 -30
20 Wolverhampton Wndrs
19 43 21 39 40 82 -42
Premier League Final Table 2011-2012
Data used
Principal inertias (eigenvalues):
1 2 3 Value 0.053538 0.008691 0.00678 Percentage 77.58% 12.59% 9.82%
Redundancy Analysis and Canonical Correspondence Analysis
In this context canonical means that we have two matrices or, alternatively, two sets of descriptors for one set of objects
Rao, 1964 Ter Braak, 1986
1 2 3 4 5 6 7
1 3 8
2 0 9
3 13 4
4 5 5
8 19
23
77 7 3
Species Si
tes
1 2 3 4 5 6 7
1 45.8 78.6
2 32.8 98.5
3 56.1 45.0
4
77 78.3
Counts
Environmental
Site
s
e.g. Soil, Climate, etc.
Real Numbers
Typical Example of Ecology Data
Indirect Comparison: The matrix of explanatory variables, , does not intervene in the in the calculation that produces the ordination of . Correlation or regression of the ordination vectors on are carried out first and the ordination is carried out on a modified . In a direct comparison the matrix X intervenes in the calculation , forcing the ordination vectors, , to be maximally related to combinations of the columns of . In mathematics more generally, a canonical form is the simplest and most comprehensive form to which certain functions , relations, or expressions can be reduced without loss of generality. For example, the canonical form of a covariance matrix is its matrix of eigenvalues
n×pY
n×pX
n×pXn×pY
n×pX
n×pY
n×pY
n×pY
n×pY
n×mXn×1
y
n×mX
Simple ordination Principal Components Correspondence Analysis
Ordination of under constrained Multiple regression
n×1y
n×pX
Ordination of under constrained Redundancy Analysis Canonical Correspondence Analysis
n×pXn×pY
1 1 1 1 1 1
1 1
1 1 1 1 1 1
1 1 1
, , , ,
, , , ,
, , , ,
, , , ,
. . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . .
. . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . .
p m
p p p p p m
p m
p m m m m
y y y y y x y x
y y y y y x y x
x y x y x x x x
x y x y x x x x
S S S S
S S S S
S S S S
S S S S
Y+XS
YY YX YY YX
XY XX YX XX
S S S S
S S S S
Variance covariance matrix of n×pY
Variance covariance matrix of n×pX
Covariances amongst descriptors of
X and Y
Covariances amongst descriptors of
X and Y
In principal component analysis the eigen analysis equation was :
In redundancy analysis it is
0
:
0
k
k
YY k
-1 T
YX XX YX k
S I u
S S S I u
An Aside on Multiple Linear Regression
2
0 1 1 2 2 .... ,y b b x b x e e N
1 0 1 11 2 12
2 0 1 21 2 22
3 0 1 31 2 32
0 1 1 2 2
.......
.......
.......
.
.
.......n n n
y b b x b x
y b b x b x
y b b x b x
y b b x b x
1 11 12
2 21 22 0
3 31 32 1
3
1 2
1 . .
1 . .
1 . .
. . . . . .
. . . . . . .
. . . . . . .
1 . .n n n
y x x
y x x b
y x x b
b
y x x
ˆ
ˆ
ˆ
ˆ
ˆ
-1 -1
-1
-1
y = Xb
X y = (X X)b
(X X) X y = (X X) (X X)b
(X X) X y = Ib
(X X) X y = bLeast squares solution
Coefficients to be estimated Data
The Algebra of Redundancy Analysis
Centre both response matrix and matrix of independent variables
by subtracting the column means from the column values / elements
For each column in compute , giving
Fo
.
ˆ ˆ
r
-1 -1
Y X
Y b (X X) X y B (X X) X Y
ˆ ˆ
ˆ ˆ ˆˆ ,
ˆ ˆ
each column of compute fitted values giving =
[1/ (1 )]
= [1/ (1 )]
[1/ (1 )]
= [1/ (1 )
=
]
n
n
n
n
-1
XX
Y Y
-1 -1
-1
-1
YX
Y y Xb Y XB X(X X) X Y
S X X
S Y Y
Y X(X X) X X(X X) X Y
Y X(X X) X Y
S S
ˆ ˆSo, perform redundancy analysis by solving
: 0k k
XX YX
-1
k YX XX YX kY Y
S
S I u S S S I u
Burglary Violence Robbery Pop
Density
Bath 3 5.7 0.4 5
Birmingham 7.2 11.6 4.1 39
Brighton 4.5 7.6 1.5 31
Bristol 9.8 10.4 2.9 40
Cambridge 7.6 7.6 1.6 31
Canterbury 2.7 6.3 0.5 5
Cardiff 6 8.6 1.1 24
Coventry 7.4 11.2 2.2 32
Lancaster 2.5 7.7 0.4 2
Leeds 12 7.5 1.7 14
Leicester 8.6 11.7 2.6 42
Liverpool 9.1 7.7 2.3 40
Manchester 13.2 11.5 5.1 43
Newcastle 5.2 8.6 0.9 26
Nottingham 11.7 15.1 4.3 41
Oxford 5.2 8.3 1.7 34
Sheffield 6.9 7.5 1.2 15
Southampton 4.9 14.6 1.8 48
Swansea 4.7 7.3 0.4 6
York 4.4 6.7 0.5 7
Bournemouth 5.2 12.9 0.9 36
Plymouth 4 10.7 0.7 32
Norwich 3.9 8.9 1.1 37
Lincoln 6 9.8 0.9 25
Objects
Source: http://www.thecompleteuniversityguide.co.uk/preparing-to-go/staying-safe-and-secure/how-safe-is-your-city/
Descriptors
http://www.guardian.co.uk/news/datablog/interactive/2011/jun/30/uk-population-mapped
Population density /ha Population density /ha
Population density /ha
Burglary Violence
Robbery
Eigen values (1) 6.612048e+00 (2) 2.664535e-15 (3) -1.110223e-16
1 1 1 2 1 3
2 1 2 2 2 3
3 1 3 2 3 3
1 2 3
ˆ ˆ ˆ
ˆ ˆ ˆ
ˆ ˆ ˆ
. . .
. . .
. . .
ˆ ˆ ˆn n n
x b x b x b
x b x b x b
x b x b x b
x b x b x b
11 12 13
21 22 23
31 32 33
1 2 3
. . .
. . .
. . .
n n n
y y y
y y y
y y y
y y y
1 1 1 2 1 311 12 13 1
21 22 23 2 2 1 2 2 2 3
31 32 33 3 3 1 3 2 3 3
1 2 3
1 2 31 2 3
ˆ ˆ ˆ
ˆ ˆ ˆ
ˆ ˆ ˆ
ˆ ˆ ˆ. . . . . .
. . . . . .
. . . . . .
ˆ ˆ ˆn n n nn n n
x b x b x by y y x
y y y x x b x b x b
y y y x x b x b x b
b b b
y y y xx b x b x b
Regression model fitted
Matrix used in PCA, Rank = 3 Matrix used in Redundancy Analysis, Rank = 1
Eigen Values: (1) 6.706016e+00 (2) 7.439234e-02 (3) 6.071489e-18
Second attempt: fitting a quadratic instead of a straight line
11 12 13
1 2 321 22 23
11 21 3131 32 33
12 22 3241 42 43
11 21 31
12 22 32
1 1 0 1 0
ˆ ˆ ˆ1 1 0 0 1
ˆ ˆ ˆ1 0 1 1 0
ˆ ˆ ˆ1 0 1 0 1
ˆ ˆ ˆ. . . . . . . .
ˆ ˆ ˆ. . . . . . . .
. . . . . . . .
ˆ
y y y
y y y
y y y
y y y
1 11 11 2 21 21 3 31 31
1 11 12 2 21 22 3 31 32
1 12 11 2 22 21 3 32 31
1 12 12 2 22 22 3 32 32
ˆ ˆ ˆˆ ˆ ˆˆ ˆ
ˆ ˆ ˆˆ ˆ ˆˆ ˆ ˆ
ˆ ˆ ˆˆ ˆ ˆˆ ˆ ˆ
ˆ ˆ ˆˆ ˆ ˆˆ ˆ ˆ
. . .
. . .
. . .
1st replicate of a 2 by 2 factorial
Used in redundancy analysis
R Software
http://www.r-project.org/index.html
top related