view of aegean sea and island of lesbos. turkey, august...
TRANSCRIPT
Michael GreenacreUniversitat Pompeu Fabra
Barcelona
Michael GreenacreUniversitat Pompeu Fabra
Barcelona
Simple correspondence analysis (CA),Multiple correspondence analysis (MCA),
Joint correspondence analysis (JCA), as well as all subset versions of these,
using R
package ca.
Oleg Nenadić
& Michael GreenacreUniversity of Göttingen
& Universitat
Pompeu
Fabra
View of Aegean Sea and island of Lesbos. Turkey, August 2010.
AssosVenue
for
CARME in ASSOS
ca
package
function
ca function
mjca
(simple) correspondence analysis
(CA)
multiple
correspondence
analysis (MCA)
adjusted
MCA
joint correspondence analysis (JCA)
subset CA, subset MCA, adjusted
subset MCA, subset JCA
subset versions
Contribution coordinates
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
-6 -4 -2 0 2 4 6
-4-2
02
46
8
••
•
•
•••
•
•
•••
•
•
•
•••
• •••••
•••
• •
••
•
•
••
•
•• •
• ••14:0
14:1(n-5)
i-15:0a-15:015:0
15:1(n-6)
i-16:0
16:0
16:1(n-9)
16:1(n-7)
16:1(n-5)
i-17:0
a-17:0
16:2(n-4)
17:0
16:3(n-4)
16:4(n-1)
18:0
18:1(n-9)
18:1(n-7)
18:2(n-6)
18:3(n-6)
18:3(n-3)
18:4(n-3)
20:0
20:1(n-11)
20:1(n-9)20:1(n-7)
20:2(n-6)
20:3(n-6)20:4(n-6)
20:3(n-3)
20:4(n-3)20:5(n-3)
22:1(n-11)22:1(n-9)22:1(n-7)22:5(n-3)22:6(n-3)
24:1(n-9)
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
••
•
•
•••
•
•
•••
•
•
•
•••
• •••••
•••
• •
••
•
•
••
•
•• •
• ••
16:1(n-7)
18:0
18:4(n-3)
20:1(n-9)
20:5(n-3)
22:1(n-11)
asymmetric
map: map="rowprincipal"
contribution
coordinates: map="rowgreen"
See Biplots in Practice (Greenacre
2010) www.multivariatestatistics.org
Problem of variance explained> summary(mjca(wg93[,1:4], lambda="indicator"))
Principal inertias
(eigenvalues):
dim
value
% cum% scree
plot1 0.457379 11.4 11.4 *************************2 0.430966 10.8 22.2 *********************** 3 0.321926 8.0 30.3 *************** : : : :
> summary(mjca(wg93[,1:4], lambda="Burt"))Principal inertias
(eigenvalues):dim
value
% cum% scree
plot1 0.209196 18.6 18.6 *************************2 0.185732 16.5 35.0 ********************** 3 0.103636 9.2 44.2 *********** : : : :> summary(mjca(wg93[,1:4], lambda="adjusted")))) #DEFAULTPrincipal inertias
(eigenvalues):
dim
value
% cum% scree
plot1 0.076455 44.9 44.9 *************************2 0.058220 34.2 79.1 ******************* 3 0.009197 5.4 84.5 *** : : : : > summary(mjca(wg93[,1:4]), lambda="JCA"))Percentage explained by JCA in 2 dimensions: 85.7%(Eigenvalues
are not nested)[Iterations in JCA: 44 , epsilon = 9.91e-05]
increasinginertia
explained
Same problem for individual points> summary(mjca(wg93[,1:4], lambda="Burt"))Principal inertias
(eigenvalues):dim
value
% cum% scree
plot1 0.209196 18.6 18.6 *************************2 0.185732 16.5 35.0 ********************** 3 0.103636 9.2 44.2 *********** : : : : :
name
mass
qlt
inr
k=1 cor
ctr
k=2 cor
ctr1 | A1 | 34 445 55 | -840 391 53 | -314 54 8 |2 | A2 | 92 169 38 | -250 136 13 | 123 33 3 |3 | A3 | 59 344 47 | 204 47 5 | 517 298 36 |4 | A4 | 51 350 50 | 533 258 32 | -318 92 12 |5 | A5 | 14 401 60 | 913 170 25 | -1064 231 36 |6 | B1 | 20 621 62 | -1338 519 80 | -590 101 16 |7 | B2 | 50 158 47 | -293 80 9 | 287 77 10 |8 | B3 | 59 227 45 | -158 29 3 | 415 198 24 |9 | B4 | 81 210 41 | 327 185 19 | 121 25 3 |10 | B5 | 40 722 60 | 619 229 34 | -908 493 77 |11 | C1 | 44 732 60 | -987 632 93 | -392 100 16 |12 | C2 | 91 164 38 | -113 27 3 | 255 137 14 |13 | C3 | 57 296 48 | 283 84 10 | 450 212 27 |14 | C4 | 44 345 52 | 617 289 37 | -274 57 8 |15 | C5 | 15 471 60 | 671 99 15 | -1300 372 59 |16 | D1 | 17 251 56 | -551 83 11 | -785 168 25 |17 | D2 | 67 14 42 | 101 14 1 | 3 0 0 |18 | D3 | 58 303 48 | 176 33 4 | 499 269 34 |19 | D4 | 65 25 43 | 101 14 1 | 91 11 1 |20 | D5 | 43 272 50 | -324 81 10 | -496 191 25 |
Same problem for individual points> summary(mjca(wg93[,1:4], lambda="Burt"))Principal inertias
(eigenvalues):dim
value
% cum% scree
plot1 0.209196 18.6 18.6 *************************2 0.185732 16.5 35.0 ********************** 3 0.103636 9.2 44.2 ***********
> mjca(wg93[,1:4])$BurtA1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 D1 D2 D3 D4 D5
A1 119 0 0 0 0 27 28 30 22 12 49 40 18 7 5 15 25 17 34 28A2 0 322 0 0 0 38 74 84 96 30 67 142 60 41 12 22 102 76 68 54A3 0 0 204 0 0 3 48 63 73 17 18 75 70 34 7 10 44 68 58 24A4 0 0 0 178 0 3 21 23 79 52 16 50 40 56 16 9 52 28 54 35A5 0 0 0 0 48 0 3 5 11 29 2 9 9 16 12 4 9 13 12 10B1 27 38 3 3 0 71 0 0 0 0 43 19 4 3 2 9 17 10 10 25B2 28 74 48 21 3 0 174 0 0 0 36 88 34 15 1 16 51 42 45 20B3 30 84 63 23 5 0 0 205 0 0 37 90 57 19 2 10 53 63 51 28B4 22 96 73 79 11 0 0 0 281 0 27 88 75 74 17 6 66 70 92 47B5 12 30 17 52 29 0 0 0 0 140 9 31 27 43 30 19 45 17 28 31C1 49 67 18 16 2 43 36 37 27 9 152 0 0 0 0 25 24 15 38 50C2 40 142 75 50 9 19 88 90 88 31 0 316 0 0 0 15 97 67 89 48C3 18 60 70 40 9 4 34 57 75 27 0 0 197 0 0 5 51 83 41 17C4 7 41 34 56 16 3 15 19 74 43 0 0 0 154 0 6 44 30 51 23C5 5 12 7 16 12 2 1 2 17 30 0 0 0 0 52 9 16 7 7 13D1 15 22 10 9 4 9 16 10 6 19 25 15 5 6 9 60 0 0 0 0D2 25 102 44 52 9 17 51 53 66 45 24 97 51 44 16 0 232 0 0 0D3 17 76 68 28 13 10 42 63 70 17 15 67 83 30 7 0 0 202 0 0D4 34 68 58 54 12 10 45 51 92 28 38 89 41 51 7 0 0 0 226 0D5 28 54 24 35 10 25 20 28 47 31 50 48 17 23 13 0 0 0 0 151
Joint correspondence analysis
> mjca(wg93[,1:4], lambda="JCA")$Burt.upd
A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 D1 D2 D3 D4 D5A1 31 53 19 14 3 27 28 30 22 12 49 40 18 7 5 15 25 17 34 28A2 53 131 77 52 10 38 74 84 96 30 67 142 60 41 12 22 102 76 68 54A3 19 77 63 39 7 3 48 63 73 17 18 75 70 34 7 10 44 68 58 24A4 14 52 39 54 20 3 21 23 79 52 16 50 40 56 16 9 52 28 54 35A5 3 10 7 20 9 0 3 5 11 29 2 9 9 16 12 4 9 13 12 10B1 27 38 3 3 0 21 20 18 8 3 43 19 4 3 2 9 17 10 10 25B2 28 74 48 21 3 20 46 54 50 4 36 88 34 15 1 16 51 42 45 20B3 30 84 63 23 5 18 54 65 64 4 37 90 57 19 2 10 53 63 51 28B4 22 96 73 79 11 8 50 64 104 55 27 88 75 74 17 6 66 70 92 47B5 12 30 17 52 29 3 4 4 55 74 9 31 27 43 30 19 45 17 28 31C1 49 67 18 16 2 43 36 37 27 9 82 55 4 3 7 25 24 15 38 50C2 40 142 75 50 9 19 88 90 88 31 55 126 79 46 9 15 97 67 89 48C3 18 60 70 40 9 4 34 57 75 27 4 79 66 41 6 5 51 83 41 17C4 7 41 34 56 16 3 15 19 74 43 3 46 41 45 18 6 44 30 51 23C5 5 12 7 16 12 2 1 2 17 30 7 9 6 18 11 9 16 7 7 13D1 15 22 10 9 4 9 16 10 6 19 25 15 5 6 9 9 15 5 13 18D2 25 102 44 52 9 17 51 53 66 45 24 97 51 44 16 15 62 56 61 38D3 17 76 68 28 13 10 42 63 70 17 15 67 83 30 7 5 56 64 56 21D4 34 68 58 54 12 10 45 51 92 28 38 89 41 51 7 13 61 56 60 36D5 28 54 24 35 10 25 20 28 47 31 50 48 17 23 13 18 38 21 36 38
• default: two-dimensional solution• at convergence
the
diagonal blocks
are perfectly
fitted
Joint correspondence analysis> summary(mjca(wg93[,1:4], lambda="JCA"))Principal inertias
(eigenvalues):dim
value1 0.0990912 0.065033: :
--------Total: 0.182425
Diagonal inertia
discounted
from
eigenvalues: 0.0547405Percentage
explained
by JCA in 2 dimensions: 85.7%(Eigenvalues
are not
nested)[Iterations
in JCA: 44 , epsilon
= 9.91e-05]
857.00547405.0182425.0
0547405.0)065033.0099091.0(
Subset version
of
JCA available
in new
version: i.e., a subset of
the categories
is
specified, and
the
analysis
fits
these
optimally, using
the
original margins
of
the
Burt matrix, omitting
the
(subsets
of) categories
in the
diagonal blocks.
Adjusted MCA
-1.5 -1.0 -0.5 0.0 0.5 1.0
-1.0
-0.5
0.0
0.5
A1
A2
A3
A4
A5
B1
B2
B3
B4
B5
C1
C2
C3
C4
C5
D1
D2
D3
D4
D5
-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
A1
A2
A3
A4
A5
B1
B2B3
B4
B5
C1
C2
C3
C4
C5
D1
D2
D3
D4
D5
Burt: 1
, 2
, …
35% explained Adjusted: 1
*, 2
*, …
79% explained
22
2* )1(
)1( QQQ
ii
Adjusted MCA –
nullifying the Burt matrixA1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 D1 D2 D3 D4 D5
A1 119 0 0 0 0 27 28 30 22 12 49 40 18 7 5 15 25 17 34 28A2 0 322 0 0 0 38 74 84 96 30 67 142 60 41 12 22 102 76 68 54A3 0 0 204 0 0 3 48 63 73 17 18 75 70 34 7 10 44 68 58 24A4 0 0 0 178 0 3 21 23 79 52 16 50 40 56 16 9 52 28 54 35A5 0 0 0 0 48 0 3 5 11 29 2 9 9 16 12 4 9 13 12 10B1 27 38 3 3 0 71 0 0 0 0 43 19 4 3 2 9 17 10 10 25B2 28 74 48 21 3 0 174 0 0 0 36 88 34 15 1 16 51 42 45 20B3 30 84 63 23 5 0 0 205 0 0 37 90 57 19 2 10 53 63 51 28B4 22 96 73 79 11 0 0 0 281 0 27 88 75 74 17 6 66 70 92 47B5 12 30 17 52 29 0 0 0 0 140 9 31 27 43 30 19 45 17 28 31C1 49 67 18 16 2 43 36 37 27 9 152 0 0 0 0 25 24 15 38 50C2 40 142 75 50 9 19 88 90 88 31 0 316 0 0 0 15 97 67 89 48C3 18 60 70 40 9 4 34 57 75 27 0 0 197 0 0 5 51 83 41 17C4 7 41 34 56 16 3 15 19 74 43 0 0 0 154 0 6 44 30 51 23C5 5 12 7 16 12 2 1 2 17 30 0 0 0 0 52 9 16 7 7 13D1 15 22 10 9 4 9 16 10 6 19 25 15 5 6 9 60 0 0 0 0D2 25 102 44 52 9 17 51 53 66 45 24 97 51 44 16 0 232 0 0 0D3 17 76 68 28 13 10 42 63 70 17 15 67 83 30 7 0 0 202 0 0D4 34 68 58 54 12 10 45 51 92 28 38 89 41 51 7 0 0 0 226 0D5 28 54 24 35 10 25 20 28 47 31 50 48 17 23 13 0 0 0 0 151
Adjusted MCA –
nullified Burt matrixA1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 D1 D2 D3 D4 D5
A1 119 0 0 0 0
27 28 30 22 12 49 40 18 7 5 15 25 17 34 28A2 0 322 0 0 0 38 74 84 96 30 67 142 60 41 12 22 102 76 68 54A3 0 0 204 0 0 3 48 63 73 17 18 75 70 34 7 10 44 68 58 24A4 0 0 0 178 0 3 21 23 79 52 16 50 40 56 16 9 52 28 54 35A5 0 0 0 0 48 0 3 5 11 29 2 9 9 16 12 4 9 13 12 10B1 27 38 3 3 0 71 0 0 0 0
43 19 4 3 2 9 17 10 10 25B2 28 74 48 21 3 0 174 0 0 0
36 88 34 15 1 16 51 42 45 20B3 30 84 63 23 5 0 0 205 0 0
37 90 57 19 2 10 53 63 51 28B4 22 96 73 79 11 0 0 0 281 0
27 88 75 74 17 6 66 70 92 47B5 12 30 17 52 29 0 0 0 0 140
9 31 27 43 30 19 45 17 28 31C1 49 67 18 16 2 43 36 37 27 9 152 0 0 0 0
25 24 15 38 50C2 40 142 75 50 9 19 88 90 88 31 0 316 0 0 0
15 97 67 89 48C3 18 60 70 40 9 4 34 57 75 27 0 0 197 0 0
5 51 83 41 17C4 7 41 34 56 16 3 15 19 74 43 0 0 0 154 0
6 44 30 51 23C5 5 12 7 16 12 2 1 2 17 30 0 0 0 0 52
9 16 7 7 13D1 15 22 10 9 4 9 16 10 6 19 25 15 5 6 9 60 0 0 0 0D2 25 102 44 52 9 17 51 53 66 45 24 97 51 44 16 0 232 0 0 0D3 17 76 68 28 13 10 42 63 70 17 15 67 83 30 7 0 0 202 0 0D4 34 68 58 54 12 10 45 51 92 28 38 89 41 51 7 0 0 0 226 0D5 28 54 24 35 10 25 20 28 47 31 50 48 17 23 13 0 0 0 0 151
0
0
0
0
•
Perform
eigendecomposition
on
B0
(suitably
centred
& normalized, as in MCA)
•
The
POSITIVE eigenvalues
are exactly
the
adjusted
inertias•
Adjustments
for
each
category
obtained
in same
way
B0
=
Results for new version (default is “adjusted”)> summary(mjca(wg93[,1:4]))
Principal inertias
(eigenvalues):dim
value
% cum% scree
plot1 0.076455 44.9 44.9 *************************2 0.058220 34.2 79.1 ******************* 3 0.009197 5.4 84.5 ***
: : : : :name
mass
qlt
inr
k=1 cor
ctr
k=2 cor
ctr1 | A1 | 34 963 55 | 508 860 115 | -176 103 18 |2 | A2 | 92 659 38 | 151 546 28 | 69 113 7 |3 | A3 | 59 929 47 | -124 143 12 | 289 786 84 |4 | A4 | 51 798 50 | -322 612 69 | -178 186 28 |5 | A5 | 14 799 60 | -552 369 55 | -596 430 84 |6 | B1 | 20 911 62 | 809 781 174 | -331 131 38 |7 | B2 | 50 631 47 | 177 346 21 | 161 285 22 |8 | B3 | 59 806 45 | 96 117 7 | 233 690 55 |9 | B4 | 81 620 41 | -197 555 41 | 68 65 6 |10 | B5 | 40 810 60 | -374 285 74 | -509 526 179 |11 | C1 | 44 847 60 | 597 746 203 | -219 101 36 |12 | C2 | 91 545 38 | 68 101 6 | 143 444 32 |13 | C3 | 57 691 48 | -171 218 22 | 252 473 62 |14 | C4 | 44 788 52 | -373 674 80 | -153 114 18 |15 | C5 | 15 852 60 | -406 202 32 | -728 650 136 |16 | D1 | 17 782 56 | 333 285 25 | -440 497 57 |17 | D2 | 67 126 42 | -61 126 3 | 2 0 0 |18 | D3 | 58 688 48 | -106 87 9 | 280 601 78 |19 | D4 | 65 174 43 | -61 103 3 | 51 71 3 |20 | D5 | 43 869 50 | 196 288 22 | -278 581 57 |
Subset version also
available,
using
nullified Burt matrix
as
before
Packages with CA
• ca
• FactoMiner
• vegan
• ade4
• MASS
• caGUI
• biplotGUI
• …
I - 0Correspondence analysis with ca
Correspondence analysis with ca
Tutorial presented at the CARME 2011 in Rennes, FranceFebruary 8, 2011
M. Greenacre, O. Nenadi
I - 1Correspondence analysis with ca
Introduction
In the practical part of this tutorial we demonsrate how to apply the capackage for simple, multiple and joint correspondence analysis in R.
R is a freely available statistical software environment. Since itsintroduction by R. Ihaka and R. Gentleman (1996) it has gained muchpopularity in the statistical community.
One advantage of R is the extension system, which allows for extendingR‘s capabilities by so-called packages.
Further information on R is available at the official R website: http://www.R-project.org .
I - 2Correspondence analysis with ca
The ca package, an overview
The ca package offers functions for the computation and visualization of correspondence analysis.
The core computations are done by the functions ca() (simple correspondence analysis) and mjca() (multiple and joint correspondenceanalysis).
Each function has its corresponding print, summary and plot method whichare used for presenting numerical results of the analysis and for thegraphical display.
Additional functions include auxillary functions that are usually not calleddirectly by the users (such as e.g. iterate.mjca() which is used in a joint correspondence analysis).
I - 3Correspondence analysis with ca
The ca package, an overview
The core functions in ca and its methods:
simple correspon- multiple and jointdence analysis correspondence analysis
- Computation: ca() mjca()
- Numerical output: print.ca() print.mjca()summary.ca() summary.mjca()
- Graphical display: plot.ca() plot.mjca()plot3d.ca() (plot3d.mjca())
Where applicable, the functions for simple and for multiple / jointcorrespondence analysis share the same structure of arguments.
I - 4Correspondence analysis with ca
Simple correspondence analysis
Simple correspondence analysis is performed with the function ca():> ca(smoke)
Principal inertias (eigenvalues):1 2 3
Value 0.074759 0.010017 0.000414Percentage 87.76% 11.76% 0.49%
Rows:SM JM SE JE SC
Mass 0.056995 0.093264 0.264249 0.455959 0.129534ChiDist 0.216559 0.356921 0.380779 0.240025 0.216169Inertia 0.002673 0.011881 0.038314 0.026269 0.006053Dim. 1 -0.240539 0.947105 -1.391973 0.851989 -0.735456Dim. 2 -1.935708 -2.430958 -0.106508 0.576944 0.788435
Columns:none light medium heavy
Mass 0.316062 0.233161 0.321244 0.129534ChiDist 0.394490 0.173996 0.198127 0.355109Inertia 0.049186 0.007059 0.012610 0.016335Dim. 1 -1.438471 0.363746 0.718017 1.074445Dim. 2 -0.304659 1.409433 0.073528 -1.975960
I - 5Correspondence analysis with ca
Simple correspondence analysis
Additional details are given with the summary method:> summary(ca(smoke))
Principal inertias (eigenvalues):dim value % cum% scree plot1 0.074759 87.8 87.8 *************************2 0.010017 11.8 99.5 *** 3 0.000414 0.5 100.0
-------- -----Total: 0.085190 100.0
Rows:name mass qlt inr k=1 cor ctr k=2 cor ctr
1 | SM | 57 893 31 | -66 92 3 | -194 800 214 |2 | JM | 93 991 139 | 259 526 84 | -243 465 551 |3 | SE | 264 1000 450 | -381 999 512 | -11 1 3 |4 | JE | 456 1000 308 | 233 942 331 | 58 58 152 |5 | SC | 130 999 71 | -201 865 70 | 79 133 81 |
Columns:name mass qlt inr k=1 cor ctr k=2 cor ctr
1 | none | 316 1000 577 | -393 994 654 | -30 6 29 |2 | lght | 233 984 83 | 99 327 31 | 141 657 463 |3 | medm | 321 983 148 | 196 982 166 | 7 1 2 |4 | hevy | 130 995 192 | 294 684 150 | -198 310 506 |
I - 6Correspondence analysis with ca
Simple correspondence analysis
Extensions to simple correspondence analysis include supplementaryrows and/or columns as well as a subset analysis.
These extensions are handled by the optional arguments supcol / suprow and subsetcol / subsetrow :
# Considering the first column (non-smokers) as supplementary: > ca(smoke, supcol = 1)
# Considering the subset of non-smokers (i.e. columns 2,3 and 4):> ca(smoke, subsetcol = 2:4)
# Adding a supplementary column to a subset analysis:> ca(smoke, subsetcol = 2:4, supcol = 1)
I - 7Correspondence analysis with ca
Simple correspondence analysis
The visualization of simple correspondence analysis is done with thecorresponding plot method:> plot(ca(smoke, supcol = 1))
I - 8Correspondence analysis with ca
Simple correspondence analysis
As with the core function, additional options are provided by optional arguments. For example, different map scaling options are available withthe option map :
option description"symmetric" Rows and columns in principal coordinates (default)"rowprincipal" Rows in principal and columns in standard coordinates"colprincipal" Rows in standard and columns in principal coordinates"symbiplot" Row and column coordinates are scaled to have variances
equal to the singular values"rowgab" Rows in principal coordinates and columns in standard co-
ordinates times mass"colgab" Columns in principal coordinates and rows in standard co-
ordinates times mass(according to a proposal by Gabriel and Odoro , 1990)
"rowgreen" Rows in principal coordinates and columns in standard co-ordinates times the square root of the mass
"colgreen" Columns in principal coordinates and rows in standard co-ordinates times the square root of the mass(according to a proposal by Greenacre, 2006)
I - 9Correspondence analysis with ca
Simple correspondence analysis
In addition, three-dimensional maps can be displayed using the rgl-package (D. Murdoch, D. Adler):> plot3d(ca(smoke))
I - 10Correspondence analysis with ca
Multiple and joint correspondence analysis
Multiple and joint correspondence analysis is computed with the functionmjca().
The approach to MCA is determined by the option lambda:
lambda=“indicator” Multiple correspondence analysis based on the indicator matrix
lambda=“Burt” Multiple correspondence analysis based on the Burt matrix
lambda=“adjusted” Adjusted multiple correspondence analysislambda=“JCA” Joint correspondence analysis
By default, an adjusted MCA is performed, i.e. lambda=“adjusted“.
I - 11Correspondence analysis with ca
Multiple and joint correspondence analysis
The input data for mjca() is a data frame comprising factors as thecolumns (response pattern matrix).
Internally, computations are performed on the Burt matrix (B), which isobtained from the indicator matrix (Z).
I - 12Correspondence analysis with ca
Multiple and joint correspondence analysis
An example: A multiple correspondence analysis on the wg93 dataset (i.e. four questions on attitude towards science with responses on a five-point scale):> mjca(wg93[,1:4])
Eigenvalues:1 2 3 4 5 6
Value 0.076455 0.05822 0.009197 0.00567 0.001172 7e-06Percentage 44.91% 34.2% 5.4% 3.33% 0.69% 0%
Columns:A1 A2 A3 A4 A5 B1 B2 B3
Mass 0.034156 0.092423 0.058553 0.051091 0.013777 0.020379 0.049943 0.058840ChiDist 1.343394 0.676433 0.947274 1.049164 2.214898 1.856041 1.034203 0.933288Inertia 0.061642 0.042289 0.052542 0.056238 0.067588 0.070203 0.053417 0.051252Dim. 1 1.836627 0.546240 -0.446797 -1.165903 -1.995217 2.924321 0.641516 0.346050Dim. 2 -0.727459 0.284443 1.199439 -0.736782 -2.470026 -1.370078 0.666938 0.963918
B4 B5 C1 C2 C3 C4 C5 D1Mass 0.080654 0.040184 0.043628 0.090700 0.056544 0.044202 0.014925 0.017222ChiDist 0.760011 1.294006 1.241063 0.688137 0.977789 1.148345 2.132827 1.915937Inertia 0.046587 0.067286 0.067197 0.042950 0.054060 0.058289 0.067895 0.063217Dim. 1 -0.714126 -1.353725 2.157782 0.246828 -0.618996 -1.348858 -1.467582 1.203782Dim. 2 0.280071 -2.107677 -0.908553 0.591611 1.044412 -0.634647 -3.016588 -1.821975...
I - 13Correspondence analysis with ca
Multiple and joint correspondence analysis
As in simple CA a more detailed output is given with the summary method:> summary(mjca(wg93[,1:4]))
Principal inertias (eigenvalues):
dim value % cum% scree plot 1 0.076455 44.9 44.9 *************************2 0.058220 34.2 79.1 ******************* 3 0.009197 5.4 84.5 *** 4 0.005670 3.3 87.8 ** 5 0.001172 0.7 88.5 6 7e-06000 0.0 88.5
-------- -----Total: 0.170246
Columns:name mass qlt inr k=1 cor ctr k=2 cor ctr
1 | A1 | 34 963 55 | 508 860 115 | -176 103 18 |2 | A2 | 92 659 38 | 151 546 28 | 69 113 7 |3 | A3 | 59 929 47 | -124 143 12 | 289 786 84 |4 | A4 | 51 798 50 | -322 612 69 | -178 186 28 |5 | A5 | 14 799 60 | -552 369 55 | -596 430 84 |6 | B1 | 20 911 62 | 809 781 174 | -331 131 38 |
...
I - 14Correspondence analysis with ca
Multiple and joint correspondence analysis
The different approaches to MCA are specified with the optional argument lambda:
# MCA based on the indicator matrix:> mjca(wg93[,1:4], lambda = “indicator”)
# MCA based on the Burt matrix:> mjca(wg93[,1:4], lambda = “Burt”)
# MCA based on the adjusted approach:> mjca(wg93[,1:4], lambda = “adjusted”)# lambda=“adjusted” is the default, hence the following # gives the same result:> mjca(wg93[,1:4])
# Joint correspondence analysis:> mjca(wg93[,1:4], lambda = “JCA”)
I - 15Correspondence analysis with ca
Multiple and joint correspondence analysis
As with simple CA, supplementary variables are specified with the option supcol. In mjca() only supplementary variables (i.e. columns) are considered.
Columns 5 to 7 of the wg93 dataset contain additional demographic information (sex, age and education). These are included as supplementary variables as follows:
> mjca(wg93, supcol = 5:7)
I - 16Correspondence analysis with ca
Multiple and joint correspondence analysis
The option subsetcol in mjca() referrs to the column indexes of the subset categories (i.e. the levels of the variables).
For example, excluding the middle categories in the analysis of the wg93dataset is done as follows:
> si <- (1:20)[-seq(3,18,5)]> si[1] 1 2 4 5 6 7 9 10 11 12 14 15 16 17 19 20> mjca(wg93[,1:4], subsetcol = si)
I - 17Correspondence analysis with ca
Multiple and joint correspondence analysis
Both options, subsetcol and supcol, can be combined, i.e. supplementary variables can be included in a subset analysis:
> mjca(wg93, subsetcol = si, supcol = 5:7)
Eigenvalues:1 2 3 4 5
Value 0.070422 0.034998 0.007176 0.000875 0.00044Percentage 53.96% 26.81% 5.5% 0.67% 0.34%
Columns:A1 A2 A4 A5 B1 B2 B4 B5
Mass 0.034156 0.092423 0.051091 0.013777 0.020379 0.049943 0.080654 0.040184ChiDist 1.343394 0.676433 1.049164 2.214898 1.856041 1.034203 0.760011 1.294006Inertia 0.061642 0.042289 0.056238 0.067588 0.070203 0.053417 0.046587 0.067286Dim. 1 1.706316 0.544095 -1.307329 -2.435074 2.759360 0.850833 -0.569441 -1.710689Dim. 2 1.275991 -0.343625 0.201719 2.794810 2.003836 -0.658112 -0.533170 2.100918...
sex1(*) sex2(*) age1(*) age2(*) age3(*) age4(*) age5(*) age6(*)Mass NA NA NA NA NA NA NA NAChiDist NA NA NA NA NA NA NA NAInertia NA NA NA NA NA NA NA NADim. 1 -0.341876 0.328786 -0.405213 -0.243592 -0.033779 -0.030832 0.025808 0.666671Dim. 2 -0.130770 0.125763 -0.319599 0.305108 0.075773 -0.016810 -0.190774 -0.146837...
I - 18Correspondence analysis with ca
Multiple and joint correspondence analysis
The plotting method gives the graphical representation of the result as a map:> plot(mjca(wg93[,1:4]))
I - 19Correspondence analysis with ca
Summary
The computation is done with two functions, ca() for simple CA and mjca() for multiple and joint CA.
The input data is a table of frequencies for simple CA and a response pattern matrix (i.e. a data frame with factors) for multiple and joint CA.
In mjca() the type of analysis is controlled by the option lambda.
Subsets and supplementary variables are specified with subsetcol and supcol (in simple CA also subsetrow and suprow).
Output (numerical and graphical) is managed by the correspondingmethods (print, summary and plot).
All available options are listed in the manual / help files.
I - 20Correspondence analysis with ca
The End
The package is available from the CARME-N website (Correspondence Analysis and Related Methods Network):http://www.carme-n.org
Currently the package is at version 0.50, the current version includes a major revision for the mjca-part, where all computations have been rewritten to follow a unified approach.
The next update will focus on the graphical output.
Feedback and suggestions are highly welcome: