euclidean representations of a set of hierarchies using multiple...

40
Euclidean representations of a set of hierarchies using Multiple Factor Analysis Cadoret M.* , Lê S.* and Pagès J.* * Applied mathematics department Agrocampus Ouest, France 9 February 2011 Correspondence Analysis and Related Methods 2011 Laboratoire de Mathématiques Appliquées Agrocampus

Upload: others

Post on 18-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Euclidean representations of a set of hierarchiesusing Multiple Factor AnalysisCadoret M.*, Lê S.* and Pagès J.*

* Applied mathematics departmentAgrocampus Ouest, France

9 February 2011

Correspondence Analysis and Related Methods 2011

Laboratoire de Mathématiques Appliquées Agrocampus

Page 2: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Outline

1 Introduction

2 Data coding

3 Statistical analysis

4 Application

5 Conclusion

2/ 27

Page 3: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Introduction

Interested in:Set of non-indexed hierarchiesSynthetic graphical representations

At least 2 possible graphical representations:As a hierarchy consensus (Adams, 1972)

Same shape of the dataConsensus difficult to obtain when the number of hierarchiesincreases

As an Euclidean representation of the hierarchies:representation of the terminal nodes, etc.

3/ 27

Page 4: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Outline

1 Introduction

2 Data coding

3 Statistical analysis

4 Application

5 Conclusion

4/ 27

Page 5: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Data coding (1)

A B HC ID J K LE M NF G O P

A CB

D

E F G

C E FD G

MH NI J

OK PL

A B H I JK L M N O P

L1 L2 L3ABCDEFGHIJKLMNOP

5/ 27

Page 6: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Data coding (1)

A B HC ID J K LE M NF G O P

L1

L1 L2 L3A G1B G1C G1D G1E G1F G1G G1H G2I G2J G2K G2L G2M G2N G2O G2P G2

5/ 27

Page 7: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Data coding (1)

A B HC ID J K LE M NF G O P

A CB

D

E F G

MH NI J

OK PL

L1

L2

L1 L2 L3A G1 G1B G1 G1C G1 G2D G1 G2E G1 G2F G1 G2G G1 G2H G2 G3I G2 G3J G2 G3K G2 G3L G2 G3M G2 G4N G2 G4O G2 G4P G2 G4

5/ 27

Page 8: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Data coding (1)

A B HC ID J K LE M NF G O P

A CB

D

E F G

C E FD G

MH NI J

OK PL

A B H I JK L M N O P

L1

L2

L3

L1 L2 L3A G1 G1 G1B G1 G1 G1C G1 G2 G2D G1 G2 G2E G1 G2 G3F G1 G2 G3G G1 G2 G3H G2 G3 G4I G2 G3 G4J G2 G3 G4K G2 G3 G4L G2 G3 G4M G2 G4 G5N G2 G4 G5O G2 G4 G5P G2 G4 G5

5/ 27

Page 9: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Data coding (2)

L1 L2 L3 L1 L2 L1 L2 L3 L41

I

Hierarchy 1 Hierarchy j Hierarchy J

6/ 27

Page 10: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Outline

1 Introduction

2 Data coding

3 Statistical analysis

4 Application

5 Conclusion

7/ 27

Page 11: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Check on data coding and analysis of 1 hierarchy

Data table with qualitative variablesMultiple Correspondence Analysis + Ascendant HierarchicalClassification on the dimensions

A B HC ID J K LE M NF G O P

A CB

D

E F G

C E FD G

MH NI J

OK PL

A B H I JK L M N O P

H I J K L M N O PA B C D E F G

0.0

0.2

0.4

0.6

0.8

1.0

⇒ We found the initial hierarchy

8/ 27

Page 12: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Check on data coding and analysis of 1 hierarchy

Data table with qualitative variablesMultiple Correspondence Analysis + Ascendant HierarchicalClassification on the dimensions

A B HC ID J K LE M NF G O P

A CB

D

E F G

C E FD G

MH NI J

OK PL

A B H I JK L M N O P

H I J K L M N O PA B C D E F G

0.0

0.2

0.4

0.6

0.8

1.0

⇒ We found the initial hierarchy

8/ 27

Page 13: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Check on data coding and analysis of 1 hierarchy

Data table with qualitative variablesMultiple Correspondence Analysis + Ascendant HierarchicalClassification on the dimensions

A B HC ID J K LE M NF G O P

A CB

D

E F G

C E FD G

MH NI J

OK PL

A B H I JK L M N O P

H I J K L M N O PA B C D E F G

0.0

0.2

0.4

0.6

0.8

1.0

⇒ We found the initial hierarchy

8/ 27

Page 14: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Objectives

From a data table with a group structure on the variables, we wantto perform a global factorial analysis such as:

it provides graphical representations of objects, hierarchies andlevels of hierarchythe influence of each hierarchy is balanced

⇒ Multiple Factor Analysis (MFA; Escofier and Pagès (1982)) inwhich 1 hierarchy corresponds to 1 group of variables

9/ 27

Page 15: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Objectives

From a data table with a group structure on the variables, we wantto perform a global factorial analysis such as:

it provides graphical representations of objects, hierarchies andlevels of hierarchythe influence of each hierarchy is balanced

⇒ Multiple Factor Analysis (MFA; Escofier and Pagès (1982)) inwhich 1 hierarchy corresponds to 1 group of variables

9/ 27

Page 16: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Objectives

From a data table with a group structure on the variables, we wantto perform a global factorial analysis such as:

it provides graphical representations of objects, hierarchies andlevels of hierarchythe influence of each hierarchy is balanced

⇒ Multiple Factor Analysis (MFA; Escofier and Pagès (1982)) inwhich 1 hierarchy corresponds to 1 group of variables

9/ 27

Page 17: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Multiple Correspondence Analysis (MCA)

MCA is looking for dimensions zs that maximize:

1Q

Q∑q

η2(zs , Lq),

with:Q the number of qualitative variableszs the axis sLq the qualitative variable q

02

12

10/ 27

Page 18: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Multiple Correspondence Analysis (MCA)

MCA is looking for dimensions zs that maximize:

1Q

Q∑q

η2(zs , Lq),

with:Q the number of qualitative variableszs the axis sLq the qualitative variable q

02

12

10/ 27

Page 19: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Multiple Correspondence Analysis (MCA)

MCA is looking for dimensions zs that maximize:

1Q

Q∑q

η2(zs , Lq),

with:Q the number of qualitative variableszs the axis sLq the qualitative variable q

02

12

10/ 27

Page 20: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Multiple Factor Analysis (MFA)

MFA is looking for dimensions zs that maximize the followingcriterion:

J∑j

1Qj

Qj∑q

η2(zs , Lq),

with:Qj the number of level of hierarchy jzs the axis sLq the level q of the hierarchy j

⇒ In this particular case: criterion maximized by MFA ⇔ sum ofcriteria maximized by MCA

11/ 27

Page 21: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Multiple Factor Analysis (MFA)

MFA is looking for dimensions zs that maximize the followingcriterion:

J∑j

1Qj

Qj∑q

η2(zs , Lq),

with:Qj the number of level of hierarchy jzs the axis sLq the level q of the hierarchy j

⇒ In this particular case: criterion maximized by MFA ⇔ sum ofcriteria maximized by MCA

11/ 27

Page 22: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Disjunctive data table associated with one hierarchy j

1 k Kj

1

1

L1

i 0 1 yik

I

Ik IKj

0 0 0 0 0 1 0 0 0

I

Lqj LQjL1 Lqj LQj

1

I

Each level (associated with a hierarchy) is represented by a set ofdummy variables

12/ 27

Page 23: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Object representation

Distance between 2 objects:

d2(i , l) =∑

j

1Qj

∑k∈Kj

IIk(yik − ylk)

2 =∑

j

d2MCAj

(i , l),

with:Qj the number of level of hierarchy jI the number of objectsIk the number of objects into the group kyik the element of the disjunctive data table which is equal to1 if the object i belong to group k and 0 in the opposite case

In this particular case: sum of usual distance in MCA

⇒ 2 objects will be closer than they belong to the same group for alot of hierarchies

13/ 27

Page 24: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Global hierarchy representation

H1

H2H3

10

0

1

),( 21 HzLg

),( 22 HzLg

Coordinate of hierarchy j on axis s:

1Qj

∑q∈Qj

η2(zs , Lq),

with:Qj the number of level ofhierarchy jzs the axis sLq the level q of the hierarchyj

14/ 27

Page 25: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Level representation

H3

H3L1

H3L2

H3L3

10

0

1

Coordinate of level q on axis s:

η2(zs , Lq),

with:zs the axis sLq the level q

2 consequences:Levels ordered along each axisHierarchy = barycenter of itslevels

15/ 27

Page 26: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Outline

1 Introduction

2 Data coding

3 Statistical analysis

4 Application

5 Conclusion

16/ 27

Page 27: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Data

16 advertisements concerning an orange juiceAdvertisements built according to a 25−1 fractional factorialdesign22 subjectsHierarchical sorting

17/ 27

Page 28: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Example of hierarchical sorting: subject number 3

ABC DE FGH IJK LM N OP

18/ 27

Page 29: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Example of hierarchical sorting: subject number 5

A B C

D

E

FG

HI

J

K

L MN

OP

19/ 27

Page 30: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Advertisement representation

-6 -4 -2 0 2 4 6

-4-2

02

46

Dim 1 (15.62 %)

Dim

2 (1

4.18

%)

A

B

C

DE

FG

H

IJ

K

L

M

N

O

P

λ1 = 16.55

20/ 27

Page 31: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Advertisement representation

-6 -4 -2 0 2 4 6

-4-2

02

46

Dim 1 (15.62 %)

Dim

2 (1

4.18

%)

A

B

C

DE

FG

H

IJ

K

L

M

N

O

P

Background color

λ1 = 16.55

20/ 27

Page 32: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Advertisement representation

-6 -4 -2 0 2 4 6

-4-2

02

46

Dim 1 (15.62 %)

Dim

2 (1

4.18

%)

A

B

C

DE

FG

H

IJ

K

L

M

N

O

P

Background color

Figu

rativ

e

λ1 = 16.55

20/ 27

Page 33: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Hierarchy representation

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Dim 1 (15.62 %)

Dim

2 (1

4.18

%)

12

3

4

5

67

8 9 11 1718 2022

10

12

13

14

15

16

19

21

21/ 27

Page 34: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Hierarchy representation: subject number 3

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Dim 1 (15.62 %)

Dim

2 (1

4.18

%)

12

34

5

67

8 9 11 1718 2022

10

1213

14

15

16

19

21

-6 -4 -2 0 2 4 6

-4-2

02

46

Dim 1 (15.62 %)D

im 2

(14.

18 %

) A

B

C

DE

FG

H

IJ

K

L

MN

OP

L2

L1

L3

22/ 27

Page 35: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Level representation: subject number 3

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Dim 1 (15.62%)

Dim

2 (1

4.18

%)

3.L3

3.L23.L1

3

23/ 27

Page 36: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Level representation: trajectories

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Dim 1 (15.62%)

Dim

2 (

14.1

8%

)

4.L1

4.L2

4.L3

5.L1

5.L2

6.L3 6.L47.L3 7.L4

12.L2

12.L3

12.L4

13.L1

13.L2

13.L3

14.L1

14.L2

14.L3

6.L1 6.L27.L1 7.L2

10.L115.L1

1.L1 2.L1

3.L1 3.L2

8.L1 9.L1

11.L1 12.L1

16.L1 17.L118.L1 19.L120.L1 21.L122.L1

1.L2 1.L3 2.L22.L3 3.L3 8.L28.L3 8.L4 9.L29.L3 9.L4 10.L2

10.L3 10.L4 11.L211.L3 11.L4 15.L215.L3 16.L2 16.L317.L2 17.L3 17.L418.L2 18.L3 18.L419.L2 19.L3 19.L419.L5 20.L2 20.L320.L4 21.L2 21.L322.L2 22.L3 22.L4

24/ 27

Page 37: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Level representation: trajectories

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Dim 1 (15.62%)

Dim

2 (

14.1

8%

)

4.L1

4.L2

4.L3

5.L1

5.L2

6.L3 6.L47.L3 7.L4

12.L2

12.L3

12.L4

13.L1

13.L2

13.L3

14.L1

14.L2

14.L3

6.L1 6.L27.L1 7.L2

10.L115.L1

1.L1 2.L1

3.L1 3.L2

8.L1 9.L1

11.L1 12.L1

16.L1 17.L118.L1 19.L120.L1 21.L122.L1

1.L2 1.L3 2.L22.L3 3.L3 8.L28.L3 8.L4 9.L29.L3 9.L4 10.L2

10.L3 10.L4 11.L211.L3 11.L4 15.L215.L3 16.L2 16.L317.L2 17.L3 17.L418.L2 18.L3 18.L419.L2 19.L3 19.L419.L5 20.L2 20.L320.L4 21.L2 21.L322.L2 22.L3 22.L4

18%

63%

18%

24/ 27

Page 38: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Outline

1 Introduction

2 Data coding

3 Statistical analysis

4 Application

5 Conclusion

25/ 27

Page 39: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

Conclusion

Methodology providing:Representation of objects, hierarchies, levelsRepresentations related to each otherRepresentations interpretable according to simple rules

In the example, suggests groups of hierarchiesAllows the simultaneous taking into account of hierarchies andpartitions in a same analysisProgram available in the SensoMineR package

26/ 27

Page 40: Euclidean representations of a set of hierarchies using Multiple …carme2011.agrocampus-ouest.fr/slides/Cadoret_Le_Pages.pdf · 2014-05-16 · IntroductionData codingStatistical

Introduction Data coding Statistical analysis Application Conclusion References

References

Adams, E. I. (1972). Consensus techniques and the comparison oftaxonomic trees. Systematic Zoology, 21:390–397.

Escofier, B. and Pagès, J. (1982). Comparaison de groupes devariables définies sur le même ensemble d’individus. Rapport derecherche INRIA, 149.

27/ 27