marine cadoret & françois husson - agrocampus...
TRANSCRIPT
Con�dence ellipses for holistic approaches
Marine Cadoret & François Husson
Agrostat 2012, Paris
29 february 2012
Outline
1 What do we mean by �holistic approaches� and �con�dence
ellipse�?
2 Con�dence ellipses construction
3 Why partial bootstrap doesn't work?
4 Validity of total bootstrap
5 Conclusion and perspectives
2/ 21
Holistic approaches
From oλoς (holos), a Greek word meaning all, entire, total
Products evaluated in their entirety
Among holistic approaches:
NappingSortingSorted nappingHierarchical sortingFlash pro�le and Free choice pro�ling (between holistic andanalytic approaches)
3/ 21
Napping data: an example with 10 wines and 11 judges
Judge 1
0 10 V Font Coteaux
3040
1 T Michaud
3 T Trotignon
7 V Aub. Marigny
10 V Font Coteaux
203
2 T Renaudie
4 T B i D i
5 T Buisse Cristal
102 4 T Buisse Domaine
6 V Aub. Silex
0
8 V Font. Domaine
9 V Font. Brules
0 10 20 30 40 50 600 10 20 30 40 50 60
X1 Y1 .. X11 Y11
1 T Michaud 43 29 48 152 T Renaudie 36 28 45 143 T Trotignon 53 37 8 234 T Buisse Domaine 18 20 31 95 T Buisse Cristal 17 22 .. 34 316 V Aub. Silex 8 14 20 357 V Aub. Marigny 10 32 47 288 V Font. Domaine 56 3 4 59 V Font. Brules 42 4 8 610 V Font Coteaux 1 38 54 36
4/ 21
Napping data: an example with 10 wines and 11 judges
-4 -2 0 2 4
-4-2
02
4
Dim 1 (39.39%)
Dim
2 (
26.6
8%)
Confidence ellipses for the napping configuration
1 T Michaud 10 V Font Coteaux
2 T Renaudie3 T Trotignon
4 T Buisse Domaine5 T Buisse Cristal
6 V Aub. Silex
8 V Font. Domaine
7 V Aub. Marigny
9 V Font. Brules
5/ 21
Napping data: an example with 10 wines and 11 judges
-4 -2 0 2 4
-4-2
02
4
Dim 1 (39.39%)
Dim
2 (
26.6
8%)
Confidence ellipses for the napping configuration
1 T Michaud 10 V Font Coteaux
2 T Renaudie3 T Trotignon
4 T Buisse Domaine5 T Buisse Cristal
6 V Aub. Silex
8 V Font. Domaine
7 V Aub. Marigny
9 V Font. Brules
5/ 21
Outline
1 What do we mean by �holistic approaches� and �con�dence
ellipse�?
2 Con�dence ellipses construction
3 Why partial bootstrap doesn't work?
4 Validity of total bootstrap
5 Conclusion and perspectives
6/ 21
Bootstrap technique
Real jury
J1 J2X1 Y1X2 Y2 … X11Y11
P1
Real jury
P1P2P3
…
P10
2 ways to use bootstrapped virtual juries:
by projection (partial bootstrap)
by procrustean rotation (total bootstrap)
7/ 21
Bootstrap technique
Real jury Virtual jury
J1 J2X1 Y1X2 Y2 … X11Y11
P1…
P1
Real jury Virtual jury
P1P2P3
…
P1P2P3
…
P10 P10
2 ways to use bootstrapped virtual juries:
by projection (partial bootstrap)
by procrustean rotation (total bootstrap)
7/ 21
Bootstrap technique
Real jury Virtual jury
J1 J2X1 Y1X2 Y2 … X11Y11
P1
J1X1 Y1 …
P1
Real jury Virtual jury
P1P2P3
…
P1P2P3
…
P10 P10
2 ways to use bootstrapped virtual juries:
by projection (partial bootstrap)
by procrustean rotation (total bootstrap)
7/ 21
Bootstrap technique
Real jury Virtual jury
J1 J2X1 Y1X2 Y2 … X11Y11
P1
J1 J1X1 Y1X1 Y1 …
P1
Real jury Virtual jury
P1P2P3
…
P1P2P3
…
P10 P10
2 ways to use bootstrapped virtual juries:
by projection (partial bootstrap)
by procrustean rotation (total bootstrap)
7/ 21
Bootstrap technique
Real jury Virtual jury
J1 J2X1 Y1X2 Y2 … X11Y11
P1
J1 J1 J10X1 Y1X1 Y1 … X10Y10
P1
Real jury Virtual jury
P1P2P3
…
P1P2P3
…
P10 P10
2 ways to use bootstrapped virtual juries:
by projection (partial bootstrap)
by procrustean rotation (total bootstrap)
7/ 21
Bootstrap technique
Real jury Virtual jury
J1 J2X1 Y1X2 Y2 … X11Y11
P1 P1
Real jury Virtual jury
P1P2P3
…
P1P2P3
…
P1P2P3
…
P1P2P3
P1P2P1P2P1
P10 P10
…
P10
…
P10
P3
…
P10
P2P3
…P10
P2P3
…P10P10
2 ways to use bootstrapped virtual juries:
by projection (partial bootstrap)
by procrustean rotation (total bootstrap)
7/ 21
Bootstrap technique
Real jury Virtual jury
J1 J2X1 Y1X2 Y2 … X11Y11
P1 P1
Real jury Virtual jury
P1P2P3
…
P1P2P3
…
P1P2P3
…
P1P2P3
P1P2P1P2P1
P10 P10
…
P10
…
P10
P3
…
P10
P2P3
…P10
P2P3
…P10P10
2 ways to use bootstrapped virtual juries:
by projection (partial bootstrap)
by procrustean rotation (total bootstrap)
7/ 21
Partial bootstrap
P4
P1P3
P2
F1F2
F3
F4
F3
Multiple Factor Analysis (MFA)
8/ 21
Partial bootstrap
P4
P1P3
P2
F1F2
F3
F4
F3
Multiple Factor Analysis (MFA)
Projection to get the productsaccording to each judge (of thereal jury): partial representation
8/ 21
Partial bootstrap
P4
Multiple Factor Analysis (MFA)
Projection to get the productsaccording to each judge (of thereal jury): partial representation⇒ barycentric property
8/ 21
Partial bootstrap
P4
Multiple Factor Analysis (MFA)
Projection to get the productsaccording to each judge (of thereal jury): partial representation⇒ barycentric property
Creation of virtual jury andcalculation of new barycenter
8/ 21
Partial bootstrap
P4
Multiple Factor Analysis (MFA)
Projection to get the productsaccording to each judge (of thereal jury): partial representation⇒ barycentric property
Creation of virtual jury andcalculation of new barycenter
8/ 21
Partial bootstrap
P4
Multiple Factor Analysis (MFA)
Projection to get the productsaccording to each judge (of thereal jury): partial representation⇒ barycentric property
Creation of virtual jury andcalculation of new barycenter
Building con�dence ellipsescontaining 95% of the points
8/ 21
Total bootstrap
Real jury
1. MFA
9/ 21
Total bootstrap
Virtual jury 1 Virtual jury 2 Virtual jury 3 Virtual jury B
2. MFA on each virtual jury
Real jury
Virtual jury 1 Virtual jury 2 Virtual jury 3 j y
MFAMFAMFAMFA
1. MFA
9/ 21
Total bootstrap
Virtual jury 1 Virtual jury 2 Virtual jury 3 Virtual jury B
2. MFA on each virtual jury
Real jury
Virtual jury 1 Virtual jury 2 Virtual jury 3 j y
MFA MFAMFAMFA
1. MFA
Dilatation
Translation 3. Procrustean
t tiRotation
rotation
9/ 21
Total bootstrap
Virtual jury 1 Virtual jury 2 Virtual jury 3 Virtual jury B
2. MFA on each virtual jury
Real jury
Virtual jury 1 Virtual jury 2 Virtual jury 3 j y
MFAMFAMFAMFA
1. MFA
Dilatation
Translation 3. Procrustean
t tiRotation
rotation
9/ 21
Total bootstrap
Virtual jury 1 Virtual jury 2 Virtual jury 3 Virtual jury B
2. MFA on each virtual jury
Real jury
Virtual jury 1 Virtual jury 2 Virtual jury 3 j y
MFAMFAMFAMFA
1. MFA
Dilatation
Translation 3. Procrustean
t tiRotation
rotation
4. Confidence ellipses containing95% of the points
9/ 21
Comparison of partial and total bootstrap
A completely random dataset with 100 judges
●
−10 −5 0 5 10
−10
−5
05
10
Dim 1 (14.51%)
Dim
2 (
14.3
3%)
12
3
4
5
6
7
89
10
●
●
●
●
●
●
●
●
●
●
Partial bootstrap
●
−10 0 10 20
−15
−10
−5
05
1015
Dim 1 (14.51%)
Dim
2 (
14.3
3%)
1
10
2
3
4
5
6
7
89
●
●
●
●
●
●
●
●
●
●
Total bootstrap
10/ 21
Outline
1 What do we mean by �holistic approaches� and �con�dence
ellipse�?
2 Con�dence ellipses construction
3 Why partial bootstrap doesn't work?
4 Validity of total bootstrap
5 Conclusion and perspectives
11/ 21
Partial bootstrap: increased number of judges
Completely random dataset
●
−8 −6 −4 −2 0 2 4
−6
−4
−2
02
4
Dim 1 (30.79%)
Dim
2 (
22.3
2%)
1
2
3
4
5
6
7
8
910
●
●
●
●
●
●
●
●
●
●
10 judges
●
−15 −10 −5 0 5
−10
−5
05
10
Dim 1 (14.23%)
Dim
2 (
13.1
2%)
1
2
3
4
5
6
7
8
9
10
●
●
●
●
●
●
●
●
●
●
100 judges
●
−20 −10 0 10 20
−10
010
20
Dim 1 (12.53%)
Dim
2 (
12.3
3%)
1
2
3
4
5
6
7
8
9
10
●
●
●
●
●
●
●
●
●
●
500 judges
Dimensionality problem (few products in a too large space)?
Inference problem (barycenter calculated with too many
points)?
⇒ Modify the dimensionality of the dataset independently to the
number of judges
12/ 21
Partial bootstrap: increased number of judges
Completely random dataset
●
−8 −6 −4 −2 0 2 4
−6
−4
−2
02
4
Dim 1 (30.79%)
Dim
2 (
22.3
2%)
1
2
3
4
5
6
7
8
910
●
●
●
●
●
●
●
●
●
●
10 judges
●
−15 −10 −5 0 5
−10
−5
05
10
Dim 1 (14.23%)
Dim
2 (
13.1
2%)
1
2
3
4
5
6
7
8
9
10
●
●
●
●
●
●
●
●
●
●
100 judges
●
−20 −10 0 10 20
−10
010
20
Dim 1 (12.53%)
Dim
2 (
12.3
3%)
1
2
3
4
5
6
7
8
9
10
●
●
●
●
●
●
●
●
●
●
500 judges
Dimensionality problem (few products in a too large space)?
Inference problem (barycenter calculated with too many
points)?
⇒ Modify the dimensionality of the dataset independently to the
number of judges
12/ 21
Random datasets: �xed number of judges
●
−10 −5 0 5
−10
−5
05
Dim 1 (23.3 %)
Dim
2 (
18.1
6 %
)
●
●
●
●
●
●
●
● ●
●
1
2
3
4
5
6
7
8 9
10
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
10 judges − 2 descriptors per judge
●
−10 −5 0 5
−5
05
Dim 1 (15 %)D
im 2
(13
.65
%)
●
●
●
●
●
●
●
●
●
●
1
2
3
4
5
6
7
8
9
10
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
10 judges − 20 descriptors per judge
●
−5 0 5
−8
−6
−4
−2
02
4
Dim 1 (12.59 %)
Dim
2 (
12.0
1 %
)
●
●
●
●
●
●●
●
●
●
1
2
3
4
5
67
8
9
10
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
10 judges − 200 descriptors per judge
●
−8 −6 −4 −2 0 2 4
−4
−2
02
46
Dim 1 (23.3%)
Dim
2 (
18.1
6%)
1
2
3
4
5
6
7
8 9
10
●
●
●
●
●
●
●
●●
●
●
−6 −4 −2 0 2 4 6
−6
−4
−2
02
46
Dim 1 (15%)
Dim
2 (
13.6
5%)
1
2
3
4
5
6
7
8
9
10
●
●
●
●
●
●
●
●
●
●
●
−6 −4 −2 0 2 4
−8
−6
−4
−2
02
4
Dim 1 (12.59%)
Dim
2 (
12.0
1%)
1
2
3
4
5
67
8
9
10
●
●
●
●
●
●●
●
●
●
Products are better separated when the number of dimensions
increases (same problem with GPA)
13/ 21
Random datasets: �xed number of judges
●
−10 −5 0 5
−10
−5
05
Dim 1 (23.3 %)
Dim
2 (
18.1
6 %
)
●
●
●
●
●
●
●
● ●
●
1
2
3
4
5
6
7
8 9
10
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
10 judges − 2 descriptors per judge
●
−10 −5 0 5
−5
05
Dim 1 (15 %)D
im 2
(13
.65
%)
●
●
●
●
●
●
●
●
●
●
1
2
3
4
5
6
7
8
9
10
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
10 judges − 20 descriptors per judge
●
−5 0 5
−8
−6
−4
−2
02
4
Dim 1 (12.59 %)
Dim
2 (
12.0
1 %
)
●
●
●
●
●
●●
●
●
●
1
2
3
4
5
67
8
9
10
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
10 judges − 200 descriptors per judge
●
−8 −6 −4 −2 0 2 4
−4
−2
02
46
Dim 1 (23.3%)
Dim
2 (
18.1
6%)
1
2
3
4
5
6
7
8 9
10
●
●
●
●
●
●
●
●●
●
●
−6 −4 −2 0 2 4 6
−6
−4
−2
02
46
Dim 1 (15%)
Dim
2 (
13.6
5%)
1
2
3
4
5
6
7
8
9
10
●
●
●
●
●
●
●
●
●
●
●
−6 −4 −2 0 2 4
−8
−6
−4
−2
02
4
Dim 1 (12.59%)
Dim
2 (
12.0
1%)
1
2
3
4
5
67
8
9
10
●
●
●
●
●
●●
●
●
●
Products are better separated when the number of dimensions
increases (same problem with GPA)
13/ 21
Random datasets: �xed number of judges
●
−10 −5 0 5
−10
−5
05
Dim 1 (23.3 %)
Dim
2 (
18.1
6 %
)
●
●
●
●
●
●
●
● ●
●
1
2
3
4
5
6
7
8 9
10
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
10 judges − 2 descriptors per judge
●
−10 −5 0 5
−5
05
Dim 1 (15 %)D
im 2
(13
.65
%)
●
●
●
●
●
●
●
●
●
●
1
2
3
4
5
6
7
8
9
10
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
10 judges − 20 descriptors per judge
●
−5 0 5
−8
−6
−4
−2
02
4
Dim 1 (12.59 %)
Dim
2 (
12.0
1 %
)
●
●
●
●
●
●●
●
●
●
1
2
3
4
5
67
8
9
10
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
10 judges − 200 descriptors per judge
●
−8 −6 −4 −2 0 2 4
−4
−2
02
46
Dim 1 (23.3%)
Dim
2 (
18.1
6%)
1
2
3
4
5
6
7
8 9
10
●
●
●
●
●
●
●
●●
●
●
−6 −4 −2 0 2 4 6
−6
−4
−2
02
46
Dim 1 (15%)
Dim
2 (
13.6
5%)
1
2
3
4
5
6
7
8
9
10
●
●
●
●
●
●
●
●
●
●
●
−6 −4 −2 0 2 4
−8
−6
−4
−2
02
4
Dim 1 (12.59%)
Dim
2 (
12.0
1%)
1
2
3
4
5
67
8
9
10
●
●
●
●
●
●●
●
●
●
Products are better separated when the number of dimensions
increases (same problem with GPA)
13/ 21
Random datasets: �xed size of dataset
●
−4 −2 0 2 4
−4
−2
02
4
Dim 1 (13.25%)
Dim
2 (
12.4
6%)
1
2
3
4
5 6
7
8
9
10
●
●
●
●
● ●
●
●
●
●
5 judges − 200 descriptors per judge
●
−5 0 5 10
−10
−5
05
Dim 1 (12.8%)
Dim
2 (
12.5
2%)
1
2
3
4
56
7
8
9
10
●
●
●
●
●●
●
●
●
●
50 judges − 20 descriptors per judge
●
−20 −10 0 10 20
−20
−10
010
Dim 1 (12.52%)
Dim
2 (
12.2
5%)
1
2
3
4
5
6
7
8
9
10
●
●
●
●
●
●
●
●
●
●
500 judges − 2 descriptors per judge
The sizes of the ellipses don't depend on the number of judges but
only on the dimensionality of the dataset
14/ 21
Outline
1 What do we mean by �holistic approaches� and �con�dence
ellipse�?
2 Con�dence ellipses construction
3 Why partial bootstrap doesn't work?
4 Validity of total bootstrap
5 Conclusion and perspectives
15/ 21
The case of completely random data
Dimensionality problem with completely random data?
●
−6 −4 −2 0 2 4 6
−6
−4
−2
02
46
Dim 1 (21.2%)
Dim
2 (
20.3
5%)
1
10
2
3
45
6
7
8
9
●
●
●
●
●
●
●
●
●
●
10 judges
●
−20 −10 0 10 20
−15
−10
−5
05
1015
Dim 1 (14.46%)
Dim
2 (
12.9
1%)
1
10
23
4
5
6
7
89
●
●
●
●
●
●
●
●
●
●
100 judges
●
−40 −20 0 20 40
−40
−20
020
Dim 1 (12.8%)
Dim
2 (
12.0
7%)
1
10
2
3
4
5
67
89
●
●
●
●
●
●
●
●
●
●
500 judges
16/ 21
Data simulation procedure
Pure
tablecloth
17/ 21
Data simulation procedure
Pure
tableclothDuplicated J times
17/ 21
Data simulation procedure
Noise simulated according to
uniform distribution
Pure
tableclothDuplicated J times
17/ 21
Data simulation procedure
Noise simulated according to
uniform distribution
Real dataset
Pure
tableclothDuplicated J times
17/ 21
Data simulation procedure
Noise simulated according to
uniform distribution
Real dataset
Pure
tableclothDuplicated J times
Ellipses according to real data
a
b
c
d
e
f
g
h
i
j
17/ 21
Data simulation procedure
Noise simulated according to
uniform distribution
Real dataset
Pure
tableclothDuplicated J times
Ellipses according to real data
a
b
c
d
e
f
g
h
i
j
17/ 21
Results of the simulations
30 judges
Noise/Signal Frequency
10% 91.12%20% 91.58%40% 91.83%100% 91.17%200% 91%400% 91.08%
Noise/Signal = 20%
Nb judges Frequency
30 91.58%50 92.87%100 93.37%200 93.37%500 93.42%
⇒ Small underestimation of the con�dence level
18/ 21
Results of the simulations
30 judges
Noise/Signal Frequency
10% 91.12%20% 91.58%40% 91.83%100% 91.17%200% 91%400% 91.08%
Noise/Signal = 20%
Nb judges Frequency
30 91.58%50 92.87%100 93.37%200 93.37%500 93.42%
⇒ Small underestimation of the con�dence level
18/ 21
Outline
1 What do we mean by �holistic approaches� and �con�dence
ellipse�?
2 Con�dence ellipses construction
3 Why partial bootstrap doesn't work?
4 Validity of total bootstrap
5 Conclusion and perspectives
19/ 21
Conclusion and perspectives
Dimensionality problem highlighted: Con�dence ellipses are
essential (but may be built according to total bootstrap)
Total bootstrap can be applied to all holistic approaches:
napping, sorting, sorted napping, hierarchical sorting, free
choice pro�ling
Available into the R package SensoMineR through the boot
function
One parameter must be chosen: the number of dimensions for
the Procrustean rotations
20/ 21
Conclusion and perspectives
Choice of the number of dimensions for the rotation
●
−20 −10 0 10 20
−15
−10
−5
05
1015
Dim 1 (12.36%)
Dim
2 (
11.4
8%)
a
b
c
de
f
g
h
i
j
k
l ●
●
●
●
●
●
●
●
●
●
●
●
2 dimensions for the rotation
●
−10 −5 0 5 10
−10
−5
05
10
Dim 1 (12.36%)
Dim
2 (
11.4
8%)
a
b
c
d
e
f
g
h
i
j
k
l ●
●
●
●
●
●
●
●
●
●
●
●
10 dimensions for the rotation
When the number of dimensions used for the Procrustean rotation
increases:
The size of the ellipses decreases
The con�dence level decreases
21/ 21