aggregation methods: optimality and fast...
TRANSCRIPT
Aggregation methods: optimality and fast rates.
Guillaume Lecue
Universite Paris 6
18 Mai 2007
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Motivation.
M prior estimators (’weak’ estimators) : f1, . . . , fM
n observations : Dn
Aim
Construction of a new estimator which is approximatively as good as thebest ’weak’ estimator :
Aggregation method or Aggregate
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Motivation.
M prior estimators (’weak’ estimators) : f1, . . . , fM
n observations : Dn
Aim
Construction of a new estimator which is approximatively as good as thebest ’weak’ estimator :
Aggregation method or Aggregate
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Examples.
Adaptation :
Observations : Dm+n
Estimation : Dm →non-adaptive estimators f1, . . . , fM .
learning : D(n) →aggregate fn (adaptive).
Estimation :
ε−net : f1, . . . , fM (functions)
learning : Dn →aggregate fn.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Examples.
Adaptation :
Observations : Dm+n
Estimation : Dm →non-adaptive estimators f1, . . . , fM .
learning : D(n) →aggregate fn (adaptive).
Estimation :
ε−net : f1, . . . , fM (functions)
learning : Dn →aggregate fn.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Model.
(Z, T ) a measurable space,
P the set of all probability measures on (Z, T ),
F : P 7−→ F (example : F (π) = dπ/dµ),
Z : random variable with values in Z,
π probability distribution of Z ,
Dn = (Z1, . . . ,Zn) : n i.i.d. observations of Z .
Problem : Estimation of F (π) from Dn.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Model.
(Z, T ) a measurable space,
P the set of all probability measures on (Z, T ),
F : P 7−→ F (example : F (π) = dπ/dµ),
Z : random variable with values in Z,
π probability distribution of Z ,
Dn = (Z1, . . . ,Zn) : n i.i.d. observations of Z .
Problem : Estimation of F (π) from Dn.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Model.
(Z, T ) a measurable space,
P the set of all probability measures on (Z, T ),
F : P 7−→ F (example : F (π) = dπ/dµ),
Z : random variable with values in Z,
π probability distribution of Z ,
Dn = (Z1, . . . ,Zn) : n i.i.d. observations of Z .
Problem : Estimation of F (π) from Dn.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Model.
loss : Q : Z × F 7−→ R,
risk : A(f ) = E[Q(Z , f )],
A∗def= inff∈F A(f ) = A(f ∗), (f ∗ = F (π))
excess risk : A(f )− A∗ ≥ 0
Empirical Risk :
An(f ) =1
n
n∑i=1
Q(Zi , f ).
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Model.
loss : Q : Z × F 7−→ R,
risk : A(f ) = E[Q(Z , f )],
A∗def= inff∈F A(f ) = A(f ∗), (f ∗ = F (π))
excess risk : A(f )− A∗ ≥ 0
Empirical Risk :
An(f ) =1
n
n∑i=1
Q(Zi , f ).
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Model.
loss : Q : Z × F 7−→ R,
risk : A(f ) = E[Q(Z , f )],
A∗def= inff∈F A(f ) = A(f ∗), (f ∗ = F (π))
excess risk : A(f )− A∗ ≥ 0
Empirical Risk :
An(f ) =1
n
n∑i=1
Q(Zi , f ).
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Model.
loss : Q : Z × F 7−→ R,
risk : A(f ) = E[Q(Z , f )],
A∗def= inff∈F A(f ) = A(f ∗), (f ∗ = F (π))
excess risk : A(f )− A∗ ≥ 0
Empirical Risk :
An(f ) =1
n
n∑i=1
Q(Zi , f ).
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Regression
loss : Q((x , y), f ) = (y − f (x))2,
excess risk : A(f )− A∗ = ||f ∗ − f ||2L2(PX ) where f ∗(x) = E [Y |X = x ] .
Density estimation (f ∗ = dπ/dµ)
KL loss : Q(z , f ) = − log f (z),
excess risk : A(f )− A∗ = K (f ∗|f ).
L2-loss : Q(z , f ) =∫Z f 2dµ− 2f (z),
excess risk : A(f )− A∗ = ||f ∗ − f ||2L2(µ).
Classification
loss : φ : R 7−→ R,Q((x , y), f ) = φ(yf (x)), y ∈ {−1, 1}φ−risk : Aφ(f ) = E[Q((X ,Y ), f )] = E[φ(Yf (X ))].
f ∗ = f φ∗ s.t. Aφ(f φ∗) = minf :X 7−→R Aφ(f ).
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Regression
loss : Q((x , y), f ) = (y − f (x))2,
excess risk : A(f )− A∗ = ||f ∗ − f ||2L2(PX ) where f ∗(x) = E [Y |X = x ] .
Density estimation (f ∗ = dπ/dµ)
KL loss : Q(z , f ) = − log f (z),
excess risk : A(f )− A∗ = K (f ∗|f ).
L2-loss : Q(z , f ) =∫Z f 2dµ− 2f (z),
excess risk : A(f )− A∗ = ||f ∗ − f ||2L2(µ).
Classification
loss : φ : R 7−→ R,Q((x , y), f ) = φ(yf (x)), y ∈ {−1, 1}φ−risk : Aφ(f ) = E[Q((X ,Y ), f )] = E[φ(Yf (X ))].
f ∗ = f φ∗ s.t. Aφ(f φ∗) = minf :X 7−→R Aφ(f ).
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Regression
loss : Q((x , y), f ) = (y − f (x))2,
excess risk : A(f )− A∗ = ||f ∗ − f ||2L2(PX ) where f ∗(x) = E [Y |X = x ] .
Density estimation (f ∗ = dπ/dµ)
KL loss : Q(z , f ) = − log f (z),
excess risk : A(f )− A∗ = K (f ∗|f ).
L2-loss : Q(z , f ) =∫Z f 2dµ− 2f (z),
excess risk : A(f )− A∗ = ||f ∗ − f ||2L2(µ).
Classification
loss : φ : R 7−→ R,Q((x , y), f ) = φ(yf (x)), y ∈ {−1, 1}φ−risk : Aφ(f ) = E[Q((X ,Y ), f )] = E[φ(Yf (X ))].
f ∗ = f φ∗ s.t. Aφ(f φ∗) = minf :X 7−→R Aφ(f ).
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Selectors.
F0 = {f1, . . . , fM} ⊂ F a dictionary.
Empirical Risk Minimization (ERM) :(Vapnik, Chervonenkis...)
f ERMn ∈ Arg min
f∈F0
An(f ).
penalized Empirical Risk Minimization (pERM) :
f ERMn ∈ Arg min
f∈F0
[An(f ) + pen(f )],
where pen is a penalty function. (Barron, Bartlett, Birge,Boucheron, Koltchinski, Lugosi, Massart,...)
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Selectors.
F0 = {f1, . . . , fM} ⊂ F a dictionary.
Empirical Risk Minimization (ERM) :(Vapnik, Chervonenkis...)
f ERMn ∈ Arg min
f∈F0
An(f ).
penalized Empirical Risk Minimization (pERM) :
f ERMn ∈ Arg min
f∈F0
[An(f ) + pen(f )],
where pen is a penalty function. (Barron, Bartlett, Birge,Boucheron, Koltchinski, Lugosi, Massart,...)
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Aggregation methods with exponential weights.
F0 = {f1, . . . , fM} ⊂ F a dictionary.
Aggregate with Exponential weights (AEW) :
f AEWn,T =
∑f∈F0
w(n)T (f )f , where w
(n)T (f ) =
exp (−nTAn(f ))∑g∈F0
exp (−nTAn(g)),
T−1 : temperature parameter.
Cumulative Aggregate with Exponential Weights (CAEW) :(Catoni,Yang,...)
f CAEWn,T =
1
n
n∑k=1
f AEWk,T .
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Aggregation methods with exponential weights.
F0 = {f1, . . . , fM} ⊂ F a dictionary.
Aggregate with Exponential weights (AEW) :
f AEWn,T =
∑f∈F0
w(n)T (f )f , where w
(n)T (f ) =
exp (−nTAn(f ))∑g∈F0
exp (−nTAn(g)),
T−1 : temperature parameter.
Cumulative Aggregate with Exponential Weights (CAEW) :(Catoni,Yang,...)
f CAEWn,T =
1
n
n∑k=1
f AEWk,T .
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Aim of Aggregation(1) : Optimal rate of aggregation.
Definition
∀F0 = {f1, . . . , fM} ⊆ F , ∃fn such that ∀π ∈ P, ∀n ≥ 1
E[A(fn)− A∗
]≤ min
f∈F0
(A(f )− A∗) + C0γ(n,M).
∃F0 = {f1, . . . , fM} such that for any aggregate fn, ∃π ∈ P, ∀n ≥ 1
E[A(fn)− A∗
]≥ min
f∈F0
(A(f )− A∗) + C1γ(n,M).
γ(n,M) is an optimal rate of aggregation and fn is an optimalaggregation procedure.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Aim of Aggregation(1) : Optimal rate of aggregation.
Definition
∀F0 = {f1, . . . , fM} ⊆ F , ∃fn such that ∀π ∈ P, ∀n ≥ 1
E[A(fn)− A∗
]≤ min
f∈F0
(A(f )− A∗) + C0γ(n,M).
∃F0 = {f1, . . . , fM} such that for any aggregate fn, ∃π ∈ P, ∀n ≥ 1
E[A(fn)− A∗
]≥ min
f∈F0
(A(f )− A∗) + C1γ(n,M).
γ(n,M) is an optimal rate of aggregation and fn is an optimalaggregation procedure.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Aim of Aggregation(1) : Optimal rate of aggregation.
Definition
∀F0 = {f1, . . . , fM} ⊆ F , ∃fn such that ∀π ∈ P, ∀n ≥ 1
E[A(fn)− A∗
]≤ min
f∈F0
(A(f )− A∗) + C0γ(n,M).
∃F0 = {f1, . . . , fM} such that for any aggregate fn, ∃π ∈ P, ∀n ≥ 1
E[A(fn)− A∗
]≥ min
f∈F0
(A(f )− A∗) + C1γ(n,M).
γ(n,M) is an optimal rate of aggregation and fn is an optimalaggregation procedure.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Aim of Aggregation(2) : Adaptation.
Definition (Oracle Inequality)
∀F0 = {f1, . . . , fM} ⊆ F , ∃fn such that ∀π ∈ P, ∀n ≥ 1
E[A(fn)− A∗
]≤ C min
f∈F0
(A(f )− A∗) + C0γ(n,M),
where C ≥ 1.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Continuous scale of loss functions.
Classification problem : Aφ(f ) = E[φ(Yf (X ))],Y ∈ {−1, 1},X ∈ X .
φ(x) = φh(x) =
{(1− h)φ0(x) + hφ1(x) if 0 ≤ h ≤ 1(h − 1)x2 − x + 1 if h > 1,
∀x ∈ R
where φ0(z) = 1I(z≤0) is the 0− 1 loss and φ1(z) = max(0, 1− z) is theHinge loss.
h = 0
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Continuous scale of loss functions.
Classification problem : Aφ(f ) = E[φ(Yf (X ))],Y ∈ {−1, 1},X ∈ X .
φ(x) = φh(x) =
{(1− h)φ0(x) + hφ1(x) if 0 ≤ h ≤ 1(h − 1)x2 − x + 1 if h > 1,
∀x ∈ R
where φ0(z) = 1I(z≤0) is the 0− 1 loss and φ1(z) = max(0, 1− z) is theHinge loss.
h = 1/3
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Continuous scale of loss functions.
Classification problem : Aφ(f ) = E[φ(Yf (X ))],Y ∈ {−1, 1},X ∈ X .
φ(x) = φh(x) =
{(1− h)φ0(x) + hφ1(x) if 0 ≤ h ≤ 1(h − 1)x2 − x + 1 if h > 1,
∀x ∈ R
where φ0(z) = 1I(z≤0) is the 0− 1 loss and φ1(z) = max(0, 1− z) is theHinge loss.
h = 2/3
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Continuous scale of loss functions.
Classification problem : Aφ(f ) = E[φ(Yf (X ))],Y ∈ {−1, 1},X ∈ X .
φ(x) = φh(x) =
{(1− h)φ0(x) + hφ1(x) if 0 ≤ h ≤ 1(h − 1)x2 − x + 1 if h > 1,
∀x ∈ R
where φ0(z) = 1I(z≤0) is the 0− 1 loss and φ1(z) = max(0, 1− z) is theHinge loss.
h = 1
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Continuous scale of loss functions.
Classification problem : Aφ(f ) = E[φ(Yf (X ))],Y ∈ {−1, 1},X ∈ X .
φ(x) = φh(x) =
{(1− h)φ0(x) + hφ1(x) if 0 ≤ h ≤ 1(h − 1)x2 − x + 1 if h > 1,
∀x ∈ R
where φ0(z) = 1I(z≤0) is the 0− 1 loss and φ1(z) = max(0, 1− z) is theHinge loss.
h = 2
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Continuous scale of loss functions.
Classification problem : Aφ(f ) = E[φ(Yf (X ))],Y ∈ {−1, 1},X ∈ X .
φ(x) = φh(x) =
{(1− h)φ0(x) + hφ1(x) if 0 ≤ h ≤ 1(h − 1)x2 − x + 1 if h > 1,
∀x ∈ R
where φ0(z) = 1I(z≤0) is the 0− 1 loss and φ1(z) = max(0, 1− z) is theHinge loss.
h = 3
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
ORA in classification
Loss function 0 ≤ h < 1 h = 1 h > 1Optimal rate of aggregation(ORA)
√log M
n
√log M
nlog M
n
Optimal aggregation proce-dure
ERM ERM, AEW, CAEW CAEW
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
ORA in classification
Loss function 0 ≤ h < 1 h = 1 h > 1Optimal rate of aggregation(ORA)
√log M
n
√log M
nlog M
n
Optimal aggregation proce-dure
ERM ERM, AEW, CAEW CAEW
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
ORA in classification
Loss function 0 ≤ h < 1 h = 1 h > 1Optimal rate of aggregation(ORA)
√log M
n
√log M
nlog M
n
Optimal aggregation proce-dure
ERM ERM, AEW, CAEW CAEW
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
ORA in classification
Loss function 0 ≤ h < 1 h = 1 h > 1Optimal rate of aggregation(ORA)
√log M
n
√log M
nlog M
n
Optimal aggregation proce-dure
ERM ERM, AEW, CAEW CAEW
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
ORA in classification
Loss function 0 ≤ h < 1 h = 1 h > 1Optimal rate of aggregation(ORA)
√log M
n
√log M
nlog M
n
Optimal aggregation proce-dure
ERM ERM, AEW, CAEW CAEW
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
ORA in classification
Loss function 0 ≤ h < 1 h = 1 h > 1Optimal rate of aggregation(ORA)
√log M
n
√log M
nlog M
n
Optimal aggregation proce-dure
ERM ERM, AEW, CAEW CAEW
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
2 Questions.
Question 1 : Why is there such a breakdown just after the Hinge loss ?
Question 2 : Do we really need aggregation procedures with exponentialweights to achieve the optimal rates of aggregation ?
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
2 Questions.
Question 1 : Why is there such a breakdown just after the Hinge loss ?
0 ≤ h ≤ 1,
√log M
n−→ log M
n, h > 1.
Question 2 : Do we really need aggregation procedures with exponentialweights to achieve the optimal rates of aggregation ?
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
2 Questions.
Question 1 : Why is there such a breakdown just after the Hinge loss ?
0 ≤ h ≤ 1,
√log M
n−→ log M
n, h > 1.
Question 2 : Do we really need aggregation procedures with exponentialweights to achieve the optimal rates of aggregation ?
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
2 Questions.
Question 1 : Why is there such a breakdown just after the Hinge loss ?
0 ≤ h ≤ 1,
√log M
n−→ log M
n, h > 1.
ERM −→ CAEW
Question 2 : Do we really need aggregation procedures with exponentialweights to achieve the optimal rates of aggregation ?
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Question 1. Why there is a breakdown at h = 1?
Margin assumption for the loss function φ :
The probability measure π satisfies the φ−margin assumption φ−MA(κ),with margin parameter κ ≥ 1 if
E[(φ(Yf (X ))− φ(Yf φ∗(X )))2] ≤ cφ(Aφ(f )− Aφ∗)1/κ,
for any f : X 7−→ R.
cf. Mammen and Tsybakov 99 (discriminant analysis) and Tsybakov 04(classification).
φ0 −MA(κ) ⇐⇒ P[|2η(X )− 1| ≤ t] ≤ tα,∀0 < t < 1, α =1
κ− 1
η(x) = P[Y = 1|X = x ]
(κ = 1 ⇐⇒ ∃h > 0, |2η(X )− 1| ≥ h)
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Question 1. Why there is a breakdown at h = 1?
Margin assumption for the loss function φ :
The probability measure π satisfies the φ−margin assumption φ−MA(κ),with margin parameter κ ≥ 1 if
E[(φ(Yf (X ))− φ(Yf φ∗(X )))2] ≤ cφ(Aφ(f )− Aφ∗)1/κ,
for any f : X 7−→ R.
cf. Mammen and Tsybakov 99 (discriminant analysis) and Tsybakov 04(classification).
φ0 −MA(κ) ⇐⇒ P[|2η(X )− 1| ≤ t] ≤ tα,∀0 < t < 1, α =1
κ− 1
η(x) = P[Y = 1|X = x ]
(κ = 1 ⇐⇒ ∃h > 0, |2η(X )− 1| ≥ h)
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Question 1. Why there is a breakdown at h = 1?
Margin assumption for the loss function φ :
The probability measure π satisfies the φ−margin assumption φ−MA(κ),with margin parameter κ ≥ 1 if
E[(φ(Yf (X ))− φ(Yf φ∗(X )))2] ≤ cφ(Aφ(f )− Aφ∗)1/κ,
for any f : X 7−→ R.
cf. Mammen and Tsybakov 99 (discriminant analysis) and Tsybakov 04(classification).
φ0 −MA(κ) ⇐⇒ P[|2η(X )− 1| ≤ t] ≤ tα,∀0 < t < 1, α =1
κ− 1
η(x) = P[Y = 1|X = x ]
(κ = 1 ⇐⇒ ∃h > 0, |2η(X )− 1| ≥ h)
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Question 1. Why there is a breakdown at h = 1?
κ = +∞ for any 0 ≤ h ≤ 1.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Question 1. Why there is a breakdown at h = 1?
κ = +∞ for any 0 ≤ h ≤ 1.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Question 1. Why there is a breakdown at h = 1?
κ = +∞ for any 0 ≤ h ≤ 1.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Question 1. Why there is a breakdown at h = 1?
κ = +∞ for any 0 ≤ h ≤ 1.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Question 1. Why there is a breakdown at h = 1?
κ = 1 for any h > 1.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Question 1. Why there is a breakdown at h = 1?
κ = 1 for any h > 1.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Question 2 : Do we really need agg. with exp. weights ?
Theorem (suboptimality of selectors)
For any M ≥ 2, φ : R 7−→ R s.t. φ(−1) 6= φ(1),∃f1, . . . , fM : X 7−→ {−1, 1} s.t. for any selector fn, ∃π s.t.
E[Aφ(fn)− Aφ∗
]≥ min
j=1,...,M
(Aφ(fj)− Aφ∗) + C
√log M
n.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Question 2 : Do we really need agg. with exp. weights ?
Theorem (suboptimality of selectors under the margin assumption)
For any M ≥ 2, κ ≥ 1, φ : R 7−→ R s.t. φ(−1) 6= φ(1),∃f1, . . . , fM : X 7−→ {−1, 1} s.t. for any selector fn, ∃π satisfying theφ0−MA(κ) s.t.
E[Aφ(fn)− Aφ∗
]≥ min
j=1,...,M
(Aφ(fj)− Aφ∗) + C
( log M
n
) κ2κ−1
.
√log M
n>>
( log M
n
) κ2κ−1
>>log M
n, 1 < κ < ∞.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Question 2 : Do we really need agg. with exp. weights ?
Suboptimality of Penalized ERM.
For any M ≥ 2, κ > 1 and φ : R 7−→ R s.t. φ(−1) 6= φ(1),∃f1, . . . , fM : X 7−→ {−1, 1}, ∃π satisfying the φ0−MA(κ) s.t. the pERMaggregate
f pERMn ∈ Arg min
j=1,...,M(Aφ
n (fj) + pen(fj)),
where |pen(f )| < 16
√log M
n , satisfies
E[Aφ(f pERM
n )− Aφ∗]≥ min
j=1,...,M
(Aφ(fj)− Aφ∗) + C
√log M
n
if√
M log M ≤√
n/(132e3), for any integer n ≥ 1.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Theorem (Exact Oracle Inequalities for the general framework)
F0 = {f1, . . . , fM} ⊆ F . Assume that π satisfies (MA)(κ, c,F0) MH and|Q(Z , f )− Q(Z , f ∗)| ≤ K ,∀f ∈ F0.
E[A(f ERMn )− A∗] ≤ min
f∈F0
(A(f )− A∗) + 4γ(n,M, κ,B),
where the residual γ(n,M, κ,B) equals to(B
1κ log Mβ1n
)1/2
if B ≥(
log Mβ1n
) κ2κ−1
(log Mβ1n
) κ2κ−1
otherwise,
where B = minf∈F0 (A(f )− A∗) and β1 > 0.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Theorem (Exact Oracle Inequalities for the general framework)
F0 = {f1, . . . , fM} ⊆ F . Assume that π satisfies (MA)(κ, c,F0) MH and|Q(Z , f )− Q(Z , f ∗)| ≤ K ,∀f ∈ F0.If Q(z , ·) is convex for any z ∈ Z, then
E[A(f AEWn )− A∗] ≤ min
f∈F0
(A(f )− A∗) + 4γ(n,M, κ,B),
where the residual γ(n,M, κ,B) equals to(B
1κ log Mβ1n
)1/2
if B ≥(
log Mβ1n
) κ2κ−1
(log Mβ1n
) κ2κ−1
otherwise,
where B = minf∈F0 (A(f )− A∗) and β1 > 0.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Oracle Inequality in density estimation
Corollary. In density estimation (Margin parameter κ = 1.)
Let f1, . . . , fM : X 7−→ [0,B]. Assume that f ∗ is bounded by B. For anyε > 0, we have
E[||f ∗ − f AEWn ||2L2(µ)] ≤ (1 + ε) min
j=1,...,M(||f ∗ − fj ||2L2(µ)) +
C
ε
log M
n.
Aggregation of wavelet thresholded estimators⇓
adaptive minimax procedure over all Besov Balls Bsp,∞ for s > 1/p.
(Chesneau, L. (2007))
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Oracle Inequality in density estimation
Corollary. In density estimation (Margin parameter κ = 1.)
Let f1, . . . , fM : X 7−→ [0,B]. Assume that f ∗ is bounded by B. For anyε > 0, we have
E[||f ∗ − f AEWn ||2L2(µ)] ≤ (1 + ε) min
j=1,...,M(||f ∗ − fj ||2L2(µ)) +
C
ε
log M
n.
Aggregation of wavelet thresholded estimators⇓
adaptive minimax procedure over all Besov Balls Bsp,∞ for s > 1/p.
(Chesneau, L. (2007))
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Oracle Inequality in the regression framework
Corollary. In bounded regression (Margin parameter κ = 1.)
Let f1, . . . , fM : X 7−→ [0, 1]. For any ε > 0, we have
E[||f ∗ − f AEWn ||2L2(PX )] ≤ (1 + ε) min
j=1,...,M(||f ∗ − fj ||2L2(PX )) +
C
ε
log M
n.
Aggregation of wavelet thresholded estimators⇓
adaptive minimax procedure over all Besov Balls Bsp,∞ for s > 1/p.
(Chesneau, L. (2007))
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Oracle Inequality in the regression framework
Corollary. In bounded regression (Margin parameter κ = 1.)
Let f1, . . . , fM : X 7−→ [0, 1]. For any ε > 0, we have
E[||f ∗ − f AEWn ||2L2(PX )] ≤ (1 + ε) min
j=1,...,M(||f ∗ − fj ||2L2(PX )) +
C
ε
log M
n.
Aggregation of wavelet thresholded estimators⇓
adaptive minimax procedure over all Besov Balls Bsp,∞ for s > 1/p.
(Chesneau, L. (2007))
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Oracle Inequality in classification
Corollary. In classification under the Margin Assumption.
Let f1, . . . , fM : X 7−→ [−1, 1] and κ ≥ 1. For any π satisfying the marginassumption φ0−MA(κ) and any ε > 0, we have
E[A1(fAEWn )− A∗1 ] ≤ (1 + ε) min
j=1,...,M(A1(fj)− A∗1) + C
( log M
εn
) κ2κ−1
.
Aggregation of SVM classifiers (or plug-in classifiers)⇓
procedure adaptive simultaneously to the complexity and to the marginparameters.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Oracle Inequality in classification
Corollary. In classification under the Margin Assumption.
Let f1, . . . , fM : X 7−→ [−1, 1] and κ ≥ 1. For any π satisfying the marginassumption φ0−MA(κ) and any ε > 0, we have
E[A1(fAEWn )− A∗1 ] ≤ (1 + ε) min
j=1,...,M(A1(fj)− A∗1) + C
( log M
εn
) κ2κ−1
.
Aggregation of SVM classifiers (or plug-in classifiers)⇓
procedure adaptive simultaneously to the complexity and to the marginparameters.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
(Gaıffas, L. (2007))
Single-index model
(X ,Y ) ∈ Rd × R, Y = g(X ) + σ(X )ε
where ∃ϑ ∈ Sd−1+ =
{v ∈ Rd | ‖v‖2 = 1 et vd ≥ 0
}(index) and a
function f : R 7−→ R (link function) s.t.
g(x) = f (ϑ>x).
ε ⊥ X and ε ∼ N(0, 1) ;
σ0 ≤ σ(·) ≤ σ1 (σ1 known) ;
Aim : estimation of g from n observations Dn := [(Xi ,Yi ); 1 ≤ i ≤ n]
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
(Gaıffas, L. (2007))
Single-index model
(X ,Y ) ∈ Rd × R, Y = g(X ) + σ(X )ε
where ∃ϑ ∈ Sd−1+ =
{v ∈ Rd | ‖v‖2 = 1 et vd ≥ 0
}(index) and a
function f : R 7−→ R (link function) s.t.
g(x) = f (ϑ>x).
ε ⊥ X and ε ∼ N(0, 1) ;
σ0 ≤ σ(·) ≤ σ1 (σ1 known) ;
Aim : estimation of g from n observations Dn := [(Xi ,Yi ); 1 ≤ i ≤ n]
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Reduction of dimension
Without the assumption of Single-Index and if g ∈ Hd(s) (Holder class)⇓
n−s/(2s+d) is the minimax rate of convergence.
“Open”question 2 of Stone (82)
is n−s/(2s+1) the minimax rate of estimation of the regression function inthe Single-Index model, if the link function f belongs to a s-Holder Class ?
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Reduction of dimension
Without the assumption of Single-Index and if g ∈ Hd(s) (Holder class)⇓
n−s/(2s+d) is the minimax rate of convergence.
“Open”question 2 of Stone (82)
is n−s/(2s+1) the minimax rate of estimation of the regression function inthe Single-Index model, if the link function f belongs to a s-Holder Class ?
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
assumption on the Design PX
Assumption (D)
PX is compactly supported and PX << λd
For any v ∈ Sd−1+ , if µ := dPv>X/Leb :
µ is continuous ;Card{z ∈ Supp µ s.t. µ(z) = 0} < ∞ ;if µ(z) = 0, then µ is U-shaped around z ;
∃β ≥ 0, γ > 0 s.t. ∀v ∈ Sd−1+ and ∀ interval I ⊂ SuppPv>X ;
Pv>X [I ] ≥ γ|I |β+1.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Holder Balls
1-dimensional Holder Balls H(s, L) (regularity of the link function)
H(s, L) is made of all functions f : R 7−→ R bsc−times differentiablesatisfying ∀z1, z2 ∈ R
|f (bsc)(z1)− f (bsc)(z2)| ≤ L|z1 − z2|s−bsc.
For Q > 0, define
HQ(s, L) := H(s, L) ∩ {f | ‖f ‖∞ := supx|f (x)| ≤ Q}.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Upper bound
Theorem
If PX satisfies assumption (D), we can construct an estimator gsatisfying for any s ∈ [smin, smax] :
supϑ∈Sd−1
+
supf∈HQ(s,L)
E n‖g − g‖2L2(PX ) ≤ Cn−2s/(2s+1),
where g(·) = f (ϑ>·). The constant C > 0 depends on σ1, L, smin, smax
and PX .
g adapts both to the index and the regularity.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Lower bound
Theorem
Let s, L,Q > 0 and PX satisfying assumption (D). For any ϑ ∈ Sd−1+ :
infg
supf∈H(s,L)
E n‖g − g‖2L2(PX ) ≥ C ′n−2s/(2s+1)
where inf g denotes the infimum over all estimator g constructed on Dn.
n−2s/(2s+1) is the minimax rate of convergence in the single-indexmodel conjectured by Stone (1982).
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Construction of the estimator
We split the sample in two subsamples
Training sample :
Dm := [(Xi ,Yi ); 1 ≤ i ≤ m] (for instance m = 3n/4)
Learning sample :
D(m) := [(Xi ,Yi );m + 1 ≤ i ≤ n].
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
weak estimators : local polynomial estimators
Construction of the weak estimators : We fixe a parameter
λ = (v , s) ∈ Λn = Sd−1+ (∆n)× [smin, smin + (log n)−1, . . . , smax ],
where ∆n = (n log n)−1/(2smin).
We work with the data projected in the direction v :
Dm(v) = [(v>Xi ,Yi ); 1 ≤ i ≤ m];
Construction of a 1-dimensional polynomial estimator f (λ)(·) withthe data Dm(v) for the regularity s LPE .
g (λ)(x) := τQ
(f (λ)(v>x)
)where τQ(g) := max(−Q,min(Q, g)).
We do this for any λ ∈ Λn ! ReCo
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
weak estimators : local polynomial estimators
Construction of the weak estimators : We fixe a parameter
λ = (v , s) ∈ Λn = Sd−1+ (∆n)× [smin, smin + (log n)−1, . . . , smax ],
where ∆n = (n log n)−1/(2smin).
We work with the data projected in the direction v :
Dm(v) = [(v>Xi ,Yi ); 1 ≤ i ≤ m];
Construction of a 1-dimensional polynomial estimator f (λ)(·) withthe data Dm(v) for the regularity s LPE .
g (λ)(x) := τQ
(f (λ)(v>x)
)where τQ(g) := max(−Q,min(Q, g)).
We do this for any λ ∈ Λn ! ReCo
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
weak estimators : local polynomial estimators
Construction of the weak estimators : We fixe a parameter
λ = (v , s) ∈ Λn = Sd−1+ (∆n)× [smin, smin + (log n)−1, . . . , smax ],
where ∆n = (n log n)−1/(2smin).
We work with the data projected in the direction v :
Dm(v) = [(v>Xi ,Yi ); 1 ≤ i ≤ m];
Construction of a 1-dimensional polynomial estimator f (λ)(·) withthe data Dm(v) for the regularity s LPE .
g (λ)(x) := τQ
(f (λ)(v>x)
)where τQ(g) := max(−Q,min(Q, g)).
We do this for any λ ∈ Λn ! ReCo
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
weak estimators : local polynomial estimators
Construction of the weak estimators : We fixe a parameter
λ = (v , s) ∈ Λn = Sd−1+ (∆n)× [smin, smin + (log n)−1, . . . , smax ],
where ∆n = (n log n)−1/(2smin).
We work with the data projected in the direction v :
Dm(v) = [(v>Xi ,Yi ); 1 ≤ i ≤ m];
Construction of a 1-dimensional polynomial estimator f (λ)(·) withthe data Dm(v) for the regularity s LPE .
g (λ)(x) := τQ
(f (λ)(v>x)
)where τQ(g) := max(−Q,min(Q, g)).
We do this for any λ ∈ Λn ! ReCo
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Aggregation of the weak estimators
Adaptation to the regularity and the index by aggregation. Once wehave the dictionary {g (λ);λ ∈ Λn} of weak estimators.
Empirical risk of g :
A(m)(g) :=n∑
i=m+1
(Yi − g(Xi ))2. (1)
For a temperature T−1 > 0, we put a Gibbs measure Gibbs on{g (λ);λ ∈ Λn} :
w(g) :=exp
(− TA(m)(g)
)∑λ∈Λn
exp(− TA(m)(g (λ))
) . (2)
the final estimator is the AEW aggregate of local polynomialestimators :
g :=∑λ∈Λn
w(g (λ))g (λ).
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Remarks on the temperature parameter
if T−1 large ⇒exponential weights are close to the uniform weights.
if T−1 small ⇒all the weights equal zero except one : g is the ERM.
T is a parameter of a trade-off between an aggregate with uniformweights and the ERM.
Quality of the aggregate depends on the choice of T .
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Remarks on the temperature parameter
if T−1 large ⇒exponential weights are close to the uniform weights.
if T−1 small ⇒all the weights equal zero except one : g is the ERM.
T is a parameter of a trade-off between an aggregate with uniformweights and the ERM.
Quality of the aggregate depends on the choice of T .
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Remarks on the temperature parameter
if T−1 large ⇒exponential weights are close to the uniform weights.
if T−1 small ⇒all the weights equal zero except one : g is the ERM.
T is a parameter of a trade-off between an aggregate with uniformweights and the ERM.
Quality of the aggregate depends on the choice of T .
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Aggregation phenomenon depending on the temperature
Weights {w(g (λ)) : λ ∈ Λn} on S1+(∆n) for ϑ = (1/
√2, 1/
√2) and
T = 0.05, 0.2, 0.5, 10
●●●●●●●●●●●●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●●●●●●●●●●●●●
−1.0 −0.5 0.0 0.5 1.0
0.0
0.5
1.0
1.5
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Aggregation phenomenon depending on the temperature
Weights {w(g (λ)) : λ ∈ Λn} on S1+(∆n) for ϑ = (1/
√2, 1/
√2) and
T = 0.05, 0.2, 0.5, 10
●●●●●●●●●●●
●●●
●●
●●
●●
●●
●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●●●●●●●●●●●●●
−1.0 −0.5 0.0 0.5 1.0
0.0
0.5
1.0
1.5
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Aggregation phenomenon depending on the temperature
Weights {w(g (λ)) : λ ∈ Λn} on S1+(∆n) for ϑ = (1/
√2, 1/
√2) and
T = 0.05, 0.2, 0.5, 10
●●●●●●●●●●●●●●
●●
●●
●●
●●
●●
●
●
●
●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●●●●●●●●●●●●
−1.0 −0.5 0.0 0.5 1.0
0.0
0.5
1.0
1.5
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Aggregation phenomenon depending on the temperature
Weights {w(g (λ)) : λ ∈ Λn} on S1+(∆n) for ϑ = (1/
√2, 1/
√2) and
T = 0.05, 0.2, 0.5, 10
●●●●●●●●●●●●●●●●
●●
●●
●●
●●
●●
●
●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●
●●
●●
●●
●●
●●
●●●●●●●●●●●●●●●●●
−1.0 −0.5 0.0 0.5 1.0 1.5
0.0
0.5
1.0
1.5
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Aggregation phenomenon depending on the temperature
Weights {w(g (λ)) : λ ∈ Λ} on S2+(∆n) for ϑ = (0, 0, 1) and T = 0.05, 0.3, 0.5, 10
●●●●
●●●●
●● ●
●●●
●●
●●
● ●●
●●●
● ●
●●
● ●●●
●●
●
●●
●
●●
●●
●●
●●
●●●
●●
●
●●
●●
●
●●
●
● ●
●
●●●
●
●●
●●
●
●●
●●
●●
●
●
●
●●
●
●
●●●●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
● ●● ●
●●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
● ●●
●●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●●
●
●
●
● ●●
●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●
● ●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●●
●
●
● ●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
● ●
●●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
● ●●●
●●
●
●
●
●
● ●
●
●
●
● ●
●
●●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●●
●●●
●
●●
●
●
●
●●
●●● ●
●
●●
●
●●
●
●●● ●
●
●●
●
●
●
● ●●●
●
●●
●
●
●●
●●
●●
●
●
●●
●
●●
●●
●●
●●●
●●●
●●
●● ●
●●
●
●
●●
● ●●
●
●●●● ●●●● ●●
−1.0 −0.5 0.0 0.5 1.0−1.0−0.50.00.51.00.0
0.5
1.0
1.5
2.0
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Aggregation phenomenon depending on the temperature
Weights {w(g (λ)) : λ ∈ Λ} on S2+(∆n) for ϑ = (0, 0, 1) and T = 0.05, 0.3, 0.5, 10
●●●●
●●●●
●● ●
●●●
●●
●●
● ●●
●●●
● ●
●●
● ●●●
●●
●
●●
●
●
●●
●●
●●
●●
●●●
●●
●
●●
●●
●
●●
●
● ●
●
●●●
●
●●
●●
●
●●
●●
●●
●
●
●
●●
●
●
●●●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●● ●
●●
●
●●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
● ●●
●●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●●●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●●
●
●
●
● ●●
●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●
● ●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●●●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●●
●
●
● ●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
● ●
●●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●●● ●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
● ●●●
●●
●
●
●
●
● ●
●
●
●
● ●
●
●●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●●
●●●
●
●●
●
●
●
●●
●●● ●
●
●●
●
●●
●
●●● ●
●
●●
●
●
●
● ●●●
●
●●
●
●
●●
●●
●●
●
●
●●
●
●●
●●
●●
●●●
●●●
●●
●● ●
●●
●
●
●●
● ●●
●
●●●● ●●●● ●●
−1.0 −0.5 0.0 0.5 1.0−1.0−0.50.00.51.00.0
0.5
1.0
1.5
2.0
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Aggregation phenomenon depending on the temperature
Weights {w(g (λ)) : λ ∈ Λ} on S2+(∆n) for ϑ = (0, 0, 1) and T = 0.05, 0.3, 0.5, 10
●●●●
●●●●
●● ●
●●●
●●
●●
● ●●
●●●
● ●
●●
● ●●●
●●
●
●●
●
●
●●
●●
●●
●●
●●●
●●
●
●●
●●
●
●●
●
● ●
●
●●●
●
●●
●●
●
●●
●●
●●
●
●
●
●●
●
●
●●●●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
● ●● ●
●●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
● ●●
●●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●●
●
●
●
● ●●
●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●
● ●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●●
●
●
● ●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
● ●
●●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
● ●●●
●●
●
●
●
●
● ●
●
●
●
● ●
●
●●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●●
●
●
●●
●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●●
●●●
●
●●
●
●
●
●●
●●●●
●
●●
●
●●
●
●●●●
●
●●
●
●
●
●●●●
●
●●
●
●
●●
●●
●●
●
●
●●
●
●●
●●
●●
●●●
●●●
●●
●● ●
●●
●
●
●●
● ●●
●
●●●● ●●●● ●●
−1.0 −0.5 0.0 0.5 1.0−1.0−0.50.00.51.00.0
0.5
1.0
1.5
2.0
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Aggregation phenomenon depending on the temperature
Weights {w(g (λ)) : λ ∈ Λ} on S2+(∆n) for ϑ = (0, 0, 1) and T = 0.05, 0.3, 0.5, 10
●●●●
●●●●
●● ●
●●●
●●
●●
● ●●
●●●
● ●
●●
● ●●●
●●
●
●●
●
●
●●
●●
●●
●●
●●●
●●
●
●●
●●
●
●●
●
● ●
●
●●●
●
●●
●●
●
●●
●●
●●
●
●
●
●●
●
●
●●●●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
● ●● ●
●●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
● ●●
●●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●●
●
●
●
● ●●
●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●
● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●●
●
●
● ●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
● ●
●●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
● ●●●
●●
●
●
●
●
● ●
●
●
●
● ●
●
●●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●●
●
●
●●
●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●●
●●●
●
●●
●
●
●
●●
●●●●
●
●●
●
●●
●
●●●●
●
●●
●
●
●
●●●●
●
●●
●
●
●●
●●
●●
●
●
●●
●
●●
●●
●●
●●●
●●●
●●
●● ●
●●
●
●
●●
● ●●
●
●●●● ●●●● ●●
−1.0 −0.5 0.0 0.5 1.0−1.0−0.50.00.51.00.0
0.5
1.0
1.5
2.0
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
For 100 simulations
MISE against T for f = hardsine, ϑ = (2/√
14, 1/√
14, 3/√
14) and n = 200
0 5 10 15 20 25 30
0.01
00.
020
0.03
0
T
MIS
E
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
For 100 simulations
MISE against T for f = hardsine, ϑ = (2/√
14, 1/√
14, 3/√
14) and n = 400
0 5 10 15 20 25 30
0.00
50.
010
0.01
50.
020
T
MIS
E
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
For 100 simulations
MISE against T (f = hardsine, ϑ = (2/√
14, 1/√
14, 3/√
14))
n \ T 0.1 0.5 0.7 1.0 1.5 2.0 ERM aggCVT
100 0.029 0.021 0.019 0.018 0.017 0.018 0.037 0.020(.011) (.008) (.008) (.007) (.008) (.009) (.022) (.008)
200 0.016 0.010 0.010 0.009 0.009 0.010 0.026 0.010(.005) (.003) (.003) (.002) (.002) (.003) (0.008) (.003)
400 0.007 0.006 0.005 0.005 0.006 0.007 0.017 0.006(.002) (.001) (.001) (.001) (.001) (.002) (.003) (.001)
MISE against T (f = hardsine, ϑ = (1/√
21,−2/√
21, 0, 4/√
21))
n \ T 0.1 0.5 0.7 1.0 1.5 2.0 ERM aggCVT
100 0.038 0.027 0.021 0.019 0.017 0.017 0.038 0.020(.016) (.010) (.009) (.008) (.007) (.007) (.025) (.010)
200 0.019 0.013 0.012 0.012 0.013 0.014 0.031 0.013(.014) (.009) (.010) (.011) (.012) (.012) (.016) (.010)
400 0.009 0.006 0.005 0.005 0.006 0.007 0.017 0.006(.002) (.001) (.001) (.001) (.001) (.002) (.004) (.001)
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Some perspectives.
To introduce a margin parameter in the problem of prediction ofindividual sequences.
To construct of aggregation methods in a framework withoutempirical risk (pointwise estimation, Lp−risk for p 6= 2, ...).
To explore the quality of some randomized aggregates fornon-convex losses :
f Randn = fj , with probability wT (fj).
To explore the quality of aggregation of the ERM over the convexclass of the dictionary :
fn ∈ Arg minf∈C
An(f ), where C = {M∑
j=1
λj fj ;λj ≥ 0,∑
λj = 1}.
To construct of sparse aggregate for models selection.
To find the optimal Temperature parameter.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Some perspectives.
To introduce a margin parameter in the problem of prediction ofindividual sequences.
To construct of aggregation methods in a framework withoutempirical risk (pointwise estimation, Lp−risk for p 6= 2, ...).
To explore the quality of some randomized aggregates fornon-convex losses :
f Randn = fj , with probability wT (fj).
To explore the quality of aggregation of the ERM over the convexclass of the dictionary :
fn ∈ Arg minf∈C
An(f ), where C = {M∑
j=1
λj fj ;λj ≥ 0,∑
λj = 1}.
To construct of sparse aggregate for models selection.
To find the optimal Temperature parameter.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Some perspectives.
To introduce a margin parameter in the problem of prediction ofindividual sequences.
To construct of aggregation methods in a framework withoutempirical risk (pointwise estimation, Lp−risk for p 6= 2, ...).
To explore the quality of some randomized aggregates fornon-convex losses :
f Randn = fj , with probability wT (fj).
To explore the quality of aggregation of the ERM over the convexclass of the dictionary :
fn ∈ Arg minf∈C
An(f ), where C = {M∑
j=1
λj fj ;λj ≥ 0,∑
λj = 1}.
To construct of sparse aggregate for models selection.
To find the optimal Temperature parameter.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Some perspectives.
To introduce a margin parameter in the problem of prediction ofindividual sequences.
To construct of aggregation methods in a framework withoutempirical risk (pointwise estimation, Lp−risk for p 6= 2, ...).
To explore the quality of some randomized aggregates fornon-convex losses :
f Randn = fj , with probability wT (fj).
To explore the quality of aggregation of the ERM over the convexclass of the dictionary :
fn ∈ Arg minf∈C
An(f ), where C = {M∑
j=1
λj fj ;λj ≥ 0,∑
λj = 1}.
To construct of sparse aggregate for models selection.
To find the optimal Temperature parameter.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Some perspectives.
To introduce a margin parameter in the problem of prediction ofindividual sequences.
To construct of aggregation methods in a framework withoutempirical risk (pointwise estimation, Lp−risk for p 6= 2, ...).
To explore the quality of some randomized aggregates fornon-convex losses :
f Randn = fj , with probability wT (fj).
To explore the quality of aggregation of the ERM over the convexclass of the dictionary :
fn ∈ Arg minf∈C
An(f ), where C = {M∑
j=1
λj fj ;λj ≥ 0,∑
λj = 1}.
To construct of sparse aggregate for models selection.
To find the optimal Temperature parameter.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Some perspectives.
To introduce a margin parameter in the problem of prediction ofindividual sequences.
To construct of aggregation methods in a framework withoutempirical risk (pointwise estimation, Lp−risk for p 6= 2, ...).
To explore the quality of some randomized aggregates fornon-convex losses :
f Randn = fj , with probability wT (fj).
To explore the quality of aggregation of the ERM over the convexclass of the dictionary :
fn ∈ Arg minf∈C
An(f ), where C = {M∑
j=1
λj fj ;λj ≥ 0,∑
λj = 1}.
To construct of sparse aggregate for models selection.
To find the optimal Temperature parameter.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Some perspectives.
To introduce a margin parameter in the problem of prediction ofindividual sequences.
To construct of aggregation methods in a framework withoutempirical risk (pointwise estimation, Lp−risk for p 6= 2, ...).
To explore the quality of some randomized aggregates fornon-convex losses :
f Randn = fj , with probability wT (fj).
To explore the quality of aggregation of the ERM over the convexclass of the dictionary :
fn ∈ Arg minf∈C
An(f ), where C = {M∑
j=1
λj fj ;λj ≥ 0,∑
λj = 1}.
To construct of sparse aggregate for models selection.
To find the optimal Temperature parameter.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Margin assumption in the general framework
Margin Assumption :
The probability measure π satisfies the margin assumption MA(κ, c,F0),where κ ≥ 1, c > 0 and F0 ⊂ F if
E[(Q(Z , f )− Q(Z , f ∗))2] ≤ c(A(f )− A∗)1/κ,
for any function f ∈ F0.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Where does this Gibbs measure come from ?
The weights w = (wλ)λ∈Λ := (w(g (λ)))λ∈Λ are solution of
min(R(m)(θ) +
1
T
∑λ∈Λ
θλ log θλ
∣∣ (θλ) ∈ C),
where Rm(θ) :=∑
λ∈Λ θλR(m)(g(λ)) and
C :={
(θλ)λ∈Λ s.t. θλ ≥ 0 and∑λ∈Λ
θλ = 1}
.
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Reduction of complexity : “preselection”of the weak estimators.
iterative Construction of the dictionary : step 1, 2, 3, 4 (ϑ = (2/√
14, 1/√
14, 3/√
14))
●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●
−1.0 −0.5 0.0 0.5 1.0−1.0−0.50.00.51.00.0
0.5
1.0
1.5
2.0
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Reduction of complexity : “preselection”of the weak estimators.
iterative Construction of the dictionary : step 1, 2, 3, 4 (ϑ = (2/√
14, 1/√
14, 3/√
14))
●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−1.0 −0.5 0.0 0.5 1.0−1.0−0.50.00.51.00.0
0.5
1.0
1.5
2.0
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Reduction of complexity : “preselection”of the weak estimators.
iterative Construction of the dictionary : step 1, 2, 3, 4 (ϑ = (2/√
14, 1/√
14, 3/√
14))
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
−1.0 −0.5 0.0 0.5 1.0−1.0−0.50.00.51.00.0
0.5
1.0
1.5
2.0
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
Reduction of complexity : “preselection”of the weak estimators.
iterative Construction of the dictionary : step 1, 2, 3, 4 (ϑ = (2/√
14, 1/√
14, 3/√
14))
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−1.0 −0.5 0.0 0.5 1.0−1.0−0.50.00.51.00.0
0.5
1.0
1.5
2.0
Aggregation methods: optimality and fast rates. Universite Paris 6
General Framework. Aggregations Procedures. Optimality in classification. Applications.
weak estimators : local polynomial estimators
Construction of the weak estimators : the parameterλ = (v , s) ∈ Λn = Sd−1
+ (∆n)× [smin, smin + (log n)−1, . . . , smax ] is fixedWe work with the data projected in the direction v :
Dm(v) = [(Zi ,Yi ); 1 ≤ i ≤ m] where Zi := v>Xi ;
If h > 0 (window), define P(z,h) ∈ Polr minimizing
m∑i=1
(Yi − P(Zi − z))21Zi∈[z−h,z+h],
with the window at the point z given by
Hm(z) := argminh>0
{Lhs ≥ σ1
(mPZ [z − h, z + h])1/2
}where PZ (A) := 1
m
∑mi=1 1Zi∈A
Set f (z) := P(z,Hm(z))(z), and for Q > 0 fixed and all x ∈ Rd ,
g (λ)(x) := τQ
(f (λ)(v>x)
)where τQ(g) := max(−Q,min(Q, g)).
We do this for any λ ∈ Λn ! ReCo
Aggregation methods: optimality and fast rates. Universite Paris 6