algorithmic aspects of large scale optimal experimental...
TRANSCRIPT
Algorithmic aspects of large scaleoptimal experimental design
Guillaume SAGNOL
Zuse Institut Berlin (ZIB)
February 8, 2012
Outline
1 Optimal Design of Experiments
2 Application to Network Monitoring
3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions
Outline
1 Optimal Design of Experiments
2 Application to Network Monitoring
3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions
Optimal Design of Experiments
We want to estimate the vector of parametersθ = [θ1, . . . , θm]T .
We have s experiments available:
y1 = A1θ + ε1 y4 = A4θ + ε4
y2 = A2θ + ε2 ...
y3 = A3θ + ε3 ys = Asθ + εs
GOAL:1 Choose how many times to perform each experiment.2 Continuous relaxation on the unit s-simplex: distribute the
experimental effort wi (with∑s
i=1 wi = 1).
Optimal Design of Experiments
We want to estimate the vector of parametersθ = [θ1, . . . , θm]T .We have s experiments available:
y1 = A1θ + ε1 y4 = A4θ + ε4
y2 = A2θ + ε2 ...
y3 = A3θ + ε3 ys = Asθ + εs
GOAL:1 Choose how many times to perform each experiment.2 Continuous relaxation on the unit s-simplex: distribute the
experimental effort wi (with∑s
i=1 wi = 1).
Optimal Design of Experiments
We want to estimate the vector of parametersθ = [θ1, . . . , θm]T .We have s experiments available:
y1 = A1θ + ε1 y4 = A4θ + ε4
y2 = A2θ + ε2 ...
y3 = A3θ + ε3 ys = Asθ + εs
GOAL:1 Choose how many times to perform each experiment.2 Continuous relaxation on the unit s-simplex: distribute the
experimental effort wi (with∑s
i=1 wi = 1).
minimizing the variance of the best estimator
Under classical assumptions(e.g. noises have unit variance and are independent),
Theorem [Gauss-Markov]
For every linear unbiased estimator θ̂ of the unknownparameter θ, we have:
Var[θ̂] �
(s∑
i=1
wiATi Ai
)−1
.
Moreover this lower bound is attained by the estimatorfrom least square theory.
Two common criterions
Definition: Information matrixM(w) :=
∑si=1 wiAT
i Ai is the information matrix of w .
Common approach: Minimize a scalar function ofM(w)−1.
Two standard criterionsc−optimality: given c ∈ Rm,
min
{cT M(w)−1c : w ∈ Rs
+,∑
i
wi = 1
}
Two common criterions
Definition: Information matrixM(w) :=
∑si=1 wiAT
i Ai is the information matrix of w .
Common approach: Minimize a scalar function ofM(w)−1.
Two standard criterionsc−optimality: given c ∈ Rm,
min
{cT M(w)−1c : w ∈ Rs
+,∑
i
wi = 1
}
relevant when we want to estimate cTθ
Two common criterions
Definition: Information matrixM(w) :=
∑si=1 wiAT
i Ai is the information matrix of w .
Common approach: Minimize a scalar function ofM(w)−1.
Two standard criterionsc−optimality: given c ∈ Rm,
min
{cT M(w)−1c : w ∈ Rs
+,∑
i
wi = 1
}
A-optimality:
min
{trace M(w)−1 : w ∈ Rs
+,∑
i
wi = 1
}
Geometric interpretation
Given an eigenvalue decomposition
M(w) =∑
λiu iuTi ,
the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√
λi.
1√λ1
u1
u2
1√λ2
θ
Geometric interpretation
Given an eigenvalue decomposition
M(w) =∑
λiu iuTi ,
the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√
λi.
1√λ1
u1
u2
1√λ2
θΦA(w)
A−optimal design:
min1λ1
+1λ2
minimize the diagonalof the bounding box.
Geometric interpretation
Given an eigenvalue decomposition
M(w) =∑
λiu iuTi ,
the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√
λi.
1√λ1
u1
u2
1√λ2
θ
c
Φc(w)c-optimal design:
minimize shadowalong vector c.
Geometric interpretation
Given an eigenvalue decomposition
M(w) =∑
λiu iuTi ,
the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√
λi.
1√λ1
u1
u2
1√λ2
θ
ΦD(w)
D−optimal design:
maxλ1λ2
minimize the volume.
Geometric interpretation
Given an eigenvalue decomposition
M(w) =∑
λiu iuTi ,
the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√
λi.
1√λ1
u1
u2
1√λ2
θ
ΦE (w)
E−optimal design:
maxλ1
minimize the largestsemi-axis.
Summary
Given observation matrices A1, . . . ,As, solve
minw
Φ
(s∑
i=1
wiATi Ai
)
s. t. w ≥ 0,s∑
i=1
wi = 1.
Φ is a function mapping X ∈ S+m to R, that is
nonincreasing with respect to Löwner ordering �, e.g.
Φ(X ) = trace X−1 (A−optimality)
Φ(X ) = cT X−1c (c−optimality)
Outline
1 Optimal Design of Experiments
2 Application to Network Monitoring
3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions
Optimal Network Monitoring
We want to monitor certainquantities on a network, e.g.
volume of the flowsperformance indicatornature of the traffic
General ProblemHow many monitors should we place on eachNode/Edge/OD Pair of the network ?
example: OD-flows monitoring
We place a monitor on the blueedge:=⇒ we collect information about4 OD flows:
example: OD-flows monitoring
We place a monitor on the blueedge:=⇒ we collect information about4 OD flows:
1→ 41→ 52→ 42→ 5
example: OD-flows monitoring (continued)
A2→4 =
1→
2
1→
4
1→
5
· · ·
2→
4
2→
5
· · ·
0 1 0 0 0 0 00 0 1 0 0 0 00 0 0 0 1 0 00 0 0 0 0 1 0
In some applications, monitors can’t determine the sourceof the flows:
A2→4 =1→
2
1→
4
1→
5
· · ·
2→
4
2→
5
· · ·( )0 1 0 0 1 0 00 0 1 0 0 1 0
BAG Project: Truck Toll in Germany
A Huge SystemIntroduced in 2005Tax based on: distance and truck category(avg price: 0,17 Euros/km)Total revenue around 3 Billion Euros / year
Control ForcesAutomatic: 300 toll checker gantries (Kontrollbrücke)Manual: 300 vehicles – 540 officiers from BAG(Federal ofOice of Freight)
Complexity of BAG’s managementSchedule of each officerChoose where and when to control
TODO: show NRW and Berlin, CPU time, what aboutwhole Germany ?
Outline
1 Optimal Design of Experiments
2 Application to Network Monitoring
3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions
Outline
1 Optimal Design of Experiments
2 Application to Network Monitoring
3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions
Wynn-Fedorov exchange algorithm
min
{trace M(w)−1 : w ∈ Rs
+,∑
i
wi = 1, M(w) =∑
i
wiATi Ai
}
Algorithm (Wynn 70, Fedorov 72)This is a feasible descent algorithm:
At step k , find the atomic design eik such that thedirectional derivative is maximum.
ik ← argmaxi∈[s] ‖M(w (k))−1Ai‖F
For a step of length αk , move in the direction of eik :
w (k+1) ← (1− αk )w (k) + αkeik .
Titterington multiplicative algorithm
min
{trace M(w)−1 : w ∈ Rs
+,∑
i
wi = 1, M(w) =∑
i
wiATi Ai
}
Algorithm (Titterington 76)At step k ,
For all i ∈ [s], compute the directional derivative
di ← ‖M(w (k))−1Ai‖F
For a parameter λ, make the multiplicative updates:
w (k+1)i ← w (k)
idλ
i
Γ,
where Γ is a normalizing constant.
Outline
1 Optimal Design of Experiments
2 Application to Network Monitoring
3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions
Semidefinite Programming
min
{trace M(w)−1 : w ∈ Rs
+,∑
i
wi = 1, M(w) =∑
i
wiATi Ai
}
A-optimality SDP (Vandenberghe,Boyd,Wu 98)
minw , Y∈Sm
trace Y
s. t.(
M(w) II Y
)� 0,
s∑i=1
wi = 1, ∀ i ∈ [s],wi ≥ 0.
Elfving’s Theorem
In presence of scalar observations (Ai = aiT ), we have:
Theorem [Elfving,53]The design w is c−optimal iff there exists t > 0 such that
tc =∑
k
wkak ∈ ∂(
conv{±ai , i = 1, . . . , s}).
maxλ,t
t
s.t. tc =∑
k
λiai∑k
|λk |︸︷︷︸wk
≤ 1.
a1
a2
−a1
c
−a2
a4
−a3
θ2
t∗c = 34 a3+14 (−a4)
a3
θ1
−a4
General case
Theorem [DHL09], [Sag09,10]The design w is c-optimal iff there exists t > 0 and unitvectors εk such that
tc =∑
k
wkATk εk ∈ ∂
(conv{AT
i ε, i = 1..., s, ‖ε‖ ≤ 1}).
a11
a12
a21
a31
a22a32
a33
a41
E3
E2
E1
E4
t∗c
c
x1x3
An illustration in 3D
x
y
z
a2a3
a1(1)
a1(2)
A1 =
x y z( )1 0 00 2 0
A2 =(
0 0.5 2)
A3 =(
0 -0.5 2)
An illustration in 3D
x
y
z
a2a3
a1(1)
a1(2)
A1 =
x y z( )1 0 00 2 0
A2 =(
0 0.5 2)
A3 =(
0 -0.5 2)
An illustration in 3D
x
y
z
a2a3
a1(1)
a1(2)
A1 =
x y z( )1 0 00 2 0
A2 =(
0 0.5 2)
A3 =(
0 -0.5 2)
An illustration in 3D
x
y
z
a2a3
a1(1)
a1(2)
A1 =
x y z( )1 0 00 2 0
A2 =(
0 0.5 2)
A3 =(
0 -0.5 2)
The uppermost half of the generalized Elfving set
An illustration in 3D
xy
z
a2
a3
a1(1)
a1(2)
c A1 =
x y z( )1 0 00 2 0
A2 =(
0 0.5 2)
A3 =(
0 -0.5 2)
If we want to estimate (x + 2z) −→ Barycenter of a1(1), a2 and a3
An illustration in 3D
x
y
z
a2a3
a1(1)
a1(2)
x1
c A1 =
x y z( )1 0 00 2 0
A2 =(
0 0.5 2)
A3 =(
0 -0.5 2)
If we want to estimate (x + y + z) −→ Barycenter of x1 and a2
SOCP formulations
Corollary: SOCP for c−optimality [Sag09,10]If µ is the optimal dual variable associated to theconstraints of
maxz∈Rm
cT z
∀i ∈ [s], ‖Aiz‖ ≤ 1
Then, w := µ∗∑sk=1 µ
∗k
is a c−optimal design.
SOCP for A−optimality [Sag10]
maxU∈Rm×m
trace U
∀i ∈ [s], ‖AiU‖F ≤ 1
SOCP formulations
Corollary: SOCP for c−optimality [Sag09,10]If µ is the optimal dual variable associated to theconstraints of
maxz∈Rm
cT z
∀i ∈ [s], ‖Aiz‖ ≤ 1
Then, w := µ∗∑sk=1 µ
∗k
is a c−optimal design.
SOCP for A−optimality [Sag10]
maxU∈Rm×m
trace U
∀i ∈ [s], ‖AiU‖F ≤ 1
Outline
1 Optimal Design of Experiments
2 Application to Network Monitoring
3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions
The End
Thank you for your attention