algorithmic aspects of large scale optimal experimental...

41
Algorithmic aspects of large scale optimal experimental design Guillaume SAGNOL Zuse Institut Berlin (ZIB) February 8, 2012

Upload: others

Post on 17-Mar-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Algorithmic aspects of large scaleoptimal experimental design

Guillaume SAGNOL

Zuse Institut Berlin (ZIB)

February 8, 2012

Page 2: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Outline

1 Optimal Design of Experiments

2 Application to Network Monitoring

3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions

Page 3: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Outline

1 Optimal Design of Experiments

2 Application to Network Monitoring

3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions

Page 4: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Optimal Design of Experiments

We want to estimate the vector of parametersθ = [θ1, . . . , θm]T .

We have s experiments available:

y1 = A1θ + ε1 y4 = A4θ + ε4

y2 = A2θ + ε2 ...

y3 = A3θ + ε3 ys = Asθ + εs

GOAL:1 Choose how many times to perform each experiment.2 Continuous relaxation on the unit s-simplex: distribute the

experimental effort wi (with∑s

i=1 wi = 1).

Page 5: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Optimal Design of Experiments

We want to estimate the vector of parametersθ = [θ1, . . . , θm]T .We have s experiments available:

y1 = A1θ + ε1 y4 = A4θ + ε4

y2 = A2θ + ε2 ...

y3 = A3θ + ε3 ys = Asθ + εs

GOAL:1 Choose how many times to perform each experiment.2 Continuous relaxation on the unit s-simplex: distribute the

experimental effort wi (with∑s

i=1 wi = 1).

Page 6: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Optimal Design of Experiments

We want to estimate the vector of parametersθ = [θ1, . . . , θm]T .We have s experiments available:

y1 = A1θ + ε1 y4 = A4θ + ε4

y2 = A2θ + ε2 ...

y3 = A3θ + ε3 ys = Asθ + εs

GOAL:1 Choose how many times to perform each experiment.2 Continuous relaxation on the unit s-simplex: distribute the

experimental effort wi (with∑s

i=1 wi = 1).

Page 7: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

minimizing the variance of the best estimator

Under classical assumptions(e.g. noises have unit variance and are independent),

Theorem [Gauss-Markov]

For every linear unbiased estimator θ̂ of the unknownparameter θ, we have:

Var[θ̂] �

(s∑

i=1

wiATi Ai

)−1

.

Moreover this lower bound is attained by the estimatorfrom least square theory.

Page 8: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Two common criterions

Definition: Information matrixM(w) :=

∑si=1 wiAT

i Ai is the information matrix of w .

Common approach: Minimize a scalar function ofM(w)−1.

Two standard criterionsc−optimality: given c ∈ Rm,

min

{cT M(w)−1c : w ∈ Rs

+,∑

i

wi = 1

}

Page 9: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Two common criterions

Definition: Information matrixM(w) :=

∑si=1 wiAT

i Ai is the information matrix of w .

Common approach: Minimize a scalar function ofM(w)−1.

Two standard criterionsc−optimality: given c ∈ Rm,

min

{cT M(w)−1c : w ∈ Rs

+,∑

i

wi = 1

}

relevant when we want to estimate cTθ

Page 10: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Two common criterions

Definition: Information matrixM(w) :=

∑si=1 wiAT

i Ai is the information matrix of w .

Common approach: Minimize a scalar function ofM(w)−1.

Two standard criterionsc−optimality: given c ∈ Rm,

min

{cT M(w)−1c : w ∈ Rs

+,∑

i

wi = 1

}

A-optimality:

min

{trace M(w)−1 : w ∈ Rs

+,∑

i

wi = 1

}

Page 11: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Geometric interpretation

Given an eigenvalue decomposition

M(w) =∑

λiu iuTi ,

the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√

λi.

1√λ1

u1

u2

1√λ2

θ

Page 12: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Geometric interpretation

Given an eigenvalue decomposition

M(w) =∑

λiu iuTi ,

the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√

λi.

1√λ1

u1

u2

1√λ2

θΦA(w)

A−optimal design:

min1λ1

+1λ2

minimize the diagonalof the bounding box.

Page 13: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Geometric interpretation

Given an eigenvalue decomposition

M(w) =∑

λiu iuTi ,

the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√

λi.

1√λ1

u1

u2

1√λ2

θ

c

Φc(w)c-optimal design:

minimize shadowalong vector c.

Page 14: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Geometric interpretation

Given an eigenvalue decomposition

M(w) =∑

λiu iuTi ,

the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√

λi.

1√λ1

u1

u2

1√λ2

θ

ΦD(w)

D−optimal design:

maxλ1λ2

minimize the volume.

Page 15: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Geometric interpretation

Given an eigenvalue decomposition

M(w) =∑

λiu iuTi ,

the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√

λi.

1√λ1

u1

u2

1√λ2

θ

ΦE (w)

E−optimal design:

maxλ1

minimize the largestsemi-axis.

Page 16: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Summary

Given observation matrices A1, . . . ,As, solve

minw

Φ

(s∑

i=1

wiATi Ai

)

s. t. w ≥ 0,s∑

i=1

wi = 1.

Φ is a function mapping X ∈ S+m to R, that is

nonincreasing with respect to Löwner ordering �, e.g.

Φ(X ) = trace X−1 (A−optimality)

Φ(X ) = cT X−1c (c−optimality)

Page 17: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Outline

1 Optimal Design of Experiments

2 Application to Network Monitoring

3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions

Page 18: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Optimal Network Monitoring

We want to monitor certainquantities on a network, e.g.

volume of the flowsperformance indicatornature of the traffic

General ProblemHow many monitors should we place on eachNode/Edge/OD Pair of the network ?

Page 19: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

example: OD-flows monitoring

We place a monitor on the blueedge:=⇒ we collect information about4 OD flows:

Page 20: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

example: OD-flows monitoring

We place a monitor on the blueedge:=⇒ we collect information about4 OD flows:

1→ 41→ 52→ 42→ 5

Page 21: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

example: OD-flows monitoring (continued)

A2→4 =

1→

2

1→

4

1→

5

· · ·

2→

4

2→

5

· · ·

0 1 0 0 0 0 00 0 1 0 0 0 00 0 0 0 1 0 00 0 0 0 0 1 0

In some applications, monitors can’t determine the sourceof the flows:

A2→4 =1→

2

1→

4

1→

5

· · ·

2→

4

2→

5

· · ·( )0 1 0 0 1 0 00 0 1 0 0 1 0

Page 22: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

BAG Project: Truck Toll in Germany

A Huge SystemIntroduced in 2005Tax based on: distance and truck category(avg price: 0,17 Euros/km)Total revenue around 3 Billion Euros / year

Control ForcesAutomatic: 300 toll checker gantries (Kontrollbrücke)Manual: 300 vehicles – 540 officiers from BAG(Federal ofOice of Freight)

Complexity of BAG’s managementSchedule of each officerChoose where and when to control

Page 23: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

TODO: show NRW and Berlin, CPU time, what aboutwhole Germany ?

Page 24: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Outline

1 Optimal Design of Experiments

2 Application to Network Monitoring

3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions

Page 25: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Outline

1 Optimal Design of Experiments

2 Application to Network Monitoring

3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions

Page 26: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Wynn-Fedorov exchange algorithm

min

{trace M(w)−1 : w ∈ Rs

+,∑

i

wi = 1, M(w) =∑

i

wiATi Ai

}

Algorithm (Wynn 70, Fedorov 72)This is a feasible descent algorithm:

At step k , find the atomic design eik such that thedirectional derivative is maximum.

ik ← argmaxi∈[s] ‖M(w (k))−1Ai‖F

For a step of length αk , move in the direction of eik :

w (k+1) ← (1− αk )w (k) + αkeik .

Page 27: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Titterington multiplicative algorithm

min

{trace M(w)−1 : w ∈ Rs

+,∑

i

wi = 1, M(w) =∑

i

wiATi Ai

}

Algorithm (Titterington 76)At step k ,

For all i ∈ [s], compute the directional derivative

di ← ‖M(w (k))−1Ai‖F

For a parameter λ, make the multiplicative updates:

w (k+1)i ← w (k)

idλ

i

Γ,

where Γ is a normalizing constant.

Page 28: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Outline

1 Optimal Design of Experiments

2 Application to Network Monitoring

3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions

Page 29: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Semidefinite Programming

min

{trace M(w)−1 : w ∈ Rs

+,∑

i

wi = 1, M(w) =∑

i

wiATi Ai

}

A-optimality SDP (Vandenberghe,Boyd,Wu 98)

minw , Y∈Sm

trace Y

s. t.(

M(w) II Y

)� 0,

s∑i=1

wi = 1, ∀ i ∈ [s],wi ≥ 0.

Page 30: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Elfving’s Theorem

In presence of scalar observations (Ai = aiT ), we have:

Theorem [Elfving,53]The design w is c−optimal iff there exists t > 0 such that

tc =∑

k

wkak ∈ ∂(

conv{±ai , i = 1, . . . , s}).

maxλ,t

t

s.t. tc =∑

k

λiai∑k

|λk |︸︷︷︸wk

≤ 1.

a1

a2

−a1

c

−a2

a4

−a3

θ2

t∗c = 34 a3+14 (−a4)

a3

θ1

−a4

Page 31: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

General case

Theorem [DHL09], [Sag09,10]The design w is c-optimal iff there exists t > 0 and unitvectors εk such that

tc =∑

k

wkATk εk ∈ ∂

(conv{AT

i ε, i = 1..., s, ‖ε‖ ≤ 1}).

a11

a12

a21

a31

a22a32

a33

a41

E3

E2

E1

E4

t∗c

c

x1x3

Page 32: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

An illustration in 3D

x

y

z

a2a3

a1(1)

a1(2)

A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)

Page 33: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

An illustration in 3D

x

y

z

a2a3

a1(1)

a1(2)

A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)

Page 34: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

An illustration in 3D

x

y

z

a2a3

a1(1)

a1(2)

A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)

Page 35: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

An illustration in 3D

x

y

z

a2a3

a1(1)

a1(2)

A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)

The uppermost half of the generalized Elfving set

Page 36: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

An illustration in 3D

xy

z

a2

a3

a1(1)

a1(2)

c A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)

If we want to estimate (x + 2z) −→ Barycenter of a1(1), a2 and a3

Page 37: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

An illustration in 3D

x

y

z

a2a3

a1(1)

a1(2)

x1

c A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)

If we want to estimate (x + y + z) −→ Barycenter of x1 and a2

Page 38: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

SOCP formulations

Corollary: SOCP for c−optimality [Sag09,10]If µ is the optimal dual variable associated to theconstraints of

maxz∈Rm

cT z

∀i ∈ [s], ‖Aiz‖ ≤ 1

Then, w := µ∗∑sk=1 µ

∗k

is a c−optimal design.

SOCP for A−optimality [Sag10]

maxU∈Rm×m

trace U

∀i ∈ [s], ‖AiU‖F ≤ 1

Page 39: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

SOCP formulations

Corollary: SOCP for c−optimality [Sag09,10]If µ is the optimal dual variable associated to theconstraints of

maxz∈Rm

cT z

∀i ∈ [s], ‖Aiz‖ ≤ 1

Then, w := µ∗∑sk=1 µ

∗k

is a c−optimal design.

SOCP for A−optimality [Sag10]

maxU∈Rm×m

trace U

∀i ∈ [s], ‖AiU‖F ≤ 1

Page 40: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Outline

1 Optimal Design of Experiments

2 Application to Network Monitoring

3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions

Page 41: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

The End

Thank you for your attention