![Page 1: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/1.jpg)
Blind online optimizationGradient descent without a gradient
Abie Flaxman CMU
Adam Tauman Kalai TTI
Brendan McMahan CMU
![Page 2: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/2.jpg)
Standard convex optimization
Convex feasible set S ½ <d
Concave function f : S ! <
}
Goal: find x
f(x) ¸ maxz2Sf(z) – = f(x*) -
x*Rd
![Page 3: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/3.jpg)
Steepest ascent
• Move in the direction of steepest ascent
• Compute f’(x) (rf(x) in higher dimensions)
• Works for convex optimization
(and many other problems)
x1 x2x3x4
![Page 4: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/4.jpg)
Typical application
• Company produces certain numbers of cars per month
• Vector x 2 <d (#Corollas, #Camrys, …)
• Profit of company is concave function of production vector
• Maximize total (eq. average) profit
PROBLEMS
![Page 5: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/5.jpg)
• Sequence of unknown concave functions
• period t: pick xt 2 S, find out only ft(xt)
• convex
Problem definition and results
Theorem:
![Page 6: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/6.jpg)
Online model
• Holds for arbitrary sequences
• Stronger than stochastic model:– f1, f2, …, i.i.d. from D
– x* = arg minx2S ED[f(x)]
expected
regret
![Page 7: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/7.jpg)
Outline
• Problem definition
• Simple algorithm
• Analysis sketch
• Variations
• Related work & applications
![Page 8: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/8.jpg)
First try
x1
f1(x1)
PR
OF
IT
#CAMRYSx2
f2(x2)
x3
f3(x3)
x4
f4(x4)
f1f2f3
f4
Zinkevich ’03:
If we could only compute gradients…
x*
![Page 9: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/9.jpg)
Idea: one point gradientP
RO
FIT
#CAMRYSxx+x-
With probability ½, estimate = f(x + )/
With probability ½, estimate = –f(x – )/
E[ estimate ] ¼ f’(x)
![Page 10: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/10.jpg)
d-dimensional online algorithm
S
x1
x2
x3
x4
![Page 11: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/11.jpg)
Outline
• Problem definition
• Simple algorithm
• Analysis sketch
• Variations
• Related work & applications
![Page 12: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/12.jpg)
Analysis ingredients
• E[1-point estimate] is gradient of
• is small
• Online gradient ascent analysis [Z03]
• Online expected gradient ascent analysis
• (Hidden complications)
![Page 13: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/13.jpg)
1-pt gradient analysisP
RO
FIT
#CAMRYSx+x-
![Page 14: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/14.jpg)
1-pt gradient analysis (d-dim)
• E[1-point estimate] is gradient of
• is small 2
•
• 1
![Page 15: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/15.jpg)
Online gradient ascent [Z03]
•
•
•
(concave,
bounded gradient)
![Page 16: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/16.jpg)
Expected gradient ascent analysis
• Regular deterministic gradient ascent on gt
(concave,
bounded gradient)
![Page 17: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/17.jpg)
Hidden complication…
S
![Page 18: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/18.jpg)
Hidden complication…
S
![Page 19: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/19.jpg)
Hidden complication…
S’
![Page 20: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/20.jpg)
Hidden complication…
Thin sets are bad
S
![Page 21: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/21.jpg)
Hidden complication…
Round sets are good
…reshape into
“isotropic position”
[LV03]
![Page 22: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/22.jpg)
Outline
• Problem definition
• Simple algorithm
• Analysis sketch
• Variations
• Related work & applications
![Page 23: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/23.jpg)
Variations
•
• Works against adaptive adversary– Chooses ft knowing x1, x2, …, xt-1
• Also works if we only get a noisy estimate of ft(xt), i.e. E[ht(xt)|xt]=ft(xt)
diameter
gradient
bound
![Page 24: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/24.jpg)
Finite difference
Related convex optimization
Sighted(see entire function(s))
Blind (evaluations only)
Regular(single f)
Stochastic(dist over f’s or
dist over errors)
Online(f1, f2, f3, …)
Gradient descent (stoch.)
Gradient descent, ... Ellipsoid, Random walk [BV02],
Sim. annealing [KV05],
Finite difference
Gradient descent (online)
[Z03]
1-pt. gradient appx. [BKM04]
Finite difference [Kleinberg04]
1-pt. gradient appx.
[G89,S97]
![Page 25: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/25.jpg)
2
2 3 5
2 3 5
2 5
2 3 5
Multi-armed bandit (experts)
1
0
0
0S
[R52,ACFS95,…]
![Page 26: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/26.jpg)
Driving to work (online routing)
Exponentially many paths…
Exponentially many slot machines?
Finite dimensions
Exploration/exploitation tradeoff
25
[TW02,KV02,
AK04,BM04]
S
![Page 27: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/27.jpg)
Online product design
![Page 28: Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU](https://reader038.vdocument.in/reader038/viewer/2022110115/55141efd550346ec488b5729/html5/thumbnails/28.jpg)
Conclusions and future work
• Can “learn” to optimize a sequence of unrelated functions from evaluations
• Answer to:“What is the sound of one hand clapping?”
• Applications– Cholesterol– Paper airplanes– Advertising
• Future work– Many players using same algorithm
(game theory)