data-driven optimization: contexts,

Contexts, opportunities, challengesCommon fundamental aspects

An illustrationConcluding remarks

Data-Driven Optimization: Contexts,Opportunities and Challenges1

Patrick Jaillet

Department of Electrical Engineering and Computer ScienceLaboratory for Information and Decision Systems

Operations Research CenterMassachusetts Institute of Technology

August 30, 2010

1Acknowledgements: Research partly funded by NSF, ONR, AFOSR, and/or Singapore. Presentation

benefited from input by Prof. Emilio Fraziolli, Dr. Pavithra Harsha, Prof. Daleh Munzer, Dr. Ketan Savla, MIT.

Jaillet, MIT DDDAS Workshop 2010



Outline

1 Contexts, opportunities, challengesRouting, mobility, spatial explorationsDynamic resource allocationsSmart grids

2 Common fundamental aspectsTechnical, methodological, and algorithmicOnline, real-time, and stochastic considerationsA proposed framework: Generalized online optimization

3 An illustrationOnline traveling salesman problems

4 Concluding remarks




Routing, mobility, spatial explorationsDynamic resource allocationsSmart grids

Routing, mobility, spatial explorations





urban mobility





Dynamic resource allocations - various contexts

Sponsored search auctions and online auctionsLoad balancing for content delivery networksDistributed caching problemsOn-demand video/movie requests





Dynamic resource allocations - sponsored search

"AdWords is Google’s flagship advertising product and main source of revenue ($21

billion in 2008). AdWords offers pay-per-click (PPC) advertising, and site-targeted

advertising for both text and banner ads ...."





Smarter grids





Smart grid: Challenges





Smart grid: Dynamic data-driven applications




Technical, methodological, and algorithmicOnline, Real-time, StochasticProposed framework

Common fundamental aspects of these data-drivenoptimization problems

Common vision-opportunity: Technological advances incomputing, communication, and in multi-purpose sensingcapabilities at any scales have the potential to radicallytransform key existing activities and processes withinsocietal, economical, governmental, and individualdomains, and, in some cases, allow the emergence of newones.Some of the key challenges: It is clear that the amount,diversity, and availability of data will only continue to grow.It is however much less clear how one can best “harness”such rich data growth, and transform it into useful andreliable information flow and decision technologies.





Fundamental methodological/algorithmic questions:

proposing appropriate mathematical frameworks forrigorously evaluating solutions for data-driven problemswith:

dynamic, incomplete, and uncertain input streams,time-sensitive objectives,short time requirements and capacity constraints for somedecisions

defining key canonical online and real-time optimizationproblems capturing the essence of these complexdata-driven problemsdesigning and rigorously analyzing algorithmic solutionstrategies for these canonical problems





Online concepts

Online optimization problem = instance incrementallyrevealed over time; need to make decisions “online”without knowledge about what comes next.Online algorithm:

A measure of “quality” of an online algorithm is itscompetitive ratio c. For a minimization problem:

c = inf

r | Costonline(I)Costoffline(I)

≤ r , ∀ instances I

(we also say that the algorithm is c-competitive.)It is said to be best possible if there is no other suchalgorithm with a smaller competitive ratio⇒ it thenbecomes the competitive ratio of the problem itself.





Real-time considerations

Online algorithms may include real-time considerations:not only do we have to make decisions online, but some ofthese decisions must be done “quickly":

the difficulty is to properly define what this means -depends on problem-specific settings (few minutes may befast in some settings but excruciatingly slow in others).in addition, what could be considered real-time or not willalso clearly depend on (i) the complexity of themathematical model to be solved locally, and on (ii) theimportance of the resulting decision.

Understanding these trade-offs with proper scalings(linking “time” and “complexity” of the local mathematicalproblem to be solved) is part of the challenge.





Stochastic information

Sometimes we have reasons to think that the uncertaintyabout the future input streams can in fact be modeled withthe help of probability.For example, we may be confident from past observationsthat the input to an online problem comes from a givenprobability distribution, or a given family of distribution.How to properly include this “stochastic information” inorder to design better algorithms? Can this be quantified?





Putting everything together: A framework

Keep the basic online framework and competitive ratioanalysis of classical online optimization.Add restrictions on real-time decision making:

polynomial-time online algorithms;time bounds on some of the decisions.

Include stochastic information as follow:for a given known probability distribution π on the set of allpossible input instances, an online algorithm will becπ-competitive if:

cπ = inf

r | Ew∈π

[Costonline(w)

Costoptimal(w)

]≤ r

cπ will be called the π-competitive ratio of the onlinealgorithm. An online algorithm is said to be π-best possibleif there does not exist another online algorithm with astrictly smaller π-competitive ratio.




Online traveling salesman problems

Online TSP

The (classical) Traveling Salesman Problem (TSP)The TSP with release dates:

Each city needs to be visited after a specified date (itsrelease date).Objective: find a tour that requires minimum total time.

Online TSP:The problem instance is not known a priori; rather newcities are revealed over time (usually at their release dates).The underlying offline problem is the TSP with releasetimes.





The Online TSP on Metric Spaces

The number of cities n is not known to the online algorithm.City locations are located in a metric spaceM: li ∈M fori = 1, . . . ,n.Cities are revealed at their release date ri ≥ 0; assumer1 ≤ r2 · · · ≤ rn.The salesman travels at unit speed or is idle.The problem begins at time 0, and the salesman is initiallyat the origin.The salesman’s objective is to minimize the time to serveall revealed cities.





The Generalized Plan-At-Home (GPAH) algorithm

Given a ρ-approximation algorithm for the TSP, GPAH algorithmworks as follows:

1 Whenever at the origin o, the salesman starts a newρ-approximation tour through all currently known andunserved cities.

2 At any time t when a new city appears at location l , theaction depends on the current salesman’s location p:

If d(l ,o) > d(p,o), the salesman stops his current tour,goes back to the origin, and follows 1.If d(l ,o) ≤ d(p,o), the salesman continues the current tourand considers this request only when reaching the originnext.





GPAH Algorithm: illustration in R2+





Competitive Analysis of GPAH

Assume that we are given a ρ-approximation algorithm for theTSP, then:

Theorem

CGPAH(I) ≤ 2ρCOPT(I), ∀ I.





What about behavior of the algorithm "on average"?

Assume that locations are ∼ U[0,1]2 and release dates arePoisson Process with rate λ = 1:

0 20 40 60 80 1001

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

Simulation results

1 + 3n−1 + 0.1n−1/2





Almost Sure Asymptotic Optimality

TheoremIf locations are i.i.distributed from a general distribution withcompact support in a d-dimensional Euclidean space, and ifrelease dates are generated from a general renewal processwith finite inter-arrival rate, then

limn→∞

CPAH(n)

COPT(n)= 1

almost surely.





Online Resource Augmentation

Competing against an “almighty” offline adversary may beunfair. One can limit the power of the adversary, or strengthenthe online algorithm by increasing its resources (e.g. advancedinformation, speed augmentation, etc.)

Example 1: Assume that the online algorithm is given afixed amount of advanced notice for each request, a scaledfactor α ∈ [0,∞] (α = 0 if no advanced information), then asuitable version of PAH becomes

(2− α

1+α

)-competitive.

Example 2: Assume that the online algorithm has a serverwith speed γ ≥ 1, and that the offline algorithm has accessto a server with unit speed, then PAH becomes(1 + 1

γ )-competitive.





Other Variants

Similar results (online analysis, asymptotic results,polynomial-time restrictions, resource augmentation) can beobtained on the following generalizations:

The Online m-TSP.Online routing problems with capacity and precedenceconstraints.Other objectives: e.g., Online Traveling RepairmanProblems.Other related problems: Online Scheduling Problems.





Online TSP with Flexibility

Assume we include the possibility of rejecting customers (eitherdue to capacity constraints or for economical reasons, hopingfor a “better” future customers). Several options are possible fordefining an appropriate offline framework:

The prize-collecting salesman problem (PCTSP),The profitable tour problem (PTP),The orienteering problem (OP).





The Online TSP with Flexibility on Metric Spaces

A metric spaceM. A series of n requests (li , ri ,pi)1≤i≤nwhere li ∈M is the location, ri ∈ R+ is the release date,and pi ∈ R+ is the penalty for not serving request i .The problem begins at time 0; the server is initially at theorigin, and travels at unit speed (when not idle). Thenumber of cities n is not known to the online server. Citiesare revealed at their release dates ri ≥ 0 (r1 ≤ r2 · · · ≤ rn).Objective: minimize the sum of the time to visit all acceptedrequests plus the sum of all penalties of rejected requests.Two distinct versions of this problem:

Basic: request i can be accepted/rejected any time ≥ ri .Immediate: request i must be accepted/rejected at time ri .





Basic Version - The Algorithm

Pj = set of requests not yet visited by the onlineserver when its state first becomes j .Sk = be the set of accepted requests in an optimaloffline solution through the first k requests.

“Wait, Go and Ignore” (WGI) Algorithm:(0) Initial state is j := 0, P0 = ∅.(1) Server in state j remains at the origin until

time t so that k > j revealed requests,f (k , j) 6= −1, and t ≥ maxCk , ck,j, whereck,j = 2Ck − lf (k,j) − L(τk,j ) −

∑i>j,i /∈Sk

pi −∑

i∈Pj

pi .

Go to 2.

(2) Server fully completes one of two tours:- If ck,j < Ck , it follows route τk . Go to 3.- If ck,j ≥ Ck , it goes directly to f (k , j) andthen follows τk,j . Go to 3.

(3) Update state to j := k and Pj . Go to 1.

τk

f(k,j)

τk,j





Basic Version: A Best Possible Result

TheoremThe WGI algorithm is 2-competitive and thus best possible forthis problem

CorollaryThe WGI algorithm is also 2-competitive and best possible forthe online prize collecting TSP problem





Immediate Version for Accept/Reject Decisions

This version turns out to be much more demanding for anonline strategy. The following lower bounds have been proved:

2.5 on R+ (instead of 2 for the basic version).2.64 on R (instead of 2 for the basic version).and ... Ω(

√ln n) on general metric spacesM (again

instead of 2 for the basic version !!).Online algorithms with the following competitive ratios havebeen designed:

2.5 on R+ (so best possible).3 on R.asymptotical O(

√ln n)-competitive on any metric spaceM

(so best possible).





Related research

Other related research topics:Mixing online and competitive strategies.

Example: The online competitive m-TSP problemDesign and analysis of “intelligent” algorithms for problemswith

incomplete and uncertain datashort time restriction for making decisionsextremely large data sets




Summary

An introduction to data-driven optimization problems ofsimilar characteristicsA list of key fundamental questions(methodologies/algorithmic) for tackling these problemsA proposed rigorous framework to include online, real-time,and stochastic aspects of data-driven discrete optimizationproblemsAn illustration of possible results using the case of theTraveling Salesman Problem.


data-driven optimization: contexts,

Documents