near-optimal approximate shortest paths and …2. broadcast congested clique model: „1 +...

Near-Optimal Approximate Shortest Paths and

Transshipment in Distributed and Streaming Models∗

Ruben Becker†

Sebastian Forster‡

Andreas Karrenbauer§

Christoph Lenzen¶

Abstract

We present a method for solving the transshipment problem—also known as uncapacitated

minimum cost ow—up to a multiplicative error of 1+Y in undirected graphs with non-negative

edge weights using a tailored gradient descent algorithm. Using 𝑂 (·) to hide polylogarithmic

factors in 𝑛 (the number of nodes in the graph), our gradient descent algorithm takes 𝑂 (Y−2)iterations, and in each iteration it solves an instance of the transshipment problem up to a

multiplicative error of polylog𝑛. In particular, this allows us to perform a single iteration

by computing a solution on a sparse spanner of logarithmic stretch. Using a randomized

rounding scheme, we can further extend the method to nding approximate solutions for the

single-source shortest paths (SSSP) problem. As a consequence, we improve upon prior works

by obtaining the following results:

1. Broadcast CONGEST model: (1 + Y)-approximate SSSP using 𝑂 ((√𝑛 + 𝐷)Y−3) rounds,

where 𝐷 is the (hop) diameter of the network.

2. Broadcast congested clique model: (1 + Y)-approximate transshipment and SSSP using

𝑂 (Y−2) rounds.3. Multipass streaming model: (1+Y)-approximate transshipment and SSSP using𝑂 (𝑛) space

and 𝑂 (Y−2) passes.The previously fastest SSSP algorithms for these models leverage sparse hop sets. We bypass

the hop set construction; computing a spanner is sucient with our method. The above

bounds assume non-negative edge weights that are polynomially bounded in 𝑛; for general non-

negative weights, there is an additional multiplicative overhead equal to the logarithm of the

maximum ratio between non-zero weights. Our algorithms can also handle asymmetric costs

for traversing edges in opposite directions. In this case, we obtain an additional multiplicative

dependence of the maximum ratio between the two costs on some edge.

∗Accepted to SIAM Journal on Computing. A preliminary version of this paper was presented at the 31st International

Symposium on Distributed Computing (DISC 2017).†Gran Sasso Science Institute, L’Aquila, Italy

‡Department of Computer Sciences, University of Salzburg, Austria; this author previously published under the name

Sebastian Krinninger

§Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany

¶Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany

1

arX

iv:1

607.

0512

7v4

[cs

.DS]

2 F

eb 2

021

Contents

1 Introduction 3

2 General Approach for Solving Transshipment and SSSP 72.1 Overview of our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 An Ecient Oracle for 𝑂 (log𝑛)-Approximate Solutions . . . . . . . . . . . . . . . 10

2.3 Potential Function for the Gradient Descent . . . . . . . . . . . . . . . . . . . . . 11

2.4 Generic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Single-Source Shortest Paths 213.1 Sampling a Tree Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Computing an Approximate Shortest-Path Tree . . . . . . . . . . . . . . . . . . . 28

4 Implementation in Distributed and Streaming Models 304.1 Broadcast Congested Clique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Broadcast CONGEST Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3 Multipass Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5 Deterministic Spanner Computation in Congested Clique and Multipass Stream-ing Model 40

References 42

2

1 Introduction

Single-source shortest paths (SSSP) is a fundamental and well-studied problem in computer science.

Thanks to sophisticated algorithms and data structures [FT87, Tho99, Han+17], it has been known

for a long time how to obtain (near-)optimal running time in the RAM model. This is not the case in

non-centralized models of computation, which become more and more relevant in a big-data world.

Despite progress for exact SSSP algorithms [Spe97, KS97, Coh97, BTZ98, SS99, Cen+19b, Le 16,

Elk20, GL18, FN18], there remain large gaps to the strongest known lower bounds. Close-to-optimal

running times have so far only been achieved by ecient approximation schemes [Ble+16, Coh00,

MS03, Mil+15, HKN16, EN16]. For instance, in the CONGEST model of distributed computing,

the state of the art is as follows: exact SSSP on weighted graphs can be computed in 𝑂 (√𝑛𝐷)1

or 𝑂 (√𝑛𝐷1/4 + 𝑛3/5 + 𝐷) rounds [FN18], where 𝐷 is the (hop) diameter of the graph, and (1 + Y)-

approximate SSSP can be computed in (√𝑛 + 𝐷) · 2𝑂 (

√log𝑛 log (Y−1 log𝑛))

rounds [HKN16].2Even for

constant Y, the latter exceeds the strongest known lower bound of Ω(√︁𝑛/log𝑛 +𝐷) rounds [Elk06]

by a super-polylogarithmic factor. The techniques developed in the present paper allow us to make

a qualitative algorithmic improvement for (1 + Y)-approximate SSSP in this model: we solve the

problem in 𝑂 ((√𝑛 + 𝐷) · Y−2) rounds. We thus narrow the gap between upper and lower bound

signicantly and additionally improve the dependence on Y. Our new approach achieves its superior

running time by leveraging techniques from continuous optimization.

Our approach is based on tackling a problem that is more general than SSSP. In the transshipmentproblem, we seek to nd a cheapest routing for sending units of a single good from sources to

sinks along the edges of a graph meeting the nodes’ demands. Equivalently, we want to nd the

minimum-cost ow in a graph where edges have unlimited capacity. The special case of SSSP

can be modeled as a transshipment problem by setting the demand of the source to −𝑛 + 1 (thussupplying −𝑛 + 1 units) and the demand of every other node to 1. Unfortunately, this relation breaks

when we consider approximation schemes: A (1 + Y)-approximate solution to the transshipment

problem merely yields (1 + Y)-approximations to the distances on average. In the special case of

SSSP, however, one is interested in obtaining a (1 + Y)-approximation to the distance for each singlenode and we show how to extend our algorithm to provide such a guarantee as well.

Techniques from continuous optimization have been key to recent breakthroughs in the combi-

natorial realm of graph algorithms [DS08, Chr+11, She13, Kel

+14, Mąd13, LS14, Coh

+17]. In this

paper, we apply this paradigm to computing (1 + Y)-approximate primal and dual solutions to the

transshipment problem in undirected graphs with non-negative edge weights. Accordingly, we

perform projected gradient descent for a suitable norm-minimization formulation of the problem,

where we approximate the innity norm by a dierentiable soft-max function. To make this general

approach work in our setting, we need to add signicant problem-specic tweaks. In particular, we

develop a gradient descent algorithm that reduces the problem of computing a (1+Y)-approximation

to the more relaxed problem of computing a crude (e.g. 𝑂 (log𝑛) factor) approximation. We then

exploit that an𝑂 (log𝑛)-approximation can be computed very eciently by solving the problem on

a sparse spanner, and that it is well-known how to compute sparse spanners eciently. To obtain

the aforementioned per-node guarantee in the approximate SSSP problem, we provide an intricate

randomized rounding scheme that yields an approximate shortest-path tree (i.e., a primal solution)

1Throughout this paper we use 𝑂 (·)-notation to hide polylogarithmic factors in 𝑛.

2Note that these running times refer to weighted graphs. In unweighted graphs, the SSSP problem can easily be

solved in 𝑂 (𝐷) rounds by performing a BFS tree computation.

3

in addition to the dual solution (i.e., estimated distances to the source).

Our method is widely applicable among a plurality of non-centralized models of computation in

rather straightforward ways. We obtain the rst non-trivial algorithms for approximate undirected

transshipment in the Broadcast CONGEST, Broadcast Congested Clique, and Multipass Streaming

models. As a further, arguably more important, consequence, we improve upon prior results for

computing approximate SSSP in these models. Our approximate SSSP algorithms are the rst to be

provably optimal up to polylogarithmic factors.

Our Contributions and Results We summarize our technical and conceptual contributions

as follows:

(C1) We give a problem-specic gradient descent algorithm for approximating the transshipment

problem, which requires access to an oracle computing an 𝛼-approximate dual solution for

any given demand vector.3To compute a (1 + Y)-approximation, the algorithm performs

𝑂 (Y−2𝛼2) oracle calls. If the oracle returns primal solutions, so does our algorithm.

(C2) We provide a randomized rounding scheme that, given the output of the gradient descent

algorithm for an SSSP problem, can be used to obtain distance estimates that satisfy a per-node

approximation guarantee.

(C3) We observe that spanners can be used to obtain an ecient transshipment oracle with

approximation guarantee 𝛼 ∈ 𝑂 (log𝑛).

By implementing our method in specic models of computation, we obtain the following algorithmic

results in graphs with non-negative polynomially bounded4edge weights:

(R1) We give faster Las Vegas algorithms5for computing (1 + Y)-approximate SSSP:

(a) Broadcast CONGEST model: We obtain an algorithm for computing (1 + Y)-approximate

SSSP in𝑂 ((√𝑛 +𝐷) · Y−3) rounds. This improves upon the previous best upper bound of

(√𝑛 +𝐷) · 2𝑂 (


rounds [HKN16]. For Y−1 ∈ 𝑂 (polylog𝑛), we match, up

to polylogarithmic factors in 𝑛, the lower bound of Ω̃(√𝑛 + 𝐷) [Elk06, Das+12], which

applies to any (poly𝑛)-approximation of the distance between two xed nodes in a

weighted undirected graph. If a Monte-Carlo guarantee suces, we obtain a slightly

better guarantee of 𝑂 ((√𝑛 + 𝐷) · Y−3/2).

(b) Broadcast Congested Clique model: We obtain an algorithm for computing (1 + Y)-approximate SSSP using 𝑂 (Y−2) rounds. This improves upon the previous best upper

bound of 2𝑂 (√log𝑛 log (Y−1 log𝑛))

rounds [HKN16].

(c) Multipass Streaming model: We obtain an algorithm for computing (1 + Y)-approximate

SSSP using 𝑂 (Y−2) passes and 𝑂 (𝑛 log2 𝑛) space. This improves upon the previous

3Note that dual feasibility is crucial here. In particular, this rules out an oracle based on tree embeddings [Bar98,

Elk+08], as such trees might have stretch Ω(𝑛) on individual edges.

4For general non-negative weights, running times scale by a multiplicative factor of log𝑅, where 𝑅 is the maximum

ratio between non-zero edge weights.

5The conference version of this paper claimed deterministic bounds for computing approximate single-source

distances. However, this claim was supported with inaccurate arguments. The deterministic guarantee can still be

obtained for computing the 𝑠-𝑡 distance between two nodes.

4

best upper bound of (2 + 1/Y)𝑂 (√log𝑛 log log𝑛)

passes and 𝑂 (𝑛 log2 𝑛) space [EN16]. Bysetting Y small enough, we can compute distances up to the value log𝑛 exactly in

integer-weighted graphs using polylog𝑛 passes and 𝑂 (𝑛 log2 𝑛) space. Thus, up to

polylogarithmic factors in 𝑛, our result matches a lower bound of𝑛1+Ω (1/𝑝 )

poly𝑝space for

all algorithms that decide in 𝑝 passes if the distance between two xed nodes in an

unweighted undirected graph is at most 2(𝑝 + 1) for any 𝑝 = 𝑂 (log𝑛/log log𝑛) [GO16].

(R2) We give fast algorithms for computing (1 + Y)-approximate transshipments. No non-trivial

upper bounds were known in each of these models before.

(a) Broadcast CONGEST model: A deterministic algorithm using 𝑂 (𝑛Y−2) rounds.(b) Broadcast Congested Clique model: A deterministic algorithm running in 𝑂 (Y−2) rounds.(c) Multipass Streamingmodel: Adeterministic algorithmusing𝑂 (Y−2) passes and𝑂 (𝑛 log𝑛)

space.

In the case of SSSP, we can compute a (1 + Y)-approximation to the distance of the source to

every node together with a tree such that the length of the path from the source to any node is

within a factor of (1 + Y) of its true distance.In the case of the transshipment problem, we can (deterministically) return (1 + Y)-approximate

primal and dual solutions. We can further extend the results to bidirected graphs, where each

edge can be used in either direction with potentially asymmetric weights. Denoting by _ ≥ 1 the

maximum over all edges of the weight ratio between traversing the edge in dierent directions, our

algorithms give the same guarantees if the number of rounds or passes, respectively, is increased

by a factor of _2 log _.

Related Work on the Transshipment Problem Transshipment is a classic problem in

combinatorial optimization [KV00, Sch03]. The classic algorithms for directed graphs with non-

negative edge weights in the RAM model run in time 𝑂 (𝑛(𝑚 + 𝑛 log𝑛) log𝑛) [Orl93] and 𝑂 ((𝑚 +𝑛 log𝑛)𝐵) [EK72], respectively, where 𝐵 is the sum of the nodes’ demands (when they are given as

integers) and the term𝑚+𝑛 log𝑛 comes from SSSP computations. If the graph contains negative edge

weights, then these algorithms require an additional preprocessing step to compute SSSP in presence

of negative edge weights, for example in time 𝑂 (𝑚𝑛) using the Bellman-Ford algorithm [Bel58,

For56] or in time 𝑂 (𝑚√𝑛 log𝑁 ) using Goldberg’s algorithm [Gol95].

6The weakly polynomial

running time was rst improved to 𝑂 (𝑚3/2polylog𝑅) [DS08] and then to 𝑂 (𝑚

√𝑛 polylog𝑅) in

a recent breakthrough for minimum-cost ow [LS14], where 𝑅 is the maximum ratio between

edge weights. Recently, Sherman [She17] obtained a randomized algorithm for computing a

(1 + Y)-approximate transshipment in weighted undirected graphs in time 𝑂 (Y−2𝑚1+𝑜 (1) ) usinga preconditioning approach. Note that Sherman’s preconditioner gives a condition number of

𝑛𝑜 (1) and thus is not suitable for obtaining a polylogarithmic iteration count as in our approach.

We are not aware of any non-trivial algorithms for computing (approximate) transshipments in

non-centralized models of computation, such as distributed or streaming models.

Comparison to Hop Set Based SSSP Algorithms The state-of-the art SSSP algorithms

in the distributed CONGEST model follow the framework developed in [Nan14], where (1) the

6Goldberg’s running time bound holds for integer-weighted graphs with most negative weight −𝑁 .

5

problem of computing SSSP is reduced to an overlay network of size 𝑁 = 𝑂 (√𝑛) and (2) a sparse

hop set is constructed to speed up computing SSSP on the overlay network. An (ℎ, Y)-hop set

is a set of weighted edges that, when added to the original graph, provides sucient shortcuts

to approximate all pairwise distances using paths with only ℎ edges (“hops”). The algorithm by

Nanongkai et al. [HKN16] achieves an upper bound of (√𝑛 + 𝐷) · 2𝑂 (


on the

number of rounds by constructing an (ℎ, Y)-hop set of size 𝑂 (𝑁𝜌) where ℎ ≤ 2𝑂 (√log𝑛 log (Y−1 log𝑛))

and 𝜌 ≤ 2𝑂 (√log𝑛 log (Y−1 log𝑛))

. Elkin’s algorithm [Elk20], which takes 𝑂 (𝐷1/3(𝑛 log𝑛)2/3) rounds,uses an exact (𝑁 /𝜌, 0)-hop set of size 𝑂 (𝑁𝜌) similar to the one developed by Shi and Spencer for

the PRAM model [Spe97]. Elkin’s main technical contribution lies in showing how to compute

this hop set without constructing the overlay network explicitly. Roughly speaking, in all these

algorithms, both ℎ and 𝜌 enter the running time of the corresponding SSSP algorithms, in addition

to the time needed to construct the hop set.

The concept of hop sets has been introduced by Cohen in the context of PRAM algorithms for

approximate SSSP [Coh00]. The increased interest in hop sets and their applications in the last

years [Ber09, HKN18, Mil+15, HKN16, EN16] has culminated in the construction of (ℎ, Y)-hop sets

of size𝑂 (𝑛1+1

2𝑘+1 −1) for ℎ = 𝑂

(( 𝑘Y)𝑘

)[HP19, EN19]. Recent lower bounds by Abboud et al. [ABP18]

show that this trade-o is essentially tight: for any integer 𝑘, any construction of an (ℎ, Y)-hop set

of size ≤ 𝑛1+1

2𝑘−1−𝛿

(for 𝛿 > 0) must have ℎ = Ω(𝑐𝑘/Y𝑘+1), where 𝑐𝑘 is a constant depending only

on 𝑘 . This implies that the hop set based algorithms, as long as the factor 𝜌 has to be paid in the

running time for construction hop sets of size 𝑛𝜌 , will never be able to achieve a running time

comparable to our SSSP algorithm exclusively by nding better hop sets.

Further Related Work In the following we review further results on (approximate) SSSP in

the non-centralized models considered in this paper.

In the CONGEST model of distributed computing, SSSP on unweighted graphs can be computed

exactly in 𝑂 (𝐷) rounds by distributed breadth-rst search [Pel00]. For weighted graphs, the only

non-trivial algorithm known is the distributed version of Bellman-Ford [Bel58, For56], which uses

𝑂 (𝑛) rounds. In terms of approximation, Elkin [Elk06] showed that computing an 𝛼-approximate

SSSP tree requires Ω((𝑛/𝛼)1/2/log𝑛 + 𝐷) rounds. Together with many other lower bounds, this

was strengthened in [Das+12] by showing that computing a (poly𝑛)-approximation of the distance

between two xed nodes in aweighted undirected graph requiresΩ(√𝑛/log𝑛+𝐷) rounds. The lower

bounds were complemented by two SSSP algorithms: a randomized𝑂 (𝛼 log𝛼)-approximation using

𝑂 (𝑛1/2+1/𝛼 + 𝐷) rounds [LP13] and a randomized (1 + 𝑜 (1))-approximation using 𝑂 (𝑛1/2𝐷1/4 + 𝐷)rounds [Nan14]. Both results were improved upon in [HKN16] by a deterministic algorithm that

computes (1 + 𝑜 (1))-approximate SSSP in 𝑛1/2+𝑜 (1) + 𝐷1+𝑜 (1)rounds. A recent hop set construction

of Elkin and Neiman [EN16] improves the running time for computing shortest paths from multiple

sources.

TheCongested Cliquemodel [Lot+05] has seen increasing interest in the past years as it highlights

the aspect of limited bandwidth in distributed computing, yet excludes the possibility of explicit

bottlenecks (e.g., a bridge that would limit the ow of information between the parts of the graph it

connects to 𝑂 (log𝑛) bits per round in the CONGEST model). For weighted graphs, SSSP can again

be computed exactly in 𝑂 (𝑛) rounds. The rst approximation was given by Nanongkai [Nan14]

with a randomized algorithm for computing (1+𝑜 (1))-approximate SSSP in𝑂 (√𝑛) rounds. All-pairs

shortest paths in the Congested Clique model can be computed deterministically in 𝑂 (𝑛1/3) rounds

6

for an exact result and in 𝑂 (𝑛0.158) rounds for a (1 + 𝑜 (1))-approximation [Cen+19b]. The time for

computing all-pairs shortest paths exactly has subsequently been improved to𝑂 (𝑛0.2096) rounds [Le16]. The hop set construction of [HKN16] gives a deterministic algorithm for computing (1 + 𝑜 (1))-approximate SSSP in 𝑛𝑜 (1) rounds. A recent hop set construction of Elkin and Neiman [EN16]

improves the running time for computing shortest paths from multiple sources. We note that

the approaches of [HKN16] and [EN16] as well as our approach can actually operate in the more

restricted Broadcast Congested Clique and Broadcast CONGEST models, in which in each round,

each node sends the same message to all other nodes or all its neighbors, respectively. After the

preliminary version of our paper appeared, Censor-Hillel et al. [Cen+19a] presented a deterministic

algorithm that computes (1+ Y)-approximate multi-source shortest paths from up to𝑂 (√𝑛) sources

in 𝑂 (log2 𝑛/Y) rounds in the Congested Clique model.

In the Streaming model, two approaches were known for (approximate) SSSP before the algo-

rithm of [HKN16]. First, shortest paths up to distance 𝑑 can be computed using 𝑑 passes and 𝑂 (𝑛)space in unweighted graphs by breadth-rst search. Second, approximate shortest paths can be

computed by rst obtaining a sparse spanner and then computing distances on the spanner without

additional passes [Fei+05, EZ06, Fei

+08, Bas08, Elk11]. This leads, for example, to a randomized

(2𝑘 − 1)-approximate all-pairs shortest paths algorithm using 1 pass and 𝑂 (𝑛1+1/𝑘 ) space for anyinteger 𝑘 ≥ 2 in unweighted undirected graphs. In unweighted undirected graphs, the spanner

construction of [EZ06] can be used to compute (1 + 𝑜 (1))-approximate SSSP using 𝑂 (1) passesand 𝑂 (𝑛1+𝑜 (1) ) space. The hop set construction of [HKN16] gives a deterministic algorithm for

computing (1 + 𝑜 (1))-approximate SSSP in weighted undirected graphs using 𝑛𝑜 (1) passes and𝑛1+𝑜 (1) space. Using randomization, this was improved to 𝑛𝑜 (1) passes and𝑂 (𝑛 log2 𝑛) space by Elkinand Neiman [EN16]. These upper bounds are complemented by a lower bound of 𝑛1+Ω (1/𝑝)/(poly 𝑝)space for all algorithms that decide in 𝑝 passes if the distance between two xed nodes in an

unweighted undirected graph is at most 2(𝑝 + 1) for any 𝑝 = 𝑂 (log𝑛/log log𝑛) [GO16]. Notethat this lower bound in particular applies to all algorithms that provide 1 + Y approximations for

Y < 1/2(𝑝 + 1) in integer-weighted graphs, as this level of precision requires to compute shortest

paths for small enough distances exactly.

2 General Approach for Solving Transshipment and SSSP

We consider a bidirected graph 𝐺 = (𝑉 , 𝐸,𝑤) where 𝑉 is the set of 𝑛 nodes, 𝐸 is the set of directed

arcs, and 𝑤 assigns a positive integer weight 𝑤 (𝑢,𝑣) to every arc (𝑢, 𝑣) ∈ 𝐸. Being bidirected

means that for every arc (𝑢, 𝑣) ∈ 𝐸 there also exists a reverse arc (𝑣,𝑢) ∈ 𝐸. We can thus view

the arcs (𝑢, 𝑣) ∈ 𝐸 and (𝑣,𝑢) ∈ 𝐸 as two orientations of an undirected edge {𝑢, 𝑣} ∈ 𝐸, where𝐸 = {{𝑢, 𝑣} ∈

(𝑉2

)| (𝑢, 𝑣) ∈ 𝐸} and𝑚 := |𝐸 |. For every 𝑒 ∈ 𝐸 we call one of the orientations the

forward arc of weight𝑤+𝑒 and the other orientation the backward arc of weight𝑤−𝑒 . Since the choiceof names is arbitrary, let us, w.l.o.g., assume that𝑤+ ≥ 𝑤−. Dene _ := max

{𝑤+𝑒 /𝑤−𝑒 : 𝑒 ∈ 𝐸

}≥ 1

as the maximum ratio over all edges between the forward and the backward weight. Every weighted

undirected graph can be readily modeled as a weighted bidirected graph with _ = 1. As _ is easily

determined and only a single value, we assume that it is a globally known parameter in the following.

Furthermore, let𝑊 = diag(𝑤+,𝑤−) denote the 2𝑚 × 2𝑚 diagonal matrix containing the forward

and backward weights and let 𝑏 ∈ Z𝑛 be a vector of demands. W.l.o.g., we restrict to feasible and

7

non-trivial instances, i.e., 𝑏𝑇1 = 0 and 𝑏 ≠ 0.7

As we will now briey argue, it suces to formulate our algorithms for positive integer arc

weights from 1 to ‖𝑤 ‖∞. First, arbitrary positive arc weights can be reduced to positive integer

arc weights by replacing each arc weight 𝑤𝑒 by 𝑤 ′𝑒 = d 𝑤𝑒

Y𝑤min

e for 𝑤min = min𝑒∈𝐸{𝑤𝑒 }, whichconceptually corresponds to rounding up each arc weight to the next multiple of Y𝑤min and carrying

out the computation with weights equal to the corresponding multiplicities. This increases the

weight of any path by a factor of at most (1 + Y) and bounds the maximum arc weight ‖𝑤 ′‖∞ by

𝑂 (Y−1𝑅), where 𝑅 is the maximum ratio between non-zero arc weights in the original graph. As

Y−1 may be assumed to be polynomial in 𝑛 – otherwise exact algorithms will be faster than our

approximation algorithms – this bounds log ‖𝑤 ′‖∞ by 𝑂 (log𝑛 + log𝑅). Second, given positive

integer weights, we can deal with arc weight 0 by using weight𝑤 ′′𝑒 = 1+ (1+Y−1)𝑛𝑤 ′𝑒 for every arc 𝑒 ,assuming without loss of generality that Y−1 is integer (see, e.g., [FN18] for details). As this increasesthe maximum arc weight by a factor of 𝑂 (𝑛Y−1), also log ‖𝑤 ′′‖∞ is bounded by 𝑂 (log𝑛 + log𝑅).

A common approach to model the bidirected transshipment problem as a linear program con-

siders the node-arc-incidence matrix 𝐴 ∈ {−1, 0, 1}𝑛×2𝑚 of the bidirected graph.8For notational

convenience, we list the columns of the forward arcs rst, followed by the columns for the backward

arcs in the same order. The asymmetric transshipment problem can then be written as a primal/dual

pair of linear programs:

min{1𝑇𝑊𝑥 : 𝐴𝑥 = 𝑏, 𝑥 ≥ 0} = max{𝑏𝑇𝑦 : (𝑊 −1𝐴𝑇𝑦)max ≤ 1}, (1)

where for 𝑧 ∈ R𝑑 we denote by (𝑧)max := max{𝑧𝑖 : 𝑖 ∈ [𝑑]} the maximum entry of 𝑧. The primal

(left) program asks to “ship” the ow given by 𝑏 from sources (negative demand) to sinks (positive

demand) along the arcs of the graph, minimizing the cost of the ow, i.e.,

∑(𝑢,𝑣) ∈𝐸 𝑤 (𝑢,𝑣)𝑥𝑢𝑣 . Note

that if both 𝑥𝑢𝑣 > 0 and 𝑥𝑣𝑢 > 0, we can reduce both variables by min{𝑥𝑢𝑣, 𝑥𝑣𝑢} without changingwhether 𝐴𝑥 = 𝑏, but reducing the objective. Thus, an optimal solution sends ow only in one

direction over any edge.

The dual (right) program asks for node potentials 𝑦 such that for each arc (𝑢, 𝑣) ∈ 𝐸, 𝑦𝑣 − 𝑦𝑢 ≤𝑤 (𝑢,𝑣) , maximizing 𝑏𝑇𝑦. Note that, because 𝑏𝑇1 = 0, shifting the potential by 𝑟 · 1 for any 𝑟 ∈ Rdoes neither change 𝑏𝑇𝑦 nor 𝑦𝑢 − 𝑦𝑣 for any 𝑢, 𝑣 ∈ 𝑉 . The goal of the dual is thus to maximize the

dierences in potential of sources and sinks (weighted according to 𝑏), subject to the constraint that

the potential of 𝑢 does not exceed the potential of 𝑣 by more than𝑤 (𝑢,𝑣) for any neighbor 𝑣 of 𝑢.

In the special case of SSSP with source 𝑠 ∈ 𝑉 , we have that (i) 𝑏𝑠 = −𝑛+1 and 𝑏𝑣 = 1 for all 𝑣 ≠ 𝑠 ,

(ii) an optimal primal solution 𝑥∗ is given by routing, for each 𝑠 ≠ 𝑣 ∈ 𝑉 , one unit of ow along a

shortest path from 𝑠 to 𝑣 , and (iii) optimal potentials 𝑦∗ are given by setting 𝑦∗𝑣 to the distance from

𝑠 to 𝑣 .

2.1 Overview of our Approach

We will employ gradient descent on a suitable potential function to converge to a near-optimal

solution. If we can ensure large (relative) progress, few iterations of gradient descent suce, and we

can hope for the high parallelism that is crucial for good results in our target models of computation.

Naturally, we will also require the iterations to allow for an ecient implementation. We will use

7Here 1 denotes the all-ones vector and thus 𝑏𝑇1 = 0 simply means that the positive demands equal the negative

demands (i.e., the supplies).

8The incidence matrix 𝐴 of a directed graph contains a row for every node and a column for every arc and 𝐴𝑖 𝑗 is 1 if

the 𝑗-th arc enters the 𝑖-th node, −1 if it leaves the node, and 0 otherwise.

8

a sparse representation of the distance structure of the graph, which can be “kept available” to

determine approximate solutions to intermediate problems at any time.

A generic (unconstrained, adaptive step size, minimizing) gradient descent algorithm operates

as follows:

1. Pick a dierentiable potential function that reects the objective well and does not change

too quickly. A convex potential function is desirable, as it guarantees the existence of an

optimum solution and non-zero gradient at non-optimal solutions.

2. Find a reasonable starting solution (a poor approximation typically suces).

3. Proceed in iterations. In each iteration:

(a) Determine the gradient of the potential function at the current solution.

(b) Determine a direction for the update in which the gradient indicates that the potential

reduces quickly.

(c) Choose a large step size, under the constraint that the gradient does not change too

much along the way, and update the solution according to direction and step size.

(d) Check whether the solution is suciently close to the optimum. If yes, terminate and

return the current solution. Otherwise, proceed with the next iteration.

The key to minimizing the number of iterations is to nd good update directions, in which the

gradient is large and does not change too quickly. In contrast, the key to fast iterations is to

avoid too costly calculations for nding such update directions. The choice of the “right” potential

function is crucial for being able to achieve a good compromise between these conicting goals. It

should also be noted that an ecient test for termination is required. Fortunately, our potential

function will not only be convex, but being able to make signicant progress will be equivalent to

being far from the optimum; thus, the determined step size (which corresponds to the estimated

reduction of the potential) can be used as a simple tool to bound the approximation ratio of the

current solution.

Inconveniently, both the primal and dual program in (1) have constraints. There are various

techniques to handle this issue in gradient descent methods. We will rephrase the dual problem so

that a single equality constraint (dening an (𝑛 − 1)-dimensional ane hyperplane) remains. We

then perform an unconstrained gradient descent on the subspace given by the hyperplane, i.e., we

restrict to updates that do not aect this constraint.

This strategy requires to overcome a number of challenges, which we discuss before proceeding

to presenting the high-level algorithm. First, we show that a sparse spanner gives rise to an ecient

oracle yielding approximate solutions to transshipment problems in Section 2.2. This oracle provides

a suciently good starting solution and, more importantly, is our tool for determining good update

directions for the gradient descent steps. We then proceed to rephrasing the transshipment problem

and dening a suitable potential function in Section 2.3. We also analyze its gradient to establish the

key properties needed for showing that the update steps guarantee fast progress and, at termination,

a close-to-optimal solution. In Section 2.4, we state the pseudocode of the algorithm and prove its

correctness and give a bound on its number of iterations. The algorithm computes a dual solution

only. However, we show how to also obtain a primal solution from an additional call to the oracle,

under the condition that the oracle is capable of providing not only dual, but also primal solutions;

the oracle from Section 2.2 meets this criterion.

9

2.2 An Ecient Oracle for 𝑶 (log 𝒏)-Approximate Solutions

Our approach is based on guiding gradient descent using approximate solutions to intermediate

transshipment problems. To this end, an ecient way of obtaining such solutions is required. We

phrase this in terms of calling an oracle that provides dual solutions to (1). If a primal solution is

required, too, the oracle must also be able to provide primal solutions.

For the sake of concreteness, we specify how to obtain such an oracle already at this point. It

turns out that computing an exact solution on a sparse spanner is a good choice for all applications

of our method demonstrated in this article.

Denition 2.1 (Spanner). Given 𝐺 = (𝑉 , 𝐸,𝑤) and 𝛼 ≥ 1, an 𝛼-spanner of 𝐺 is a subgraph(𝑉 , 𝐸 ′,𝑤 |𝐸′), 𝐸 ′ ⊆ 𝐸, in which distances are at most by factor 𝛼 larger than in 𝐺 .

In other words, a spanner removes edges from 𝐺 while approximately preserving distances.

It is well-known that for every undirected graph we can eciently compute an 𝛼-spanner of size

𝑂 (𝑛 log𝑛) with 𝛼 = 𝑂 (log𝑛). For the sake of completeness, we discuss derandomized implementa-

tions of the Baswana-Sen spanner construction [BS07] suitable for the computational models under

consideration; this is deferred to Section 59.

In all cases, the sparse representation of the approximate distance structure of the graph can

be kept “available,” i.e., in the broadcast congested clique and broadcast congest models, we can

make a spanner globally known, and in the multipass streaming model we may keep a spanner

in memory, cf. Section 4. Note that, as the graph is bidirected with potentially asymmetric arc

weights, an undirected 𝛼-spanner construction cannot be directly applied to the bidirected graph𝐺 .

However, we can instead consider the symmetrized problem

min{1𝑇𝑊−𝑥 : 𝐴𝑥 = 𝑏, 𝑥 ≥ 0} = max{𝑏𝑇𝑦 : (𝑊 −1− 𝐴𝑇𝑦)max ≤ 1}, (2)

where𝑊− = diag(𝑤−,𝑤−).

Observation 2.2. Feasible primal and dual solutions 𝑥 and 𝑦 of (2) are also feasible solutions of (1),where

(i) 1𝑇𝑊−𝑥 ≤ 1𝑇𝑊𝑥 ≤ _1𝑇𝑊−𝑥 , and

(ii) 1

_(𝑊 −1− 𝐴𝑇𝑦)max ≤ (𝑊 −1𝐴𝑇𝑦)max ≤ (𝑊 −1− 𝐴𝑇𝑦)max.

In particular, the objective value of 𝑥 increases by a factor between 1 and _ (while the objective valueof 𝑦 is unaected).

We will not make direct use of this observation beyond obtaining a starting solution, as em-

ploying the symmetric weights𝑊− streamlines our reasoning. However, this observation provides

the intuition that the asymmetry of weights reduces the quality of approximation provided by the

spanner by a factor of _ and therefore should result in replacing 𝛼 by 𝛼_ in time bounds; essentially,

this intuition turns out to be correct.

Given an 𝛼-spanner of the undirected graph 𝐺− = (𝑉 , 𝐸,𝑤−) and a demand vector 𝑏, it is

straightforward to compute an (𝛼_)-approximate primal-dual pair for (1). By this we mean feasible

𝑥 and 𝑦 satisfying that 𝛼_𝑏𝑇𝑦 ≥ 1𝑇𝑊𝑥 . As we will x the vector 𝑏 of the original problem in the

following sections, but need to solve intermediate problems for dierent demand vectors˜𝑏, the

lemma formalizing this statement is phrased accordingly.

9Note that while it is possible to construct spanners with𝑂 (𝑛) edges, the fastest known such algorithms for weighted

graphs are usually less ecient than those for constructing spanners with 𝑂 (𝑛 log𝑛) edges.

10

Lemma 2.3. Given an 𝛼-spanner 𝑆 of 𝐺− = (𝑉 , 𝐸,𝑤−) and any 0 ≠ ˜𝑏 ∈ R𝑛 with ˜𝑏𝑇1 = 0, an𝛼-approximate primal-dual pair of solutions to (2) with demands ˜𝑏 can be computed (without furtherknowledge of 𝐺).

Proof. Let 𝑥 and 𝑦 be optimal solutions to (1) on the spanner. Observe that any feasible primal

solution on 𝑆 , padded with 0-entries for edges not present in 𝑆 , is feasible on 𝐺 . We show that

𝛼−1𝑦, whose objective is by factor 𝛼 smaller than that of 𝑦, is a feasible dual solution on 𝐺 . As the

objectives of 𝑥 and 𝑦 on the spanner and the padded 𝑥 on 𝐺 are identical, this proves the claim.

Let {𝑢, 𝑣} = 𝑒 ∈ 𝐸 be arbitrary. By the denition of a spanner, there must be a path 𝑃𝑢𝑣 of

weight at most 𝛼𝑤−𝑒 from 𝑢 to 𝑣 in 𝑆 . We have

| (𝑊 −1− 𝐴𝑇𝑦) (𝑢,𝑣) | =|𝑦𝑣 − 𝑦𝑢 |𝑤−𝑒

≤∑{𝑢′,𝑣′ }=𝑒′∈𝑃𝑢𝑣 |𝑦𝑣′ − 𝑦𝑢′ |

𝑤−𝑒

=

∑{𝑢′,𝑣′ }=𝑒′∈𝑃𝑢𝑣 𝑤

−𝑒′ | ( (𝑤−𝑒′)−1𝐴𝑇𝑦) (𝑢′,𝑣′) |𝑤−𝑒

≤∑{𝑢′,𝑣′ }=𝑒′∈𝑃𝑢𝑣 𝑤

−𝑒′

𝑤−𝑒≤ 𝛼.

Thus, (𝑊 −1− 𝐴𝑇 (𝛼−1𝑦))max ≤ 1, i.e., 𝛼−1𝑦 is feasible on 𝐺 , completing the proof. �

As mentioned before, we need to restrict our updates to maintain a certain constraint. As we

will see in Section 2.3, this constraint is that update steps must be orthogonal to 𝑏. Thus, instead of

(2), we will consider the program

min{1𝑇𝑊−𝑥 : 𝐴𝑥 + 𝑧𝑏 = ˜𝑏, 𝑥 ≥ 0} = max{ ˜𝑏𝑇𝑦 : (𝑊 −1− 𝐴𝑇𝑦)max ≤ 1 ∧ 𝑏𝑇𝑦 = 0} (3)

where 𝑧 is a one-dimensional variable. The additional constraint of 𝑏𝑇𝑦 = 0 in the dual enforces

said orthogonality. It is reected in the primal program by relaxing the equality constraints to

𝐴𝑥 = ˜𝑏 − 𝑧𝑏, i.e., shifting the demands by an arbitrary multiple of 𝑏. Note that feasible primal

solutions of (3) on a spanner are still feasible on 𝐺 (after padding), and scaling dual solutions does

not aect whether 𝑏𝑇𝑦 = 0 or not. Hence the same arguments as before yield that an 𝛼-approximate

pair can be computed based on an 𝛼-spanner of 𝐺 .

Corollary 2.4. Given an 𝛼-spanner 𝑆 of 𝐺− = (𝑉 , 𝐸,𝑤−) and any 0 ≠ ˜𝑏 ∈ R𝑛 with ˜𝑏𝑇1 = 0, an𝛼-approximate primal-dual pair of solutions to (3) with demands ˜𝑏 can be computed (without furtherknowledge of 𝐺).

2.3 Potential Function for the Gradient Descent

In what follows, we consider 𝐺 and 𝑏 to be xed, and denote by 𝑦∗ an optimal solution of the dual

program, i.e., 𝑏𝑇𝑦∗ = max{𝑏𝑇𝑦 : (𝑊 −1𝐴𝑇𝑦)max ≤ 1}. We relate the dual program to another linear

program that normalizes the objective to 1 and seeks to minimize (𝑊 −1𝐴𝑇𝑦)max instead:

min{(𝑊 −1𝐴𝑇𝜋)max : 𝑏𝑇𝜋 = 1}. (4)

Let us denote by 𝜋∗ an optimal solution to this problem. There is a straightforward correspondence

between feasible solutions of (1) and feasible solutions of (4).

11

Lemma 2.5. Consider the map𝜓 on vertex potentials 𝜋 with (𝑊 −1𝐴𝑇𝜋)max > 0 dened by𝜓 (𝜋) :=𝜋

(𝑊 −1𝐴𝑇 𝜋 )max

and the map 𝜒 on vertex potentials 𝑦 with 𝑏𝑇𝑦 > 0 dened by 𝜒 (𝑦) := 𝑦

𝑏𝑇 𝑦.

1. If 𝜋 is a feasible solution of (4) with (𝑊 −1𝐴𝑇𝜋)max > 0, then𝜓 (𝜋) denes a feasible solution ofthe dual in (1). If 𝑦 is a feasible dual solution of (1) with 𝑏𝑇𝑦 > 0, then 𝜒 (𝑦) denes a feasiblesolution of (4).

2. The map𝜓 (·) preserves the approximation ratio. Namely, for any 𝛾 ≥ 1, if 𝜋 is a solution of (4)

within factor 𝛾 of the optimum, i.e., (𝑊 −1𝐴𝑇𝜋)max ≤ 𝛾 · (𝑊 −1𝐴𝑇𝜋∗)max, then𝜓 (𝜋) is feasiblefor (1) and within factor 𝛾 of the optimum, i.e., 𝑏𝑇𝜓 (𝜋) ≥ 𝛾−1𝑏𝑇𝑦∗.

Proof. 1. Let𝜋 be feasible for (4), i.e.,𝑏𝑇𝜋 = 1, with (𝑊 −1𝐴𝑇𝜋)max > 0. Then (𝑊 −1𝐴𝑇𝜓 (𝜋))max =

(𝑊 −1𝐴𝑇 𝜋

(𝑊 −1𝐴𝑇 𝜋 )max

)max = 1 and thus𝜓 (𝜋) is feasible for the dual in (1). Now, let𝑦 be feasiblefor the dual in (1), i.e., (𝑊 −1𝐴𝑇𝑦)max ≤ 1, and let 𝑦 satisfy 𝑏𝑇𝑦 > 0. Then 𝑏𝑇 𝜒 (𝑦) = 𝑏𝑇 𝑦

𝑏𝑇 𝑦= 1

and thus 𝜒 (𝑦) is feasible for (4).

2. Recall that 𝑏 ≠ 0 and 𝑏𝑇1 = 0. As𝑏

(𝑊 −1𝐴𝑇𝑏)max

is feasible for (1) and 𝑏𝑇𝑏 > 0, we have that

𝑏𝑇𝑦∗ > 0. Thus, 𝑦∗ ≠ 𝑟 · 1 for all 𝑟 ∈ R, i.e., there are 𝑢, 𝑣 ∈ 𝑉 so that 𝑦∗𝑢 ≠ 𝑦∗𝑣 . As 𝐺is connected, this entails that (𝑊 −1𝐴𝑇𝑦∗)max > 0. Hence, 𝜒 (𝑦∗) is feasible for (4) and has

positive objective, i.e., (𝑊 −1𝐴𝑇 𝜒 (𝑦∗))max =(𝑊 −1𝐴𝑇 𝑦∗)max

𝑏𝑇 𝑦∗> 0; in particular, we have that

(𝑊 −1𝐴𝑇𝜋∗)max > 0. Accordingly, if 𝜋 is a 𝛾-approximation of (4), (𝑊 −1𝐴𝑇𝜋)max > 0 and

𝑏𝑇𝜓 (𝜋) = 𝑏𝑇𝜋

(𝑊 −1𝐴𝑇𝜋)max

≥ 1

(𝑊 −1𝐴𝑇𝜋)max

≥ 1

𝛾 (𝑊 −1𝐴𝑇𝜋∗)max

≥ 1

𝛾 (𝑊 −1𝐴𝑇 𝜒 (𝑦∗))max

=𝑏𝑇𝑦∗

𝛾 (𝑊 −1𝐴𝑇𝑦∗)max

≥ 𝑏𝑇𝑦∗

𝛾. �

In other words, it is sucient to determine a (1 + Y)-approximation to (4) in order to obtain a

(1 + Y)-approximation to (1).

We have now translated the dual maximization problem of (1), which has inequality constraints,

into a minimization problem with a single equality constraint 𝑏𝑇𝜋 = 1. This would enable us

to employ projected gradient descent in the 𝑏𝑇𝜋 = 1 plane, if it were not for the fact that the

objective (𝑊 −1𝐴𝑇𝜋)max is not dierentiable. To overcome this issue, we use the standard approach

of “smoothing” the gradient by approximating the objective by a dierentiable function. For the

maximum value of a vector, a suitable candidate is given by the log-sum-exponent (or softmax)

function. For vectors 𝑧 ∈ R𝑑 , it is dened as

lse𝛽 (𝑧) :=1

𝛽ln

©«∑︁𝑖∈[𝑑 ]

𝑒𝛽𝑧𝑖ª®¬ ,

12

where the parameter 𝛽 > 0 determines the trade-o between (a) accuracy of approximation and (b)

“smoothness.” To clarify what we mean by (a), observe that

(𝑧)max ≤ lse𝛽 (𝑧) ≤1

𝛽ln

©«∑︁𝑖∈[𝑑 ]

𝑒𝛽 (𝑧)maxª®¬ =

1

𝛽ln

(𝑑𝑒𝛽 (𝑧)max

)= (𝑧)max +

ln𝑑

𝛽, (5)

because both ln(·) and 𝑒 ( ·) are increasing functions. To precise what we mean by (b), denote for

𝑧 ∈ R𝑑 by ‖𝑧‖1 =∑𝑑

𝑖=1 |𝑧𝑖 | its 1-norm and by ‖𝑧‖∞ = max{|𝑧𝑖 | : 𝑖 ∈ [𝑑]} its ∞-norm. Then the

1-norm of the gradient ∇ lse𝛽 (·) is 𝛽-Lipschitz continuous w.r.t. the∞-norm (see, e.g., [She13]), i.e.,

∀𝑧, 𝑧 ′ ∈ R𝑑 : ‖∇ lse𝛽 (𝑧) − ∇ lse𝛽 (𝑧 ′) ‖1 ≤ 𝛽 ‖𝑧 − 𝑧 ′‖∞. (6)

Recall that our goal is to nd 𝜋 ∈ R𝑛 that is a (1 + Y)-approximate feasible solution to (4).

Accordingly, we dene the potential function

Φ𝛽 (𝜋) := lse𝛽

(𝑊 −1𝐴𝑇𝜋

).

Note that Φ𝛽 (·) is convex for any 𝛽 , as it is constructed by composing lse𝛽 (·), which is convex

for any 𝛽 , with linear functions. We will vary 𝛽 over the course of the algorithm to balance the

requirement of a suciently accurate approximation of (𝑊 −1𝐴𝑇𝜋)max with the need for small 𝛽 ,

i.e., a smooth gradient and thus fast progress.

Concerning the approximation guarantee, our target of a (1 + Y)-approximation entails that the

potential must be a more accurate approximation to the objective. Accordingly, we will ensure that

Φ𝛽 (𝜋) ≤(1 + Y

4

)(𝑊 −1𝐴𝑇𝜋)max. (7)

By (5), 4 ln(2𝑚) ≤ Y𝛽 (𝑊 −1𝐴𝑇𝜋)max implies (7), which is an invariant the algorithm will maintain.

To understand the eect of 𝛽 on the progress of the gradient descent, we examine the gradient

of the potential function. As multiplications with𝑊 −1 and 𝐴𝑇are linear functions,

∇Φ𝛽 (𝜋) = 𝐴𝑊 −1∇ lse𝛽(𝑊 −1𝐴𝑇𝜋

). (8)

Recall that𝑊− = diag(𝑤−,𝑤−). From (6) and Hölder’s inequality,10we can derive a key property of

∇Φ𝛽 (·) that we will use for bounding the change of the potential for a given update step ℎ ∈ R𝑛that is due to the change of the gradient, namely

∀𝜋,ℎ ∈ R𝑛 :��∇Φ𝛽 (𝜋)𝑇ℎ − ∇Φ𝛽 (𝜋 − ℎ)𝑇ℎ

��(8)

=

��𝐴𝑊 −1∇ lse𝛽 (𝑊 −1𝐴𝑇𝜋

)𝑇ℎ −𝐴𝑊 −1∇ lse𝛽

(𝑊 −1𝐴𝑇 (𝜋 − ℎ)

)𝑇ℎ

��=

��(∇ lse𝛽 (𝑊 −1𝐴𝑇𝜋

)− ∇ lse𝛽

(𝑊 −1𝐴𝑇 (𝜋 − ℎ)

))𝑇𝑊 −1𝐴𝑇ℎ

��Hölder

≤ ∇ lse𝛽 (

𝑊 −1𝐴𝑇𝜋

)− ∇ lse𝛽

(𝑊 −1𝐴𝑇 (𝜋 − ℎ)

) 1

𝑊 −1𝐴𝑇ℎ ∞

(6)

≤ 𝛽 ‖𝑊 −1𝐴𝑇ℎ‖2∞= 𝛽 ‖𝑊 −1− 𝐴𝑇ℎ‖2∞= 𝛽 (𝑊 −1− 𝐴𝑇ℎ)2

max, (9)

10It holds that |𝑧𝑇 𝑧′ | ≤ ‖𝑧‖𝑝 · ‖𝑧′‖𝑞 for any 𝑝, 𝑞 ≥ 1 satisfying 𝑝−1 + 𝑞−1 = 1 (where∞−1 = 0).

13

where the last two steps use that (𝐴𝑇ℎ)𝑢𝑣 = −(𝐴𝑇ℎ)𝑣𝑢 for all 𝑢, 𝑣 ∈ 𝑉 and 𝑤+𝑒 ≥ 𝑤−𝑒 for all 𝑒 ∈ 𝐸.Intuitively, we decomposed the gradient into the 𝛽-Lipschitz continuous part given by the lse𝛽 (·)function and the derivative𝐴𝑊 −1 of the “inner” function. This means that the “length” of an update

step ℎ will be measured in terms of (𝑊 −1− 𝐴𝑇ℎ)max. Using this bound, for a given step direction ℎ

we can now determine the step size that maximizes the progress in direction of ℎ as a function of

∇Φ𝛽 (𝜋) (under the worst-case assumption that (9) is tight).

Lemma2.6 (Additive Decrement of𝚽𝜷 ). Suppose𝜋,ℎ ∈ R𝑛 satisfy∇Φ𝛽 (𝜋)𝑇ℎ > 0 and (𝑊 −1− 𝐴𝑇ℎ)max ≤1. Then, for 𝛿 := ∇Φ𝛽 (𝜋)𝑇ℎ it holds that

Φ𝛽

(𝜋 − 𝛿ℎ

2𝛽

)≤ Φ𝛽 (𝜋) −

𝛿2

4𝛽.

Proof. Let us denote ˜ℎ := 𝛿ℎ2𝛽. Recall thatΦ𝛽 (·) is convex and thusΦ𝛽 (𝜋) ≥ Φ𝛽 (𝜋− ˜ℎ)+∇Φ𝛽

(𝜋− ˜ℎ

)𝑇˜ℎ.

This gives

Φ𝛽 (𝜋 − ˜ℎ) − Φ𝛽 (𝜋) ≤ −∇Φ𝛽

(𝜋 − ˜ℎ

)𝑇˜ℎ + ∇Φ𝛽 (𝜋)𝑇 ˜ℎ − ∇Φ𝛽 (𝜋)𝑇 ˜ℎ

(9)

≤ 𝛽 (𝑊 −1− 𝐴𝑇 ˜ℎ)2max− ∇Φ𝛽 (𝜋)𝑇 ˜ℎ

=𝛿2(𝑊 −1− 𝐴𝑇ℎ)2

max

4𝛽− 𝛿

2

2𝛽

≤ 𝛿2

4𝛽− 𝛿

2

2𝛽= − 𝛿

2

4𝛽. �

As this lemma suggests, we will try to ensure large progress by making 𝛿 = ∇Φ𝛽 (𝜋)𝑇ℎ as large

as possible and 𝛽 as small as possible. Now observe that, for a xed 𝛽 , maximizing ∇Φ𝛽 (𝜋)𝑇ℎ under

the constraint (𝑊 −1− 𝐴𝑇ℎ)max ≤ 1 is another instance of the transshipment problem with demand

vector ∇Φ𝛽 (𝜋). Our algorithm will determine its step direction ℎ by computing an 𝛼-approximation

to this transshipment instance. Regarding the incentive of minimizing 𝛽 , note that progress with

respect to Φ𝛽 (·) is meaningless if it does not provide a suciently accurate approximation of the

true objective (𝑊 −1− 𝐴𝑇 (·))max, so we will increase 𝛽 only when it becomes necessary to ensure (7).

Bounding 𝛽 from above turns the additive progress guarantee into a relative one.

Corollary 2.7 (Multiplicative Decrement of 𝚽𝜷 ). Suppose 𝜋,ℎ ∈ R𝑛 satisfy ∇Φ𝛽 (𝜋)𝑇ℎ > 0,(𝑊 −1− 𝐴𝑇ℎ)max ≤ 1, and Y𝛽Φ𝛽 (𝜋) ≤ 10 ln(2𝑚). Then, for 𝛿 := ∇Φ𝛽 (𝜋)𝑇ℎ it holds that

Φ𝛽

(𝜋 − 𝛿ℎ

2𝛽

)≤

(1 − Y𝛿2

40 ln(2𝑚)

)Φ𝛽 (𝜋).

2.4 Generic Algorithm

We now present the pseudocode of the generic gradient descent algorithm for nding a (1 + Y)-approximate solution to (1). As mentioned before, the algorithm actually computes a (1 + Y)-approximate solution 𝜋 to (4) and then returns𝜓 (𝜋), cf. Lemma 2.5. The algorithm is generic, as

the performed computations need to be implemented in model-specic ways. This is discussed in

Section 4.

The code is given in Algorithm 1. The algorithm rst computes a starting solution. This can

be done by calling the oracle on the symmetrized problem (2), which by Observation 2.2 yields

14

Algorithm 1: gradient_transshipment (𝐺,𝑏, Y)1 compute (𝛼_)-approximation 𝜋 to (4) // use oracle, Obs. 2.2, and Lemma 2.52 set Y ′ := 1

3 repeat4 set Y ′← Y′

2

5 set 𝛽 :=8 ln(2𝑚)

Y′ (𝑊 −1𝐴𝑇 𝜋 )max

6 repeat7 compute gradient ∇Φ𝛽 (𝜋)8 compute 𝛼-approximate primal-dual pair ((𝑓 , 𝑧), ℎ) for min{1𝑇𝑊− ˜𝑓 : 𝐴 ˜𝑓 + 𝑧𝑏 =

∇Φ𝛽 (𝜋), ˜𝑓 ≥ 0} = max

{∇Φ𝛽 (𝜋)𝑇 ˜ℎ : (𝑊 −1− 𝐴𝑇 ˜ℎ)max ≤ 1 ∧ 𝑏𝑇 ˜ℎ = 0

}// use oracle with weights𝑤− and demands ˜𝑏 = ∇Φ𝛽 (𝜋) (Cor. 2.4)

9 set 𝛿 := ∇Φ𝛽 (𝜋)𝑇ℎ10 if 𝛿 > Y′

6𝛼_then 𝜋 ← 𝜋 − 𝛿ℎ

2𝛽

11 if 𝛽 <4 ln(2𝑚)


then 𝛽 :=8 ln(2𝑚)


// happens only if Y ′ = 1

2

12 until 𝛿 ≤ Y′

6𝛼_// current 𝜋 is a (1 + Y ′)-approximate solution

13 until Y ′ ≤ Y14 𝑥 :=𝑊 −1∇ lse𝛽


)15

( 𝑓 +𝑓 −

):= 𝑓 with 𝑓 +, 𝑓 − ∈ R𝑚

16 𝑓 ′ :=(𝑓 −𝑓 +

)17 𝑥 :=

�̃�+𝑓 ′𝑧

// Observation 2.918 𝑦 := 𝜓 (𝜋) // Lemma 2.519 return (𝑥,𝑦)

an (𝛼_)-approximation to (1), and then rescaling according to Lemma 2.5. As trying to enforce a

too precise approximation initially slows down progress, the algorithm initializes Y ′ as a constant(the choice of 1 is, neglecting technicalities, arbitrary). Then it starts an outer loop decreasing Y ′

exponentially. It sets 𝛽 to a suitable value and then starts the inner loop, which performs gradient

descent steps until the step size becomes too small to ensure fast progress, i.e., until 𝛿 ≤ Y′

6𝛼_2. As

in the rst iteration of the outer loop, the potential may decrease rapidly (we go from an (𝛼_)-approximation to a

1

2-approximation), Line 11 adjusts 𝛽 if necessary. As we will show, this implies

that at termination of the inner loop, the current solution is a (1 + Y ′)-approximation.

Each gradient descent step consists of the following steps:

1. Compute ∇Φ𝛽 (𝜋).

2. Compute an 𝛼-approximate solution ℎ to

max

{∇Φ𝛽 (𝜋)𝑇 ˜ℎ : (𝑊 −1− 𝐴𝑇 ˜ℎ)max ≤ 1 ∧ 𝑏𝑇 ˜ℎ = 0

},

i.e., of (3) with demands ∇Φ𝛽 (𝜋).11 This is done by calling the oracle (for the constrained

11As ∇Φ𝛽 (𝜋)𝑇1 = 0, this instance is feasible.

15

problem). This optimization problemmaximizes the rate of progress “in units of (𝑊 −1− 𝐴𝑇 ˜ℎ)max,”

which in turn determines the step size, see Corollary 2.7.

3. Determine the step size in accordance with Corollary 2.7 and check whether 𝛿 is small enough

to prove that the current 𝜋 is a (1 + Y ′)-approximation. If yes, the inner loop terminates; if

not, the step is executed.

4. Increase 𝛽 if it becomes too small. (This will only happen when Y ′ = 1

2).

We note that the dual oracle problems occurring for dierent gradients of the potential function in

the second step dier only in terms of the objective function. The constraints however are the same

in all iterations. This allows us to use the same spanner in all of these iterations as, in particular,

the edge weights involved are always the backward weights𝑤− of the input graph.

Returning a primal solution As specied so far, the algorithm relies on a dual oracle only

and also returns only a dual solution. For a primal solution, we require the oracle to provide

an 𝛼-approximate primal-dual pair to the constrained problem (3) (with demands ∇Φ𝛽 (𝜋)).12Consider the nal iteration of Algorithm 1. Denote by 𝜋 the nal normalized node potentials, set

𝑥 =𝑊 −1∇ lse𝛽(𝑊 −1𝐴𝑇𝜋

)(for the nal value of 𝛽) and 𝑦 = 𝜓 (𝜋). We will show how to compute 𝑥

such that (𝑥,𝑦) is a (1 + Y)-approximate primal-dual pair.

Observation 2.8. The vector 𝑥 =𝑊 −1∇ lse𝛽(𝑊 −1𝐴𝑇𝜋

)satises (a) 𝑥 ≥ 0, (b) 𝐴𝑥 = ∇Φ𝛽 (𝜋), and

(c) 1𝑇𝑊𝑥 = 1.

Proof. By (8), 𝐴𝑥 = ∇Φ𝛽 (𝜋) and thus (b) holds. For the other two properties, we set 𝑧 =𝑊 −1𝐴𝑇𝜋

and compute

∀𝑖 ∈ {1, . . . , 2𝑚} : 𝑥𝑖 =1

𝑤𝑖

∇ lse𝛽 (𝑧)𝑖 =1

𝛽𝑤𝑖

𝛽𝑒𝛽𝑧𝑖∑2𝑚𝑗=1 𝑒

𝛽𝑧 𝑗> 0,

hence (a) holds, and

1𝑇𝑊𝑥 = 1𝑇∇ lse𝛽 (𝑧) =1

𝛽

2𝑚∑︁𝑖=1

𝛽𝑒𝛽𝑧𝑖∑2𝑚𝑗=1 𝑒

𝛽𝑧 𝑗= 1,

showing (c). �

Conveniently, this 𝑥 is a byproduct of evaluating ∇Φ𝛽 (𝜋) in the nal iteration of the algorithm,

by storing the intermediate result before the nal multiplication with 𝐴, cf. (8).

Recall that in its nal iteration, the algorithm computes an 𝛼-approximate primal-dual pair

((𝑓 , 𝑧), ℎ) for (3) with demands˜𝑏 = ∇Φ𝛽 (𝜋), i.e., the program

min{1𝑇𝑊− ˜𝑓 : 𝐴 ˜𝑓 + 𝑧𝑏 = ∇Φ𝛽 (𝜋), ˜𝑓 ≥ 0}= max{∇Φ𝛽 (𝜋)𝑇 ˜ℎ : (𝑊 −1− 𝐴𝑇 ˜ℎ)max ≤ 1 ∧ 𝑏𝑇𝑦 = 0}

(where we use˜𝑓 and

˜ℎ to denote the variables for the sake of distinction from (1)). We can use 𝑥 as

dened above to cancel out the demands in the primal program and then rescale according to 𝑧.

12Actually, the primal solution is exclusively required on termination to generate a primal solution to (1).

16

Observation 2.9. For any given 𝛽 ∈ R>0, 𝜋 ∈ R𝑛 with 𝑏𝑇𝜋 = 1, primal solution (𝑓 , 𝑧) ∈ R2𝑚 × Rto (3) with ˜𝑏 = ∇Φ𝛽 (𝜋) satisfying 𝑧 > 0, and 𝑥 ∈ R2𝑚 satisfying 𝐴𝑥 = ∇Φ𝛽 (𝜋), write 𝑓 =

( 𝑓 +𝑓 −

)(with

𝑓 +, 𝑓 − ∈ R𝑚). Then a primal solution to (1) is given by

𝑥 :=𝑥 +

(𝑓 −𝑓 +

)𝑧

∈ R2𝑚 . (10)

Proof. As 𝑥 , 𝑓 , and 𝑧 are all positive, so is 𝑥 . 𝐴𝑥 = 𝑏 follows from observing that 𝐴(𝑓 −𝑓 +

)= −𝐴𝑓 =

−∇Φ𝛽 (𝜋) + 𝑧𝑏, as for each (undirected) edge we have a forward and a backward arc. �

The intuition regarding why this should be a “good” primal solution to (1) is as follows. Any

𝜋 ′ minimizing Φ𝛽 (·) must satisfy that ∇Φ𝛽 (𝜋 ′)𝑇ℎ = 0 for any ℎ with 𝑏𝑇ℎ = 0. Thus, ∇Φ𝛽 (𝜋 ′) isproportional to 𝑏. For a near-optimal solution 𝜋 we have that ∇Φ𝛽 (𝜋) is almost parallel to 𝑏, asthe gradient restricted to the 𝑏𝑇ℎ = 0 plane is small. Accordingly, the “cost” of the correction,

given by 1𝑇𝑊 𝑓 (which relates to 1𝑇𝑊− 𝑓 ), is small. The role of the scaling factor 𝑧 is to undo the

normalization of (1) we performed by transitioning to problem (4).

We remark that it also possible to drop the constraint 𝑏𝑇ℎ = 0 in the update problem at the

expense of an increase in the approximation ratio. To this end, we consider the projector 𝑃 := 𝐼−𝜋𝑏𝑇 ,which yields 𝑏𝑇𝑃ℎ = 0 for any ℎ ∈ R𝑛 . A primal-dual pair 𝑓 ′, ℎ′ of

max{ ˜𝑏𝑇𝑃ℎ : (𝑊 −1− 𝐴𝑇ℎ)max ≤ 1} = min{1𝑇𝑊− 𝑓 : 𝐴𝑓 = 𝑃𝑇 ˜𝑏, 𝑓 ≥ 0}

with 1𝑇𝑊− 𝑓 ′ ≤ 𝛼 ′ · ˜𝑏𝑇𝑃ℎ′ yields a primal-dual pair (𝑓 , 𝑧), ℎ for the more restricted update problem

with 𝑓 = 𝑓 ′, 𝑧 = ˜𝑏𝑇𝜋 , and ℎ = 𝑃ℎ′/(𝑊 −1− 𝐴𝑇𝑃ℎ′)max. Moreover,

1𝑇𝑊− 𝑓 = 1𝑇𝑊− 𝑓′ ≤ 𝛼 ′ · ˜𝑏𝑇𝑃ℎ′ = 𝛼 ′ · (𝑊 −1− 𝐴𝑇𝑃ℎ′)max · ˜𝑏𝑇ℎ.

Furthermore,

(𝑊 −1− 𝐴𝑇𝑃ℎ)max ≤ 1 + |𝑏𝑇ℎ | · (𝑊 −1− 𝐴𝑇𝜋)max ≤ 1 + 𝑏𝑇𝑦∗ · _ · (𝑊 −1𝐴𝑇𝜋)max

= 1 + _ (𝑊−1𝐴𝑇𝜋)max

(𝑊 −1𝐴𝑇𝜋∗)max

≤ 1 + _2𝛼 ′

since we start with an 𝛼 ′_-approximation.13

Thus, we obtain an 𝛼-approximation of the restricted

problem with 𝛼 ∈ 𝑂 (𝛼 ′2_2). However, in the models of computation considered in this paper,

solving the restricted problem is not harder than the unrestricted one, so we do not need to pay the

additional computational cost resulting from the weaker approximationa guarantee.

Proof of correctness We show the approximation guarantee of the algorithm by arguing that

its return value (𝑥,𝑦) is a (1 + Y)-approximate pair, i.e., 1𝑇𝑊𝑥 ≤ (1 + Y)𝑏𝑇𝑦. By weak duality this

in particular implies that both 𝑥 and 𝑦 are (1 + Y)-approximate solutions. We stress, however, that

Algorithm 1 would not have to compute 𝑥 to return the (1 + Y)-approximate dual solution 𝑦; we

could exploit the mere existence of 𝑥 for proving the approximation guarantee of 𝑦. To obtain 𝑦

it would thus be sucient for the oracle to return only an 𝛼-approximate dual solution ℎ for (3).

If the oracle additonally returns an 𝛼-approximate dual solution (𝑓 , 𝑧), then 𝑥 can be determined

explicitly. Note that, by Corollary 2.4, the oracle we use throughout this work has this capability.

We rst prove a helper statement showing that the algorithm maintains suitable values of 𝛽 ,

and how adjusting 𝛽 aects Φ𝛽 (𝜋).13Note that (𝑊 −1𝐴𝑇 𝜋)max is always within a constant factor of Φ𝛽 (𝜋), which decreases in every iteration.

17

Lemma 2.10. Except immediately after updates of Y ′ or 𝜋 (i.e., before adjusting 𝛽 in the subsequentlines), Algorithm gradient_transshipment maintains the invariant that

4 ln(2𝑚) ≤ Y ′𝛽 (𝑊 −1𝐴𝑇𝜋)max ≤ Y ′𝛽Φ𝛽 (𝜋) ≤ 10 ln(2𝑚) .

In particular, Φ𝛽 (𝜋) ≤ (1 + Y′

4) (𝑊 −1𝐴𝑇𝜋)max.

Proof. First, observe that 4 ln(2𝑚) ≤ Y ′𝛽 (𝑊 −1𝐴𝑇𝜋)max follows immediately from Lines 5 and 11.

By (5), this implies (7), i.e., the last statement of the lemma. As by (5) (𝑊 −1𝐴𝑇𝜋)max ≤ lse𝛽


)=

Φ𝛽 (𝜋) for any 𝛽 , it remains to show that Y ′𝛽Φ𝛽 (𝜋) ≤ 10 ln(2𝑚). Note that Y ′𝛽 (𝑊 −1𝐴𝑇𝜋)max ≤8 ln(2𝑚) implies that

Y ′𝛽Φ𝛽 (𝜋) ≤ Y ′𝛽(1 + Y

′

4

)(𝑊 −1𝐴𝑇𝜋)max

Y′≤1≤ 10 ln(2𝑚) .

Thus, the statement always holds after executing Line 5 or adjusting 𝛽 according to Line 11. This

leaves only the possibility that updates of 𝜋 cause a violation of the inequality Y ′𝛽Φ𝛽 (𝜋) ≤ 10 ln(2𝑚)while 𝛽 is not adjusted. However, Lemma 2.6 shows that updates of 𝜋 may only decrease the potential,

completing the proof. �

Lemma 2.11 (Correctness of Algorithm 1). Let 0 < Y ≤ 1, and let (𝑥,𝑦) be the return value ofgradient_transshipment(𝐺,𝑏, Y). Then (𝑥,𝑦) is a (1 + Y)-approximate pair for (1), i.e., it holds that𝐴𝑥 = 𝑏, 𝑥 ≥ 0, (𝑊 −1𝐴𝑇𝑦)max ≤ 1, and 1𝑇𝑊𝑥 ≤ (1 + Y)𝑏𝑇𝑦.

Proof. Let 𝛽 , 𝜋 , 𝑓 , 𝑧, and ℎ denote the values of the respective variables upon termination of the

algorithm. Remember that ((𝑓 , 𝑧), ℎ) is an 𝛼-approximate pair for (3) with˜𝑏 = ∇Φ𝛽 (𝜋). Due to

termination of the algorithm, we have that 𝛿 ≤ Y′

6𝛼_and Y ′ ≤ Y, yielding that

∇Φ𝛽 (𝜋)𝑇ℎ = 𝛿 ≤ Y ′

6𝛼_≤ Y

6𝛼. (11)

Recall that ‖𝑊 −1− 𝐴𝑇𝜋 ‖∞ = (𝑊 −1− 𝐴𝑇𝜋)max and note that, as 𝑓 ≥ 0, ‖𝑊− 𝑓 ‖1 = 1𝑇𝑊− 𝑓 . By Lemma 2.10

we have 4 ln(2𝑚) ≤ Y ′𝛽 (𝑊 −1𝐴𝑇𝜋)max ≤ Y ′𝛽 ≤ Y𝛽 . As Φ𝛽 (·) is convex, its gradient restricted to a

line is non-decreasing. Thus, the rst-order Maclaurin approximation gives

𝜋𝑇∇Φ𝛽 (𝜋) ≥ Φ𝛽 (𝜋) − Φ𝛽 (0) = Φ𝛽 (𝜋) −ln(2𝑚)𝛽

≥(1 − Y

4

)Φ𝛽 (𝜋) . (12)

By multiplying the set of equalities 𝐴𝑓 + 𝑧𝑏 = ∇Φ𝛽 (𝜋) with 𝜋𝑇 from the left, we see that

𝑧 = 𝜋𝑇𝑏𝑧

= 𝜋𝑇∇Φ𝛽 (𝜋) − 𝜋𝑇𝐴𝑓(12)

≥(1 − Y

4

)Φ𝛽 (𝜋) − 𝜋𝑇𝐴𝑊 −1− 𝑊− 𝑓

(5)

≥(1 − Y

4

)(𝑊 −1− 𝐴𝑇𝜋)max − (𝑊 −1− 𝐴𝑇𝜋)𝑇𝑊− 𝑓

Hölder

≥(1 − Y

4

)(𝑊 −1− 𝐴𝑇𝜋)max − ‖𝑊 −1− 𝐴𝑇𝜋 ‖∞‖𝑊− 𝑓 ‖1

=

(1 − Y

4

− 1𝑇𝑊− 𝑓)(𝑊 −1− 𝐴𝑇𝜋)max

18

𝛼-approx. pair

≥(1 − Y

4

− 𝛼∇Φ𝛽 (𝜋)𝑇ℎ)(𝑊 −1− 𝐴𝑇𝜋)max

(11)

≥(1 − 5Y

12

)(𝑊 −1− 𝐴𝑇𝜋)max. (13)

In particular, 𝑧 > 0 and, by Observation 2.9 and Lemma 2.5, 𝑥 and 𝑦 = 𝜓 (𝜋) are indeed feasible

primal and dual solutions of (1), respectively.

It remains to show that 1𝑇𝑊𝑥 ≤ (1 + Y)𝑏𝑇𝑦. As 1𝑇𝑊𝑥 = 1 by Observation 2.8,

1𝑇𝑊𝑥 =1𝑇𝑊𝑥 + 1𝑇𝑊

(𝑓 −𝑓 +

)𝑧

≤1 + _1𝑇𝑊−

(𝑓 −𝑓 +

)𝑧

=1 + _1𝑇𝑊− 𝑓

𝑧

𝛼-approx. pair

≤1 + _𝛼∇Φ𝛽 (𝜋)𝑇ℎ

𝑧(11)

≤1 + Y

6

𝑧(13)

≤1 + Y

6(1 − 5Y

12

)(𝑊 −1− 𝐴𝑇𝜋)max

Y≤1≤ 1 + Y(𝑊 −1− 𝐴𝑇𝜋)max

Obs. 2.2

≤ 1 + Y(𝑊 −1𝐴𝑇𝜋)max

= (1 + Y)𝑏𝑇𝜓 (𝜋)= (1 + Y)𝑏𝑇𝑦. �

Corollary 2.12. Suppose 0 < Y ≤ 1. Whenever the inner loop of Algorithm gradient_transshipment

terminates, the current 𝜋 is a (1 + Y ′)-approximate dual solution to (1).

Proof. If we ran the algorithm for Y = Y ′, it would perform exactly the same computations up to

that point and then terminate. Noting that an optimal primal solution (𝑓 , 𝑧) for the problem posed

to the oracle in the last iteration and, by Observation 2.9, 𝑥 =𝑊 −1∇ lse𝛽(𝑊 −1𝐴𝑇𝜋

)satises the

preconditions of Lemma 2.11, the claim is shown by the lemma. �

Bounding the number of iterations We now examine how many iterations Algorithm 1 re-

quires to terminate. This reduces the task of bounding the overall running time to determining the

cost of implementing a single iteration, which depends on the specic model of computation.

Lemma 2.13 (Number of iterations of Algorithm 1). Suppose that 0 < Y ≤ 1. Then Algorithmgradient_transshipment performs 𝑂 ((Y−2 + log𝛼 + log _)𝛼2_2 log𝑛) iterations of its inner loop.

Proof. Denote by 𝜋 (𝑖) and 𝛽𝑖 , 0 ≤ 𝑖 ≤ 𝑖max := dlog Y−1e, the values of 𝜋 and 𝛽 at the beginning of

the (𝑖 + 1)-th iteration of the outer loop, where 𝜋 (𝑖max)and 𝛽𝑖max

denote the values at termination.

Refer to the 𝑖-th iteration as phase 𝑖 and denote by Y𝑖 = 2−𝑖

the value of Y ′ during this loop iteration.

19

Lemma 2.10 shows that the preconditions of Corollary 2.7 are satised by each update of 𝜋 ,

as 𝛿 ≤ Y𝑖6_𝛼

implies termination of the inner loop. Thus, if the inner loop does not terminate, the

corollary shows that the potential decreases by a factor of at least(1 − Y𝑖𝛿

2

40 ln(2𝑚)

)≤

(1 −

Y3𝑖

1440𝛼2_2 ln(2𝑚)

)=: 𝑞𝑖 .

However, when setting 𝛽 in Line 11, the potential might increase. Recall that, for any 𝛽 , it holds that

Φ𝛽 (𝜋) ≥ (𝑊 −1𝐴𝑇𝜋)max by (5). As after Line 11 we have thatΦ𝛽 (𝜋) ≤ (1+ Y𝑖4) (𝑊 −1𝐴𝑇𝜋)max again by

Lemma 2.10, the increase in potential due to an update of 𝛽 according to Line 11 is multiplicatively

bounded by 1 + Y𝑖4. Observe that 𝛽 is at least doubled by the statement in Line 11 and that this can

happen no more than

ℓ𝑖 := log

((𝑊 −1𝐴𝑇𝜋 (𝑖−1) )max

(𝑊 −1𝐴𝑇𝜋 (𝑖) )max

)≤ log

((𝑊 −1𝐴𝑇𝜋 (𝑖−1) )max

(𝑊 −1𝐴𝑇𝜋∗)max

)times in phase 𝑖 . By Corollary 2.12, we have for all 𝑖 > 1 that

(𝑊 −1𝐴𝑇𝜋 (𝑖−1) )max ≤ (1 + Y𝑖−1) (𝑊 −1𝐴𝑇𝜋∗)max < 2(𝑊 −1𝐴𝑇𝜋∗)max

and thus ℓ𝑖 = 0, and as 𝜋 (0) is an (𝛼_)-approximation, we have that ℓ1 ≤ log(𝛼_).Denote by 𝛽 ′𝑖 the value of 𝛽 when the inner loop terminates at the end of phase 𝑖 and by

𝑘𝑖 the number of iterations in this phase. Using the above bounds on ℓ𝑖 , that Φ𝛽𝑖−1 (𝜋 (𝑖−1) ) ≤(1 + Y𝑖

4) (𝑊 −1𝐴𝑇𝜋 (𝑖−1) )max by Lemma 2.10, and once more the approximation guarantee of the 𝜋 (𝑖) ,

we can bound

(𝑊 −1𝐴𝑇𝜋∗)max ≤ (𝑊 −1𝐴𝑇𝜋 (𝑖) )max

≤ Φ𝛽′𝑖(𝜋 (𝑖) )

≤ 𝑞𝑘𝑖𝑖

(1 + Y𝑖

4

) ℓ𝑖Φ𝛽𝑖−1 (𝜋 (𝑖−1) )


(1 + Y𝑖

4

) ℓ𝑖+1(𝑊 −1𝐴𝑇𝜋 (𝑖−1) )max


(1 + Y𝑖

4

) ℓ𝑖+1(𝑊 −1𝐴𝑇𝜋∗)max ·

{𝛼_ if 𝑖 = 1

(1 + Y𝑖−1) if 𝑖 > 1

≤ 𝑞𝑘𝑖𝑖(𝑊 −1𝐴𝑇𝜋∗)max ·

{(1 + Y1

4

) ℓ1+1 𝛼_ if 𝑖 = 1(1 + Y𝑖

4

)(1 + Y𝑖−1) if 𝑖 > 1

Y𝑖 ≤Y𝑖−1≤1< 𝑞

𝑘𝑖𝑖(𝑊 −1𝐴𝑇𝜋∗)max ·

{𝛼2_2 if 𝑖 = 1

1 + 3Y𝑖−12

if 𝑖 > 1.

Dividing by (𝑊 −1𝐴𝑇𝜋∗)max, taking the logarithm, and rearranging, we infer that

𝑘𝑖 <

log(𝛼2_2)− log𝑞1 if 𝑖 = 1

log

(1+ 3Y𝑖−1

2

)− log𝑞𝑖 if 𝑖 > 1

∈{𝑂

((log𝛼 + log _)𝛼2_2 ln(2𝑚)

)if 𝑖 = 1

𝑂(Y−2𝑖 𝛼

2_2 ln(2𝑚))

if 𝑖 > 1.

20

Summing over all phases 𝑖 , the total number of iterations is bounded by

𝑖max∑︁𝑖=1

𝑘𝑖 ∈ 𝑂((log𝛼 + log _ +

𝑖max∑︁𝑖=2

Y−2𝑖

)𝛼2_2 ln(2𝑚)

)= 𝑂

( (Y−2 + log𝛼 + log _

)𝛼2_2 log(𝑛)

). �

The bound on the number of iterations of the algorithm readily translates to one on the specic

operations the algorithm performs.

Corollary 2.14. Algorithm gradient_transshipment can be executed using a total of 𝑂 ((Y−2 +log𝛼 + log _)𝛼2_2 log𝑛) operations of the following types:

• oracle call,

• computation of (𝑊 −1𝐴𝑇𝜋)max for a given 𝜋 ,

• computation of ∇Φ𝛽 (𝜋) for given 𝛽 and 𝜋 ,

• scalar product ˜𝑏𝑇ℎ for given ˜𝑏 and ℎ,

• comparisons of and multiplications with scalar values.

Proof. Follows by checking the lines of the code, the statements referred to in the comments, and

the bound on the number of iterations given in Lemma 2.13. �

We summarize the results of this section in the following theorem.

Theorem 2.15. Given an oracle that computes 𝛼-approximate dual solutions to (3), Algorithmgradient_transshipment computes, for any 0 < Y ≤ 1, a (1 + Y)-approximate dual solution tothe transshipment problem (1) on bidirected graphs with positive arc weights calling the oracle𝑂 ((Y−2 + log𝛼 + log _)𝛼2_2 log𝑛) times. If, on termination of the algorithm, the oracle providesan 𝛼-approximate primal-dual pair of solutions to (3), a (1 + Y)-approximate primal-dual pair ofsolutions to (1) can be readily determined (see Observation 2.9).

3 Single-Source Shortest Paths

For any instance of the transshipment problem, there is always an optimal primal solution whose

arcs with non-zero ow form a forest. This is a well-known result of the structure of linear

programs [AMO93, Theorem 11.1]; for completeness, we directly prove the statement here for the

problem at hand.

Lemma 3.1. (1) has an optimal primal solution that sends ow only along the arcs of a forest.

Proof. Suppose 𝑥 ∈ R2𝑚 is an optimal primal solution and let 𝐶 be any cycle so that for each edge

𝑒 = {𝑢, 𝑣} ∈ 𝐶 , 𝑥 (𝑢,𝑣) > 0 or 𝑥 (𝑣,𝑢) > 0. Consistently direct𝐶 . Let 𝑓 ∈ R2𝑚 (not necessarily satisfying

𝑓 ≥ 0) send min{𝑥 (𝑢,𝑣) | {𝑢, 𝑣} ∈ 𝐶 ∧ 𝑥 (𝑢,𝑣) ≠ 0} units of ow in positive direction “around” 𝐶 , so

that 𝑓(𝑢,𝑣) ≠ 0 implies 𝑥 (𝑢,𝑣) ≠ 0. By construction, both 𝑥 − 𝑓 ≥ 0 and 𝑥 + 𝑓 ≥ 0 and, as 𝐶 is a cycle,

𝐴𝑓 = 0, i.e., 𝐴(𝑥 − 𝑓 ) = 𝐴(𝑥 + 𝑓 ) = 𝐴𝑥 = 𝑏. Thus, both 𝑥 − 𝑓 and 𝑥 + 𝑓 are feasible.

21

Note that 𝑥 − 𝑓 or 𝑥 + 𝑓 satises that there is one arc less on 𝐶 carrying ow; assume w.l.o.g. it

is 𝑥 − 𝑓 . We claim that 1𝑇𝑊 (𝑥 − 𝑓 ) = 1𝑇𝑊𝑥 . Assuming for contradiction that this is false, either

1𝑇𝑊 (𝑥 − 𝑓 ) < 1𝑇𝑊𝑥 or 1𝑇𝑊 (𝑥 + 𝑓 ) < 1𝑇𝑊𝑥 , i.e., either 𝑥 − 𝑓 or 𝑥 + 𝑓 has smaller objective

than 𝑥 , contradicting its optimality. We conclude that 𝑥 − 𝑓 is also an optimal solution. The claim

now follows by inductively repeating the argument until no cycles remain. �

When considering the special case of the single-source shortest path (SSSP) problem, two

aspects of the solution given in Section 2 are unsatisfactory. First, if a primal solution is computed,

there is no guarantee that it induces a tree, and thus it is not clear which arc one should traverse

from 𝑣 when searching for a short path to 𝑠 . Second, approximating the optimal solution guarantees

only that the computed upper bounds on the distances from the source 𝑠 are a (1+Y)-approximation

on average, i.e., for any specic node 𝑣 it may still be the case that 𝑦𝑣 − 𝑦𝑠 � 𝑑 (𝑠, 𝑣), where 𝑑 (·, ·) isthe distance metric on 𝑉 induced by 𝐺 .

The goal of this section is to resolve these issues. Our solution is based on a simple sampling

procedure using Algorithm gradient_transshipment as a subroutine. It results in an approximate

shortest-path tree rooted at 𝑠 , which guarantees a (1 + Y)-approximation for each distance from 𝑠 to

a 𝑣 ∈ 𝑉 .

3.1 Sampling a Tree Solution

As the rst step towards the randomized solution, we construct a (primal) tree solution that is good

on average. We assume in the following that there is a single source node with negative demand

and the demand on every non-source node is either 0 or 1. The idea, which relies on our specic

choice of a spanner-based oracle, is as follows:

1. Run Algorithm gradient_transshipment for Y ′ := Y6until termination and denote by 𝑥 the

returned primal solution.

2. For each node 𝑣 ∈ 𝑉 , partition its incoming arcs (𝑢, 𝑣) with 𝑥 (𝑢,𝑣) > 0 into classes in which

arc weights dier by factor at most 2 (we will ensure that there are only 𝑂 (log𝑛) classes).Denoting by 𝑓(𝑢,𝑣) the sum of ows of arcs in the class of (𝑢, 𝑣), sample (𝑢, 𝑣) with (roughly)

probability min{𝑓 −1(𝑢,𝑣) (2_𝛼 + 1)𝑥 (𝑢,𝑣) , 1}.

3. We show that an optimal solution using only sampled arcs and spanner edges is a (1 + Y)-approximation with probability at least

2

3, and that we can bound the number of arcs sampled

by the procedure by 𝑂 (𝛼_𝑛 log𝑛) with probability at least2

3.

4. We abort the procedure if more arcs are selected. Otherwise, we compute and return an

optimum tree solution on the selected arc set. By the union bound, we thus obtain a (1 + Y)-approximation using only the spanner edges and 𝑂 (𝛼_𝑛 log𝑛) sampled arcs with probability

at least1

3.

5. To obtain a Las Vegas algorithm, we simply repeat the procedure until a good solution is

obtained; this can be checked by comparing the objective value of the obtained tree solution

to the one of the dual solution returned by Algorithm gradient_transshipment.

The second last step exploits that in the computational models considered in this work, operations

on sparse graphs are considerably cheaper; thus, this strategy is tied to using an oracle based on a

sparse graph.

22

In order to prove the correctness of the above procedure, conceptually decompose 𝑥 =: 𝑥M + 𝑥◦,where the arcs with non-zero ow in 𝑥M form a directed acyclic graph (DAG). Such a decomposition

is always feasible, which again is a standard property whose proof we give for completeness.

Lemma 3.2. Any primal solution to (1) can be decomposed into a feasible solution 𝑥M inducing a DAGsatisfying 𝐴𝑥M = 0 and 𝑥◦ ≥ 0 satisfying 𝐴𝑥◦ = 0.

Proof. The proof is very similar to the one of Lemma 3.1. Suppose 𝑥 ∈ R𝑛 is a primal solution and

let 𝐶 be any consistently oriented cycle so that for each arc (𝑢, 𝑣) ∈ 𝐶 , 𝑥 (𝑢,𝑣) > 0 (if no such 𝐶 exists,

𝑥M := 𝑥 and 𝑥◦ = 0meet the claim of the lemma). Let 𝑓 ∈ R2𝑚 sendmin{𝑥 (𝑢,𝑣) | (𝑢, 𝑣) ∈ 𝐶 ∧𝑥 (𝑢,𝑣) ≠0} units of ow in positive direction “around”𝐶 . By construction both 𝑓 and 𝑥 − 𝑓 are feasible, and𝑥 − 𝑓 has one arc less carrying non-zero ow. The claim now follows by inductively repeating the

argument until no directed cycles remain, where 𝑥◦ is the sum of the ows 𝑓 from the individual

steps. �

We would like to sample from 𝑥M, but have to face the “disturbance” by 𝑥◦. We will exploit

that the ow 𝑥◦ is “cheap” to bound its eect on the sampled solution. In order to formalize this,

we x some notation. In the following, let 𝑥 ∈ R2𝑚 be the primal solution for (1) returned by

Algorithm gradient_transshipment when called with accuracy parameter Y ′ := Y6. Decompose

𝑥 =: 𝑥M + 𝑥◦ according to Lemma 3.2. We dene 𝑓 M𝑣 :=∑{𝑣,𝑢 }∈𝐸 𝑥

M(𝑢,𝑣) .

Lemma 3.3. Suppose that for 𝑣 ∈ 𝑉 \ {𝑠}, 𝑏𝑣 ≥ 0. Let each 𝑣 ∈ 𝑉 with 𝑓 M𝑣 > 0 sample one incomingarc, where the probability to choose arc (𝑢, 𝑣) ∈ 𝐸 is (𝑓 M𝑣 )−1𝑥M(𝑢,𝑣) . Then the resulting arc set is a directedtree 𝑇 rooted at the source 𝑠 spanning all nodes with non-zero demand. Denoting by 𝑡 ∈ R2𝑚 the(unique) ow with 𝐴𝑡 = 𝑏 and 𝑡𝑒 = 0 for any arc 𝑒 ∉ 𝑇 , it holds that E[1𝑇𝑊𝑡] = 1𝑇𝑊𝑥M.

Proof. Recall that 𝐴𝑥M = 𝑏. As 𝑠 is the only node with supply, i.e., 𝑏𝑣 ≥ 0 for all 𝑣 ∈ 𝑉 \ {𝑠}, anysuch 𝑣 that has an outgoing arc (𝑣,𝑢) ∈ 𝐸 carrying non-zero ow or that satises 𝑏𝑣 > 0 must have

an incoming arc (𝑢, 𝑣) ∈ 𝐸 carrying non-zero ow. Thus, we can inductively construct a directed

path ending at any such node by following the incoming arc of the (current) rst node of the path,

until either this rst node is 𝑠 or we close a directed cycle. The latter is not possible, because 𝑥M

induces a DAG by construction (cf. Lemma 3.2). Hence, the sampled graph is a DAG in which each

node that sampled an arc is reachable from 𝑠 . Thus, as each node sampled at most one incoming

arc, the sampled graph must be a tree 𝑇 rooted at 𝑠 . As we observed that each 𝑣 with 𝑏𝑣 > 0 must

have an incoming arc carrying non-zero ow, each such 𝑣 sampled an arc, implying that 𝑇 spans

the nodes of non-zero demand. Accordingly, there is a unique ow 𝑡 on 𝑇 such that 𝐴𝑡 = 𝑏.

It remains to show that E[1𝑇𝑊𝑡] = 1𝑇𝑊𝑥M. We show this by induction on the number of

nodes in the DAG induced by 𝑥M. The base case where the DAG induced by 𝑥M has 0 nodes is trivial.

For the induction step, let 𝑣 be a node so that 𝑥M(𝑣,𝑢) = 0 for all (𝑣,𝑢) ∈ 𝐸; such a node must exist, as

𝑥M induces a DAG. If 𝑏𝑣 = 0, 𝑓 M𝑣 = 𝑏𝑣 = 0 and the claim is trivially true, so suppose 𝑓 M𝑣 = 𝑏𝑣 > 0. We

interpret the random choice of 𝑣 as follows: 𝑣 picks an incoming arc (𝑢, 𝑣) and we route 𝑏𝑣 units of

ow from 𝑢 to 𝑣 , changing demands accordingly by setting 𝑏 ′𝑣 = 0 and 𝑏 ′𝑢 = 𝑏𝑢 + 𝑏𝑣 . In expectation,

this induces cost

𝐾 :=∑︁(𝑢,𝑣) ∈𝐸

𝑤 (𝑢,𝑣)𝑏𝑣 ·𝑥M(𝑢,𝑣)𝑓 M𝑣

=∑︁(𝑢,𝑣) ∈𝐸

𝑤 (𝑢,𝑣)𝑥M(𝑢,𝑣) ,

23

where we used that 𝑓 M𝑣 = 𝑏𝑣 for each 𝑣 . The modied demands𝑏 ′ satisfy for (𝑢, 𝑣) ∈ 𝐸 with 𝑥 (𝑢,𝑣) > 0

that

E[𝑏 ′𝑢] = 𝑏𝑢 +𝑥M(𝑢,𝑣)𝑓 M𝑣· 𝑏𝑣

𝑓 M𝑣 =𝑏𝑣= 𝑏𝑢 + 𝑥M(𝑢,𝑣) ,

that 𝑏 ′𝑣 = 0, and that 𝑏 ′𝑢 = 𝑏𝑢 at all other nodes 𝑢. Now 𝑥 ′ by 𝑥 ′(𝑢,𝑣) = 0 and 𝑥 ′(𝑢′,𝑣′) = 𝑥 (𝑢′,𝑣′) for any

other arc (𝑢 ′, 𝑣 ′) ∈ 𝐸. Observe that 𝑥 ′ can be decomposed into (𝑥 ′)M and (𝑥 ′)◦ (as in Lemma 3.2)

such that the DAG induced by (𝑥 ′)M amounts to the DAG induced by 𝑥M with the node 𝑣 removed.

Let 𝑇 ′ denote the tree sampled from (𝑥 ′)M. Recall that for independent random variables 𝑋 and 𝑌 ,

E[𝑋𝑌 ] = E[𝑋 ] E[𝑌 ]. As the random variable 𝑏 ′ does not depend on the random choices of any

nodes but 𝑣 , it is independent of the tree 𝑇 ′. We denote by 𝑋𝑢 the cost of routing one unit of ow

from 𝑠 to 𝑢 ∈ 𝑇 \ {𝑣, 𝑠} on 𝑇 ′. By linearity of expectation we then obtain

E[1𝑇𝑊𝑡

]= 𝐾 + E

∑︁

𝑢∈𝑇 \{𝑣,𝑠 }𝑏 ′𝑢𝑋𝑢

= 𝐾 +

∑︁𝑢∈𝑇 \{𝑣,𝑠 }

E[𝑏 ′𝑢𝑋𝑢

]= 𝐾 +

∑︁𝑢∈𝑇 \{𝑣,𝑠 }

E[𝑏 ′𝑢

]E [𝑋𝑢] .

Examining the sum in the nal term, observe that when deleting 𝑣 from the graph, the respective

restriction of 𝑥M (namely (𝑥 ′)M routes exactly demands E[𝑏 ′𝑢] ≥ 𝑏𝑢 ≥ 0 from 𝑠 to each 𝑢 ∈ 𝑇 \ {𝑣, 𝑠}.Thus, by the induction hypothesis and, again, linearity of expectation,∑︁

𝑢∈𝑇 \{𝑣,𝑠 }E

[𝑏 ′𝑢

]E [𝑋𝑢] = 1𝑇𝑊𝑥M −

∑︁(𝑢,𝑣) ∈𝐸

𝑤 (𝑢,𝑣)𝑥M(𝑢,𝑣) = 1

𝑇𝑊𝑥M − 𝐾.

We conclude that indeed E[1𝑇𝑊𝑡] = 1𝑇𝑊𝑥M, completing the proof. �

Our goal is now to apply Lemma 3.3 followed by Markov’s bound in order to show that sampling

arcs is sucient for obtaining a good tree solution. Alas, the decomposition 𝑥 = 𝑥M +𝑥◦ is unknown,and Lemma 3.3 critically depends on the fact that 𝑥M is acyclic. A naive solution would be to sample

suciently often according to 𝑥 so that each arc has at least the same probability to be sampled

when sampling from the “correct” distribution given by 𝑥M. Unfortunately, it is possible that an arc

(𝑢, 𝑣) satises 𝑥M(𝑢,𝑣) � 𝑥◦(𝑢,𝑣) , conicting with the requirement of sampling few arcs. We overcome

this obstacle by replacing such “bad” arcs by a corresponding path in the spanner.

In the following, denote𝑤min := min(𝑢,𝑣) ∈𝐸{𝑤 (𝑢,𝑣) } and partition 𝐸 into sets 𝐸𝑣,𝑘 := {(𝑢, 𝑣) ∈ 𝐸 |𝑤 (𝑢,𝑣) ∈ [2𝑘−1𝑤min, 2

𝑘𝑤min) ∧ 𝑥 (𝑢,𝑣) > 0}, where 𝑣 ∈ 𝑉 and 𝑘 ∈ Z>0. The set 𝐸𝑣,𝑘 ≠ ∅ is bad, if2𝛼_

∑𝑒∈𝐸𝑣,𝑘

𝑥M𝑒 ≤∑

𝑒∈𝐸𝑣,𝑘𝑥◦𝑒 . We say that 𝑒 ∈ 𝐸 is bad if the set 𝐸𝑣,𝑘 containing 𝑒 is bad.

Lemma 3.4. Redirecting the ow 𝑥M(𝑢,𝑣) of each bad arc (𝑢, 𝑣) over a shortest𝑢-𝑣 path on an (undirected)𝛼-spanner for the graph with weights𝑤− incurs an additional cost of at most Y ′1𝑇𝑊𝑥∗.

Proof. As the maximum ratio between the weights of an edge in opposing directions is _, an

undirected 𝛼-spanner for weights𝑤− clearly allows, for any (𝑢, 𝑣) ∈ 𝐸, nding a directed path of

24

length at most 𝛼_𝑤 (𝑢,𝑣) from 𝑢 to 𝑣 . The cost for rerouting the ow on bad arcs over the spanner is

thus bounded by ∑︁bad(𝑢,𝑣) ∈𝐸

𝛼_𝑤 (𝑢,𝑣)𝑥M(𝑢,𝑣) =

∑︁bad𝐸𝑢,𝑘

∑︁𝑒∈𝐸𝑢,𝑘

𝛼_𝑤𝑒𝑥M𝑒

<∑︁

bad𝐸𝑢,𝑘

2𝑘𝛼_𝑤min


𝑥M𝑒

≤∑︁

bad𝐸𝑢,𝑘

2𝑘−1𝑤min


𝑥◦𝑒

≤∑︁

bad𝐸𝑢,𝑘


𝑤𝑒𝑥◦𝑒

=∑︁

bad(𝑢,𝑣) ∈𝐸𝑤 (𝑢,𝑣)𝑥

◦(𝑢,𝑣)

≤ 1𝑇𝑊𝑥◦.

Furthermore, by Lemma 2.11 and the fact that 1𝑇𝑊𝑥M ≥ 1𝑇𝑊𝑥∗ by feasibility of 𝑥M and optimality

of 𝑥∗,

1𝑇𝑊𝑥◦ = 1𝑇𝑊𝑥 − 1𝑇𝑊𝑥M ≤ (1 + Y ′)1𝑇𝑊𝑥∗ − 1𝑇𝑊𝑥∗ = Y ′1𝑇𝑊𝑥∗.�

We now are ready to show that sampling suciently many arcs according to 𝑥 and adding the

spanner is, in expectation, almost as good as directly sampling according to 𝑥M.

Lemma 3.5. Suppose that for 𝑣 ∈ 𝑉 \ {𝑠}, 𝑏𝑣 ≥ 0. For each 𝐸𝑣,𝑘 ≠ ∅, sample arc 𝑒 = (𝑢, 𝑣) ∈ 𝐸𝑣,𝑘with independent probability

𝑝𝑒 := min

{(2𝛼_ + 1)𝑥𝑒∑

𝑒′∈𝐸𝑣,𝑘𝑥𝑒′

, 1

},

and repeat this procedure until at least one arc from 𝐸𝑣,𝑘 is selected. Add an arbitrary such arc, andadd all spanner edges. Then the expected cost of an optimal solution on the induced graph is boundedby (1 + 2Y ′)1𝑇𝑊𝑥∗.

Proof. By Lemma 3.3, sampling arc (𝑢, 𝑣) ∈ 𝐸 with probability (𝑓 M𝑢 )−1𝑥M(𝑢,𝑣) at 𝑢 ∈ 𝑉 with 𝑓 M𝑢 > 0

results in a ow 𝑡 on a tree of expected cost at most 1𝑇𝑊𝑥M. Observe that if the probability to

select (𝑢, 𝑣) ∈ 𝐸 is at least as large (and we are guaranteed to select at least one incoming arc for

each 𝑣 ∈ 𝑉 ), the expected cost can only become smaller. Consider an arc 𝑒 ∈ 𝐸𝑣,𝑘 with 𝑝𝑒 < 1 that

is not bad. It satises that

𝑝𝑒 ≥(2𝛼_ + 1)𝑥M𝑒∑

𝑒′∈𝐸𝑣,𝑘𝑥𝑒′

=(2𝛼_ + 1)𝑥M𝑒∑

𝑒′∈𝐸𝑣,𝑘𝑥M𝑒′ +

∑𝑒′∈𝐸𝑣,𝑘

𝑥◦𝑒′

>(2𝛼_ + 1)𝑥M𝑒

(2𝛼_ + 1)∑𝑒′∈𝐸𝑣,𝑘𝑥M𝑒′

(𝑒 is not bad)

=𝑥M𝑒

𝑓 M𝑣,

25

i.e., the probability to select such an arc is suciently large. The same is trivially true if 𝑝𝑒 = 1.

Hence, it remains to address bad arcs.

To this end, we apply Lemma 3.4 to see that we can reroute their ow over the spanner at an

additive additional cost of at most Y ′1𝑇𝑊𝑥∗. Logically, we can reect this by replacing the weight

of such an arc by the weight of a respective shortest path in the spanner and adding all bad arcs to

the sample (i.e., “sampling” each bad arc with probability 1); denoting the new weights by𝑊 ′, weare then guaranteed that the union of the spanner and the actually sampled arcs contains a solution

of expected cost at most

E[1𝑇𝑊 ′𝑡

]= 1𝑇𝑊 ′𝑥 ≤ 1𝑇𝑊𝑥 + Y ′1𝑇𝑊𝑥∗ ≤ (1 + 2Y ′)1𝑇𝑊𝑥∗.�

Corollary 3.6. Performing the sampling procedure of Lemma 3.5 and returning an optimum primaltree solution on the graph induced by the sampled arcs and the spanner yields a (1+ 2

3Y)-approximation

with probability at least 1

2.

Proof. Lemma 3.5 shows that the expected cost of an optimal solution on the stated arc set is at

most (1 + 2Y ′)1𝑇𝑊𝑥∗ = (1 + Y3)1𝑇𝑊𝑥∗. Applying Markov’s bound to the (positive) random variable

that is the amount by which this cost exceeds that of an optimal solution shows that the cost is at

most (1 + 2

3Y)1𝑇𝑊𝑥∗ with probability at least

1

2. Lemma 3.1 shows that there is an optimal solution

that is a forest, and as there is only one source, it is a tree. �

To complete the discussion of the sampling procedure of Lemma 3.5, it remains to show that it

will stop fast enough, i.e., that there is a constant probability of sampling at least one arc. In order

to show this, we use the following general observation.

Lemma 3.7. Let 𝑝 be a nite probability mass function for a sample space of size 𝑡 , i.e., 𝑝𝑖 ∈ [0, 1]for all 1 ≤ 𝑖 ≤ 𝑡 and ∑

1≤𝑖≤𝑡 𝑝𝑖 = 1. Then∏

1≤𝑖≤𝑡 (1 − 𝑝𝑖) ≤ 1

𝑒.

Proof. By the inequality of arithmetic and geometric means we have( ∏1≤𝑖≤𝑡(1 − 𝑝𝑖)

)1/𝑡

≤ 1

𝑡·

∑︁1≤𝑖≤𝑡(1 − 𝑝𝑖) =

1

𝑡·(𝑡 −

∑︁1≤𝑖≤𝑡

𝑝𝑖

)=𝑡 − 1𝑡

= 1 − 1

𝑡.

It now follows that ∏1≤𝑖≤𝑡(1 − 𝑝𝑖) ≤

(1 − 1

𝑡

)𝑡≤ 1

𝑒,

where the latter inequality can be derived from the limit deniton of Euler’s number. �

Lemma 3.8. The probability that no arc of 𝐸𝑣,𝑘 is sampled in the procedure given in Lemma 3.5 is atmost 1

𝑒.

Proof. For every arc 𝑒 ∈ 𝐸𝑣,𝑘 , set 𝑝 ′𝑒 =𝑥𝑒∑

𝑒′∈𝐸𝑣,𝑘 𝑥𝑒′. Clearly, 𝑝 ′𝑒 ≤ 𝑝𝑒 and 𝑝 ′𝑒 ∈ [0, 1] for every arc

𝑒 ∈ 𝐸𝑣,𝑘 and


𝑝 ′𝑒 = 1. With Lemma 3.7 we get that the probability that no arc is sampled is∏𝑒∈𝐸𝑣,𝑘

(1 − 𝑝𝑒) ≤∏

𝑒∈𝐸𝑣,𝑘

(1 − 𝑝 ′𝑒) ≤1

𝑒. �

We put all the steps for constructing a primal tree solution together in the pseudocode given in

Algorithm 2. We summarize the guarantees of this procedure as follows.

26

Algorithm 2: primal_tree (𝐺,𝑏, Y)1 (𝑥,𝑦) := gradient_transshipment(𝐺,𝑏, Y

6) // (1 + Y

6)-approximate pair

2 foreach 𝑣 ∈ 𝑉 and 𝑘 ∈ {1, . . . , dlog ‖𝑤 ‖∞e} do3 𝐸𝑣,𝑘 := {(𝑢, 𝑣) ∈ 𝐸 |𝑤 (𝑢,𝑣) ∈ [2𝑘−1, 2𝑘 ) ∧ 𝑥 (𝑢,𝑣) > 0}4 if 𝐸𝑣,𝑘 ≠ ∅ then5 𝐹𝑣,𝑘 := ∅6 for dlog(4𝑛dlog ‖𝑤 ‖∞e)e repetitions do7 if 𝐹𝑣,𝑘 = ∅ then8 foreach 𝑒 ∈ 𝐸𝑣,𝑘 do

9 Add 𝑒 to 𝐹𝑣,𝑘 with probability 𝑝𝑒 := min

{(2𝛼_+1)𝑥𝑒∑𝑒′∈𝐸𝑣,𝑘 𝑥𝑒′

, 1

}10 if |𝐹𝑣,𝑘 | = 16𝛼_ + 8 then11 𝐹𝑣,𝑘 ← ∅12 break;

13 Construct 𝛼-spanner 𝑆 of 𝐺− = (𝑉 , 𝐸,𝑤−)14 Construct graph 𝐻 consisting of all sampled arcs

⋃𝑣,𝑘 𝐹𝑣,𝑘 and 𝑆

15 Compute optimal primal tree solution 𝑡 and dual solution 𝑦 ′ on 𝐻16 if 1𝑊𝑡 ≤ (1 + Y)𝑏𝑇𝑦 then // (1 + Y)-approximate pair17 return (𝑡, 𝑦) // algorithm succeeded18 else19 return (⊥,⊥) // algorithm failed

Theorem 3.9. If 𝑏𝑣 ∈ {0, 1} for all 𝑣 ∈ 𝑉 \ {𝑠}, then, given any 0 < Y ≤ 1, with probability atleast 1

4, algorithm primal_tree computes a (1 + Y)-approximate primal-dual pair of solutions to (1) in

a bidirected graph with positive integer arc weights, where the primal solution has non-zero ow onlyon the arcs of a tree.

Proof. ByTheorem 2.15, the primal-dual pair (𝑥,𝑦) computed by algorithm gradient_transshipment

is a (1+ Y6)-approximate pair, i.e., 1𝑊𝑥 ≤ (1+ Y

6)𝑏𝑇𝑦. For any 𝑣 ∈ 𝑉 and any 𝑘 ∈ {1, . . . , dlog ‖𝑤 ‖∞e}

we have that for each repetition of the sampling, no arc is sampled, i.e., 𝐹 ′𝑣,𝑘

= ∅, with probability at

most1

𝑒by Lemma 3.8. Furthermore, by linearity of expectation, the expected number of sampled arcs

is 2𝛼_ + 1. By the Markov inequality, this expectation is exceeded by a factor of 8 with probability at

most1

8. The probability that |𝐹𝑣,𝑘 | reaches size 16𝛼_+8 is therefore at most

1

8. Thus, in each repetition

of the sampling, the probability that 𝐹𝑣,𝑘 = ∅ is at most1

𝑒+ 1

8≤ 1

2. If this happens for all repetitions of

the sampling, the algorithm sets 𝐹𝑣,𝑘 = ∅ and might then not be correct. The probability of this event

falls exponentially with the number of repetitions, i.e., is at most1

2dlog(4𝑛dlog ‖𝑤‖∞e)e ≤ 1

4𝑛 dlog ‖𝑤 ‖∞ e . Thus,the probability that 𝐹𝑣,𝑘 = ∅ after the sampling for some 𝑣 ∈ 𝑉 and some 𝑘 ∈ {1, . . . , dlog ‖𝑤 ‖∞e}is, again by the union bound, at most 𝑛dlog ‖𝑤 ‖∞e · 1

4𝑛 dlog ‖𝑤 ‖∞ e = 1

4. Moreover, Corollary 3.6

states that computing an optimal tree solution 𝑡 using only sampled and spanner arcs yields a

(1 + 2

3Y)-approximation, i.e., 1𝑊𝑡 ≤ (1 + 2

3Y)1𝑊𝑥∗, with probability at least

2

3; we compute such a

solution. Applying the union bound, we conclude that with probability at least1

2− 1

4= 1

4, both all

the sets of sampled arcs are non-empty and the resulting solution is of sucient quality. In that

27

case we have

1𝑊𝑡 ≤ (1 + 2

3Y)1𝑊𝑥∗ ≤ (1 + 2

3Y)1𝑊𝑥 ≤ (1 + 2

3Y) (1 + 1

6Y)1𝑏𝑇𝑦 ≤ (1 + Y)𝑏𝑇𝑦 ,

i.e., (𝑡, 𝑦) is a (1 + Y)-approximate primal-dual pair and the algorithm succeeds. �

3.2 Computing an Approximate Shortest-Path Tree

Next, we address the point that an on-average guarantee is typically insucient when considering

the single-source shortest path problem. Rather, we would like an approximate shortest-path tree,i.e., a tree rooted at the source 𝑠 such that, for each 𝑣 ∈ 𝑉 , the distance from 𝑠 to 𝑣 in the tree is

larger by a factor of at most 1 + Y than the distance in 𝐺 .

Our approach here is straightforward. The algorithm (see Algorithm 3 for pseudocode) starts

with solving a “single-source” transshipment instance, in which 𝑠 has demand −𝑛 + 1 and every

other node has demand 1, up to increased precision Y ′ = Y12. In particular, it computes a primal tree

solution providing and “on-average” guarantee and a dual solution. Next, it determines the distances

from the source to every node in this tree. By weak duality, we know that for all nodes whose

distance in the tree exceeds their dual variable by a factor at most 1 + 12Y ′ ≤ 1 + Y we have alreadyfound a suitable approximate shortest path in the current tree. Our algorithm will consider these

nodes “done”, sets their demands to 0 and repeats this process with the modied transshipment

instance. We can argue that the nodes whose distance in the tree exceeds their dual variable by

a factor at most 1 + 12Y ′ make up a constant fraction of the optimal value of the transshipment

instance. Therefore, this algorithm will be nished after logarithmically many repetitions. Now the

desired approximate shortest path tree can be found by taking the union of all the intermediate

trees with “on-average” guarantee and computing a shortest path tree in the respective graph.

Algorithm 3: sssp (𝐺, 𝑠, Y)1 Set 𝐸 ′ := ∅, 𝑏 := 1 − 𝑛1𝑠 , and Y ′ := Y

12.

2 while 𝑏𝑠 < 0 do3 (𝑇,𝑦) := primal_tree(𝐺,𝑏, Y ′)4 if (𝑇,𝑦) ≠ (⊥,⊥) then5 Set 𝐸 ′← 𝐸 ′ ∪𝑇6 foreach 𝑣 ∈ 𝑉 with 𝑏𝑣 = 1 do7 if 𝑑𝑇 (𝑣) ≤ (1 + 12Y ′) (𝑦𝑣 − 𝑦𝑠) then8 Set 𝑏𝑣 ← 0 and 𝑏𝑠 ← 𝑏𝑠 + 1

9 Compute shortest-path tree 𝑇 ′ on (𝑉 , 𝐸 ′,𝑤 |𝐸′) rooted at 𝑠

10 return 𝑇 ′

In the following, assume w.l.o.g. that 𝑦𝑠 = 𝑦∗𝑠 = 0, i.e., for each 𝑣 ∈ 𝑉 , 𝑦∗𝑣 is the distance from 𝑠 to

𝑣 and 𝑦𝑣 ≤ 𝑦∗𝑣 is an approximation thereof. Thus, 𝑏𝑇𝑦∗ is the sum of all distances from 𝑠 , and 𝑏𝑇𝑦 is

close to this sum, where no distance is overestimated. Assigning to each node 𝑣 ∈ 𝑉 weight 𝑦∗𝑣 (i.e.,

its distance to 𝑠), this entails that the sum of weights of nodes satisfying 𝑦𝑣 ≥ 𝑦∗𝑣1+4Y′ is a constant

fraction of the total weights.

Lemma 3.10. Assume that 𝑏𝑣 ≥ 0 for all 𝑣 ∈ 𝑉 \ {𝑠}, 𝑦 ∈ R𝑛 is a (1 + Y ′)-approximate dual solutionto (1), and 𝑌 := {𝑣 ∈ 𝑉 \ {𝑠} : 𝑦𝑣 < 𝑦∗𝑣/(1 + 4Y ′)} for some Y ′ ≤ 1

4. Then,

∑𝑣∈𝑌 𝑏𝑣𝑦

∗𝑣 ≤ 1

2𝑏𝑇𝑦∗.

28

Proof. We have that

3Y ′∑︁𝑣∈𝑌

𝑏𝑣𝑦𝑣 ≤ 3Y ′∑︁𝑣∈𝑌

𝑏𝑣𝑦𝑣 + (1 + Y ′)𝑏𝑇𝑦 − 𝑏𝑇𝑦∗

= (1 + 4Y ′)∑︁𝑣∈𝑌

𝑏𝑣𝑦𝑣 + (1 + Y ′)∑︁

𝑣∈𝑉 \𝑌𝑏𝑣𝑦𝑣 − 𝑏𝑇𝑦∗

<∑︁𝑣∈𝑌

𝑏𝑣𝑦∗𝑣 + (1 + Y ′)

∑︁𝑣∈𝑉 \𝑌

𝑏𝑣𝑦𝑣 − 𝑏𝑇𝑦∗

= (1 + Y ′)∑︁

𝑣∈𝑉 \𝑌𝑏𝑣𝑦𝑣 −

∑︁𝑣∈𝑉 \𝑌

𝑏𝑣𝑦∗𝑣

≤ Y ′∑︁

𝑣∈𝑉 \𝑌𝑏𝑣𝑦𝑣,

where the last inequality follows from the observation that (𝑊 −1𝐴𝑇𝑦)max = 1 guarantees that

𝑦𝑣 ≤ 𝑦∗𝑣 for all 𝑣 ∈ 𝑉 , the fact that 𝑏𝑣 ≥ 0 for all 𝑣 ∈ 𝑉 \ {𝑠}, and the assumption that 𝑦𝑠 = 0. It

follows that

∑𝑣∈𝑉 \𝑌 𝑏𝑣𝑦𝑣 ≥ 3

4𝑏𝑇𝑦. We conclude that∑︁

𝑣∈𝑌𝑏𝑣𝑦∗𝑣 = 𝑏

𝑇𝑦∗ −∑︁

𝑣∈𝑉 \𝑌𝑏𝑣𝑦∗𝑣

≤ (1 + Y ′)𝑏𝑇𝑦 −∑︁

𝑣∈𝑉 \𝑌𝑏𝑣𝑦𝑣

≤(1

4

+ Y ′)𝑏𝑇𝑦

Y′≤ 14≤ 1

2

𝑏𝑇𝑦∗. �

In other words, with respect to node weights 𝑦∗𝑣 , we have a good lower bound on distances for a

large fraction of the total weight. A similar argument applies to a primal solution on a tree, which

yields good upper bounds on distances for a large fraction of the weight.

Lemma 3.11. Assume that Y ′ ≤ 1

4and 𝑏𝑣 ≥ 0 for all 𝑣 ∈ 𝑉 \ {𝑠}. Moreover, suppose that 𝑡 ∈ R2𝑚 is a

(1 + Y ′)-approximate primal solution to (1) satisfying 𝑡𝑒 = 0 for all 𝑒 ∈ 𝐸 \𝑇 , where 𝑇 is a tree rootedat 𝑠 . Denote by 𝑑𝑇 (𝑣) the distances from 𝑠 to 𝑣 ∈ 𝑉 in𝑇 . Then𝑋 := {𝑣 ∈ 𝑉 \ {𝑠} : 𝑑𝑇 (𝑣) > (1+4Y ′)𝑦∗𝑣}satises that

∑𝑣∈𝑋 𝑏𝑣𝑦

∗𝑣 ≤ 1

4𝑏𝑇𝑦∗.

Proof. Recall that 𝑦∗𝑠 = 0, i.e., the (minimal) cost of routing 𝑏𝑣 units of ow from 𝑠 to 𝑣 in 𝐺 is 𝑏𝑣𝑦∗𝑣 ,

𝑏𝑣 times the distance from 𝑣 to 𝑠 . Similarly, routing 𝑏𝑣 units of ow from 𝑠 to 𝑣 in 𝑇 incurs cost

𝑏𝑣𝑑𝑇 (𝑣). Hence,

(1 + Y ′)𝑏𝑇𝑦∗ = (1 + Y ′)1𝑇𝑊𝑥∗ ≥ 1𝑇𝑊𝑡

=∑︁𝑣∈𝑉

𝑏𝑣𝑑𝑇 (𝑣)

=∑︁𝑣∈𝑋

𝑏𝑣𝑑𝑇 (𝑣) +∑︁

𝑣∈𝑉 \𝑋𝑏𝑣𝑑𝑇 (𝑣)

≥∑︁𝑣∈𝑋

𝑏𝑣 (1 + 4Y ′)𝑦∗𝑣 +∑︁

𝑣∈𝑉 \𝑋𝑏𝑣𝑦∗𝑣

29

= 4Y ′∑︁𝑣∈𝑋

𝑏𝑣𝑦∗𝑣 + 𝑏𝑇𝑦∗.

Subtracting 𝑏𝑇𝑦∗ and diving by 4Y ′, the claim follows. �

Together, these lemmas show that a tree solution and a dual solution provide us with the means

to detect nodes for which the distances in 𝑇 are good approximations to their distances in 𝐺 , while

ensuring sucient progress.

Corollary 3.12. Use the notation of Lemmas 3.10 and 3.11 and assume that their preconditions aresatised. Then 𝑍 := {𝑣 ∈ 𝑉 \ {𝑠} : 𝑑𝑇 (𝑣) > (1 + 12Y ′)𝑦𝑣} satises

∑𝑣∈𝑍 𝑏𝑣𝑦

∗𝑣 ≤ 3

4𝑏𝑇𝑦∗.

Proof. Any 𝑣 ∈ 𝑉 \ (𝑋 ∪ 𝑌 ) satises

𝑑𝑇 (𝑣) ≤ (1 + 4Y ′)𝑦∗𝑣 ≤ (1 + 4Y ′)2𝑦𝑣Y′≤ 1

4≤ (1 + 12Y ′)𝑦𝑣 .

Hence, 𝑍 ⊆ 𝑋 ∪ 𝑌 and the claim follows from the bounds provided by the lemmas. �

Theorem 3.13. Given any 0 < Y ≤ 1, algorithm sssp computes a (1 + Y)-approximate single-sourceshortest path tree in bidirected graphs with positive polynomially bounded integer arc weights, i.e.,for each 𝑣 ∈ 𝑉 , the distance from 𝑠 to 𝑣 in the returned tree is at most factor 1 + Y larger than in theinput graph. The arc set 𝐸 ′ for the nal shortest path computation has size 𝑂 (𝑛 log ‖𝑤 ‖∞) and thealgorithm performs 𝑂 (𝑐 log ‖𝑤 ‖∞) iterations with probability at least 1 − 𝑛−𝑐 for every 𝑐 ≥ 1.

Proof. Observe that 𝑏 remains integral and feasible throughout the algorithm and 𝑏𝑣 ∈ {0, 1} forall 𝑣 ∈ 𝑉 \ {𝑠}. Therefore, 𝑏𝑇𝑦∗ < 1 (the minimum arc weight) implies that 𝑏 = 0 and thus that

the algorithm terminates. As 𝑏𝑇𝑦∗ is initially bounded from above by 𝑛2‖𝑤 ‖∞, 𝑛 times the upper

bound of 𝑛‖𝑤 ‖∞ on the length of any shortest path, Corollary 3.12 implies that the outer loop of

the algorithm terminates after 𝑟 := blog4(𝑛2‖𝑤 ‖∞)c + 1 = 𝑂 (log(𝑛‖𝑤 ‖∞)) iterations for which the

call of procedure primal_tree was successful. For each iteration this happens with probability at

least1

4by Theorem 3.9. The algorithm ensures that for each 𝑣 ∈ 𝑉 \ {𝑠}, there is an iteration in

which the tree 𝑇 added to 𝐸 ′ and the dual solution 𝑦 satisfy

𝑑𝑇 (𝑣) ≤ (1 + 12Y ′) (𝑦𝑣 − 𝑦𝑠) = (1 + Y) (𝑦𝑣 − 𝑦𝑠) ≤ (1 + Y) (𝑦∗𝑣 − 𝑦∗𝑠 ),

i.e., the distance from 𝑠 to 𝑣 in 𝑇 is at most factor 1 + Y larger than the distance from 𝑠 to 𝑣 in 𝐺 . It

follows that the returned shortest-path tree on (𝑉 , 𝐸 ′,𝑊 |𝐸′) satises the claimed approximation

guarantee. The probabilistic bound on the number of iterations is immediate from standard Cherno

bounds. As arcs are added to 𝐸 ′ only in the 𝑂 (log(𝑛‖𝑤 ‖∞)) iterations with a successful call of

procedure primal_tree and in each such iteration we add a tree of 𝑂 (𝑛) arcs, the size of 𝐸 ′ is𝑂 (𝑛 log(𝑛‖𝑤 ‖∞)). �

4 Implementation in Distributed and Streaming Models

In the following we explain how to implement the gradient descent algorithm for computing (1+ Y)-approximate transshipments and SSSP in distributed and streaming models. Common to all these

implementations is the use of sparse spanners. We give deterministic algorithms for computing

spanners in distributed and streaming models in Section 5.

30

4.1 Broadcast Congested Clique

Model In the Broadcast Congested Clique model, the system consists of 𝑛 fully connected

nodes labeled by unique 𝑂 (log𝑛)-bit identiers. Computation proceeds in synchronous rounds,

where in each round, nodes may perform arbitrary internal computations, broadcast (send) an

𝑂 (log𝑛)-bit message to all other nodes, and receive the messages from other nodes. The input

is distributed among the nodes. The rst part of the input of every node consists of its incident

arcs (given by their endpoints’ identiers) and their weights. The weights are assumed to be

polynomially bounded in 𝑛, so that they can be encoded using𝑂 (log𝑛) bits. The second part of the

input is problem-specic: for the transshipment problem, every node 𝑣 knows its demand 𝑏𝑣 and

for SSSP 𝑣 knows whether or not it is the source 𝑠 . In both cases, every node knows Y as well. Each

node needs to compute its part of the output. For the transshipment problem, every node needs

to know a (1 + Y) approximation of the optimum value, and for SSSP every node needs to know a

(1 + Y)-approximation of its distance to the source. For a primal solution to transshipment, every

node must know the ow over its incident arcs, and for a dual solution every node must know its

potential. In the case of SSSP, every node needs to know a (1 + Y)-approximation of its distance to

the source. In terms of primal solutions for (approximate) SSSP, only tree solutions are of interest.

Here, each node must know its parent, which is equivalent to knowing the next routing hop on

an approximately shortest path to the source. The complexity of the algorithm is measured in the

worst-case number of rounds until the computation is complete.

Implementing Algorithm gradient_transshipment In the following, we explain how to

implement our gradient descent algorithm for approximating the transshipment problem.

Theorem 4.1. Given any 0 < Y ≤ 1, in the Broadcast Congested Clique model a (1 + Y)-approximateprimal-dual pair of solutions to the transshipment problem in bidirected graphs with positive arcweights can be computed deterministically in 𝑂 (_2Y−2) rounds.

Proof. We will show that each iteration of Algorithm 1 can be implemented in a constant number

of rounds and that the remainder of the algorithm can be implemented in 𝑂 (log2 𝑛) rounds. ByTheorem 2.15, which bounds the number of iterations of Algorithm 1, a round complexity of

𝑂 (_2(Y−2 + log𝑛 + log _) log3 𝑛) follows. For _ ∈ Ω(𝑛) we can solve the problem in𝑂 (𝑛) rounds bymaking all arcs global knowledge. Therefore, this simplies to 𝑂 (_2Y−2).

We will explain how to implement the following lines of the algorithm: Lines 1 and 8, where the

algorithm performs oracle calls, Lines 5 and 11, where the algorithm needs to compute (𝑊 −1𝐴𝑇𝜋)max,

Line 7, where the algorithm computes ∇Φ𝛽 (𝜋), and Line 14, where the algorithm computes 𝑥 =

𝑊 −1∇ lse𝛽(𝑊 −1𝐴𝑇𝜋

). We implement the latter by computing, for every node 𝑣 , 𝑥𝑒 for every incident

arc 𝑒 . Once we have explained how to implement the lines mentioned above, it is clear that the rest

of the algorithm can be implemented by performing only local computation.

We implement the oracle calls in Lines 1 and 8 as follows. We rst make 𝑏 ∈ R𝑛 known to every

node in one round. Then we compute an𝑂 (log𝑛)-spanner of𝐺− = (𝑉 , 𝐸,𝑤−) with𝑂 (𝑛 log𝑛) edgesand make it known to every node. Using the randomized algorithm of Baswana and Sen [BS07],

computing the spanner takes 𝑂 (log2 𝑛) rounds. In Appendix 5, we provide a deterministic version

of this algorithm that performs 𝑂 (log2 𝑛) rounds and computes a spanner of size 𝑂 (𝑛 log𝑛). Forthe spanner computed in this way, each node knows a subset of its incident spanner edges of size

𝑂 (log𝑛) such that the union of all nodes’ subsets equals the whole set of spanner edges (i.e., the

31

arboricity of the spanner is 𝑂 (log𝑛)). Therefore the spanner can be made global knowledge in

𝑂 (log𝑛) rounds. The oracle calls in Lines 1 and 8 require only internal computation.

To explain how Lines 5, 7, 11, and 14 can be implemented in a constant number of rounds

each, we introduce the following notation. For every arc 𝑒 = (𝑢, 𝑣) ∈ 𝐸 dene the stretch under

potentials 𝜋 as

𝑠𝑒 (𝜋) :=𝜋𝑣 − 𝜋𝑢𝑤𝑒

.

Note that𝑊 −1𝐴𝑇𝜋 is the vector containing the stretches of the arcs under potential 𝜋 . For every

node 𝑣 , let 𝔰𝑣 and 𝔪𝑣 be the following quantities over its incoming arcs:

𝔰𝑣 :=∑︁

𝑒=(𝑢,𝑣) ∈𝐸𝑒𝛽𝑠𝑒 (𝜋 )

𝔪𝑣 := max{𝑠𝑒 (𝜋) : 𝑒 = (𝑢, 𝑣) ∈ 𝐸} .

Thus, for every arc 𝑒 ∈ 𝐸,

(𝑊 −1∇ lse𝛽(𝑊 −1𝐴𝑇𝜋

))𝑒 =

𝑒𝛽𝑠𝑒 (𝜋 )

𝑤𝑒 ·∑

𝑒′∈𝐸 𝑒𝛽𝑠𝑒′ (𝜋 )=


𝑤𝑒 ·∑

𝑣∈𝑉 𝔰𝑣,

for every node 𝑣 ,

∇Φ𝛽 (𝜋)𝑣 = (𝐴𝑊 −1∇ lse𝛽(𝑊 −1𝐴𝑇𝜋

))𝑣

=∑︁

𝑒=(𝑢,𝑣) ∈𝐸


𝑤𝑒 ·∑

𝑣∈𝑉 𝔰𝑣−

∑︁𝑒=(𝑣,𝑢) ∈𝐸


𝑤𝑒 ·∑

𝑣∈𝑉 𝔰𝑣,

and

(𝑊 −1𝐴𝑇𝜋)max = max{𝑠𝑒 (𝜋) : 𝑒 = (𝑢, 𝑣) ∈ 𝐸} = max{𝔪𝑣 : 𝑣 ∈ 𝑉 } .

The vector 𝜋 ∈ R𝑛 can be made global knowledge in a single round. As every node 𝑣 knows its

incident arcs and their respective weights, it can then internally compute 𝑠𝑒 (𝜋) for every incident

arc 𝑒 as well as 𝔰𝑣 and𝔪𝑣 . In two rounds of communication, the values 𝔰𝑣 and𝔪𝑣 of all nodes 𝑣 can

be made global knowledge. Once these values are known, each node can locally compute ∇Φ𝛽 (𝜋)𝑣 ,(𝑊 −1𝐴𝑇𝜋)max, and (𝑊 −1∇ lse𝛽


))𝑒 for each incident arc 𝑒 , using the equations above. The

vector ∇Φ𝛽 (𝜋) ∈ R𝑛 can then be made globel knowledge in a single round. �

Implementing Algorithm primal_tree In the following we explain how to implement our

sampling-based algorithm to compute a primal tree solution for the approximate transshipment

problem.

Theorem 4.2. If 𝑏𝑣 ∈ {0, 1} for all 𝑣 ∈ 𝑉 \ {𝑠}, then in the Broadcast Congested Clique model, givenany 0 < Y ≤ 1, with probability at least 1

4, a (1 + Y)-approximate primal-dual pair of solutions to the

transshipment problem in bidirected graphs with positive polynomially bounded integer arc weights,where the primal solution has non-zero ow only on the arcs of a tree, can be computed in 𝑂 (_2Y−2)rounds.

Proof. By Theorem 4.1, the call to Algorithm gradient_transshipment in Line 1 takes 𝑂 (_2Y−2)rounds. Recall that the corresponding primal solution 𝑥 is known locally to the nodes and therefore

32

each node 𝑣 knows 𝑥 (𝑢,𝑣) for each incoming arc (𝑢, 𝑣). After determining dlog ‖𝑤 ‖∞e in a single

round, the sampling in Lines 2 to 12 can thus be performed without additional communication.

As |𝐹𝑣,𝑘 | = 𝑂 (𝛼_) for each 𝑣 ∈ 𝑉 and each 𝑘 ∈ {1, . . . , log ‖𝑤 ‖∞}, the set⋃

𝑣,𝑘 𝐹𝑣,𝑘 can be made

global knowledge in 𝑂 (𝛼_ log ‖𝑤 ‖∞) rounds, which for 𝛼 = 𝑂 (log𝑛) is 𝑂 (_ log𝑛 log ‖𝑤 ‖∞). Since‖𝑤 ‖∞ is polynomially bounded, this term is dominated by𝑂 (_2Y−2). To implement Line 13, we run

the algorithm of Corollary 5.1 that performs 𝑂 (log2 𝑛) rounds to compute a spanner 𝑆 of stretch

𝛼 = 𝑂 (log𝑛) and make it known to all nodes. Once 𝑆 and

⋃𝑣,𝑘 𝐹𝑣,𝑘 are global knowledge, Lines 14

to 19 require no additional communication. �

ImplementingAlgorithm sssp Finally, we explain how to implement our adaptive-demands

algorithm for computing an approximate single-source shortest path tree.

Theorem 4.3. Given any 0 < Y ≤ 1, in the Broadcast Congested Clique model a (1+Y)-approximationto the single-source shortest path problem in bidirected graphs with positive polynomially boundedinteger arc weights can be computed in 𝑂 (𝑐_2Y−2) rounds with probability at least 1 − 𝑛−𝑐 for every𝑐 ≥ 1.

Proof. By Theorem 3.13, the number of iterations of the while loop is𝑂 (𝑐 log ‖𝑤 ‖∞) with probability1 − 𝑛−𝑐 . Observe that Line 3, which calls the algorithm for computing a primal tree solution, is

the only place in the algorithm, where communication between the nodes is necessary (if the

primal tree solution 𝑇 and the dual solution 𝑦 are made global knowledge in 𝑂 (1) rounds aftereach call). Therefore, the round complexity of computing an approximate SSSP tree exceeds the

round complexity of computing a primal tree solution to the transshipment problem (as stated in

Theorem 4.1) by a factor of 𝑂 (𝑐 log ‖𝑤 ‖∞), which is 𝑂 (𝑐 log𝑛) as ‖𝑤 ‖∞ is polynomially bounded.

�

As discussed at the beginning of Section 2 this result extends to graphs with arbitrary non-

negative arc weights at the cost of mild overheads. Note that in the literature the approximate

SSSP problem has also been studied in a relaxation of the Broadcast Congested Clique that does

not require the weights to be polynomially bounded by increaseing the message size to Θ(log𝑛 +log ‖𝑤 ‖∞) [EN16]. By a reduction of Klein and Subramanian [KS97], that has been adapted to

the Broadcast Congested Clique by Elkin and Neiman [EN16], we still may assume without loss

of generality that our approximate SSSP algorithm is called on a graph with largest arc weight

‖𝑤 ‖∞ = 𝑂 (𝑛Y−1). Furthermore, we may assume that Y−1 = 𝑂 (𝑛) as our running time anyway

depends on Y−1 and thus in the regime Y−1 = Ω(𝑛) we might as well compute a solution with the

Bellman-Ford algorithm, which takes 𝑂 (𝑛) rounds. In this way we arrive at polynomially bounded

arc weights again.

4.2 Broadcast CONGEST Model

Model The broadcast CONGEST model diers from the broadcast congested clique in that

communication is restricted to edges that are present in the input graph. That is, node 𝑣 receives

the messages sent by node 𝑢 if and only if {𝑢, 𝑣} ∈ 𝐸. All other aspects of the model are identical

to the broadcast congested clique. We stress that this restriction has signicant impact, however:

Denoting the hop diameter14

of the input graph by 𝐷 , it is straightforward to show that Ω(𝐷)14That is, the diameter of the unweighted graph 𝐺 = (𝑉 , 𝐸).

33

rounds are necessary to solve the transshipment problem. Moreover, it has been established that

Ω(√𝑛/log𝑛) rounds are required even on graphs with 𝐷 ∈ 𝑂 (log𝑛) [Das+12]. Both of these bounds

apply to randomized approximation algorithms (unless the approximation ratio is not polynomially

bounded in 𝑛).

Solving the Transshipment Problem A straightforward implementation of our algorithm

in this model simply simulates the broadcast congested clique. A folklore method to simulate

(global) broadcast is to use “pipelining” on a breadth-rst-search (BFS) tree.

Lemma 4.4 (cf. [Pel00]). Suppose each 𝑣 ∈ 𝑉 holds𝑚𝑣 ∈ Z≥0 messages of 𝑂 (log𝑛) bits each, fora total of 𝑀 =

∑𝑣∈𝑉 𝑚𝑣 strings. Then all nodes in the graph can receive these 𝑀 messages within

𝑂 (𝑀 + 𝐷) rounds.

Proof Sketch. It is easy to construct a BFS tree in𝑂 (𝐷) rounds (rooted at, e.g., the node with smallest

identier) and obtain an upper bound 𝑑 ∈ [𝐷, 2𝐷] on the hop diameter by determining the depth

of the tree and multiplying it by 2. By a convergecast, we can determine |𝑀 |, where each node

in the tree determines the total number of messages in its subtree. We dene a total order on the

messages via lexicographical order on node identier/message pairs.15

Then nodes ood their

messages through the tree, where a ooding operation for a message may be delayed by those for

other messages which are smaller w.r.t. the total order on the messages. On each path, a ooding

operation may be delayed once for each ooding operation for a smaller message. Hence, this

operation completes within 𝑂 (𝐷 + |𝑀 |) rounds, and using the knowledge on 𝑑 and𝑀 , nodes can

safely terminate within 𝑂 (𝑑 + |𝑀 |) = 𝑂 (𝐷 + |𝑀 |) rounds. �

We obtain the following corollary to Theorem 4.1.

Corollary 4.5. Given any 0 < Y ≤ 1, in the Broadcast CONGEST model a (1 + Y)-approximateprimal-dual pair of solutions to the transshipment problem in bidirected graphs with positive arcweights can be computed deterministically in 𝑂 (_2𝑛Y−2) rounds.

Proof. Simulate each round on the broadcast congested clique using Lemma 4.4, i.e., with parameters

|𝑀 | = 𝑛 and 𝐷 ≤ 𝑛. Applying Theorem 4.1, the claim follows. �

Special Case: Single-Source Shortest Paths The near-linear running time bound of Corol-

lary 4.5 is far from the Ω̃(√𝑛 + 𝐷) lower bound, which also applies to the single-source shortest

path problem. However, for this problem there is an ecient reduction to a smaller skeleton graph,implying that we can match the lower bound up to polylogarithmic factors. The skeleton graph

is given as an overlay network on a subset 𝑉 ′ ⊆ 𝑉 of the nodes, where each node in 𝑉 ′ learns itsincident arc and their weights.

Theorem 4.6 ([HKN16]). Given any 0 < Y ≤ 1 and a source node 𝑠 , in the Broadcast CONGEST modelan overlay network𝐺 ′ = (𝑉 ′, 𝐸 ′) of an undirected graph𝐺 with positive integer edge weights that hasedge weights𝑤 ′ : 𝐸 ′→ {1, . . . ,𝑂 (𝑛‖𝑤 ‖∞)} together with some additional information for every nodesatisfying the following properties with high probability can be computed in𝑂 (

√𝑛Y−1 log2 ‖𝑤 ‖∞ +𝐷)

rounds:

• |𝑉 ′ | = 𝑂 (√𝑛Y−1 log ‖𝑤 ‖∞) and 𝑠 ∈ 𝑉 ′.

15W.l.o.g., we assume identier/message pairs to be dierent.

34

• For every node 𝑢 ∈ 𝑉 , as soon as 𝑢 receives, for every 𝑣 ∈ 𝑉 ′, a (1 + Y3)-approximation of the

distance from 𝑠 to 𝑣 in 𝐺 ′, it can infer a (1 + Y)-approximation of the distance from 𝑠 to 𝑢 in 𝐺without any additional communication.

To see why this reduction to the skeleton graph also works in bidirected graphs with asymmetric

arc weights at the expense of a factor of _, let us briey review the construction of Henzinger et

al. [HKN16]. In the rst step, a set of nodes 𝑉 ′ is computed such that every path 𝜋 consisting of√𝑛 edges contains a node 𝑢 such that the ℎ∗-hop distance from 𝑢 to 𝑉 ′ is at most

Y10

times the

weight of 𝜋 ; the ℎ∗-hop distance is the length of the shortest path with at most ℎ∗ edges for some

ℎ∗ = 𝑂 (√𝑛) chosen by the algorithm. In the second step, the (1 + Y

10)-approximate 𝑘-hop distance

from each node of 𝑉 ′ to each node of 𝑉 is computed where 𝑘 = 𝑂 (√𝑛). The edge weights of the

overlay network are set to the approximate 𝑘-hop distances between nodes in 𝑉 ′. The remaining

approximate distances computed in this step are also stored and used in the end to determine a

(1+Y)-approximation of the distance from 𝑠 for each node. Furthermore, the construction guarantees

that an implicit (1 + Y)-approximate shortest path tree can be obtained if every node 𝑣 chooses the

neighbor 𝑢 as its parent that minimizes the sum of the approximate distance from 𝑠 to 𝑢 and the

weight of the arc from 𝑢 to 𝑣 .

In bidirected graphs, we simply run the rst step of the algorithm with Y ′ = Y_10

on 𝐺+, thegraph in which every bidirected edge 𝑒 with edge weights𝑤+𝑒 and𝑤−𝑒 for the forward and backward

arc, respectively, is replaced by an undirected edge of weight𝑤+𝑒 (recall that𝑤+𝑒 ≥ 𝑤−𝑒 ). Observethat for every path, the total weight in𝐺+ exceeds its total weight in𝐺 by a factor of at most _. Let

𝜋 be a path consisting of

√𝑛 arcs. The overlay network construction guarantees that 𝜋 contains a

node 𝑢 such that the ℎ-hop distance from 𝑢 to 𝑉 ′ in 𝐺+ is at most Y ′ times the weight of 𝜋 in 𝐺+.As the ℎ-hop distance from 𝑢 to 𝑉 ′ in 𝐺 is at most the ℎ-hop distance from 𝑢 to 𝑉 ′ in 𝐺+ and the

weight of 𝜋 in 𝐺+ exceeds its weight in 𝐺 by a factor of at most _, we get the following: the ℎ-hop

distance from𝑢 to𝑉 ′ in𝐺+ is at most Y ′_ = Y10

times the weight of 𝜋 in𝐺 . We can now proceed with

the second step of the overlay network construction, which is based on a distance approximation

technique that pipelines multiple (single-source) Moore-Bellman-Ford instances that also works in

directed graphs and thus does not need to be adapted. In this way, we can construct an overlay

network with 𝑂 (_√𝑛Y−1 log ‖𝑤 ‖∞) nodes in 𝑂 (_

√𝑛Y−1 log2 ‖𝑤 ‖∞ + 𝐷) rounds. Observe that the

derived overlay network is guaranteed to have ratio at most (1 + Y)_ = 𝑂 (_) between forward and

backward arcs. We can thus apply the previous simulation approach as follows.

Corollary 4.7. Given any 0 < Y ≤ 1, in the Broadcast CONGEST model a (1 + Y)-approximationto the single-source shortest path problem in bidirected graphs with positive polynomially boundedinteger arc weights can be computed in 𝑂 (𝑐 (

√𝑛 + 𝐷) log ‖𝑤 ‖∞_2Y−3) = 𝑂 (𝑐 (

√𝑛 + 𝐷)_2Y−3) rounds

with probability at least 1 − 𝑛−𝑐 for every 𝑐 ≥ 1.

Proof. We apply Theorem 4.6. Subsequently, we use Lemma 4.4 to simulate rounds of the Broadcast

Congested Clique on the overlay network, taking𝑂 ( |𝑉 ′ | +𝐷) = 𝑂 (√𝑛Y−1 log ‖𝑤 ‖∞ +𝐷) rounds per

simulated round. Using Theroem 4.3 for Y ′ = Y3, we obtain a (1 + Y ′)-approximation to the distances

from 𝑠 to each 𝑣 ∈ 𝑉 ′ in the overlay network. After broadcasting these distances using Lemma 4.4

again, all nodes can locally compute a (1 + Y)-approximation of their distance from 𝑠 . The total

running time is𝑂 ((√𝑛Y−1 log ‖𝑤 ‖∞+𝐷)𝑐_2Y−2+

√𝑛Y−1 log2 ‖𝑤 ‖∞) = 𝑂 (𝑐 (

√𝑛+𝐷)_2Y−3 log2 ‖𝑤 ‖∞),

which is 𝑂 (𝑐 (√𝑛 + 𝐷)_2Y−3) since ‖𝑤 ‖∞ is polynomially bounded. �


negative arc weights at the cost of mild overheads.

35

If a Monte-Carlo guarantee suces, then we can get a better dependence on Y by using the

overlay network construction of Nanongkai [Nan14].

Theorem 4.8 ([Nan14]). Given any 0 < Y ≤ 1 and any 1 ≤ ℎ ≤ 𝑛, in the Broadcast CONGEST modelan overlay network 𝐺 ′ = (𝑉 ′, 𝐸 ′) of a bidirected graph 𝐺 with positive integer arc weights that hasarc weights𝑤 ′ : 𝐸 ′→ {1, . . . ,𝑂 (𝑛‖𝑤 ‖∞)} together with some additional information for every nodesatisfying the following properties with high probability can be computed in𝑂 ((ℎ

Y+ 𝑛

ℎ) log ‖𝑤 ‖∞ +𝐷)

rounds:

• |𝑉 ′ | = 𝑂 (𝑛ℎ) and 𝑠 ∈ 𝑉 ′.

• For every node 𝑢 ∈ 𝑉 , as soon as 𝑢 receives, for every 𝑣 ∈ 𝑉 ′, a (1 + Y3)-approximation of the

distance from 𝑠 to 𝑣 in 𝐺 ′, it can infer a (1 + Y)-approximation of the distance from 𝑠 to 𝑢 in𝐺 without any additional communication. This approximate distances are correct with highprobability.

Note that this overlay network construction readily works on directed graphs. By setting

ℎ = _√𝑛Y−1/2, we obtain an overlay network with |𝑉 ′ | = 𝑂 (

√Y𝑛_−1) that can be constructed in

𝑂 (_√𝑛Y−3/2 log ‖𝑤 ‖∞ + 𝐷) rounds. Using the same simulation approach as before, we get the

following result.

Corollary 4.9. Given any 0 < Y ≤ 1, in the Broadcast CONGEST model a (1 + Y)-approximation tothe single-source shortest path problem in bidirected graphs with positive polynomially bounded integerarc weights that is correct with high probability can be computed in 𝑂 ((

√𝑛 + 𝐷)_Y−3/2 log ‖𝑤 ‖∞) =

𝑂 ((√𝑛 + 𝐷)_Y−3/2) rounds.

If approximating the distance between two node suces, then both of these nodes can be made

part of the overlay network and we only need to compute the approximate distance between these

two nodes on the overlay network.

Theorem 4.10 (Implicit in [HKN16]). Given any 0 < Y ≤ 1 and two nodes 𝑠 and 𝑡 , in the BroadcastCONGEST model an overlay network 𝐺 ′ = (𝑉 ′, 𝐸 ′) of an undirected graph 𝐺 with positive integeredge weights that has edge weights𝑤 ′ : 𝐸 ′→ {1, . . . ,𝑂 (𝑛‖𝑤 ‖∞)} satisfying the following propertiescan be computed deterministically in 𝑂 (

√𝑛Y−1 log2 ‖𝑤 ‖∞ + 𝐷) rounds:

• |𝑉 ′ | = 𝑂 (√𝑛Y−1 log ‖𝑤 ‖∞) and {𝑠, 𝑡} ⊆ 𝑉 ′.

• A (1 + Y3)-approximation of the distance from 𝑠 to 𝑡 in 𝐺 ′ is a (1 + Y)-approximation of the

distance from 𝑠 to 𝑡 in 𝐺

As we can compute the 𝑠-𝑡 distance on the overlay network by running our transshipment

algorithm with demand −1 on 𝑠 and demand 1 on 𝑡 (and demand 0 on all other nodes), we obtain a

deterministic algorithm for approximating the 𝑠-𝑡 distance.

Corollary 4.11. Given any 0 < Y ≤ 1 and two nodes 𝑠 and 𝑡 , in the Broadcast CONGEST model a(1 + Y)-approximation to the distance from 𝑠 to 𝑡 in a bidirected graph with positive polynomiallybounded integer arc weights can be computed deterministically in 𝑂 ((

√𝑛 + 𝐷) log ‖𝑤 ‖∞_2Y−3) =

𝑂 ((√𝑛 + 𝐷)_2Y−3) rounds.

36

4.3 Multipass Streaming

Model In the Streaming model the input graph is presented to the algorithm edge by edge as a

“stream” without repetitions and the goal is to design algorithms that use as little space as possible.

Space is measured in memory words, where we assume that each word consists of 𝑂 (log𝑛) bits.The weights are assumed to be polynomially bounded in 𝑛, so that they can be encoded using

𝑂 (log𝑛) bits. In the Multipass Streaming model, the algorithm is allowed to make several such

passes over the input stream and the goal is to design algorithms that need only a small number of

passes (and again little space). For graph algorithms, the usual assumption is that the edges of the

input graph are presented to the algorithm in arbitrary order. For bidirected graphs, we assume

that both arcs of each edge are presented consecutively in the stream.

Implementing Algorithm gradient_transshipment In the following we explain how to

implement our gradient descent algorithm for approximating the shortest transshipment.

Theorem 4.12. Given any 0 < Y ≤ 1, in the Multipass Streaming model a (1 + Y)-approximate dualsolution to the transshipment problem in bidirected graphs with positive arc weights can be computeddeterministically in 𝑂 (_2Y−2) passes with a space of 𝑂 (𝑛 log𝑛) words.

Note that we will not explicitly compute a primal solution 𝑥 ∈ R2𝑚 because storing this solution

might take too much space, i.e., we will not implement Lines 14 and 17 of Algorithm 1.

Proof of Theorem 4.12. We will use 𝑂 (𝑛 log𝑛) space to store the following information.

• An 𝑛-dimensional vector 𝑏 for the input demands.

• An 𝑂 (log𝑛)-spanner 𝑆 of 𝐺− = (𝑉 , 𝐸,𝑤−) of size 𝑂 (𝑛 log𝑛).

• An 𝑛-dimensional vector 𝜋 for the dual solution maintained by the gradient descent algorithm.

• An 𝑂 (𝑛 log𝑛)-dimensional vector 𝑓 for the primal solution on the spanner.

• An 𝑛-dimensional vector ℎ for the dual solution on the spanner.

• Scalars𝑚, 𝑛, Y, _, 𝛼 , Y ′, 𝛽 , and 𝛿 .

Additionally, we use “temporary space” of size 𝑂 (𝑛 log𝑛) for implementing individual lines of the

algorithm. At any time, the total space will be 𝑂 (𝑛 log𝑛).We will show that each iteration of Algorithm 1 can be implemented in a single pass and

that the remainder of the algorithm can be implemented in 𝑂 (log𝑛) passes. By Theorem 2.15,

which bounds the number of iterations of Algorithm 1, the bound on the number of passes is then

𝑂 (_2(Y−2 + log𝑛 + log _) log3 𝑛). If _ is as large as polynomial in 𝑛 we can solve the problem in a

polynomial number of passes, i.e., 𝑂 (_) passes, by using any of the polynomial-time algorithms.

Therefore, this simplies to 𝑂 (_2Y−2).At the beginning of the algorithm, we rst compute and store 𝑛,𝑚, and _ in a single pass. We

then construct and store an 𝛼-spanner 𝑆 of𝐺− = (𝑉 , 𝐸,𝑤−), where 𝛼 = 2dlog𝑛e −1, with𝑂 (𝑛 log𝑛)edges in𝑂 (log𝑛) passes with𝑂 (𝑛 log𝑛) temporary space using the algorithm of Corollary 5.2. The

spanner can be used to implement the oracle calls in Lines 1 and 8 using only internal computation

as on the spanner the 𝑂 (𝑛 log𝑛)-dimensional primal solution and the 𝑛-dimensional dual solution

can be determined in 𝑂 (𝑛 log𝑛) space by enumerating all potential solutions and checking for

37

optimality.16

It remains to describe how to implement Lines 5 and 11, where the algorithm needs to

compute (𝑊 −1𝐴𝑇𝜋)max, and Line 7, where the algorithm computes ∇Φ𝛽 (𝜋). It is clear that the restof the algorithm can be implemented by performing only local computation.

To implement Lines 5, 7, and 11, rst observe that, using the denition 𝑠𝑒 (𝜋) := 𝜋𝑣−𝜋𝑢𝑤𝑒

for every

arc 𝑒 ∈ 𝐸, both ∑𝑒∈𝐸 𝑒

𝛽𝑠𝑒 (𝜋 )and (𝑊 −1𝐴𝑇𝜋)max = max{𝑠𝑒 (𝜋) : 𝑒 = (𝑢, 𝑣) ∈ 𝐸} can be computed

in a single pass with 𝑂 (1) temporary space as summation and maximum are associative and

commutative operators: Before the pass, we create temporary variables 𝔰 and 𝔪, both initialized to

0. During the pass, every time we read an arc 𝑒 of weight𝑤𝑒 from the stream, we rst compute 𝑠𝑒 (𝜋)and then update 𝔰 to 𝔰 + 𝑒𝛽𝑠𝑒 (𝜋 ) and 𝔪 to max{𝔪, 𝑠𝑒 (𝜋)}. After the pass, we have 𝔰 =

∑𝑒∈𝐸 𝑒

𝛽𝑠𝑒 (𝜋 )

and 𝔪 = (𝑊 −1𝐴𝑇𝜋)max. Since each component of ∇Φ𝛽 (𝜋) is given by

∇Φ𝛽 (𝜋)𝑣 = (𝐴𝑊 −1∇ lse𝛽(𝑊 −1𝐴𝑇𝜋

))𝑣

=∑︁

𝑒=(𝑢,𝑣) ∈𝐸


𝑤𝑒 ·∑

𝑒′∈𝐸 𝑒𝛽𝑠𝑒′ (𝜋 )−

∑︁𝑒=(𝑣,𝑢) ∈𝐸


𝑤𝑒 ·∑

𝑒′∈𝐸 𝑒𝛽𝑠𝑒′ (𝜋 ),

the same idea can be used to compute ∇Φ𝛽 (𝜋) in a single pass with 𝑂 (1) temporary space, once

𝔰 =∑

𝑒∈𝐸 𝑒𝛽𝑠𝑒 (𝜋 )

is known. �

Implementing Algorithm primal_tree In the following we explain how to implement our

sampling-based algorithm to compute a primal tree solution for the approximate transshipment

problem.

Theorem 4.13. In the Multipass Streaming model one can, with probability at least 1

4, compute

in 𝑂 (_2Y−2) passes with a space of 𝑂 (_𝑛 log2 𝑛) words a (1 + Y)-approximate primal-dual pair ofsolutions to the transshipment problem given a parameter 0 < Y ≤ 1, a bidirected graph with positivepolynomially bounded integer arc weights, and a demand vector 𝑏 such that 𝑏𝑣 ∈ {0, 1} for all𝑣 ∈ 𝑉 \ {𝑠} and with the additional property that the primal solution has non-zero ow only on thearcs of a tree.

Proof. Again, we do not want to explicitly store the primal solution 𝑥 computed at the end of

Algorithm gradient_transshipment to save space. We will be able to retrieve 𝑥𝑒 for every arc 𝑒

“on the y” upon reading 𝑒 from the stream by modifying Algorithm gradient_transshipment to

additionally return 𝑓 ′, 𝜋 , 𝑧, and 𝛽 .We will use space 𝑂 (𝑛_ log𝑛 log ‖𝑤 ‖∞e), which is 𝑂 (𝑛_ log2 𝑛) since ‖𝑤 ‖∞ is polynomially

bounded, to store the following information:

• An arc set

⋃𝑣,𝑘 𝐹𝑣,𝑘 of size 𝑂 (_ log𝑛) for every 𝑣 ∈ 𝑉 and every 𝑘 ∈ {1, . . . , dlog ‖𝑤 ‖∞e}.

• An 𝑂 (log𝑛)-spanner 𝑆 of 𝐺 of size 𝑂 (𝑛 log𝑛).

• An (𝑛 − 1)-dimensional vector 𝑡 for the primal solution on 𝑆 .

• An 𝑛-dimensional vector 𝑦 ′ for the dual solution on 𝑆 .

• A scalar to store dlog ‖𝑤 ‖∞e16Alternatively, we could internally run one of the classic linear-space algorithms for the transshipment problem.

38

• An𝑂 (𝑛 log𝑛)-dimensional vector 𝑓 ′ as computed by the call to the transshipment algorithm.

• An 𝑛-dimensional vector 𝜋 as computed by the call to the transshipment algorithm.

• Scalars 𝛽 and 𝑧 as computed by the call to the transshipment algorithm.

First, observe that Lines 14 to 19 of Algorithm primal_tree can be performed without perform-

ing additional passes. Furthermore, by Theorem 4.12, the call to Algorithm gradient_transshipment

in Line 1 takes 𝑂 (_2(Y−2 + log𝑛 + log _) log3 𝑛) passes and space 𝑂 (𝑛 log𝑛). As before, we canimplement Line 13 by constructing and storing an 𝛼-spanner 𝑆 , where 𝛼 = 𝑂 (log𝑛), with𝑂 (𝑛 log𝑛)edges in 𝑂 (log𝑛) passes with 𝑂 (𝑛 log𝑛) temporary space using the algorithm of Corollary 5.2.

It remains to explain the implementation of the sampling process in Lines 2 to 12. Before the

sampling starts, we compute dlog ‖𝑤 ‖∞e in a single pass. We rst explain how to compute 𝑥𝑒 (the

primal solution for 𝑒 in the call to Algorithm gradient_transshipment in Line 1) upon reading 𝑒

from the stream. Remember that

𝑥𝑒 = (𝑊 −1∇ lse𝛽(𝑊 −1𝐴𝑇𝜋

))𝑒 =


𝑤𝑒

∑𝑒′∈𝐸 𝑒𝛽𝑠𝑒′ (𝜋 )

,

and 𝑥𝑒 =�̃�𝑒+𝑓 ′𝑒𝑧

. We can therefore rst compute 𝔰 =∑

𝑒∈𝐸 𝑒𝛽𝑠𝑒 (𝜋 )

in an initial pass and later on

instantly compute 𝑥𝑒 using the formulas above.

We will perform the sampling in parallel for each 𝑣 ∈ 𝑉 and for each 𝑘 ∈ {1, . . . dlog ‖𝑤 ‖∞e}in 𝑂 (log(𝑛‖𝑤 ‖∞)) passes, which is 𝑂 (log𝑛) since ‖𝑤 ‖ is polynomially bounded. Every time we

read an arc 𝑒 from the stream, we compute the unique 𝑘 for which𝑤𝑒 ∈ [2𝑘−1, 2𝑘 ). We spend the

rst pass to compute


𝑥𝑒 for each 𝑣 ∈ 𝑉 and each 𝑘 ∈ {1, . . . dlog ‖𝑤 ‖∞e}. We spend the

remaining dlog(4𝑛dlog ‖𝑤 ‖∞e)e passes to repeatedly sample a set 𝐹𝑣,𝑘 for each 𝑣 ∈ 𝑉 and each

𝑘 ∈ {1, . . . dlog ‖𝑤 ‖∞e}. Note that the sampling probability 𝑝𝑒 can be computed instantly upon

reading 𝑒 from the stream. �

ImplementingAlgorithm sssp Finally, we explain how to implement our adaptive-demands

algorithm for computing an approximate single-source shortest path tree.

Theorem 4.14. Given any 0 < Y ≤ 1, in the Multipass Streaming model a (1 + Y)-approximationto the single-source shortest path problem in bidirected graphs with positive polynomially boundedinteger arc weights can be computed in 𝑂 (𝑐_2Y−2) passes with probability 1 − 𝑛−𝑐 for every 𝑐 ≥ 1 andwith a space of 𝑂 (𝑛_ log2 𝑛)).

Proof. To implement Algorithm sssp, we will, in addition to the streaming implementation of Algo-

rithm primal_tree, use space𝑂 (𝑛 log𝑛 log ‖𝑤 ‖∞), which is𝑂 (𝑛 log2 𝑛) since ‖𝑤 ‖∞ is polynomially

bounded, to store the following information:

• An arc set 𝐸 ′ of size 𝑂 (𝑛 log ‖𝑤 ‖∞).

• A tree 𝑇 consisting of 𝑛 − 1 arcs.

• An 𝑛-dimensional vector 𝑦.

• An 𝑛-dimensional vector 𝑏.

39

By Theorem 3.13, the number of iterations of the while-loop is 𝑂 (𝑐 log ‖𝑤 ‖∞) with probability

1 − 𝑛−𝑐 . Observe that Line 3, which calls the algorithm for computing a primal tree solution, is

the only place in the algorithm where reading from the stream is necessary. Thus, the number

of passes for implementing Algorithm sssp exceeds those for computing a primal tree solution

(Theorem 4.13) by a factor of 𝑂 (𝑐 log ‖𝑤 ‖∞) with probability 1 − 𝑛−𝑐 . Furthermore, the additional

space we need is dominated by the size of 𝐸 ′ which is 𝑂 (𝑛 log ‖𝑤 ‖∞) (as in each iteration a tree of

size 𝑛 − 1 is added to 𝐸 ′). �


negative arc weights at the cost of mild overheads. Note that in the literature the approximate

SSSP problem has also been studied in a relaxation of the Multipass Streaming model that does

not require the weights to be polynomially bounded by increasing the word size to Θ(log𝑛 +log ‖𝑤 ‖∞) [EN16]. By a reduction of Klein and Subramanian [KS97], that has been adapted to

the Multipass Streaming model by Elkin and Neiman [EN16], we still may assume without loss

of generality that our approximate SSSP algorithm is called on a graph with largest arc weight

‖𝑤 ‖∞ = 𝑂 (𝑛Y−1). Furthermore, we may assume that Y−1 = 𝑂 (𝑛) as our running time anyway

depends on Y−1 and thus in the regime Y−1 = Ω(𝑛) we might as well compute a solution with

the Bellman-Ford algorithm, which takes 𝑂 (𝑛) passes and 𝑂 (𝑛) space. In this way we arrive at

polynomially bounded arc weights again.

5 Deterministic Spanner Computation in Congested Clique andMultipass Streaming Model

For 𝑘 ∈ Z>0, a simple and elegant randomized algorithm computing a (2𝑘 − 1)-spanner with𝑂 (𝑘𝑛1+1/𝑘 ) edges in expectation was given by Baswana and Sen [BS07]. For the sake of completeness,

we restate it here.

1. Initially, each node is a singleton cluster : 𝑅1 := {{𝑣} | 𝑣 ∈ 𝑉 }.

2. For 𝑖 = 1, . . . , 𝑘 − 1 do:

(a) Each cluster from 𝑅𝑖 is marked independently with probability 𝑛−1/𝑘 . 𝑅𝑖+1 is dened to

be the set of clusters marked in phase 𝑖 .

(b) If 𝑣 is a node in an unmarked cluster:

i. Dene 𝑄𝑣 to be the set of edges that consists of the lightest edge from 𝑣 to each

cluster in 𝑅𝑖 it is adjacent to.

ii. If 𝑣 is not adjacent to any marked cluster, all edges in 𝑄𝑣 are added to the spanner.

iii. Otherwise, let 𝑢 be the closest neighbor of 𝑣 in a marked cluster. In this case, 𝑣 adds

to the spanner the edge {𝑣,𝑢} and all edges {𝑣,𝑤} ∈ 𝑄𝑣 with𝑤 (𝑣,𝑤) < 𝑤 (𝑣,𝑢) (breakties by neighbor identiers). Also, let 𝑋 be the cluster of 𝑢. Then 𝑋 := 𝑋 ∪ {𝑣}, i.e.,𝑣 joins the cluster of 𝑢.

3. Each 𝑣 ∈ 𝑉 adds, for each 𝑋 ∈ 𝑅𝑘 it is adjacent to, the lightest edge connecting it to 𝑋 to the

spanner.

40

It is easy to see that the expected number of edges that the algorithm selects into the spanner is

𝑂 (𝑘𝑛1+1/𝑘 ): In each iteration, each node 𝑣 sorts its incident clusters in order of ascending weight

of the lightest edge to them and elects for each cluster, up to the rst sampled one, the respective

lightest edge into the spanner. Because this order is independent of the randomness used in this

iteration, 𝑣 selects 𝑂 (𝑛1/𝑘 ) edges in expectation and 𝑂 (𝑛1/𝑘 log𝑛) edges with high probability.17

The same bound applies to the nal step, as |𝑅𝑘 | ∈ 𝑂 (𝑛1/𝑘 ) in expectation and |𝑅𝑘 | ∈ 𝑂 (𝑛1/𝑘 log𝑛)with high probability. Moreover, this observation provides a straightforward derandomization of

the algorithm applicable in our model: Instead of picking 𝑅𝑖+1 in iteration 𝑖 randomly, we consider

the union 𝐸𝑖 over all nodes 𝑣 of the lightest 𝑂 (𝑛1/𝑘 log𝑛) edges in 𝑄𝑣 . By a union bound, with

high probability we can select 𝑅𝑖+1 such that (i) |𝑅𝑖+1 | ≤ 𝑛−1/𝑘 |𝑅𝑖 | and (ii) each node selects only

𝑂 (𝑛1/𝑘 log𝑛) edges into the spanner in this iteration. In particular, such a choice must exist, and it

can be computed from 𝑅𝑖 and 𝐸𝑖 alone. With this argument we deterministically obtain a spanner

of size 𝑂 (𝑘𝑛1+1/𝑘 log𝑛).

1. Initially, each node is a singleton cluster : 𝑅1 := {{𝑣} | 𝑣 ∈ 𝑉 }.

2. For 𝑖 = 1, . . . , 𝑘 − 1 do for each node 𝑣 :

(a) Dene 𝑄𝑣 to be the set of edges that consists of the lightest edge from 𝑣 to each cluster

in 𝑅𝑖 it is adjacent to.

(b) Broadcast the set 𝑄 ′𝑣 of the lightest 𝑂 (𝑛1/𝑘 log𝑛) edges in 𝑄𝑣 .

(c) For 𝑤 ∈ 𝑉 , denote by 𝑋𝑤 ∈ 𝑅𝑖 the cluster so that 𝑣 ∈ 𝑋𝑤 . Locally compute 𝑅𝑖+1 ⊆ 𝑅𝑖such that (i) |𝑅𝑖+1 | ≤ 𝑛−1/𝑘 |𝑅𝑖 | and (ii) for each𝑤 ∈ 𝑉 for which 𝑄 ′𝑣 ≠ 𝑄𝑣 , it holds that

𝑋𝑤 ∈ 𝑅𝑖+1 or 𝑄 ′𝑣 contains an edge connecting to some 𝑋 ∈ 𝑅𝑖+1.(d) Update clusters and add edges to the spanner as the original algorithm would, but for

the computed choice of 𝑅𝑖+1.

3. Each 𝑣 ∈ 𝑉 adds, for each 𝑋 ∈ 𝑅𝑘 it is adjacent to, the lightest edge connecting it to 𝑋 .

A slightly stronger bound of𝑂 (𝑘𝑛1+1/𝑘 ) on the size of the spanner, matching the expected value

from above up to constant factors, can be obtained in this framework by using the technique of

deterministically nding early hitting sets developed by Roditty, Thorup, and Zwick [RTZ05].

Note that, as argued above, the selection of 𝑅𝑖+1 in Step 2(c) is always possible and can be done

deterministically, provided that 𝑅𝑖 is known. Because 𝑅1 is simply the set of singletons and each

node computes 𝑅𝑖+1 from 𝑅𝑖 and the 𝑄 ′𝑣 , this holds by induction. We arrive at the following result.

Corollary 5.1 (of [BS07] and [RTZ05]). For 𝑘 ∈ Z>0, in the broadcast congested clique a (2𝑘 − 1)-spanner of size 𝑂 (𝑘𝑛1+1/𝑘 ) can be deterministically computed and made known to all nodes in𝑂 (𝑘𝑛1/𝑘 log𝑛) rounds.

Proof. The bound of 𝛼 = 2𝑘 − 1 follows from the analysis in [BS07], which does not depend on

the choice of the 𝑅𝑖 . For the round complexity, observe that all computations can be performed

locally based on knowing 𝑄 ′𝑣 for all nodes 𝑣 ∈ 𝑉 in each iteration. As |𝑄 ′𝑣 | ∈ 𝑂 (𝑛1/𝑘 log𝑛) for eachiteration and each 𝑣 , the claim follows. �

17That is, for any xed constant choice of 𝑐 > 0, the number of selected edges is bounded by 𝑂 (𝑛1/𝑘 log𝑛) with

probability at least 1 − 1/𝑛𝑐 .

41

By similar arguments, we can also get an algorithm in the multipass streaming algorithm with

comparable guarantees. We remark that, apart from the property of being deterministic, this was

already stated in [Bas08] as a simple consequence of [BS07].

Corollary 5.2 (of [BS07] and [RTZ05]). For 𝑘 ∈ Z>0, in the multipass streaming model a (2𝑘 − 1)-spanner of size 𝑂 (𝑘𝑛1+1/𝑘 ) can be deterministically computed in 𝑘 passes using 𝑂 ((𝑘 + log𝑛)𝑛1+1/𝑘 )space.

References

[ABP18] Amir Abboud, Greg Bodwin, and Seth Pettie. “A Hierarchy of Lower Bounds for

Sublinear Additive Spanners”. In: SIAM Journal on Computing 47.6 (2018), pp. 2203–

2236. doi: 10.1137/16M1105815. arXiv: 1607.07497 (cit. on p. 6).

[AMO93] Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. Network ows - theory,algorithms and applications. Prentice Hall, 1993. isbn: 978-0-13-617549-0 (cit. on p. 21).

[Bar98] Yair Bartal. “On Approximating Arbitrary Metrices by Tree Metrics”. In: Proc. ofthe ACM Symposium on the Theory of Computing (STOC). 1998, pp. 161–168. doi:10.1145/276698.276725 (cit. on p. 4).

[Bas08] Surender Baswana. “Streaming algorithm for graph spanners - single pass and constant

processing time per edge”. In: Information Processing Letters 106.3 (2008), pp. 110–114.doi: 10.1016/j.ipl.2007.11.001 (cit. on pp. 7, 42).

[Bel58] Richard Bellman. “On a Routing Problem”. In: Quarterly of Applied Mathematics 16.1(1958), pp. 87–90 (cit. on pp. 5, 6).

[Ber09] Aaron Bernstein. “Fully Dynamic (2+Y) Approximate All-Pairs Shortest Paths with Fast

Query and Close to Linear Update Time”. In: Proc. of the IEEE Symposium on Foundationsof Computer Science (FOCS). 2009, pp. 693–702. doi: 10.1109/FOCS.2009.16 (cit. onp. 6).

[Ble+16] Guy E. Blelloch, Yan Gu, Yihan Sun, and Kanat Tangwongsan. “Parallel Shortest Paths

Using Radius Stepping”. In: Proc. of the ACM Symposium on Parallelism in Algorithmsand Architectures (SPAA). 2016, pp. 443–454. doi: 10.1145/2935764.2935765. arXiv:1602.03881 (cit. on p. 3).

[BS07] Surender Baswana and Sandeep Sen. “A Simple and Linear Time Randomized Algo-

rithm for Computing Sparse Spanners in Weighted Graphs”. In: Random Structures &Algorithms 30.4 (2007). Announced at ICALP ’03, pp. 532–563. doi: 10.1002/rsa.20130(cit. on pp. 10, 31, 40–42).

[BTZ98] Gerth Stølting Brodal, Jesper Larsson Trä, and Christos D. Zaroliagis. “A Parallel

Priority Queue with Constant Time Operations”. In: Journal of Parallel and DistributedComputing 49.1 (1998). Announced at IPPS ’97, pp. 4–21. doi: 10.1006/jpdc.1998.1425(cit. on p. 3).

[Cen+19a] Keren Censor-Hillel, Michal Dory, Janne H. Korhonen, and Dean Leitersdorf. “Fast

Approximate Shortest Paths in the Congested Clique”. In: Proc. of the ACM Symposiumon Principles of Distributed Computing (PODC). 2019, pp. 74–83. doi: 10.1145/3293611.3331633. arXiv: 1903.05956 (cit. on p. 7).

42

https://doi.org/10.1137/16M1105815

http://arxiv.org/abs/1607.07497

https://doi.org/10.1145/276698.276725

https://doi.org/10.1016/j.ipl.2007.11.001

https://doi.org/10.1109/FOCS.2009.16

https://doi.org/10.1145/2935764.2935765


https://doi.org/10.1002/rsa.20130

https://doi.org/10.1006/jpdc.1998.1425

https://doi.org/10.1145/3293611.3331633

https://doi.org/10.1145/3293611.3331633


[Cen+19b] Keren Censor-Hillel, Petteri Kaski, Janne H. Korhonen, Christoph Lenzen, Ami Paz, and

Jukka Suomela. “Algebraic methods in the congested clique”. In: Distributed Computing32.6 (2019). Announced at PODC ’15, pp. 461–478. doi: 10.1007/s00446-016-0270-2.

arXiv: 1503.04963 (cit. on pp. 3, 7).

[Chr+11] Paul Christiano, Jonathan A. Kelner, Aleksander Mądry, Daniel A. Spielman, and

Shang-Hua Teng. “Electrical ows, Laplacian systems, and faster approximation of

maximum ow in undirected graphs”. In: Proc. of the ACM Symposium on Theoryof Computing (STOC). 2011, pp. 273–282. doi: 10.1145/1993636.1993674. arXiv:1010.2921 (cit. on p. 3).

[Coh+17] Michael B. Cohen, Aleksander Madry, Piotr Sankowski, and Adrian Vladu. “Negative-

Weight Shortest Paths and Unit Capacity Minimum Cost Flow in �̃� (𝑚10/7log𝑊 )

Time”. In: Proc. of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 2017. doi:10.1137/1.9781611974782.48. arXiv: 1605.01717 (cit. on p. 3).

[Coh00] Edith Cohen. “Polylog-Time and Near-Linear Work Approximation Scheme for Undi-

rected Shortest Paths”. In: Journal of the ACM 47.1 (2000). Announced at STOC ’94,

pp. 132–166. doi: 10.1145/331605.331610 (cit. on pp. 3, 6).

[Coh97] Edith Cohen. “Using Selective Path-Doubling for Parallel Shortest-Path Computations”.

In: Journal of Algorithms 22.1 (1997). Announced at ISTCS ’93, pp. 30–56. doi: 10.

1006/jagm.1996.0813 (cit. on p. 3).

[Das+12] Atish Das Sarma, Stephan Holzer, Liah Kor, Amos Korman, Danupon Nanongkai,

Gopal Pandurangan, David Peleg, and Roger Wattenhofer. “Distributed Verication

and Hardness of Distributed Approximation”. In: SIAM Journal on Computing 41.5

(2012). Announced at STOC ’11, pp. 1235–1265. doi: 10 . 1137 / 11085178X. arXiv:

1011.3049 (cit. on pp. 4, 6, 34).

[DS08] Samuel I. Daitch and Daniel A. Spielman. “Faster approximate lossy generalized ow

via interior point algorithms”. In: Proc. of the ACM Symposium on Theory of Computing(STOC). 2008, pp. 451–460. doi: 10.1145/1374376.1374441. arXiv: 0803.0988 (cit. onpp. 3, 5).

[EK72] Jack Edmonds and Richard M. Karp. “Theoretical Improvements in Algorithmic E-

ciency for Network Flow Problems”. In: Journal of the ACM 19.2 (1972), pp. 248–264.

doi: 10.1145/321694.321699 (cit. on p. 5).

[Elk+08] Michael Elkin, Yuval Emek, Daniel A. Spielman, and Shang-Hua Teng. “Lower-Stretch

Spanning Trees”. In: SIAM Journal on Computing 38.2 (2008). Announced at STOC ’05,

pp. 608–628. doi: 10.1137/050641661. arXiv: cs/0411064 (cit. on p. 4).

[Elk06] Michael Elkin. “An Unconditional Lower Bound on the Time-Approximation Trade-o

for the Distributed Minimum Spanning Tree Problem”. In: SIAM Journal on Computing36.2 (2006). Announced at STOC ’04, pp. 433–456. doi: 10.1137/S0097539704441058

(cit. on pp. 3, 4, 6).

[Elk11] Michael Elkin. “Streaming and Fully Dynamic Centralized Algorithms for Constructing

and Maintaining Sparse Spanners”. In: ACM Transactions on Algorithms 7.2 (2011).

Announced at ICALP ’07, 20:1–20:17. doi: 10.1145/1921659.1921666 (cit. on p. 7).

43

https://doi.org/10.1007/s00446-016-0270-2


https://doi.org/10.1145/1993636.1993674


https://doi.org/10.1137/1.9781611974782.48


https://doi.org/10.1145/331605.331610

https://doi.org/10.1006/jagm.1996.0813


https://doi.org/10.1137/11085178X


https://doi.org/10.1145/1374376.1374441


https://doi.org/10.1145/321694.321699

https://doi.org/10.1137/050641661

http://arxiv.org/abs/cs/0411064

https://doi.org/10.1137/S0097539704441058

https://doi.org/10.1145/1921659.1921666

[Elk20] Michael Elkin. “Distributed Exact Shortest Paths in Sublinear Time”. In: Journal of theACM 67.3 (2020). Announced at STOC ’17, 15:1–15:36. doi: 10.1145/3387161. arXiv:

1703.01939 (cit. on pp. 3, 6).

[EN16] Michael Elkin and Ofer Neiman. “Hopsets with Constant Hopbound, and Applications

to Approximate Shortest Paths”. In: Proc. of the Symposium on Foundations of ComputerScience (FOCS). 2016, pp. 128–137. doi: 10.1109/FOCS.2016.22. arXiv: 1605.04538(cit. on pp. 3, 5–7, 33, 40).

[EN19] Michael Elkin and Ofer Neiman. “Linear-Size Hopsets with Small Hopbound, and

Constant-Hopbound Hopsets in RNC”. In: Proc. of the ACM Symposium on Parallelismin Algorithms and Architectures (SPAA). 2019, pp. 333–341. doi: 10.1145/3323165.3323177 (cit. on p. 6).

[EZ06] Michael Elkin and Jian Zhang. “Ecient algorithms for constructing (1+Y, 𝛽)-spannersin the distributed and streamingmodels”. In:Distributed Computing 18.5 (2006), pp. 375–385. doi: 10.1007/s00446-005-0147-2 (cit. on p. 7).

[Fei+05] Joan Feigenbaum, Sampath Kannan, AndrewMcGregor, Siddharth Suri, and Jian Zhang.

“On graph problems in a semi-streaming model”. In: Theoretical Computer Science 348.2-3 (2005). Announced at ICALP ’04, pp. 207–216. doi: 10.1016/j.tcs.2005.09.013

(cit. on p. 7).

[Fei+08] Joan Feigenbaum, Sampath Kannan, AndrewMcGregor, Siddharth Suri, and Jian Zhang.

“GraphDistances in the Data-StreamModel”. In: SIAM Journal on Computing 38.5 (2008).Announced at SODA ’05, pp. 1709–1727. doi: 10.1137/070683155 (cit. on p. 7).

[FN18] Sebastian Forster and Danupon Nanongkai. “A Faster Distributed Single-Source Short-

est Paths Algorithm”. In: Proc. of the IEEE Symposium on Foundations of ComputerScience (FOCS). 2018, pp. 686–697. doi: 10.1109/FOCS.2018.00071. arXiv: 1711.01364(cit. on pp. 3, 8).

[For56] Lester R. Ford. Network Flow Theory. Tech. rep. P-923. The RAND Corporation, 1956

(cit. on pp. 5, 6).

[FT87] Michael L. Fredman and Robert Endre Tarjan. “Fibonacci heaps and their uses in

improved network optimization algorithms”. In: Journal of the ACM 34.3 (1987). An-

nounced at FOCS ’84, pp. 596–615. doi: 10.1145/28869.28874 (cit. on p. 3).

[GL18] Mohsen Ghaari and Jason Li. “Improved distributed algorithms for exact shortest

paths”. In: Proc. of the ACM SIGACT Symposium on Theory of Computing (STOC). 2018,pp. 431–444. doi: 10.1145/3188745.3188948. arXiv: 1712.09121 (cit. on p. 3).

[GO16] Venkatesan Guruswami and Krzysztof Onak. “Superlinear Lower Bounds for Multipass

Graph Processing”. In: Algorithmica 76.3 (2016). Announced at CCC ’13, pp. 654–683.

doi: 10.1007/s00453-016-0138-7. arXiv: 1212.6925 (cit. on pp. 5, 7).

[Gol95] Andrew V. Goldberg. “Scaling Algorithms for the Shortest Paths Problem”. In: SIAMJournal on Computing 24.3 (1995). Announced at SODA ’93, pp. 494–504. doi: 10.1137/

S0097539792231179 (cit. on p. 5).

[Han+17] Thomas Dueholm Hansen, Haim Kaplan, Robert E. Tarjan, and Uri Zwick. “Hollow

Heaps”. In: ACM Trans. Algorithms 13.3 (2017). Announced at ICALP ’15, 42:1–42:27.

doi: 10.1145/3093240. arXiv: 1510.06535 (cit. on p. 3).

44

https://doi.org/10.1145/3387161




https://doi.org/10.1145/3323165.3323177

https://doi.org/10.1145/3323165.3323177

https://doi.org/10.1007/s00446-005-0147-2

https://doi.org/10.1016/j.tcs.2005.09.013

https://doi.org/10.1137/070683155



https://doi.org/10.1145/28869.28874

https://doi.org/10.1145/3188745.3188948


https://doi.org/10.1007/s00453-016-0138-7


https://doi.org/10.1137/S0097539792231179

https://doi.org/10.1137/S0097539792231179

https://doi.org/10.1145/3093240


[HKN16] Monika Henzinger, Sebastian Krinninger, and Danupon Nanongkai. “A deterministic

almost-tight distributed algorithm for approximating single-source shortest paths”. In:

Proc. of the ACM SIGACT Symposium on Theory of Computing (STOC). 2016, pp. 489–498.doi: 10.1145/2897518.2897638. arXiv: 1504.07056 (cit. on pp. 3, 4, 6, 7, 34–36).

[HKN18] Monika Henzinger, Sebastian Krinninger, and Danupon Nanongkai. “Decremental

Single-Source Shortest Paths on Undirected Graphs in Near-Linear Total Update Time”.

In: Journal of the ACM 65.6 (2018). Announced at FOCS ’14, 36:1–36:40. doi: 10.1145/

3218657. arXiv: 1512.08148 (cit. on p. 6).

[HP19] Shang-En Huang and Seth Pettie. “Thorup-Zwick emulators are universally optimal

hopsets”. In: Information Processing Letters 142 (2019), pp. 9–13. doi: 10.1016/j.ipl.2018.10.001. arXiv: 1705.00327 (cit. on p. 6).

[Kel+14] Jonathan A. Kelner, Yin Tat Lee, Lorenzo Orecchia, and Aaron Sidford. “An Almost-

Linear-Time Algorithm for Approximate Max Flow in Undirected Graphs, and its

Multicommodity Generalizations”. In: Proc. of the ACM-SIAM Symposium on DiscreteAlgorithms (SODA). 2014, pp. 217–226. doi: 10.1137/1.9781611973402.16. arXiv:1304.2338 (cit. on p. 3).

[KS97] Philip N. Klein and Sairam Subramanian. “A Randomized Parallel Algorithm for Single-

Source Shortest Paths”. In: Journal of Algorithms 25.2 (1997). Announced at STOC ’92,

pp. 205–220. doi: 10.1006/jagm.1997.0888 (cit. on pp. 3, 33, 40).

[KV00] Bernhard Korte and Jens Vygen. Combinatorial Optimization. Springer, 2000 (cit. onp. 5).

[Le 16] François Le Gall. “Further Algebraic Algorithms in the Congested Clique Model and

Applications to Graph-Theoretic Problems”. In: Proc. of the International Symposium onDistributed Computing (DISC). 2016, pp. 57–70. doi: 10.1007/978-3-662-53426-7_5.arXiv: 1608.02674 (cit. on pp. 3, 7).

[Lot+05] Zvi Lotker, Boaz Patt-Shamir, Elan Pavlov, and David Peleg. “Minimum-Weight Span-

ning Tree Construction in 𝑂 (log log𝑛) Communication Rounds”. In: SIAM Journalon Computing 35.1 (2005). Announced at SPAA ’03, pp. 120–131. doi: 10 . 1137 /

S0097539704441848 (cit. on p. 6).

[LP13] Christoph Lenzen and Boaz Patt-Shamir. “Fast Routing Table Construction Using Small

Messages”. In: Proc. of the ACM Symposium on Theory of Computing (STOC). 2013,pp. 381–390. doi: 10.1145/2488608.2488656. arXiv: 1210.5774 (cit. on p. 6).

[LS14] Yin Tat Lee and Aaron Sidford. “Path FindingMethods for Linear Programming: Solving

Linear Programs in �̃� (√𝑟𝑎𝑛𝑘) Iterations and Faster Algorithms for Maximum Flow”.

In: Proc. of the IEEE Symposium on Foundations of Computer Science (FOCS). 2014,pp. 424–433. doi: 10.1109/FOCS.2014.52 (cit. on pp. 3, 5).

[Mąd13] Aleksander Mądry. “Navigating Central Path with Electrical Flows: From Flows to

Matchings, and Back”. In: Proc. of the IEEE Symposium on Foundations of ComputerScience (FOCS). 2013, pp. 253–262. doi: 10.1109/FOCS.2013.35. arXiv: 1307.2205(cit. on p. 3).

45

https://doi.org/10.1145/2897518.2897638


https://doi.org/10.1145/3218657

https://doi.org/10.1145/3218657





https://doi.org/10.1137/1.9781611973402.16



https://doi.org/10.1007/978-3-662-53426-7_5


https://doi.org/10.1137/S0097539704441848

https://doi.org/10.1137/S0097539704441848

https://doi.org/10.1145/2488608.2488656





[Mil+15] Gary L. Miller, Richard Peng, Adrian Vladu, and Shen Chen Xu. “Improved Parallel

Algorithms for Spanners and Hopsets”. In: Proc. of the ACM Symposium on Parallelismin Algorithms and Architectures (SPAA). 2015, pp. 192–201. doi: 10.1145/2755573.2755574. arXiv: 1309.3545 (cit. on pp. 3, 6).

[MS03] Ulrich Meyer and Peter Sanders. “Δ-stepping: a parallelizable shortest path algorithm”.

In: Journal of Algorithms 49.1 (2003). Announced at ESA ’98, pp. 114–152. doi: 10.

1016/S0196-6774(03)00076-2 (cit. on p. 3).

[Nan14] Danupon Nanongkai. “Distributed Approximation Algorithms for Weighted Shortest

Paths”. In: Proc. of the ACM Symposium on Theory of Computing (STOC). 2014, pp. 565–573. doi: 10.1145/2591796.2591850. arXiv: 1403.5171 (cit. on pp. 5, 6, 36).

[Orl93] James B. Orlin. “A Faster Strongly Polynomial Minimum Cost Flow Algorithm”. In:

Operations Research 41.2 (1993). Announced at STOC ’88, pp. 338–350. doi: 10.1287/

opre.41.2.338 (cit. on p. 5).

[Pel00] David Peleg. Distributed Computing: A Locality-Sensitive Approach. Philadelphia, PA:SIAM, 2000 (cit. on pp. 6, 34).

[RTZ05] Liam Roditty, Mikkel Thorup, and Uri Zwick. “Deterministic Constructions of Ap-

proximate Distance Oracles and Spanners”. In: Proc. of the International Colloquiumon Automata, Languages and Programming (ICALP). 2005, pp. 261–272. doi: 10.1007/11523468_22 (cit. on pp. 41, 42).

[Sch03] Alexander Schrijver. Combinatorial Optimization. Springer, 2003 (cit. on p. 5).

[She13] Jonah Sherman. “Nearly Maximum Flows in Nearly Linear Time”. In: Proc. of theIEEE Symposium on Foundations of Computer Science (FOCS). 2013, pp. 263–269. doi:10.1109/FOCS.2013.36. arXiv: 1304.2077 (cit. on pp. 3, 13).

[She17] Jonah Sherman. “Generalized Preconditioning and Network Flow Problems”. In: Proc.of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 2017, pp. 772–780. doi:10.1137/1.9781611974782.49. arXiv: 1606.07425 (cit. on p. 5).

[Spe97] Thomas H. Spencer. “Time-work tradeos for parallel algorithms”. In: Journal of theACM 44.5 (1997). Announced at SODA ’91 and SPAA ’91, pp. 742–778. doi: 10.1145/

265910.265923 (cit. on pp. 3, 6).

[SS99] Hanmao Shi and Thomas H. Spencer. “Time-Work Tradeos of the Single-Source

Shortest Paths Problem”. In: Journal of Algorithms 30.1 (1999), pp. 19–32. doi: 10.1006/jagm.1998.0968 (cit. on p. 3).

[Tho99] Mikkel Thorup. “Undirected Single-Source Shortest Paths with Positive IntegerWeights

in Linear Time”. In: Journal of the ACM 46.3 (1999). Announced at FOCS ’97, pp. 362–

394. doi: 10.1145/316542.316548 (cit. on p. 3).

46

https://doi.org/10.1145/2755573.2755574

https://doi.org/10.1145/2755573.2755574


https://doi.org/10.1016/S0196-6774(03)00076-2

https://doi.org/10.1016/S0196-6774(03)00076-2

https://doi.org/10.1145/2591796.2591850


https://doi.org/10.1287/opre.41.2.338

https://doi.org/10.1287/opre.41.2.338

https://doi.org/10.1007/11523468_22

https://doi.org/10.1007/11523468_22



https://doi.org/10.1137/1.9781611974782.49


https://doi.org/10.1145/265910.265923

https://doi.org/10.1145/265910.265923



https://doi.org/10.1145/316542.316548

near-optimal approximate shortest paths and …2. broadcast congested clique model: „1 +...

Documents