estimating primary demand in bike-sharing systemsjaillet/general/ssrn-id3311371.pdfgoh, yan, and...
TRANSCRIPT
Estimating Primary Demand in Bike-sharing Systems
Chong Yang GohOperations Research Center, Massachusetts Institute of Technology
Chiwei YanOperations Research Center, Massachusetts Institute of Technology
Patrick JailletOperations Research Center, Massachusetts Institute of Technology
We consider the problem of estimating the primary (first-choice) demand in a bike-sharing system. Such
estimates are crucial for various operational decisions, such as the allocation and rebalancing of bikes to
meet demand over a planning horizon. A key challenge in estimating the demand is to account for the
choice substitution effect, where a commuter may substitute a trip for close alternatives when the first-choice
location for picking up or returning a bike is not available. In this paper, we propose a method for estimating
the primary demand using a rank-based demand model. The model accounts for choice substitutions by
treating each observed trip as the best available option in a latent ranking over origin-destination (OD)
pairs. To solve the high-dimensional estimation problem that arises from the combinatorial growth of origin-
destination rankings, we propose algorithms that (i) find sparse representations of the model parameters
efficiently, and (ii) restrict station substitutions spatially according to the bike-sharing network. We prove
consistency results of the model and develop efficient, first-order methods based on difference of convex
(DCP) programming. Our approach is tractable on a city scale, which we demonstrate on a bike-sharing
system in the Boston metropolitan area. Experimental evaluations on both simulated and real-world data sets
show that our method can significantly reduce biases in ridership prediction and primary demand estimation,
which results in increased ridership when used as inputs to an operational model for bike allocation.
Key words : bike-sharing, rank-based choice models, demand estimation
History : January 6th, 2019, this version
1. Introduction
Over the past decade, bike-sharing services have seen widespread adoption in cities across the world
in response to urban congestion. As of 2018, there are approximately 1,500 bike-sharing services in
operation globally1, spanning continents from North America, Europe to Asia. Most bike-sharing
services operate by providing a fleet of rentable bikes that are distributed over a fixed network of
1 Based on data compiled by www.bikesharingmap.com, as of February 2018.
1
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems2
stations, each capable of holding a certain number of bikes at a given time. A commuter can pick
up a bike from any station for temporary use and drop it off later by parking it in an available
dock at the destination station.
In operating a bike-sharing system, a fundamental challenge is to match the supply of bikes
or docks with the demand for rides across location and time. Without timely intervention, the
asymmetry between pick-up and drop-off demand can often result in the quick depletion of either
bikes or docks, rendering the stations out of service during peak hours. At best, such temporary
outage is an inconvenience to the commuters who have to find alternatives at nearby stations;
worse, it turns them away and result in ridership losses over time. To mitigate this supply-demand
imbalance, operators often carry out overnight or periodic relocation of the bikes from one station
to another—a costly operation known as bike rebalancing. The success of these operations crucially
depends on the operator’s ability to estimate the ridership demand accurately over the planning
horizon, which motivates the topic of our paper.
In this paper, we focus on the problem of estimating the primary demand in a bike-sharing
system. Here, the primary demand refers to the demand that would be observed if the service
never goes out anywhere—or equivalently, if every pick-up or drop-off request by a potential com-
muter is fulfilled at the station of her first choice. For operational purposes, the demand is usually
quantified by the bike outflow and inflow rates at individual stations, or the flow rates between
origin-destination pairs. Estimating these flow rates by naively aggregating past trip data, however,
can be problematic for two reasons: demand censoring and choice substitution—both can introduce
significant biases in the estimates if not properly corrected for. Demand censoring occurs when
stations are temporarily out of bikes or docks, during which there is apparently zero demand. This
can be corrected, to some degree, by filtering the time periods of outage before aggregating the
trip data, as done in O’Mahony and Shmoys (2015). But it is more difficult to account for choice
substitution, which occurs when commuters substitute their first choices for other alternatives due
to service unavailability. Choice substitution in bike-sharing can happen in two ways: commuters
may decide to use available stations nearby instead of their first choices—in which case, the demand
is recaptured elsewhere; otherwise, commuters may leave the system for external alternatives, such
as other modes of transportation, hence the demand is said to be spilled or lost. Both factors can
contribute to either overestimation (by recaptures) or underestimation (by spills) of the primary
demand.
A standard approach for analyzing substitution behaviors is to use a choice model, which specifies
the likelihood that a customer makes a certain choice when presented with a set of alternatives or
offer set. Analogous to the retail settings in which such models are most commonly applied, we can
view each station in a bike-sharing system as a product to be considered among a set of available
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems3
stations (the offer set). But one unique characteristic of a bike-sharing service is that the products
are not differentiated by their prices or other intrinsic attributes, such as brand and color, because
most operators charge identical rates and provide similar bikes regardless of location; instead, they
are largely differentiated by their relative ease of access to the commuter—for example, earlier
estimates showed that a commuter whose point of origin is 300m away from a station is about
60% less likely to use the service than one who is close by (Kabra et al. 2016). Further, commuters
often have real-time access to information regarding the availability of all stations via websites
or smartphone apps, which facilitates timely discovery of alternatives. These factors suggest that
substitution is most likely to occur in parts of the network where stations are densely positioned,
hence the need to consider its effect in demand modeling.
Most earlier works on bike-sharing demand modeling do not take into account choice substitution.
For example, there is a line of work on predicting the observed ridership with regression models
that include covariates such as weather and demographic features (Rixey 2013, Singhvi et al. 2015),
but without isolating the contributions due to spills and recaptures. Others have focused mainly on
optimizing bike rebalancing operations (O’Mahony and Shmoys 2015, Raviv et al. 2013), wherein
the demand is either assumed to be known or estimated by ad-hoc data aggregation. The work
most relevant to ours are by Kabra et al. (2016) and Pendem (2016), who proposed multinomial
logit models (MNL) that spatially differentiate choices based on the walking distances between
stations. However, our approach differs in that we estimate the primary demand without explicitly
modeling its functional relationship with other variables, such as population density and distances
between stations. While the inclusion of these covariates is useful for analyzing how the demand
varies with the environmental factors, they also impose parametric modeling constraints that are
difficult to justify a priori. In what follows, we instead consider a more general class of choice
models that can flexibly adapt to data with minimal assumptions, while exploiting the network
structure of a bike-sharing service to make its estimation tractable.
We propose a bike-sharing demand model that consists of two parts: (i) a Poisson process that
models commuter arrivals, and (ii) a generic rank-based choice model that defines commuter behav-
iors by their preference rankings of the alternatives. Specifically, we assume that each arriving
commuter has a strict ordering of stations by preference, which is drawn i.i.d. from some probability
distribution over preference rankings, and always chooses the highest ranked alternative from the
offer set (including the option to leave the system). From a modeling perspective, one advantage of
rank-based choice models is that they are “non-parametric”; that is, they are compatible with any
rational choice behaviors and require no prior assumption about the functional form of the choice
probabilities. Furthermore, its complexity can flexibly scale with data as preference rankings are
incrementally added to the model (Farias et al. 2013, van Ryzin and Vulcano 2014). By contrast,
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems4
parametric models such as MNL impose certain structural restrictions on substitutions, such as the
independent irrelevant alternatives (IIA) property, which imply behaviors that may be unrealistic
in practice. Specifically, the IIA property requires that the ratio of ridership shares between any
two available stations stays constant regardless the availability of other choices. In Figure 1 below,
we illustrate why such property is likely to be violated in a bike-sharing system. Suppose there are
three stations A, B and C, and three commuters that are near to each of the stations. We assume
that all commuters rank the stations in increasing order of distance to their respective locations,
with their preference rankings shown in Figure 1. At an aggregate level, the ridership shares of
the three stations A, B and C are (1/3,1/3,1/3), respectively. Now, suppose station A runs out
of bikes. Since station B is closest to A, the commuter near station A will pick up a bike from
station B instead, which results in ridership shares of (0,2/3,1/3). The ratio of shares between
stations B and C is thus not preserved. Although there are more complex parametric models that
relax the IIA property, such as nested logit (Ben-Akiva et al. 1985) and mixed logit (Train 2009)
models, specifying a suitable parametric structure a priori is often difficult in practice. Moreover,
fitting a complex but incorrectly specified parametric model to data can lead to poor generalization
performance due to overfitting (Farias et al. 2013).
Figure 1 An example with 3 stations (labeled A, B, C) and 3 commuters to illustrate the violation of IIA property.
Below, each commuter’s rankings over the choice set {A,B,C} is specified by a 3-tuple in decreasing
order of preference.
BA C
Rank-based demand models of the kind that we proposed have been studied in various domains,
particularly in retail and airline industries (Haensel et al. 2011, van Ryzin and Vulcano 2014, 2017).
But one distinguishing feature of our application is that choice substitutions tend to be spatially
constrained in the bike-sharing network—that is, commuters tend to substitute an unavailable
station for other alternatives that are in close proximity to their points of origin. In this regard,
rank-based models afford us the flexibility to specify exactly how the choices are substitutable by
each another depending on the spatial configuration of the bike-sharing network, without neces-
sarily satisfying the IIA property. In what follows, we summarize the main contributions of this
paper in three areas: modeling, estimation and experimentations.
Modeling. We propose to augment the rank-based choice model with a substitution graph, which
precisely characterizes the set of substitutable alternatives for each station in the bike-sharing
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems5
network through a set of graphical constraints. The introduction of these substitution constraints
serve two purposes: (i) to model commuters’ tendency to explore only alternatives that are in close
proximity to their first-choice stations, and (ii) to reduce the number of parameters of the model by
eliminating preference rankings that imply unrealistic substitution behaviors. Further, we extend
the rank-based choice model to jointly account for pick-up and drop-off substitutions using paired
rankings of OD choices. This extended model allows us to estimate the primary demand for trips
between all origin-destination (OD) station pairs, which are useful for various inventory planning
and rebalancing operations (e.g., Shu et al. (2013), Ghosh et al. (2017)).
Estimation. Our proposed model is characterized by a set of parameters—the Poisson arrival
rate and the probability mass function of preference rankings—that can be estimated by maxi-
mum likelihood estimation (MLE). One issue, however, is that the number of preference rankings
grows factorially with the number of alternatives. Therefore, it is usually not feasible to solve
the fully parametrized MLE problem using off-the-shelf optimization solvers. Instead, large-scale
optimization methods such as constraint sampling (Farias et al. 2013) and column generation (van
Ryzin and Vulcano 2014) have been applied. These methods begin by solving the MLE problem
with a small set of preference rankings, and then iteratively improving the likelihood objective by
adding new rankings to the model. However, van Ryzin and Vulcano (2014) showed that finding
a ranking that improves the likelihood requires solving an integer programming sub-problem that
is NP-hard, making it intractable when there are hundreds of alternatives, as is the case in many
bike-sharing systems. In this paper, we take a different approach by proposing a dimensionality
reduction technique that is based on sparse representations. More specifically, given a sequence of
observed available stations (offer sets) and choices over time, our algorithm will generate a parsimo-
nious set of preference rankings that sufficiently explain the observations, subject to the constraints
imposed by a substitution graph. For our application, we find that this approach can reduce the
problem dimension by orders of magnitude, making it feasible to solve the MLE problem without
resorting to large-scale optimization methods. Our second contribution is in developing a special-
ized estimation algorithm. We show that the MLE problem, while nonconcave in general, can be
reduced to a difference-of-convex (DC) program. Based on this reduction, we apply an iterative DC
programming technique that estimates the parameters by solving a sequence of convex programs,
where each convex program can be solved efficiently using the Frank-Wolfe method. In particular,
the estimation procedure can be described by a series of closed-form updates that uses only the
first derivative of the likelihood function, which makes it scalable to large bike-sharing systems
with hundreds of stations. We show that the proposed procedure enjoys theoretical convergence
guarantees and is very efficient in practice—often achieving vast but quickly diminishing improve-
ments in the likelihood objective after solving only the first few convex programs. Finally, from a
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems6
statistical point of view, we also extend the consistency results of van Ryzin and Vulcano (2014)
by proving the consistency of MLE under a Poisson arrival process and a general non-concave
likelihood function, where one cannot distinguish no arrivals from arrival without pickups (due to
a stockout).
Experimentations. We evaluate our methods on synthetic and actual data sets of Hubway, a
bike-sharing service in the Boston metropolitan area. The purpose of our study is two-fold: (i) to
demonstrate the practicality of our method on a city scale, and (ii) to evaluate the effectiveness of
our primary demand estimation procedure, in terms of both its accuracy and operational impact
on ridership. We show that across a wide range of simulated conditions, our method consistently
outperforms several benchmarks, including the independent demand and MNL models. Specifically,
we achieve error reduction by more than 20% when the average stockout frequency is close to the
actual levels observed in Boston during the evening peak hours. These improvements translate into
ridership increases of up to 3% when bikes are allocated based on the estimated demands, using a
bike deployment model by Shu et al. (2013) with a planning horizon of several hours. When applied
to the actual Hubway data set, our method achieves an overall out-of-sample error of 24.8% in
ridership prediction, which is 10% lower compared to the best-performing benchmark.
The rest of this paper is organized as follows. We first give a literature review in Section 2. We
then formally describe our bike-sharing demand model and derive the MLE problem in Section 3.
Next, we propose an MLE estimation procedure and address the computational issues resulting
from high dimensional parameter space in Section 4. After that, we discuss the statistical properties
of our methods in Section 5. We then extend our model to jointly account for origin and destination
demand using paired preference rankings in Section 6. Finally, in Section 7 and 8, we present the
results of our empirical studies on Hubway bike-share.
2. Related Work
Bike-sharing Demand. There have been numerous works that analyzed bike-sharing usage
by incorporating data from heterogeneous sources (Rixey 2013, Singhvi et al. 2015, El-Assi et al.
2017), among them being demographic characteristics, built environment factors, weather and
usage data of other connecting transportation (see El-Assi et al. (2017) for a summary of the recent
literature and references therein). By comparison, there is relatively scant work on estimating the
primary demand of bike-sharing systems. Pendem (2016) proposed a stochastic demand model that
assumes a Poisson arrival process and a MNL choice model for pick-ups, where the substitution
probability of one station by another is parametrized by a nonlinear function of the walking distance
between them. Kabra et al. (2016) focused primarily on the question of how the accessibility and
the availability of a bike-sharing service impact ridership. To that end, the authors proposed a
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems7
structural demand model in which the commuter arrival rate at a location is assumed to vary
linearly with several covariates, such as the local population density and metro usage. The pick-up
choice model is similar to that of Pendem (2016), except that the choice probability is a piecewise
linear function of distance. Both works differ from ours in two aspects: (i) we adopt a more general
class of nonparametric choice models, and (ii) we do not impose parametric relationships between
the demand and other covariates. Further, our approach can be extended to model the trip demand
for all origin-destination station pairs, whereas earlier works focused only on pick-ups.
Rebalancing Operations and Capacity Allocation. Existing work in bike rebalancing
optimization can generally be classified into two streams: (i) static rebalancing (Raviv et al. 2013,
Chemla et al. 2013), where bikes are repositioned (usually overnight) ahead of major demand spikes,
and (ii) dynamic rebalancing (O’Mahony and Shmoys 2015, Ghosh et al. 2017), which is carried out
continuously during peak usage periods. These operations are complex and multifaceted, involving
problems such as inventory planning (Shu et al. 2013, O’Mahony and Shmoys 2015, Schuijbroek
et al. 2017), vehicle routing (Schuijbroek et al. 2017) and pricing (Singla et al. 2015). Aside from
rebalancing operations, Freund et al. (2017) also address the dock-capacity allocation problem. One
common thread is that all these operations depend on estimated demand for decision support. For
example, O’Mahony and Shmoys (2015) uses the pick-up and drop-off rates at individual stations
as inputs to a dynamic rebalancing model, whereas Shu et al. (2013) uses the OD flow rates between
all station pairs to optimize bike inventory levels over a planning horizon. However, in all of these
works, the demand is either assumed to be known or estimated in an ad-hoc way without taking
into account choice substitution.
Demand Estimation using Choice Models. Discrete choice models have long been applied
to estimate the primary demand of substitutable products (Vulcano et al. 2010, 2012, Haensel
et al. 2011, van Ryzin and Vulcano 2014, 2017). In the literature, our model is closest to that of
Haensel et al. (2011) and van Ryzin and Vulcano (2014). Both considered rank-based choice models
under a customer arrival process that is either Poisson (Haensel et al. 2011) or Bernoulli (van
Ryzin and Vulcano 2014). But one distinguishing feature of our model is that choice substitutions
can be constrained according to the spatial configuration of the bike-sharing network—a notion
that we formalize using a substitution graph later in Section 4.1.1. To handle the high parameter
dimension of rank-based choice models, earlier work has applied constraint sampling (Farias et al.
2013) and column generation (van Ryzin and Vulcano 2014) for estimation. Farias et al. (2013)
considered a different setup where instead of MLE, the goal is to estimate ranking probabilities
that are robust against a revenue function under a fixed assortment, while assuming that a subset
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems8
of choice probabilities are observable and consistent. van Ryzin and Vulcano (2014) solved the high-
dimensional MLE problem by column generation, which incrementally adds rankings to the model
using dual information. But each iteration requires solving an integer program that is NP-hard.
Another related approach was taken by Jagabathula and Rusmevichientong (2016), who identified
certain structures of offer set collections under which the estimation of rank-based models can
be formulated efficiently. Our proposed dimensionality reduction method departs from the ones
by Farias et al. (2013) and van Ryzin and Vulcano (2014) in that it is applied before solving the
estimation problem itself. It also differs from Jagabathula and Rusmevichientong (2016) in that
we do not make any particular assumptions on the structure of the offer sets, while we take into
account of the observed choices and offer sets to identify a parsimonious set of preference rankings
that are sufficient for solving the MLE problem optimally.
To estimate the parameters of rank-based models, previous works have derived specialized MLE
methods based on the expectation-maximization (EM) procedure. van Ryzin and Vulcano (2017)
derived an EM procedure that has closed form solution iterates, but it is tailored to the Bernoulli
arrival assumption and not directly applicable to our setting. Haensel et al. (2011) proposed an
EM formulation that works with Poisson arrivals, but their approach did not fully address the core
issue of data incompleteness—that we can only observe the choices made by purchasing customers
without knowing their full preference rankings—but instead relied on data imputation heuristics
that are not soundly justified. Our proposed DC programming technique differs from the EM
procedure in that it directly exploits convex structures inherent in the likelihood function itself,
rather than evaluating the conditional expectation of the function with respect to an extended data
space (also known as the E-step in EM procedure). The two approaches, however, can be understood
as special cases of the majorization-minimization (MM) algorithm. The interested reader may refer
to (Lipp and Boyd 2016, Le Thi and Dinh 2018) for a discussion of these connections. A similar
MM algorithm for estimating MNL models was recently proposed by Abdallah and Vulcano (2017),
who demonstrated empirically that it converges much faster than an EM method. To the best of
our knowledge, ours is the first work that applied this approach in estimating rank-based choice
models.
3. Demand Model
We first develop a demand model for pick-up requests, which describes the arrival process of
potential commuters and their choices of where to pick up the bikes from. This approach can
be extended to model demand for both pick-up and return requests, which we explicate later in
Section 6, with an essential difference being that the choices consist of origin-destination station
pairs rather than individual stations.
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems9
3.1. Preliminaries
Let us consider a bike-sharing system with N stations, denoted as N , {1, ...,N}. We define the
commuter’s choice set as N ∪ {0}, where 0 denotes the choice of not using the service. To model
the demand, we assume that commuters with pick-up requests arrive to the system according to
a Poisson process with rate λ per unit time. Each arriving commuter is characterized by a type
σ : {0, . . . ,N} 7→ {0, . . . ,N}, a bijection that specifies a strict ordering over the choice set as follows:
for any two choices i, j ∈ {0, . . . ,N}, i is preferred over j if and only if σ(i) < σ(j), i.e., choice i
is ranked higher than choice j. Note that we do not require each commuter to exhaust all of the
N station choices before deciding to leave the system. For example, when N = 3, a commuter of
type σ with σ(1)<σ(2)<σ(0)<σ(3) would consider up to two alternatives, station 1 followed by
station 2, before choosing not to use the service at all. With slight abuse of notation, we sometimes
represent the type in tuple form as σ = (1,2), which lists all stations that are ranked higher than
the option 0 in decreasing order of preference. In general, we denote the length of a type σ as |σ|,which is the number of alternatives that are ranked higher than the option 0. The set of all types is
denoted Σ. For the model to be useful, we assume that Σ does not include any type of length zero
(i.e., with leaving the system as the first choice) because commuters of such types are unobservable
and irrelevant as far as the demand is concerned.
We assume that the type of each arriving commuter is drawn independently from a probability
distribution over the set of all possible types, Σ , {σ1, . . . , σK}. We parametrize the probability
mass function of this distribution with a vector x, (x1, . . . , xK), where xk is the probability that
the commuter is of the type σk. Overall, the demand model is parametrized by the arrival rate
λ and the type probability vector x, both of which are unknown and have to be estimated from
data. For simplicity, we assume that the model is time-homogeneous, i.e., both λ and x do not
vary as a function of time. In practice, to account for time heterogeneity (for example, to model
arrival rates during peak and off-peak periods differently), one simple approach is to model the
demand separately for different time segments of data, each with a different set of parameters. All
estimation methods that we will discuss later can likewise be applied separately in this manner.
3.2. Observation and Type Compatibility
To setup the estimation problem, we first introduce several definitions that relate an observed
choice to the underlying model parameters. Suppose we observe a pick-up at station j while the
set of available stations is S ⊆N (S is called the offer set). We say that a type σ is compatible
with the observed choice j ∈ S given the offer set S if σ(j) corresponds to the highest rank among
the alternatives in S. The set of all such types, Mj(S) is defined as follows,
Mj(S), {k ∈ {1, . . . ,K} : σk(j)<σk(i),∀i∈ S, i 6= j}.
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems10
The definition allows for j = 0, so that M0(S) is the set of types which are compatible with not
using the service given the offer set S.
For a given probability vector x that parametrizes the type distribution, the probability that a
commuter chooses option j when presented with the offer set S is given by
P(j | S ;x),
{∑k∈Mj(S) xk, if j ∈ S ∪{0},
0, otherwise.
The function P(· | · ;x) fully specifies a choice model that predicts the fraction of commuters picking
a bike from any station or not using the service under any service outage pattern. In particular,
P(j | N ;x) and λj , λP(j | N ;x) correspond to the first-choice probability and primary demand
for station j, respectively, while ρ(S;x),∑
j∈S P(j | S;x) = 1− P(0 | S;x) is the probability that
given the offer set S, an arriving commuter will stay and use the service.
In general, x overparametrizes the choice model because its dimension K scales with N ! ∼
N (N+ 12 ), whereas the function P(· | · ;x) can be fully specified by only O(N2N) parameters. There-
fore, for rank-based choice models of this kind, it is well known that x cannot be uniquely identified
based on only observed choices and offer sets (van Ryzin and Vulcano 2014). However, as we will
show in Section 5, it is still possible to identify P(· | · ;x) and recover the primary demand under
conditions that are more relaxed than those assumed in a similar result by van Ryzin and Vulcano
(2014).
3.3. Likelihood Function
Suppose we have a series of observations over time periods indexed by t= 1, . . . , T . Without loss
of generality, we assume that each time period is of unit length, so that the number of arrivals in
the time period has a Poisson distribution with parameter λ. For each time period t, we observe
the offer set St and the arrival vector (mt1, . . . ,mtN), where mtj is the number of bikes picked up
at station j within the time period. To avoid difficult modeling issues involving data truncation,
we assume that the offer set St remains the same throughout the t-th time period. That is, each
station is either available or unavailable for service throughout the entire time period. In practice,
for this assumption to be reasonable, the length of the time period should be chosen to be small
enough (such as one minute or less) such that service availability is unlikely to change in between.
In our data sets, time granularity of station status is one minute.
For each t, let us define Mt ,∑N
j=1mtj to be the total number of pick-ups observed over the
time period. The probability of observing the vector (mt1, . . . ,mtN) under the offer set St is
exp (−λρ(St;x))(λρ(St;x))Mt
Mt!· Mt!
mt1!mt2 ! . . .mtN !
N∏j=1
(P(j | St ;x)
ρ(St;x)
)mtj.
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems11
In the above, recall that ρ(St;x) denotes the probability that given the offer set St, an arriving
commuter will stay and use the service. Thus, λρ(St;x) is the observed rate of arrival to the
system. Taking the logarithm of the probability and disregarding the constant terms, we obtain
the likelihood function `t(λ,x) for the t-th time period as
`t(λ,x) =−λρ(St;x) +Mt log (λρ(St;x)) +N∑j=1
mtj log
(P(j | St ;x)
ρ(St;x)
)
=−λρ(St;x) +Mt logλ+N∑j=1
mtj logP(j | St ;x).
By the independent increment property of the Poisson process, the likelihood function L(x,λ) for
all observations over time periods t= 1, . . . , T can be expressed as
L(λ,x) =T∑t=1
`t(λ,x)
=−λT∑t=1
ρ(St;x) +T∑t=1
Mt logλ+T∑t=1
N∑j=1
mtj logP(j | St ;x).
=−λT∑t=1
1−∑
k∈M0(St)
xk
+M logλ+T∑t=1
∑j∈St
mtj log
∑k∈Mj(St)
xk
, (1)
whereM ,∑T
t=1Mt is the total number of arrivals over all time periods. To estimate the parameters
by MLE, our goal is to maximize x and λ over L(x,λ), subject to the constraints λ≥ 0 and x∈∆K ,
where ∆K , {x ∈ RK :∑K
k=1 xk = 1, x ≥ 0} is the (K − 1)-dimensional simplex. It is helpful to
understand how the difficulty of this optimization problem depends on the service outage patterns.
For example, if the offer sets are St =N ,∀t (i.e., there is no service outage over the observation
horizon), then the simplex constraint on x implies that ρ(St;x) = 1, resulting in a linearly separable
concave optimization problem,
maxλ≥0,x∈∆K
L(λ,x) = maxλ≥0{−λT +M logλ}+ max
x∈∆K
T∑t=1
N∑j=1
mtj log
∑k∈Mj(N )
xk
, (2)
which gives the following solution.
Proposition 1. Under complete offer sets, St =N for all t= 1, . . . , T , any pair (λ, x) that satisfies
λ=M
T,
∑k∈Mj(N )
xk =
∑T
t=1mjt
M∀j = 1, . . . ,N.
is an optimal solution of the MLE problem.
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems12
Intuitively,∑
k∈Mj(N ) xk can be interpreted as the first-choice probability estimate, P(j | N ; x) for
station j, which is given by the fraction of observed pick-ups at j.
In general, when the offer sets are incomplete (St 6=N ), the objective function is non-separable,
non-concave and does not admit a closed form solution such as the above stated by Proposition 1
due to the existence of a bilinear term −λ∑
t(1−ySt,0). Adding to this difficulty is the high dimen-
sion of x, which makes it impractical to solve the problem directly using off-the-shelf optimization
solvers. In the next section, we address both of these issues, first by proposing dimensionality
reduction techniques based on network substitution constraints and sparse representations of the
choice probabilities, and then reformulating the objective function into a form that is amenable to
sequential convex optimization methods.
4. Estimation
4.1. Dimensionality Reduction
We propose two approaches for reducing the dimension of x in the MLE problem. Our first approach
is based on a behavioral assumption that commuters may only consider alternatives within a
neighborhood of their first choices. This assumption is grounded on the belief that commuters are
only willing to expend limited amounts of effort to access available alternatives, which, in the case
of a bike-sharing network, usually requires walking from a point of origin to a station to pick up a
bike. By imposing a reasonable distance limit on the accessible alternatives, we effectively constrain
the set of commuter types Σ based on the spatial configuration of the bike-sharing network. Our
second approach does not rely on any behavioral assumption, but is instead based on finding
sparse representations of the choice probabilities P(· | · ;x). Roughly speaking, the goal is identify a
parsimonious subset of types that could explain the observed data without necessarily enumerating
all of the K =O(N !) types. In what follows, we formalize both approaches and provide real-world
examples to illustrate their effectiveness in reducing the problem dimension.
4.1.1. Network Substitution Constraints. Let us formalize the first approach that con-
strains the set of types based on the spatial configuration of the bike-sharing network. For each
station pair i∈N and j ∈N , we define dij ∈R+ as the distance between the two stations satisfying
two properties: (i) dij = dji (ii) dij = 0 if and only if i= j. For example, dij may represent the actual
walking distance or travel time from station i to j. Based on this distance measure, we define the
substitution graph as G= (N ,E), where the nodes N correspond to the set of stations and the edges
E = {(i, j) : dij ≤D,∀i, j ∈N , i 6= j} consist of all (unordered) station pairs that are within distance
of D units apart from each other. The graph G imposes constraints on substitution behaviors as
follows: if a commuter’s first choice is i, then any substitute j must be in the neighborhood of
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems13
Figure 2 Substitution graphs for Hubway, a Boston-based bike-sharing system, under varying maximum substi-
tution distance D (in unit of km). The nodes correspond to the 171 stations and the edges connect
station pairs that are less than D km apart.
(a) D= 0.6 (b) D= 0.8 (c) D= 1.0
i, i.e., j ∈ δG(i), where δG(i) denotes the neighborhood of node i in G. More formally, the set of
types constrained by G can be expressed as ΣG = {σ ∈Σ : σ−1(n)∈ δG(σ−1(0)),∀n= 1, . . . , |σ|−1},where |σ| denotes the length of the type and σ−1 is the inverse function of σ that maps each ordinal
number (starting from 0) to the corresponding station that occupies that position in the ranking.
By introducing these network constraints, we reduce the number of types from |Σ| = O(N !) to
|ΣG|=∑
i∈N∑deg(i)
k=1
(deg(i)k
)k! =O(N(∆(G))!), where deg(i) denotes the degree of node i and ∆(G)
is the maximum degree of G. That is, the dimension now scales linearly instead of exponentially
with the number of stations N , provided that the maximum degree is held constant as the network
grows.
In practice, many bike-sharing networks are designed such that stations are spread out to max-
imize service coverage, so the corresponding substitution graph is naturally sparse (i.e., having
low node degrees) under a reasonable distance cutoff D. As a result, imposing network constraints
can often decrease the dimension by orders of magnitude. To illustrate this by example, we show
in Figure 2 the substitution graphs corresponding to Hubway, a bike-sharing network in Boston.
By imposing distance limits of D= 0.6 km,D= 0.8 km and D= 1.0 km, we drastically reduce the
number of types from |Σ| ∼ 10311 down to ∼104, ∼107 and ∼1010, respectively. However, even after
this reduction, the numbers of parameters under D = 0.8 and D = 1.0 are still too large for full
instances of the MLE problem to be solvable. This motivates our next approach based on sparse
representations.
4.1.2. Sparse Representations of Choice Probabilities. Given a substitution graph G
and a series of observed offer sets S1, . . . , ST , we now describe an explicit procedure for generating
types that obey the network constraints induced by G, while exploiting data sparsity to further
reduce the problem dimension. From here on, we will assume that the set of all possible types is
ΣG , {σ1, . . . , σK}.
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems14
To motivate our approach, let us denote Pt , {j ∈ St :mtj > 0} as the subset of stations in which
at least one pick-up is observed during time period t. By representing each choice probability
P(j | St ;x) with an auxiliary variable ySt,j for all t and all j ∈ Pt ∪ {0}, we can write the MLE
problem in (1) with respect to the constrained set of types ΣG as:
maximizex,y,λ
−λT∑t=1
(1− ySt,0) +M logλ+T∑t=1
∑j∈Pt
mtj log ySt,j
subject to Ax= y
x∈∆K , λ≥ 0,
(3)
where the simplex ∆K characterizes the set of possible distributions over ΣG, y is the vector of
choice probabilities and A∈ {0,1}(∑Tt=1 |Pt|+T )×K is the compatibility matrix that specifies the linear
relationships between y and the ranking probabilities x, with ySt,j =∑
k∈Mj(St)xk holding for each
t and j ∈ Pt ∪{0}. With this reformulation, our proposed dimensional reduction technique can be
understood as finding an efficient representation of the feasible polytope Y = {y : ∃x∈∆K s.t. y=
Ax} in (3), which is the convex hull of the columns of A. Recall that A comprises K columns, each
corresponding to a type in ΣG. The key idea is to identify a subset of types ΣG ⊆ΣG such that the
following MLE problem (4) has the same optimal value as of the MLE problem (3) where types
are subject to network substitution constraints, ΣG.
maximizex,y,λ
−λT∑t=1
(1− ySt,0) +M logλ+T∑t=1
∑j∈Pt
mtj log ySt,j
subject to AΣGx= y
x∈∆|ΣG|, λ≥ 0.
(4)
Here AΣG denotes the column submatrix of A that corresponds to the types in ΣG. In other
words, if such a set ΣG can be identified, then we can effectively discard all types not in ΣG from
the original choice model without impacting the maximum likelihood value obtained. Trivially, a
representation in which |ΣG| <K always exists in the case where K > 2NT , i.e., there are more
columns in A (each being a zero-one vector of dimension no more than NT ) than the combinatorial
set of all possible zero-one columns. This occurs, for example, when we have a large network but few
observations. In which case, at least some of the columns in A are redundant and can be removed.
But such redundancies can also arise in more general settings, which we illustrate by an example as
follows. Suppose we have a substitution graph characterized by nodes N = {1,2,3} and edges E =
{(1,2), (2,3)}, and we observe offer sets S1 = {2,3} and S2 = {1,2}, pick-up stations P1 = {2} and
P2 = {1} over two observation periods (T = 2). For this network, there areK = 9 types that obey the
substitution constraints, ΣG , {σ1, . . . , σ9}= {(1), (2), (3), (1,2), (2,3), (2,1), (3,2), (2,1,3), (2,3,1)}
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems15
with corresponding type probabilities x , (x1, . . . , x9). The choice probability vector consists of
four components, y, (yS1,2, yS1,0, yS2,1, yS2,0), which can be written in the form y=Ax as
yS1,2
yS1,0
yS2,1
yS2,0
=
0 1 0 1 1 1 0 1 11 0 0 0 0 0 0 0 01 0 0 1 0 0 0 0 00 0 1 0 0 0 0 0 0
x1
x2
...x9
It is easy to check that there are 6 unique zero-one columns in the compatibility matrix A above.
Without loss of generality, we can choose the set ΣG = {σ1, σ2, σ3, σ4, σ5, σ7}, which corresponds
to the types {(1), (2), (3), (1,2), (2,3), (3,2)}, such that any feasible y can be represented as the
convex combination of these 6 columns. Intuitively, these redundancies exist because we are unable
to distinguish preferences beyond the observable choices in each offer set: for instance, types (2,1)
and (2,1,3) are indistinguishable from each other given the observed pickups over the two time
periods, while they will be distinguishable if we observe another offer set in time period 3, S3 = {3}
and a pickup at station 3, e.g., P3 = {3}, so that only type (2,1,3) is compatible with this new
observation but not type (2,1).
Aside from removing types corresponding to duplicate columns in A, we can further reduce the
size of ΣG by exploiting the property of the optimal solutions of the MLE problem. We begin by
showing the following Lemma which generalizes redundancy definition:
Lemma 1. For any two columns Ai and Aj in the compatibility matrix A, if Ai − Aj ≥ 0, i.e.,
column Ai is greater than or equal to Aj element-wise, then type variable xj corresponding to Aj
is redundant, i.e. any optimal solution x∗ of the MLE problem (3) satisfies x∗j = 0.
Proof. Suppose we have x∗j = ε > 0 for some j. Let us consider another solution x∗∗ where x∗∗j =
0, x∗∗i = x∗i + ε and x∗∗k = x∗k, ∀k 6= i, j. We then have y∗∗ =Ax∗∗ =∑
k 6=i,jAkx∗∗k +Aix
∗∗i +Ajx
∗∗j ≥∑
k 6=i,jAkx∗k +Aix
∗i +Ajx
∗j =Ax∗ = y∗ because of Ai ≥Aj. Since the likelihood function is strictly
increasing in y, we know that x∗∗ has higher objective value than x∗, which contradicts to x∗ being
an optimal solution. �
Lemma 1 implies that types whose corresponding columns are element-wise less than or equal
to some other columns in the compatibility matrix can be effectively dropped without impacting
the optimal value of the MLE problem. For instance, σ5, σ6, σ7, σ8, σ9 in the above example are all
redundant according to this criterion. This leads to a final pick of ΣG = {σ1, σ2, σ3, σ4}.
Based on this intuition, we propose an algorithm below that generates a parsimonious type set
ΣG given a sequence of observed offer sets S1, . . . , ST and pick-up stations P1, . . . , PT .
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems16
4.1.3. Type Enumeration Algorithm. For each offer set St and each observed pick-up
station i ∈ Pt, the basic operation of our proposed algorithm is to enumerate all types in ΣG of
the form (j1, . . . , jk, i) and (j′1, . . . , j′k), where the first k choices consist of only stations that are
unavailable at time t. In what follows, we define ΠS(i, j) as the set of all types with station i as
the first choice and station j as a last choice, while having any permutation of the set S as the
intermediate choices—for example, we have Π∅(1,2) = {(1,2)}, Π{3,4}(1,2) = {(1,3,4,2), (1,4,3,2)},
Π∅(1,ø) = {(1)} and Π{3,4}(1,ø) = {(1,3,4), (1,4,3)}.
Algorithm 1 Parsimonious type enumeration algorithm
input: Substitution graph G, offer sets S = {S1, S2, . . . , ST} and pick-up stations P = {P1, P2, . . . , PT}
output: Set of types ΣG satisfying the constraints induced by G
1: procedure SparseEnumerate(G,S,P)
2: ΣG←∅ . Initialization with empty set.
3: for t= 1, . . . , T do
4: for each i∈ Pt do
5: ΣG← ΣG ∪{(i)} . Add a type comprising only one choice i.
6: for each j ∈ (N \St)∩ δG(i) do
7: for each P ∈ 2(N\St)∩δG(j) do . 2S denotes the power set of S.
8: ΣG← ΣG ∪ΠP (j, i) . Add types comprising unavailable stations followed by choice i.
9: for each i∈N \St do
10: for each P ∈ 2(N\St)∩δG(i) do
11: ΣG← ΣG ∪ΠP (i,ø) . Add types comprising unavailable stations only.
12: return ΣG
The proposed algorithm can be easily modified to impose an additional length constraint L on
the types generated, by adding a condition in line 7 of SparseEnumerate that each subset P
satisfies |P | ≤ L− 2 and line 10 that |P | ≤ L− 1. The inclusion of the length parameter L allows
us to tune the complexity of the choice model to prevent overfitting—for instance, in data-scarce
settings, it may be favorable to choose a small L so that model class is less complex and leads
to better generalization performance. In the following theorem, we establish the correctness of
the algorithm in producing feasible types that satisfy the network constraints and are provably
sufficient to represent the polytope Y.
Theorem 1. Given offer sets S1, . . . , ST and pick-up stations P1, . . . , PT , the set ΣG generated by
Algorithm 1 satisfies ΣG ⊆ ΣG and the MLE problem (4) has the same optimal value as problem
(3).
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems17
Another main question of interest is how much, if any, can we benefit from reducing the full
set of types ΣG to ΣG? We answer this through a simple analysis, followed by a case study with
actual data sets. First, observe that each type generated by the algorithm can be categorized into
two parts: (i) either of the form (j1, . . . , jk, i) for some i ∈ Pt, j1 ∈ (N \St)∩ δG(i) and j2, . . . , jk ∈
(N \St)∩δG(j1), that is, j1 is an unavailable station in the neighborhood of i, whereas j2, . . . , jk are
unavailable stations in the neighborhood of j1; (ii) or of the form (j1, . . . , jk) for some j1 ∈N \St and
j2, . . . , jk ∈ (N \St)∩δG(j1). To derive an upper bound on k for the first category, let us define Qt ,
{j ∈N \St : δG(j)∩St 6= ∅} to be the set of unavailable stations in time step t having at least one
available choice in its neighborhood. Then, we must have k≤U1 ,maxt,j∈Qt |(N \St)∩ δG(j)|+ 1.
Similarly, for the second category, we have k ≤ U2 , maxt,j∈N\St |(N \ St) ∩ δG(j)|+ 1. We thus
obtain an upper bound
|ΣG| ≤U1∑k=1
(U1
k
)k!×N +
U2−1∑k=1
(U2− 1
k
)k!×N =O(NU !)
on the number of possible types that could be generated by the algorithm where U = max{U1,U2−
1}. In particular, if St =N for all t (no service outage is ever observed), then U = 1 and we only
require O(N) types to parsimoniously account for the observations. More generally, we have a
reduction of dimension from |ΣG|=O(N(∆(G))!) to |ΣG|=O(NU !), where U ≤∆(G) by defini-
tion.
To empirically demonstrate the effectiveness of our approach, we apply the type enumeration
algorithm to real-world data sets for a Boston bike-sharing system, which were collected over a
period of one week in the summer of 2016. In our analysis, we use only the data for the evening
rush hours (5pm-7pm) when the frequency of service outage is at its highest, at 11.5%2. Table 1
shows the dimensionality reduction achieved by applying the type enumeration algorithm under
various choices of distance limit D and type length L. As these results indicate, massive reduction
in variable count by up to a factor of over 1,000 are possible in practice. The reason is that even
though the overall service outage frequency is high during the rush hours, the number of stations
that are unavailable simultaneously within any neighborhood of the network tends to be fairly low,
thus contributing to a small U . As we demonstrate in Section 7.2 later, by finding sparse sets of
variables in this manner, we are able to solve full instances of the MLE problem efficiently without
resorting to large-scale optimization methods such as column generation or constraint sampling.
We end this section with several remarks regarding the proposed method. First, while our dis-
cussions thus far have focused on a constrained class of types ΣG, the proposed method also applies
to the unconstrained setting if we define G as a complete graph (D=∞). Second, the output set
2 This is defined as the average percentage of time when stations are out of service over the 5pm-7pm period.
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems18
Table 1 Dimensionality reduction under varying distance threshold D (in km) and maximum type length L.
L= 2 L= 3 L= 4 L= 5 L=∞
D (km) |ΣG| |ΣG| |ΣG| |ΣG| |ΣG| |ΣG| |ΣG| |ΣG| |ΣG| |ΣG|
0.6 326 429 598 925 904 1,705 1,078 2,473 1,078 2,833
0.8 464 661 1,357 2,489 3,837 9,737 9,897 37,337 52,737 3.4×106
1.0 612 945 2,381 5,097 9,069 29,139 32,193 168,771 2.4×106 3.2×109
of types ΣG is not necessarily the sparsest, i.e., two different types σ,σ′ generated by the algorithm
might have identical columns Aσ =Aσ′ . One way to address this issue is to apply post-processing
steps that eliminate any duplicate column in the associated matrix AΣG . This can be done very
efficiently due to the fact that AΣG is a binary matrix. In practice, the rate of redundancy ranges
from 0% to 10% in our computation experiments.
4.2. Difference of Concave Formulation
In this subsection, we show that the MLE problem can be reformulated as an optimization problem
involving only x as the decision variables, for which specialized algorithms can be derived. First,
observe that (1) is equivalent to the following nested optimization problem:
maxx∈∆K
maxλ≥0
−λT∑t=1
1−∑
k∈M0(St)
xk
+M logλ
+T∑t=1
∑j∈St
mtj log
∑k∈Mj(St)
xk
.
For any x, the inner maximization problem, maxλ≥0
{−λ∑T
t=1
(1−
∑k∈M0(St)
xk
)+M logλ
}is
strictly concave in λ and has a unique optimal solution λ(x), given by
λ(x) =M∑T
t=1
(1−
∑k∈M0(St)
xk
) . (5)
By substituting λ(x) into the original objective function in (1), we obtain an equivalent optimization
problem of the form,
maxx∈∆K
f(x)− g(x), (6)
where f and g are two concave functions:
f(x),T∑t=1
∑j∈St
mtj log
∑k∈Mj(St)
xk
, g(x),M log
T∑t=1
1−∑
k∈M0(St)
xk
. (7)
We have thus reduced the MLE problem into a difference of convex (concave) (DC) program,
which is a convex-constrained optimization problem with an objective that can be written as the
difference of two concave functions. A practical implication of this reformulation is that we can
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems19
apply existing DC programming techniques (Le Thi and Dinh 2018) to devise specialized algorithms
for solving the MLE problem. One such technique is the convex-concave procedure (CCCP), an
iterative method for DC programming that solves a sequence of concave programs, each involving
the maximization of a concave surrogate (i.e., a concave lower bound) of the difference of concave
objective function, f(x)− g(x). Starting from an initial solution x(0) in a convex feasible set X ,
CCCP generates a sequence of solution iterates as follows:
x(n) ∈ arg maxx∈X
f(x)− g(x;x(n−1)) ∀n= 1,2,3, . . . , (8)
where g(x;x(n−1)), g(x(n−1)) +∇g(x(n−1))T (x−x(n−1)) is a linearization of the concave function g
around the previous solution iterate x(n−1). It is easy to check that the function f(x)− g(x;x(n−1))
is concave and forms a lower bound of the difference of concave objective, f(x)− g(x) ≥ f(x)−
g(x;x(n−1)), with equality holding at x = x(n−1). Disregarding the constant term in g, we can
simplify (8) into
x(n) ∈ arg maxx∈X
f(x)−∇g(x(n−1))Tx ∀n= 1,2,3, . . . , (9)
In what follows, we will apply the CCCP method to the reformulated MLE problem in (6).
As we demonstrate below, one particular advantage of decomposing the problem into a sequence
of convex programs is that each of them can be solved efficiently by Frank-Wolfe method with
closed-form iterates. Remarkably, the resulting CCCP algorithm can also be interpreted as an
application of block coordinate ascent to the original log likelihood function in (1) and satisfies
global convergence. All these properties do not apply to DC programs in general.
4.2.1. Convex-Concave MLE Procedure. By substituting f and g from (7) into the CCCP
algorithm in (9), we obtain the following procedure for solving the reformulated MLE problem:
starting from an arbitrary point x(0) ∈∆K , solve for
x(n) ∈ arg maxx∈∆K
T∑t=1
∑j∈St
mtj log
∑k∈Mj(St)
xk
+M∑T
t=1
(∑k∈M0(St)
xk
)∑T
t=1
(1−
∑k∈M0(St)
x(n−1)k
) ∀n= 1,2, . . . (10)
Intuitively, we can interpret the constant M/∑T
t=1
(1−
∑k∈M0(St)
x(n−1)k
), λ(x(n−1)) as an esti-
mate of the arrival rate λ based on the previous solution iterate, as was defined in (5). This
motivates an alternative derivation of the same algorithm using block coordinate ascent below.
Proposition 2. Let {(λ(n), x(n))}n≥1 be a sequence generated by the following block coordinate
ascent procedure based on some initialization point x(0) ∈∆K,
λ(n) ∈ arg maxλ≥0L(λ,x(n−1)). (11)
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems20
x(n) ∈ arg maxx∈∆K
L(λ(n), x). (12)
Then, the sequence {x(n)}n≥1 is also optimal for the procedure in (10), provided that it is based on
the same initialization point x(0).
We remark that the equivalence between CCCP and block coordinate descent in the sense defined
in Proposition 2 does not hold for general DC programs, but is applicable here only because of
special structures of the likelihood function.
Note that problem (10) can be solved by Frank-Wolfe (FW) algorithm (Frank and Wolfe 1956)
with simple updates as follows. At each iteration, FW solves the linear program maxx∈∆k cTx where
c is the gradient of the function f(x)−∇g(x(n−1))Tx evaluated at the current iterate. It is easy
to check that an optimal solution of this linear program is a standard unit vector ej for some
j ∈ arg maxi ci. Thus each iteration of FW is simply reduced to a findmax operation. FW has gained
widespread interest recently as an efficient approach in solving large-scale convex optimization
problems (Clarkson 2010, Jaggi 2011, Freund and Grigas 2016). Compared to projection-based
first-order methods, FW fits naturally to our problem for a number of reasons: (i) the linear
program sub-problem under simplex constraint can be solved in O(K) number of operations by
simply finding the maximum component of the gradient; (ii) each FW update naturally inherits
sparsity properties as the algorithm is adjusting the current solution towards an extreme point of
the simplex at each iteration: if we initialize with a vector x with m non-zero entries, the solution
that the method produces at iteration n has at most m+n non-zero entries.
We summarize the complete solution approach for the MLE problem in Algorithm 2 below. To
have better sparsity, one can initialize the solution procedure with a vector with only positive
entries on a small subset of rankings while still possess finite likelihood value. For example, a
uniform distribution over all singleton rankings {(i)i∈N} can be a natural initial solution.
Algorithm 2 MLE Solution Procedure
1: Initialize x for CCCP iterate
2: Initialize x′ for Frank-Wolfe iterate
3: while no convergence on f(x)− g(x) do
4: while no convergence on f(x′)−∇g(x)Tx′ do
5: Compute j ∈ arg maxi∈{1,...,K} (∇f(x′)−∇g(x))i
. solving the FW subproblem
6: α∗ ∈ arg maxα∈[0,1] f(x′+α(ej −x′))−∇g(x)T (x′+α(ej −x′)) . line search the step size
7: x′← x′+α∗(ej −x′) . FW update
8: x← x′ . CCCP update
9: return x,λ=M/∑T
t=1
(1−
∑k∈M0(St)
xk
)
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems21
4.2.2. Convergence Properties. We show that the proposed convex-concave procedure for
solving the MLE problem satisfies the global convergence property. That is, given an arbitrary
feasible starting point x(0), the sequence of solutions generated by (10) converges in objective value
to that of a stationary point of the likelihood function.
Theorem 2. For any initialization point x(0) ∈∆K, the sequence of solutions {x(n)}n≥1 generated
by the procedure in (10) satisfies
limn→∞
f(x(n))− g(x(n))→ f(x∗)− g(x∗),
where x∗ is a stationary point of the likelihood function in (6). Furthermore, any limit point of the
sequence {x(n)}n≥1 is a stationary point of the likelihood function.
This result follows directly from Theorem 4 in Sriperumbudur (2009). CCCP also enjoys linear
convergence for general DC programs (Le Thi and Dinh 2018). We find that in practice, this
procedure converges very quickly in an extensive range of numerical experiments. These findings
will be discussed in Section 7.2.
5. Statistical Properties
In this section, we analyze the asymptotic properties of the MLE estimators (λ, x) as the observation
period T tends to infinity. For our analysis, we will assume that the demand model described
in Section 3 is correct and (λ∗, x∗) are the ground truth parameters. As we discussed earlier in
Section 3, it is generally not possible to uniquely identify x∗ based on only the observed choices
and offer sets due to over-parametrization of the model. However, we show that it is possibly to
consistently estimate λ∗ and the ground truth choice probabilities y∗S,j , P(j | S;x∗) under certain
observability conditions. The key step of our consistency results is to show that the demand model is
partially identifiable asympototically, meaning that (λ∗, y∗) is the unique maximizer of the expected
likelihood function, although it is non-concave and non-separable. We will assume, without loss of
generality, that y∗S,j > 0 for all j ∈ S and S ∈ 2N (we can simply drop station j from S if y∗S,j = 0).
By defining yS,j , P(j | S; x) to be the choice probability estimates based on x, we then formally
describe the following consistency results.
Theorem 3. Assume that every offer set S ∈ 2N is observed infinitely often; that is, every S
satisfies, for some αS > 0, limT→∞(1/T )∑T
t=1 1(St = S)→ αS. Then, with probability one, the MLE
estimators satisfy λ→ λ∗ and yS,j→ y∗S,j for all S ∈ 2N and j ∈ S, as T →∞.
As a corollary to the above theorem, we can also consistently estimate the primary demand for
individual choices. That is, we have λyN j→ λ∗y∗N j for every choice j ∈N . Note that the theorem
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems22
extends the consistency results of van Ryzin and Vulcano (2014), which is based on Bernoulli
approximation of the Poisson arrival process and specialized to the case where the likelihood
function is complete (i.e., time periods with no arrival and arrival with no pickups due to a stockout
are distinguishable) and strictly concave. We also remark that the results are general and do not
rely on any particular network G.
6. Extension to Origin-Destination Demand Model
We have thus far developed a model for estimating the primary pick-up demand only. However,
this model does not account for the two-sided dynamics of a bike-sharing system, wherein a pick-up
at one station is always followed by a drop-off at another, hence it is not directly applicable if
we wish to analyze the origin-destination (OD) trip patterns in the system. Such analysis can be
useful, for instance, in planning operations that rely on estimated OD flow rates to decide how to
best allocate bikes to meet the service demand (e.g., Shu et al. (2013)). In what follows, we discuss
an extension of our model that accounts for both the origin (pick-up) and destination (drop-off)
demand.
6.1. Model
The main distinguishing feature of the extended OD model is that each commuter now chooses
a trip from the set Nod ,N ×N , which consists of all origin-destination station pairs such that
each choice (i, j) ∈ Nod denotes a trip that begins with pick-up at station i and ends with drop-
off at station j. More specifically, we assume as before in Section 3 that commuters arrive to
the system according to a Poisson process with rate λ, but besides having ranked preferences
over the pick-up choices, each commuter also has separately ranked preferences for the drop-off
destinations. We characterize these preferences with a paired type σ = (σo, σd), where σo and σd
specify the commuter’s rankings over pick-up and drop-off choices, respectively. The paired type
fully determines the commuter’s behaviors as follows: upon arrival, the commuter first chooses an
origin station according to σo. If none of the stations in σo is available, then the commuter leaves
the system; otherwise, the commuter initiates a trip and chooses an available destination according
to σd when dropping off later. Effectively, we are assuming that a commuter’s decision of whether
to stay in the system or not depends only on the availability of the origin stations.
Formally, we define the set of all paired types, Σod = Σo×Σd as the Cartesian product of the origin
type set Σo and the destination type set Σd. We parametrize the probability distribution over the
paired types Σod with a vector x, where each doubly indexed element xkl is the probability assigned
to the paired type (σok, σdl ) ∈Σod. Based on this probability distribution, we let P(i, j | S,Q;x) be
the probability of observing trip (i, j) given the pick-up offer set S (i.e., the set of non-empty
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems23
stations at pick-up) and the drop-off offer set Q (i.e., the set of non-full stations at drop-off), which
is defined as
P(i, j | S,Q;x),
{∑k∈Mo
i (S)
∑l∈Md
j (Q) xkl, if i∈ S, j ∈Q,0, otherwise,
whereMoi (S) (respectively,Md
j (Q)) is the set of types in Σo (Σd) that are compatible with observed
pick-up (drop-off) at station i (j) under the offer set S (Q). Given the arrival rate λ and probability
vector x, we can recover the primary OD demand for trip (i, j) as λP(i, j | N ,N ;x), while the
primary pick-up and drop-off demand at each station i correspond to λ∑N
j=1 P(i, j | N ,N ;x) and
λ∑N
j=1 P(j, i | N ,N ;x), respectively.
6.2. Likelihood Function
Based on the extended choice model, we can now rewrite the original likelihood function in (1) in
terms of the observed trips. Suppose over time periods t= 1, . . . , T , we observe a sequence of M
trips, (i1, j1), . . . , (iM , jM), where each trip (ik, jk) begins at time period tok and ends at time period
tdk. Let us define St and Qt as the pick-up and drop-off offer sets at time period t, respectively, and
let ρ(St;x),∑
i∈St
∑N
j=1 P(i, j | St,N ;x) be the probability that given offer set St, a commuter will
stay in the system and begin a trip upon arrival. Again, the second summation is over the set of
all stations N because we assume that a commuter’s decision of using the system or not depends
only on the availability of the origin stations. The OD log likelihood function is given by
Lod(λ,x) =−λT∑t=1
ρ(St;x) +M logλ+M∑k=1
logP(ik, jk | Stok,Qtd
k;x). (13)
Since ρ(St;x) and P(ik, jk | Stk ,Qtk ;x) are both linear in x, the MLE problem is reduced to an
optimization problem that is solvable with the same methods described in Section 4. One major
computational issue, however, is that the dimension of x is quadratically larger than in the original
model due to the fact that |Σ|= |Σo||Σd|=O((N !)2). To address this complication, we can adapt
the dimensionality reduction techniques from Section 4.1 as follows.
6.3. Dimensionality Reduction
Given a substitution graph G, let us denote ΣGo and ΣG
d as the constrained sets of origin and
destination types, respectively, that are induced by G. We first apply Algorithm 1 to generate a
subset of types ΣGo ⊆ ΣG
o that parsimoniously accounts for the observed pick-ups given the offer
sets S1, . . . , ST . Separately, we apply the same algorithm to generate another set of types ΣGd ⊆ΣG
d
that explains the observed drop-offs given offer sets Q1, . . . ,QT . We then apply Algorithm 3 below
to generate a parsimonious set of paired rankings, ΣGod by taking the sets ΣG
o and ΣGd as inputs.
To facilitate our presentation below, we define Moi (S) (respectively, Md
j (Q)) as the set of types in
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems24
Algorithm 3 Parsimonious paired type enumeration algorithm
input: pickup offer sets S = {S1, S2, . . . , ST}, dropoff offer sets Q = {Q1,Q2, . . . ,QT}, trips T =
{(i1, j1), (i2, j2), . . . , (iM , jM)}, parsimonious set of origin types ΣGo and destination types ΣG
d
output: Parsimonious sets of paired types ΣGod
1: procedure SparseEnumeratePaired(G,S,Q,T , ΣGo , Σ
Gd )
2: ΣGod←∅ . Initialization with empty set.
3: for k= 1, . . . ,M do
4: for each (σo, σd)∈ Moik
(Stok)×Md
jk(Qtd
k) do
5: ΣGod← ΣG
od ∪{(σo, σd)} . Add a paired type compatible with trip (ik, jk).
6: for t= 1, . . . , T do
7: for each (σo, σd)∈ Moik
(Stok)× ΣG
d do
8: ΣGod← ΣG
od ∪{(σo, σd)} . Add a paired type compatible with not using the service at time t.
9: return ΣGod
the parsimonious set ΣGo (ΣG
d ) that are compatible with observed pick-up (drop-off) at station i
(j) under the offer set S (Q).
Building on top of Theorem 1, it can be shown that the set ΣGod is sufficient to account for all
the observations; that is, we can fix xkl = 0,∀(σok, σdl )∈ (ΣGo ×ΣG
d ) \ ΣGod without incurring any loss
in optimality when solving the MLE problem. We formalize this in the following corollary.
Corollary 1. The MLE problems maxλ≥0,x∈∆K Lod(λ,x) with types ΣGo × ΣG
d and ΣGod =
SparseEnumeratePaired(G,S,Q,T , ΣGo , Σ
Gd ) of Algorithm 3 have the same optimal value.
As a distinct feature of our OD estimation problem, we can further show that, for destination
ranking, any type that is nested in some other types can be proven to be redundant. Here we define
a type σ is nested in another type σ′, σ⊂ σ′, if |σ|< |σ′| and σ−1(i) = σ′−1(i),∀i∈ {0, . . . , |σ| − 1}.
This corollary mainly follows by the assumption that the availability of destination stations doesn’t
impact commuters’ decisions of using the system or not.
Corollary 2. Define ΣGd = {σ ∈ ΣG
d : ∃σ′ ∈ ΣGd s.t. σ ⊂ σ′} as the set of types that are nested in
some other types. Then the MLE problems maxλ≥0,x∈∆K Lod(λ,x) with types ΣGo ×ΣG
d and ΣGod =
SparseEnumeratePaired(G,S,Q,T , ΣGo , Σ
Gd \ ΣG
d ) of Algorithm 3 have the same optimal value.
7. Simulation Studies
In this section, we present our results for an extensive range of numerical experiments based on
simulations of a bike-sharing network in Boston. These experiments are designed with two goals
in mind. First, we evaluate the accuracy of demand estimation under a range of sample sizes and
service outage scenarios. Second, we quantify the operational impact of the demand estimation
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems25
procedure when it is used as the basis for allocating bikes in a system. Specifically, we use the
estimated demand as inputs to an optimization model for allocating bikes with the objective of
maximizing ridership over a planning horizon. We then evaluate the actual ridership with ground-
truth demand profile as a measure of how well the estimation method helps inform operational
decision-making. Throughout this section, we will refer to our proposed rank-based estimation
method using the abbreviation RANK. For comparison, we run the same set of experiments with
the following benchmark methods:
1. Sample Average (SA). We estimate the OD demand λij for each station pair (i, j) by averaging
the number of observed trips over all time periods, λij := 1T
∑T
t=1mtij, where mtij denotes the
number of trips from station i to j that begins at time step t. This corresponds to the MLE
estimate under the assumption that the station is never out of service. Based on the estimated OD
demand, we obtain the pick-up and drop-off primary demand for each station i as λoi =∑N
j=1 λij
and λdi =∑N
j=1 λji, respectively.
2. Filtered Sample Average (FSA). To take into account of service availability, we modify the SA
estimator by considering only the time periods when both origin station i and destination station
j are in service, denoted as Ti , {t : i ∈ St}, so that λij := 1|Ti|
∑T
t=1mtij. This is similar to the
approach taken by O’Mahony and Shmoys (2015) to estimate the pick-up and drop-off demand as
inputs for bike rebalancing.
3. OD Multinomial Logit Model (MNL). For this model, we define the choice probability of trip
(i, j) as
P(i, j | S,Q;u, v) =ui∑
k∈S uk +u0
· vij∑k∈Q vik
,
where u and v are nonnegative utilities associated with pick-up and drop-off choices, respectively.
Specifically, the first term, ui/(∑
k∈St uk +u0) can be understood as the probability of observing a
pick-up at station i given offer sets (S,Q), where u1, . . . , uN are the utilities assigned to the stations
and u0 is the utility associated with leaving. The second term is the conditional probability of
observing a drop-off at station j given that the trip originated from station i, where the utilities
assigned to the drop-off destinations are ui1, . . . , uiN . Since we assume that all bikes ever picked up
will be dropped off, there is zero utility associated with the option to abstain from returning the
bike.
7.1. Setup
7.1.1. Simulation Platform. To generate synthetic data for our experiments, we built a
simulation platform based on Hubway, a bike-sharing network based in Boston that comprises 171
stations based on its summer 2016 data. As a preprocessing step, we first estimate the primary
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems26
pick-up demand at individual stations by applying the FSA methd on a real Hubway data set
collected over summer 2016, wherein the timestamps are discretized into units of one minute each.
We then estimate an OD matrix from the data based on the empirical trip frequency of each origin-
destination pair, which determines the probability that a commuter decides to drop off a bike at
a station given the pick-up origin. These estimates will serve as the ground truth parameters on
which the simulation is based, as we detail below.
Arrival Process. Let us denote {λ∗j}j∈N as the ground truth arrival rates (per minute) at individ-
ual stations obtained in the preprocessing step above. We simulate arrivals to the system minute
by minute according to a Poisson process, such that the arrival rate at station j is equal to λ∗j . For
each arriving commuter, we choose the intended drop-off destination probabilistically according to
the ground truth OD matrix. Once a bike is picked up by a commuter, it will only be dropped off
after the duration of the trip time, which is determined by the origin-destination distance divided
by a fixed biking speed of 0.258km per minute.
Substitution Behaviors. In the event that the first-choice station is unavailable at the time of
pick-up or drop-off, a commuter may search for an alternative as follows. To determine the pick-up
substitution, we first assign to the commuter a pair of “exact origin” coordinates, which correspond
to the exact reference point from which the commuter determines the distance to every station.
Examples of such reference points could be the commuter’s home address or workplace. These
coordinates are sampled uniformly in an area surrounding the commuter’s first-choice pick-up sta-
tion. Given that the first choice is unavailable, the commuter will consider all available alternatives
within distance D from his exact origin, where D is sampled from a uniform distribution over [0,1].
If any such alternatives exist, then the commuter will pick the one that has the shortest distance
to the destination station: the distance between the exact origin and the alternative origin station
plus the distance between the origin and destination station is minimum; otherwise, the commuter
will leave the system. Likewise, when dropping off the bike, the commuter will consider any alterna-
tives using another random pair of “exact destination” coordinates as his reference point. However,
unlike in the case of pick-up, we assume that the commuter always drops off the bike to the closest
available alternative regarding its distance from his reference point. By randomizing the reference
point coordinates and maximum search distance D (for pick-up) in this manner, we can generate
a variety of possible commuter types and simulate a complex range of substitution behaviors.
Initial Inventory. We adopt an approach in O’Mahony and Shmoys (2015) to decide the initial
allocation of bikes based on the net balance of pick-up and drop-off demand at individual stations.
Specifically, each station is classified as either consumer, producer or self-balancer, according to
whether it has net inflow, net outflow or approximately equal in and out flow of bikes. The self-
balancers are further divided into high-usage and low-usage stations depending on their combined
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems27
Table 2 Mean Absolute Relative Errors (MAREs) of pick-up and drop-off primary demand estimation under
varying rebalancing intervals (Tr).
Outage Frequency (%) Pick-up Errors (%) Drop-off Errors (%)
Tr (min) Pick-up Drop-off RANK SA FSA MNL RANK SA FSA MNL
90 4.0 0.5 12.0 14.4 13.8 13.0 10.7 11.3 12.8 17.8
120 6.6 1.1 12.6 16.6 16.3 15.6 11.0 12.9 16.1 18.7
150 8.8 1.8 12.4 18.3 18.0 15.6 11.3 14.6 18.4 17.7
180 12.7 2.9 14.2 23.4 22.8 19.6 13.7 19.8 24.9 20.5
210 13.3 3.4 13.2 22.5 21.8 18.4 16.1 22.4 27.2 19.8
240 17.1 4.8 14.3 26.6 26.7 22.3 25.2 34.0 42.8 31.4
pick-up and drop-off demand. Then, at the beginning of each simulation, we stock the stations at
the following levels of their capacities: 10% for producers, 90% for consumers, 30% for low-usage
self balancers and 50% for high-usage self-balancers.
Rebalancing Operations. To modulate the service outage frequencies of our simulated system, we
replenish the stations at regular intervals according to the target inventory levels specified earlier.
This is done by greedily transferring bikes from the most overstocked stations (i.e., with inventory
exceeding the target levels) to the most understocked stations, up to a maximum number of 100
moves in each interval. After every 12 hours or 720 time steps, we simulate a daily reset by restoring
the inventory back to their initial levels. By varying the rebalancing intervals, we can generate a
wide range of service outage conditions that could subsequently impact demand estimation.
7.1.2. MLE Estimation. We solve the MLE problem with the convex-concave procedure in
Algorithm 2. For MNL models, we solve the MLE problem using IPOPT (Wachter and Biegler
2006), a nonlinear optimization solver. To reduce the dimension of the MLE problem, we use
Algorithm 1 to generate a sparse set of types before solving each instance. The substitution distance
D and type length L are selected by cross validation based on minimum mean squared error of
station-wise ridership prediction on the training set over values ranging from 0.5km to 1.0km and
1 to 5, respectively.
7.2. Results
To quantify the errors of demand estimation, we define the Relative Error for a given station as
Relative Error =Estimated Primary Demand−True Primary Demand
True Primary Demand.
The Mean Absolute Relative Error (MARE) is then calculated by averaging the absolute values of
the relative errors over all stations.
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems28
7.2.1. Demand Estimation. Our first set of results pertain to estimation of the pick-up and
drop-off demands based on 60 hours (T = 3600) of simulated data, with service outage frequencies
that vary between 0.5% to 17%. We summarize these results with three key observations:
1. Across the entire range of simulated outage frequencies, the RANK method consistently out-
performs SA, FSA and MNL in terms of MARE. As shown in Table 2, improvements of RANK over
the best-performing benchmarks are most significant when outage frequencies are high, with about
36% and 20% reduction of pick-up and drop-off demand estimation errors, respectively, when the
rebalancing interval is set to 240 minutes. For comparison with the actual Hubway data sets (on
which our simulations are based), the actual stockout frequency of Hubway is 11.5% during the
evening rush hours, which tracks closely with our simulated stockout frequency of 12.7% when the
rebalancing interval is set to 180 minutes.
2. Breaking down the estimation errors by individual stations in Figure 3, we observe that
the stations with the highest demands tend to benefit most from the RANK method in terms of
error reduction over the benchmarks—these are also the stations that are most frequently out of
service but account for disproportionately large shares of the overall ridership. Therefore, improved
estimates at these high-demand stations tend to have the largest impact on operational decision-
making, as we discuss later in the next section.
3. The proposed CCCP procedure converges very fast in practice, with quickly diminishing
objective value improvements after the first few iterations (here, each iteration involves solving
a concave subproblem derived from the main MLE problem using FW method, as described in
Algorithm 2). In Figure 4, we plot the log likelihood objective values as a function of CCCP itera-
tion, which are averaged over 10 independent simulations with identical setup. After only the first
3 iterations, each subsequent iteration increases the objective value by less than 0.001%, which
corresponds to negligible changes in the MLE estimates thereafter. The average times for solv-
ing the MLE instances in our experiments under Tr = 90,120,150,180,210,240 are approximately
3,4,19,24,15,24 minutes, respectively.
7.2.2. Operational Impact. Using the estimated demand from the earlier methods, we apply
a bike allocation model by Shu et al. (2013) to measure the extent to which more accurate demand
estimates can improve operational decisions. This optimization model allows us to determine,
on the basis of estimated OD demand of all station pairs, an initial allocation of bikes in the
network that maximizes ridership over a planning horizon. However, the model operates under
the assumption that commuters will leave the system immediately when their first choices are
unavailable, thus it does not take into account of choice substitution in the allocation. Despite its
limitations, we adopt this optimization model because it is tractable and can be solved optimally
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems29
Figure 3 Comparison of estimated and true pick-up demand for 171 Hubway stations. The results are averaged
over 10 independent simulations with a rebalancing interval of Tr = 180 min.
(a) RANK (b) SA
(c) FSA (d) MNL
by linear programming. Furthermore, Shu et al. (2013) has shown that the optimal value of the
linear program closely matches the expected ridership obtained by simulations with a Poisson
arrival model. After configuring the initial inventory according to this model, we evaluate the actual
ridership in terms of the total number of trips taken over the planning horizon, averaged over 10
independent simulations. Figure 4 shows the results based on different methods of estimating OD
demand under Tr = 180. We observe that allocation decisions that are made on the basis of the
RANK method lead to more trips taken compared to the benchmarks, with 3% increase for the
longest planning horizon.
8. Empirical Studies: The Boston Hubway Data Set
We next present our results of applying the four methods introduced earlier, RANK, SA, FSA
and MNL, to the actual Boston Hubway data set. The data comprises two sets of records: (i) the
individual trips taken by commuters, each defined by an origin, a destination and the associated
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems30
Figure 4 Left: Convergence of the MLE objective value (in log likelihood) as a function of the iteration of the
CCCP procedure. Right: Total ridership achievable when a fleet of 1,000 bikes are allocated according
to the demand estimated by each method.
4 5 6 7Planning Horizon (Hour)
0
500
1000
1500
2000
2500
Tota
l Trip
s
RANKMNLFSASA
pick-up and drop-off times, and (ii) the inventory levels of individual stations, which are updated
every minute. As in the simulation studies, we focus on the peak usage hours between 5pm and
7pm on the weekdays of summer 2016, with a total of 69 days. Since there is no “true primary
demand” with which to compare the results, we instead evaluate the methods by their accuracy
in predicting the ridership at individual stations. More specifically, we first fit a demand model to
the data collected in the first 54 days, and then use it to predict the total number of bike pick-ups
and drop-offs at every station over the same 54 days (in-sample evaluation), as well as also over
the subsequent 15 days (out-of-sample evaluation). For RANK and MNL methods, the predicted
number of pickups (or drop-offs) at a station given an offer set is defined to be its sum of outflow
(inflow) OD rates over the available destinations (origins). For SA and FSA methods, the prediction
is either equal to the estimated pick-up (drop-off) rate if the station is available, or zero otherwise.
To measure the predictive accuracy, we calculate the relative error for each station as
Relative Error =Predicted Ridership−Actual Ridership
Actual Ridership,
where ridership is quantified by either the total number of pick-ups or drop-offs at the station
through the evaluation period. Averaging the absolute values of these relative errors, we obtain
the Mean Absolute Relative Error (MARE) of prediction.
8.1. Results
Our main results are summarized in Table 3, which shows the in-sample and out-of-sample MAREs
of ridership prediction for each method. While the RANK method has a higher in-sample error
than MNL when it comes to pick-up prediction, it generalizes better than MNL out-of-sample
and is nearly on par with the best-performing method, FSA. On the other hand, in terms of
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems31
Table 3 Mean Absolute Relative Errors (MAREs) of in-sample and out-of-sample
ridership predictions for the Hubway data set.
In-Sample Errors (%) Out-of-Sample Errors (%)
RANK SA FSA MNL RANK SA FSA MNL
Pick-up 11.6 22.7 11.5 0.3 24.6 32.6 24.5 25.5
Drop-off 2.6 3.9 24.5 16.4 24.9 25.7 38.6 30.0
Overall 7.1 13.3 18.0 8.4 24.8 29.2 31.6 27.8
drop-off prediction, the RANK method outperforms the rest by significant margins both in-sample
and out-of-sample. In Figure 5, we observe that the RANK method has a drop-off error distribu-
tion that is more concentrated around zero compared to the rest, as indicated by its quartiles,
(−0.141,−0.009,0.119). Meanwhile, the poor drop-off predictive accuracy of FSA and MNL can
be attributed to overestimation biases, as evidenced in their large positive error medians, 0.224
and 0.120, respectively. However, when it comes to pick-up prediction, both methods exhibit much
smaller biases, with quartiles of (−0.243,−0.097,0.041) and (−0.146,−0.003,0.152), respectively;
whereas, RANK and SA have larger underestimation biases, with quartiles of (−0.257,−0.098,0.033)
and (−0.391,−0.227,−0.036), respectively.
Overall, the relatively poor out-of-sample performance of SA and FSA is hardly surprising given
that both methods ignore the effects of choice substitution altogether. As Table 3 shows, taking
into account of these effects in modeling by MNL and RANK can reduce the overall out-of-sample
errors by 5% and 15%, respectively, when compared with the SA method. Furthermore, the 11%
error improvement of RANK over MNL suggests that our proposed rank-based model is better able
to capture the choice behaviors of Hubway commuters overall.
Finally, we complement our empirical studies in this paper with an interactive web-based visual-
ization, which is accessible online at http://web.mit.edu/cygoh/www/hubway2017. A computer
screenshot of this visualization is shown in Figure 6. On the web page, the user can explore the
historical averages of stockout rates and ridership statistics that are derived from our proposed
rank-based model. In particular, we retrospectively estimate the number of rides that were dis-
placed (i.e., substituted by other choices) or lost over the 69-day period. These insights could help
bike-sharing operators better understand the experience of commuters on the ground and highlight
specific stations where the bikes were not adequately stocked.
9. Conclusions and Future Work
We presented a new approach for estimating the primary demand for a bike-sharing service. Our
approach departs from earlier work in that we adopted a rank-based demand model whose complex-
ity can flexibly scale with data, allowing us to infer commuter choice behaviors without imposing
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems32
Figure 5 Out-of-sample relative errors of pick-up (left) and drop-off (right) ridership predictions at individual
stations for the Hubway data set.
0.00.060.120.18 RANK
0.00.060.120.18 SA
0.00.060.120.18 FSA
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.00.0
0.060.120.18 MNL
Relative Error
Rela
tive
Freq
uenc
y
0.00.060.120.18 RANK
0.00.060.120.18 SA
0.00.060.120.18 FSA
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.00.0
0.060.120.18 MNL
Relative ErrorRe
lativ
e Fr
eque
ncy
Note. The quartiles of pick-up relative errors (left) for RANK, SA, FSA and MNL are (−0.257,−0.098,0.033),
(−0.391,−0.227,−0.036), (−0.243,−0.097,0.041) and (−0.146,−0.003,0.152), respectively. For drop-off prediction
(right), the corresponding error quartiles are (−0.141,−0.009,0.119), (−0.212,−0.047,0.064), (0.034,0.224,0.381) and
(−0.011,0.120,0.287).
parametric assumptions a priori. The proposed model comprises a Poisson process describing the
arrivals of commuters, whose choice behaviors are fully determined their preference rankings over
the stations. We then jointly estimate the arrival rate and probability distribution over preference
rankings by maximum likelihood estimation. To make the estimation problem tractable, we intro-
duced dimensionality reduction techniques that are based on network substitution constraints and
sparsity. In practice, these techniques are able to reduce the parameter dimension by orders of
magnitude, allowing us to scale our model to a city-sized network. We also derived a DC program-
ming method for solving the MLE problem, which reduces it to a sequence of convex optimization
problems with provable convergence guarantees and each convex problem can be efficiently solved
by FW method. Finally, we demonstrated the scalability and effectiveness of our approach through
a simulation study. Our results showed that the proposed methods are able to overcome the estima-
tion biases due to demand censoring and choice substitution, leading to increased ridership when
used as the basis for inventory planning.
Although the focus of this paper is on bike-sharing, our proposed demand model and estimation
methods are applicable more generally. For example, the OD demand model is appropriate for
other systems that share similar characteristics as a bike-sharing service, such as one-way car rental
services that allow pick-ups and drop-offs at designated parking lots. In other domains such as
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems33
Figure 6 An interactive visualization of the station stockout distribution of Hubway and its estimated impact on
ridership. Available on http://web.mit.edu/cygoh/www/hubway2017.
retail, our proposed substitution graph is also applicable if products that differ in certain quality-
defining attributes are very unlikely to be substitutable by each other. Last but not least, we tackled
the core computational issues in estimating rank-based choice models—following the work of Farias
et al. (2013) and van Ryzin and Vulcano (2014)—by proposing a new iterative MLE procedure
and a sparsity-based type enumeration algorithm that achieved good empirical performance.
There are several issues we did not address in this paper that merit consideration in future work.
First, our model assumed a static commuter arrival rate without taking into account of covariates
such as weather and seasonality factors. Combining our approach with other streams of work in
structural demand modeling could yield a model that adapts better to changing scenarios. Second,
we focused on the demand of dock-based bike-sharing service in this paper. With the increasing
adoption of dockless bike-sharing systems, which allow commuters to pick up and drop off bikes
anywhere, it is worth exploring an extension of our approach to this setting. Finally, developing
tailored operation models for capacity planning and rebalancing that takes into account of choice
substitutions could be a natural next step on top of this work.
Acknowledgments
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems34
The authors would like to thank Boston Hubway Bikes (now Blue Bikes Boston) for providing all the data
sets used in this study and for their clarifications of the data specifications. We also express our gratitude
to Weijun Ding, who was affiliated with the Georgia Institute of Technology at the time, for contributing to
early discussions of this research project.
References
Abdallah, Tarek, Gustavo Vulcano. 2017. Demand estimation under the multinomial logit model from sales
transaction data. Working paper, Columbia University, New York City, NY.
Ben-Akiva, Moshe E, Steven R Lerman, Steven R Lerman. 1985. Discrete Choice Analysis: Theory and
Application to Travel Demand , vol. 9. MIT Press.
Chemla, Daniel, Frederic Meunier, Roberto Wolfler Calvo. 2013. Bike Sharing Systems: Solving the Static
Rebalancing Problem. Discrete Optimization 10(2) 120–146.
Clarkson, Kenneth L. 2010. Coresets, sparse greedy approximation, and the frank-wolfe algorithm. ACM
Transactions on Algorithms (TALG) 6(4) 63.
El-Assi, Wafic, Mohamed Salah Mahmoud, Khandker Nurul Habib. 2017. Effects of Built Environment And
Weather on Bike Sharing Demand: A Station Level Analysis of Commercial Bike Sharing In Toronto.
Transportation 44(3) 589–613.
Farias, Vivek F, Srikanth Jagabathula, Devavrat Shah. 2013. A Nonparametric Approach to Modeling Choice
with Limited Data. Management Science 59(2) 305–322.
Frank, Marguerite, Philip Wolfe. 1956. An algorithm for quadratic programming. Naval research logistics
quarterly 3(1-2) 95–110.
Freund, Daniel, Shane G Henderson, David B Shmoys. 2017. Minimizing multimodular functions and
allocating capacity in bike-sharing systems. International Conference on Integer Programming and
Combinatorial Optimization. Springer, 186–198.
Freund, Robert M, Paul Grigas. 2016. New analysis and results for the frank–wolfe method. Mathematical
Programming 155(1-2) 199–230.
Ghosh, Supriyo, Pradeep Varakantham, Yossiri Adulyasak, Patrick Jaillet. 2017. Dynamic repositioning to
reduce lost demand in bike sharing systems. Journal of Artificial Intelligence Research 58 387–430.
Haensel, Alwin, Ger Koole, Jeroen Erdman. 2011. Estimating Unconstrained Customer Choice Set Demand:
A Case Study on Airline Reservation Data. Journal of Choice Modelling 4(3) 75–87.
Jagabathula, Srikanth, Paat Rusmevichientong. 2016. The limit of rationality in choice modeling: Formula-
tion, computation, and implications. Management Science .
Jaggi, Martin. 2011. Sparse convex optimization methods for machine learning. Ph.D. thesis, ETH Zurich.
Kabra, Ashish, Elena Belavina, Karan Girotra. 2016. Bike-share Systems: Accessibility and Availability.
Working paper, INSEAD, Fontainebleau, France.
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems35
Le Thi, Hoai An, Tao Pham Dinh. 2018. DC programming and DCA: thirty years of developments. Mathe-
matical Programming 1–64.
Lipp, Thomas, Stephen Boyd. 2016. Variations and Extension of the Convex–concave Procedure. Optimiza-
tion and Engineering 17(2) 263–287.
O’Mahony, Eoin, David B Shmoys. 2015. Data Analysis and Optimization for (Citi) Bike Sharing. Proceedings
of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 687–694.
Pendem, V. Deshpande, P. 2016. Maximizing Ridership in Bike-sharing Systems Using Empirical Data and
Stochastic Models. Working paper, University of North Carolina, Chapel Hill, NC.
Raviv, Tal, Michal Tzur, Iris A Forma. 2013. Static Repositioning in a Bike-sharing System: Models and
Solution Approaches. EURO Journal on Transportation and Logistics 2(3) 187–229.
Rixey, R. 2013. Station-level forecasting of bike sharing ridership: Station network effects in three u.s.
systems. Transportation Research Record: Journal of the Transportation Research Board (2387) 46–55.
Schuijbroek, Jasper, Robert C Hampshire, W-J Van Hoeve. 2017. Inventory rebalancing and vehicle routing
in bike sharing systems. European Journal of Operational Research 257(3) 992–1004.
Shapiro, Alexander, Darinka Dentcheva, Andrzej Ruszczynski. 2009. Lectures On Stochastic Programming:
Modeling and Theory . SIAM.
Shu, Jia, Mabel C Chou, Qizhang Liu, Chung-Piaw Teo, I-Lin Wang. 2013. Models for Effective Deployment
and Redistribution of Bicycles Within Public Bicycle-Sharing Systems. Operations Research 61(6)
1346–1359.
Singhvi, Divya, Somya Singhvi, Peter I Frazier, Shane G Henderson, Eoin O’Mahony, David B Shmoys,
Dawn B Woodard. 2015. Predicting Bike Usage for New York City’s Bike Sharing System. AAAI
Workshop On Computational Sustainability .
Singla, Adish, Marco Santoni, Gabor Bartok, Pratik Mukerji, Moritz Meenen, Andreas Krause. 2015. Incen-
tivizing Users for Balancing Bike Sharing Systems. Proceedings of the Twenty-Ninth AAAI Conference
on Artificial Intelligence. 723–729.
Sriperumbudur, Bharath K. 2009. On the Convergence of the Concave-Convex Procedure. Advances in
Neural Information Processing Systems 1759–1767.
Train, Kenneth E. 2009. Discrete Choice Methods with Simulation. Cambridge University Press.
van Ryzin, Garrett, Gustavo Vulcano. 2014. A Market Discovery Algorithm to Estimate a General Class of
Nonparametric Choice Models. Management Science 61(2) 281–300.
van Ryzin, Garrett, Gustavo Vulcano. 2017. Techical Note–An Expectation-Maximization Method to Esti-
mate a Rank-Based Choice Model of Demand. Operations Research 65(2) 396–407.
Vulcano, Gustavo, Garrett Van Ryzin, Wassim Chaar. 2010. Om practicechoice-based revenue management:
An empirical study of estimation and optimization. Manufacturing & Service Operations Management
12(3) 371–392.
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems36
Vulcano, Gustavo, Garrett Van Ryzin, Richard Ratliff. 2012. Estimating Primary Demand for Substitutable
Products from Sales Transaction Data. Operations Research 60(2) 313–334.
Wachter, Andreas, Lorenz T Biegler. 2006. On the implementation of an interior-point filter line-search
algorithm for large-scale nonlinear programming. Mathematical Programming 106(1) 25–57.
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems37
10. Appendices
10.1. Appendix A: Proofs
10.1.1. Proof of Proposition 1
Proof. The optimality of λ follows directly from strict concavity and first-order optimality
conditions. To derive the optimality conditions for x, let us denote yj ,∑
k∈Mj(N ) xk for all j =
1, . . . ,N . Since∑N
j=1 yj =∑N
j=1 P(j | N ) = 1, we can rewrite the second maximization problem in
(2) in terms of y, as follows
maxy∈∆N
{M
N∑j=1
∑T
t=1mtj
Mlog yj
}.
By Gibbs’ inequality, the optimal solution of the above problem is given by yj =∑Tt=1mjtM
for all j.
Further, a feasible x that corresponds to y can be constructed as follows: for all k = 1, . . . ,K, let
xk = yj if σk = (j) (a type comprising only one choice j), and zero otherwise. �
10.1.2. Proof of Proposition 2
Proof. For a fixed x(n−1), the optimal solution for (11) is given by
λ(n) =M∑T
t=1
(1−
∑k∈M0(St)
xk
) =M∑T
t=1 ρ(St;x(n−1)),
as derived in (5). To solve for x(n) in (12), we substitute λ(n) into the likelihood function in (1) and
then obtain
arg maxx∈∆K
−M∑T
t=1 ρ(St;x)∑T
t=1 ρ(St;x(n−1))+M log
M∑T
t=1 ρ(St;x(n−1))+
T∑t=1
∑j∈St
mtj log
∑k∈Mj(St)
xk
=arg max
x∈∆K−M∑T
t=1 ρ(St;x)∑T
t=1 ρ(St;x(n−1))+
T∑t=1
∑j∈St
mtj log
∑k∈Mj(St)
xk
,
where the equality is due to the fact that the term M log(M/∑T
t=1 ρ(St;x(n−1))) does not depend
on x. Since this maximization problem is identical to the CCCP sub-problem in (10), we conclude
by induction that any sequence {xn}n≥1 generated by the above procedure is optimal for (10) given
a common initialization point x(0). �
10.1.3. Proof of Theorem 1
Proof. The relation ΣG ⊆ΣG clearly holds because every type generated by Algorithm 1 satisfies
the network substitution constraints. Denote L∗ as the optimal value of problem (3), and L∗ as
the optimal value of problem (4). We first show L∗ ≥ L∗. Denote x∗, λ∗ as an optimal solution of
problem (4). We construct another solution λ∗ = λ∗ and x∗ = [x∗,0] where x∗i = x∗i ,∀i ∈ ΣG and
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems38
x∗i = 0,∀i∈Σ\ ΣG. Clearly (x∗, λ∗) is a feasible solution for problem (3) and has objective value of
L∗. Thus the optimal value of (3), L∗, should be at least L∗.We then show L∗ ≤ L∗. To prove that, we provide a method to convert any optimal solution
(x∗, λ∗) of problem (3) to a feasible solution of problem (4) with the same objective value. Let’s
consider a type σ1 = (j1, . . . , jk)∈ΣG that is not in ΣG. For all t= 1, . . . , T and all i∈ St, the type
σ1 must be compatible with either choice i or not using the service (choice 0) given offer set St.
We consider the following three disjoint cases:
Case 1: If σ1 is compatible with not using the service for any offer set St, t∈ {1, . . . , T}, it must be
generated within steps (9) to (11) in Algorithm 1 at time period t. This contradicts with σ1 /∈ ΣG.
Case 2: For the remaining types, if there exists some s ∈ {1, . . . , k} and a corresponding t ∈{1, . . . , T}, such that js ∈ Pt and the type σ1 is compatible with the choice js given offer set St,
i.e., σ1 ∈ Γjs(St) , {σk : k ∈Mjs(St)}. Let us choose the largest possible integer s ≤ k for which
this condition holds, and construct another type σ2 = (j1, . . . , js). Roughly speaking, our goal is
to show that (i) σ2 ∈ ΣG, and (ii) Aσ1=Aσ2
. This means that for a solution x′ constructed from
x with x′σ1= 0, x′σ2
= xσ2+ xσ1
and x′σ = xσ,∀σ ∈Σ : σ 6= σ1, σ2 has the same objective value as x.
Clearly, σ2 ∈ ΣG because it is enumerated in the t-th iteration of the algorithm, where we have
j1 ∈ (N \ St)∩ δG(js), {j2, . . . , js−1} ⊂ (N \ St)∩ δG(j1) and σ2 ∈Π{j2,...,js−1}(j1, js). We now show
that Aσ1=Aσ2
. Recall that Aσ comprises individual rows that, for every t′ and every i∈ Pt′ ∪{0},takes value 1 if σ ∈ Γi(Pt′), and 0 other. Thus it suffices to prove that σ1 ∈ Γi(St′) ⇐⇒ σ2 ∈ Γi(St′)
for every t′ and every i ∈ Pt′ ∪ {0}. Since we already rule out the possibility of case 1, we have
σ1, σ2 /∈ Γ0(St′). We thus only focus on proving the equivalence for every i ∈ Pt′ and every t′. One
direction of this equivalence is trivial: if σ2 ∈ Γi(St′) for some i ∈ Pt′ , then i must be the highest
ranked available alternative among (j1, . . . , js) given St′ . Since σ1 and σ2 have identical rankings
of the first s choices, we also have σ1 ∈ Γi(S′t). For the other direction, observe that if σ1 ∈ Γi(S
′t),
then i ∈ {j1, . . . , js} holds because of how we have chosen s. Again, since the first s choices of σ1
and σ2 are identically ranked, we also have σ2 ∈ Γi(S′t).
Case 3: For the remaining types, σ1 must be compatible with some unobserved choice i∈ St \Ptgiven offer set St for some t∈ {1, . . . , T}. Since σ1 does not belong to cases 1 and 2, we have Aσ1
= 0.
According to Lemma 1, any optimal solution x∗ of problem (3) has x∗σ1= 0.
These analyses suggest that for any optimal solution (x∗, λ∗) of problem (3) of which x∗σ > 0
for some σ ∈ΣG \ ΣG, σ must belong to case 2. Applying the transformation method we describe
in case 2, we can get a solution x∗∗ of which x∗∗σ = 0,∀σ ∈ ΣG \ ΣG. x∗ and x∗∗ also have the
same objective values. Now consider the sub-vector x∗∗ΣG
by only selecting elements corresponding
to types in ΣG. It is clear that (x∗∗ΣG, λ∗) is a feasible solution of problem (4) and has the same
objective value. This leads to our conclusion that L∗ ≤ L∗.We thus have L∗ = L∗, which completes the proof. �
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems39
10.1.4. Proof of Corollary 1
Proof. We start by considering two compatibility matrices, A∈ {0,1}(T+M)×|ΣGo | for origin types
and B ∈ {0,1}(T+M)×|ΣGd | for destination types where T is the number of time periods, M is the
number of observed trips, |ΣGo | and |ΣG
d | are the number of types considered for origin and destina-
tion station rankings. yo =Au specifies the linear relationship between origin station choice vector
yo and the origin station ranking probabilities u, with yoStok,ik
=∑
l∈Moik
(Stok
) ul for all observed trip
origin stations {ik}k=1,...,M and yoSt,0 =∑
l∈Mo0(St)
ul,∀t= 1, . . . , T for all choices corresponding to not
using the systems. Similarly, yd =Bv specifies the linear relationship between destination station
choice vector yd and the destination station ranking probabilities v, with ydQtdk,jk
=∑
m∈Mdjk
(Qtdk
) vm
for all observed trip destination stations {jk}k=1,...,M . However, for all choices corresponding to not
using the systems, we have ydQt,0 =∑
m∈ΣGdvm,∀t= 1, . . . , T . This is counter-intuitive at first glance.
The idea is that commuter’s leaving decision only depends on the availability of origin stations.
All destination ranking types are thus specified to be compatible with not using the system.
We now consider another compatibility matrix C ∈ {0,1}(T+M)×|Σo||Σd| constructed by the Had-
mard product (element-wise product) of all pairs of columns in A and B:
C = [Al ◦Bm]l∈{1,...,|Σo|},m∈{1,...,|Σd|} =
A1,lB1,m
A2,lB2,m
...AT+M,lBT+M,m
l∈{1,...,|Σo|},m∈{1,...,|Σd|}
where Al and Bm are the lth and mth columns of matrix A and B respectively. We can now check
that the joint origin-destination choice vector yod(Stok,Qtdk
),(ik,jk) =∑
l∈Moik
(Stok
)
∑m∈Md
jk(Qtdk
) xlm and
yod(St,Qt),0=∑
m∈Mdjk
(Qtdk
) xlm can be specified by yod =Cx compactly.
Based on Lemma 1 we know that identical columns and zero columns in matrix C are redundant
for maximizing the likelihood. Further note that the following two facts hold:
• For any two identical columns in A, Ai and Aj, the Hadmard product with any column in B
are identical, i.e. Ai ◦Bk =Aj ◦Bk,∀k ∈ {1, . . . , |Σd|}. Similarly, for any two identical columns in B,
Bi and Bj, the Hadmard product with any column in A are identical, i.e., Bi ◦Ak =Bj ◦Ak,∀k ∈
{1, . . . , |Σo|}.
• For any zero column in A (B), its Hadmard product with any column in B (A) is still a zero
column.
Together, these two bullet points reveal that for any origin type σo ∈ΣGo \ ΣG
o and any destina-
tion type σd ∈ ΣGd , the OD type variable corresponding to the paired type (σo, σd) is redundant.
Similarly, for any destination type σd ∈ΣGd \ ΣG
d and any origin type σo ∈ΣGo , the OD type variable
corresponding to origin type σo and destination type σd is also redundant. This means that we can
restrict ourselves to types in ΣGo × ΣG
d . Note from the description of Algorithm 3, we can see that
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems40
ΣGod = SparseEnumeratePaired(G,S,Q,T , ΣG
o , ΣGd )∈ ΣG
o × ΣGd . What is left here is to show any
(σo, σd)∈ (ΣGo × ΣG
d )\ ΣGod can be effectively dropped. This is true because such (σo, σd) can neither
explain pickups and dropoffs nor not using the service. Thus its corresponding column in matrix
C is a zero column. This completes the proof. �
10.1.5. Proof of Corollary 2
Proof. We first show that for any two destination types σdl ⊂ σdm, we have Bl ≤Bm, where B
is the destination type compatibility matrix. To see that, for row r corresponding to not using
the service under available destination stations Qt, it is by definition that Br,l =Br,m = 1; for row
r corresponding to return bikes at station jk under available destination stations Qtdk, we have
Br,l ≤Br,m because if σdl ∈Mdjk
(Qtdk) then σdm ∈Md
jk(Qtd
k) but not vice versa.
We further know that for any column Ak in the origin compatibility matrix A, Ak ◦Bl ≤Ak ◦Bmgiven Bl ≤ Bm. Applying Lemma 1, we know that all type variables corresponding to Ak ◦Bl is
zero in the optimal solution. Combined with Corollary 1, this completes the proof. �
10.1.6. Proof of Theorem 3
Proof. We first show that under the observability conditions, the parameters λ∗ and y∗ can be
uniquely identified by maximizing the expected log likelihood function under a fixed time period
T . That is, any optimal solution λ and x of the problem
maxλ≥0,x∈∆K
1
TE[LT (λ,x)] (14)
necessarily satisfies λ = λ∗ and P(j | S; x) = y∗S,j,∀S ∈ 2N ,∀j ∈ S. To show this, let us denote
2N = {S1, . . . , SL} and assume that T is sufficiently large such that each offer set has been observed
at least once, i.e., for each l = 1, . . . ,L, we have 1T
∑T
t=1 1(St = Sl) , αSl > 0 (here, αSl is not
necessarily equal to αSl for any finite T ). We have
1
TE[LT (λ,x)]
=− λT
T∑t=1
ρ(St;x) +1
T
T∑t=1
E[Mt] logλ+1
T
T∑t=1
∑j∈St
E[mtj] logP(j | St;x)
=− λT
T∑t=1
ρ(St;x) +1
T
T∑t=1
λ∗∑j∈St
y∗St,j logλ+1
T
T∑t=1
∑j∈St
λ∗y∗St,j logP(j | St;x)
=−L∑l=1
αSlλρ(Sl;x) +L∑l=1
αSlλ∗∑j∈Sl
y∗Sl,j logλ+L∑l=1
αSlλ∗∑j∈Sl
y∗Sl,j logP(j | Sl;x).
(15)
Let us define λ∗Sl, λ∗
∑j∈Sl
y∗Sl,j
to be the true arrival rate to the system given the offer set Sl, and
λSl , λρ(Sl;x) to be its corresponding estimate based on variables λ and x. Also, let us define the
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems41
variables ySl,j , P(j | Sl;x) for every l and j ∈ Sl. We can rewrite (15) as a function of the variables
{λSl}l and {ySl,j}l,j as follows,
1
TE[LT (λ,y)]
=−L∑l=1
αSlλSl +L∑l=1
αSlλ∗Sl
logλSl∑
j′∈SlySl,j′
+L∑l=1
αSlλ∗Sl
∑j∈Sl
y∗Sl,j∑
j′∈Sly∗Sl,j′log ySl,j
=L∑l=1
αSl(−λSl +λ∗Sl logλSl) +L∑l=1
αSlλ∗Sl
∑j∈Sl
y∗Sl,j∑
j′∈Sly∗Sl,j′log
ySl,j∑j′∈Sl
ySl,j′.
In the above equation, we can interpret p∗l,j ,y∗Sl,j∑
j′∈Sly∗Sl,j′
as the conditional probability that an
arriving commuter chooses station l from the offer set Sl given that she stays in the system, and
pl,j ,ySl,j∑
j′∈SlySl,j
′to be the corresponding estimate based on y. Given our assumption that y∗
Sl,j> 0,
we have p∗l,j > 0. Note that for any feasible choice probability vector y, we must have pl,j ≥ 0,∀l, j
and∑
j∈Slpl,j = 1,∀l satisfied. We then consider the optimization problem over {λSl}l and {pl,j}l,j
below:
maximizeL∑l=1
αSl(−λSl +λ∗Sl logλSl,j) +L∑l=1
αSlλ∗Sl
∑j∈Sl
p∗l,j log pl,j
subject to∑j∈Sl
pl,j = 1, ∀l= 1, . . . ,L
pl,j ≥ 0, ∀l= 1, . . . ,L,∀j ∈ Sl
λSl ≥ 0, ∀l= 1, . . . ,L
(16)
By Gibbs’ inequality, it has a unique optimal solution λSl = λ∗Sl,∀l and pl,j = p∗l,j,∀l, j. Now, observe
that the original MLE problem in (14) is equivalent to (16), but with auxiliary variables λ,x and
y that are subject to additional constraints:
λSl = λ∑j∈Sl
ySl,j, ∀l= 1, . . . ,L
pl,j =ySl,j∑
j′∈SlySl,j′
, ∀l= 1, . . . ,L,∀j ∈ Sl
y=Ax
x∈∆K .
(17)
Note that since ySl,j appears for all l ∈ {1, . . . ,L} and j ∈ Sl in this formulation, we can simply
replace ySl,0 with 1−∑
j∈SlySl,j, and matrix A does not need to contain rows corresponding to not
using the system for each offer set. To be specific, A only specifies the linear relationships between
y and the ranking probabilities x, with ySl,j =∑
k∈Mj(Sl)xk holding for each l and j ∈ Sl.
Let us fix λSl = λ∗Sl
and pl,j = p∗l,j on the LHS of (17). Clearly, the above equations are satisfied if
we assign λ= λ∗, ySl,j = y∗Sl,j
,∀l, j and any x∈∆K satisfying Ax= y∗. Since we can find a feasible
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems42
variable assignment (λ∗, y∗) for the above equations that results in the same objective value as the
relaxed problem in (16), it follows that the assignment must be optimal. We next show that (λ∗, y∗)
is the unique assignment that corresponds to the optimal objective value. Given that we observe
every possible offer set, including the complete set N , we obtain from the first set of equations in
(17) that λ∗N = λ∗ = λ∑
j∈N yN ,j = λ, which uniquely determines λ. As a result, the first two set
of equations (with λSl = λ∗Sl
, pl,j = p∗l,j and λ= λ∗ now fixed) in (17) become:
λ∗∑j∈Sl
y∗Sl, = λ∗∑j∈Sl
ySl,j, ∀l= 1, . . . ,L
y∗Sl,j∑
j′∈Sly∗Sl,j′
=ySl,j∑
j′∈SlySl,j′
, ∀l= 1, . . . ,L,∀j ∈ Sl(18)
Since we have assumed that all possible offer sets are observable and y∗ > 0, the above set of
equations are uniquely determined by y= y∗.
Having proven that maximizing the expected log likelihood function in (14) results in an optimal
solution that corresponds to the true parameters λ∗ and y∗, we only have to show that with
probability 1, it is equivalent to maximizing the empirical log likelihood function (in the limit
T →∞),
limT→∞
maxλ,y
1
TLT (λ,y)≡max
λ,ylimT→∞
1
TE[LT (λ,y)],
where the equivalence above holds in the sense that both optimization problems have the same
optimal objective value and the same set of optimal solutions in the limit. Indeed, this equivalence
can be proven to be correct by checking that the conditions for strong uniform convergence holds
(see Theorem 5.3 from Chapter 5 of Shapiro et al. (2009)). The most critical step of the proof
is to find a compact set C where E[LT (λ,x)] is finite valued for x ∈ C and the optimal solution
of E[LT (λ,y)] is contained in C and with probability one, for T large enough, optimal solution
of LT (λ,y) is also contained in C. Note that the difficulty here is that C cannot be the original
feasible region because any y∗Sl,j = 0 makes the likelihood −∞. Here we provide a proof which shows
that we can find a constant ε such that for sufficiently large T , optimal solutions of LT (λ,y) can
be restricted to ySl,j ≥ ε. With a change of variable similar to (6), we have the following sample
likelihood function only in variable y.
LT (y) =
(L∑l=1
∑j∈Sl
mlj log(ySl,j)
)−M log
(L∑l=1
T αSl
∑j∈Sl
ySl,j
),
where mlj =∑T
t=1mtj1(St = Sl).
Now, thinking about a feasible solution y0 = Ax constructed by a uniform probability vector
xi = 1/K,∀i= 1, . . . ,K. Let y∗ be the optimal solution, we then have the following,
LT (y)≥LT (y0)
Goh, Yan, and Jaillet: Estimating Primary Demand in Bike-sharing Systems43
=
(L∑l=1
∑j∈Sl
mlj log
(|Mj(Sl)|
K
))−M log
(L∑l=1
T αSl
∑j∈Sl
|Mj(Sl)|K
)
≥L∑l=1
∑j∈Sl
mlj log
(1
K
)−M log(T ) =M log
(1
K
)−M log(T )
Suppose minl∈{1,...,L},j∈Sl y∗Sl,j
= ε, we have the following,
LT (y∗) =
(L∑l=1
∑j∈Sl
mlj log(ySl,j)
)−M log
(L∑l=1
T αSl
∑j∈Sl
ySl,j
)
≤mlj log(ε)−M log(L∑l=1
T αSl1(Sl =N )∑j∈Sl
ySl,j)
≤mlj log(ε)−M log(T αN∑j∈N
yN,j)
=mlj log(ε)−M log(T αN)
In order to have LT (y∗)≥LT (y),∀y, the following needs to be hold,
mlj log(ε)−M log(T αN )≥M log
(1
K
)−M log(T )
log(ε)≥log( 1
K) + log(αN )mljM
Note that as T →∞, αN → αN and (mlj/M)→ αSly∗Sl,j
. This means for any ε1 > 0 such that
αN − ε1 > 0 and ε2 > 0, there exists sufficiently large T we have
log(ε)≥ log(1/K) + log(αN − ε1)
αSly∗Sl,j
+ ε2
We thus identify such compact set C as exp
(log(1/K)+log(αN−ε1)
αSly∗Sl,j
+ε2
)≤ ySl,j ≤ 1, ∀l ∈ {1, . . . ,L}, j ∈
Sl. To avoid tedious presentation, we omit the details of checking the rest of the regularity conditions
for strong uniform convergence. �