learning-based framework for transit assignment modeling under information provision
TRANSCRIPT
Learning-based framework for transit assignmentmodeling under information provision
Mohamed Wahba • Amer Shalaby
Published online: 26 November 2013� Springer Science+Business Media New York 2013
Abstract The modeling of service dynamics has been the focus of recent developments
in the field of transit assignment modeling. The emerging focus on dynamic service
modeling requires a corresponding shift in transit demand modeling to represent appro-
priately the dynamic behaviour of passengers and their responses to Intelligent Trans-
portation Systems technologies. This paper presents the theoretical development of a
departure time and transit path choice model based on the Markovian Decision Process.
This model is the core of the MIcrosimulation Learning-based Approach to TRansit
Assignment. Passengers, while traveling, move to different locations in the transit network
at different points in time (e.g. at stop, on board), representing a stochastic process. This
stochastic process is partly dependent on the transit service performance and partly con-
trolled by the transit rider’s trip choices. This can be analyzed as a Markovian Decision
Process, in which actions are rewarded and hence passengers’ optimal policies for maxi-
mizing the trip utility can be estimated. The proposed model is classified as a bounded
rational model, with a constant utility term and a stochastic choice rule. The model is
appropriate for modeling information provision since it distinguishes between individual’s
experience with the service performance and information provided about system dynamics.
Keywords Travel choice � Markovian decision process � Learning � Information
provision
Introduction
This paper presents the theoretical development of a departure time and transit path choice
model based on the Markovian Decision Process. This model is the core of MILATRAS—
M. Wahba (&) � A. ShalabyCivil Engineering Department, University of Toronto, 35 St. George Street, Toronto,ON M5S 1A4, Canadae-mail: [email protected]
123
Transportation (2014) 41:397–417DOI 10.1007/s11116-013-9510-5
MIcrosimulation Learning-based Approach to TRansit ASsignment (Wahba 2008; Wahba
and Shalaby 2009). The proposed model considers multiple dimensions of the transit path
choice problem; such dimensions were either simplified (e.g. stop choice) or ignored (e.g.
departure time) in most previous approaches. The developed assignment procedure is
capable of modeling the day-to-day and within-day dynamics of the transit service, as well
as passengers’ responses under information provision.
Nuzzolo et al. (2001) presented a dynamic stochastic path choice model which con-
siders variations in transit services and passengers’ learning and adaptation. The model is
called ‘‘doubly dynamic’’ as it accounts for within-day and day-to-day variations. Pas-
sengers are assumed to make a discrete choice of trip decisions based on travel disutility
and the desired arrival time at destination. Nuzzolo et al. (2003) report on the theoretical
foundation of the schedule-based approach to modeling travelers’ behaviour in transit
networks. In the assignment procedure, the path choice model employs the Logit/Probit
formulations to calculate the choice probabilities. The run-choice is varied in both within-
day and day-to-day assignment based on an exponential updating mechanism of run
attributes. This is not considered a learning-driven approach as the learning process does
not follow an explicit learning algorithm, and it is done on an aggregate level using a
common pool of information that is accessed by all travelers.
Recently, traffic assignment procedures implemented learning-based models which
have been shown to result in different and more realistic assignments relative to con-
ventional methods (Nakayama et al. 1999; Arentze and Timmermans 2003; Ettema et al.
2005). With an agent-based representation, individual travelers are explicitly modeled as
cognitive agents. This has led to the shift to an agent-based modeling framework that has
been successfully and aggressively applied in activity-based models (Timmermans et al.
2003; Salvini and Miller 2003; Roorda and Miller 2006) and in traffic assignment models
with information provision (de Palma and Marchal 2002; Rossetti and Liu 2005; Ettema
et al. 2005). As such, the next generation of dynamic transit assignment algorithms should
consider adopting learning-based and multi-agent microsimulation concepts for consis-
tency with emerging traffic assignment models and for integration with state-of-the-art
activity-based modeling frameworks. In response, the authors developed MILATRAS
(Wahba 2008; Wahba and Shalaby 2009) for the modeling of day-to-day and within-day
dynamics of the transit assignment problem.
The proposed transit path choice model within MILATRAS is inspired by the non-
equilibrium framework for assignment procedures proposed by Cascetta and Cantarella
(1991) and the agent-based, learning-driven path choice model presented by Ettema et al.
(2005). It applies similar concepts to the transit assignment problem and passenger
behavioural modeling, taking into consideration the distinctive features of public transport
networks and transit rider travel behaviour.
Cascetta and Cantarella (1991) addressed the lack of explicit modeling of day-to-day
and within-day variations in supply and demand within static equilibrium procedures
where a fixed-point solution is searched. They argued that the equilibrium framework is not
structurally compatible with observed phenomena, such as habitual behaviour, random
variations in demand and network conditions, and transient states of network conditions
following modifications. They proposed a process model that generates fixed-point or
steady-state arc-flows (approximating the mean of the distribution of the corresponding
stochastic process model and comparable to Stochastic User Equilibrium outputs) with the
explicit representation of day-to-day adjustments in travelers’ decisions and within-day
variations of demand. The evolution of the path-flows in successive days is then modeled
as a Markovian stochastic process. Cantarella and Cascetta (1995) proposed conditions for
398 Transportation (2014) 41:397–417
123
the existence and uniqueness of stationary probability distributions for the stochastic
process model. The model, however, did not consider en-route replanning due to variations
in network conditions or information provision. Examples of research efforts in this
direction include Davis and Nihan (1993), Watling (1996), and Hazelton and Watling
(2004).
Ettema et al. (2005) represent learning and experience at the individual level for the
auto-assignment problem. They promote the use of microsimulation methods to model the
system-level phenomena, such as congestion, based on individual-level choices. They used
a microsimulation platform to account for the travel time uncertainties on departure time
choices, thus allowing for the modeling of day-to-day adjustments of travelers’ behaviour.
They used mental models to represent traveler’s experience of network conditions.
Travelers are assumed to be decision-makers who decide on the trip departure time from
origin, for a fixed route. The study, therefore, did not consider other en-route trip choices,
e.g. route choice.
For a more comprehensive exposition to methodological developments of transit
assignment models over the past few decades, see Wahba (2008).
Markovian decision process
Markov chains and the ergodic theory
A Markov Chain is an ordered sequence of system state occurrences from a stochastic
process, whose system variables satisfy the Markov property. The Markov property means
that the conditional probability distribution of future states of the process (i.e. values of the
random variable at future time instances), given the present state and all past states,
depends only upon the present state and not on any past state. The present state of the
system is sufficient and necessary to predict the future state of the process. This is
sometimes referred to as the ‘memoryless property’ of the Markov process.
The Ergodic Theory provides the foundation for studying stochastic processes that have
the Markovian property. It establishes the conditions under which a Markov Chain can be
analysed to determine its transition probabilities and steady state behaviour. The Ergodic
Theory states that if a Markov Chain is ergodic, then a unique steady state distribution
exists and is independent of the initial conditions (Petersen 1990). A Markov Chain is
called ‘ergodic’ if, and only if, the chain is irreducible, positive-recurrent and aperiodic.
The positive-recurrent and aperiodic characteristics guarantee that the steady state distri-
bution exists, while irreducibility guarantees that the steady state distribution is unique and
independent of initial conditions.
Actions, rewards and decision making in Markov chains
In Markov processes, the outcome of the stochastic process is entirely random, or is not
controlled but rather observed. Situations with the underlying stochastic process having the
Markov property and where outcomes are partly random and partly under the control of a
decision maker are studied using Markov Decision Processes (MDPs).
A MDP is a discrete-time stochastic process, where a decision-maker partly controls the
transition probabilities through deliberate choices at discrete points in time. In an MDP, in
addition to the system state representation, at each state there are several actions from
which the decision maker must choose. This decision-making behaviour, consequently,
Transportation (2014) 41:397–417 399
123
influences the transition probability from one system state to another. Associated with the
decision-making capability is the concept of a reward. Rewards can be earned for each
state visited, or for each state-action pair implemented. The decision-maker in an MDP has
a goal of optimizing some cumulative function of the rewards, implying that the decision-
maker acts rationally. The main difference between Markov Processes and Markov
Decision Processes is the addition of actions (allowing choices) and rewards (giving
motivation).
If the reward and return functions are specified, then the MDP becomes an (stochastic)
optimization problem where the decision variables are the transition probabilities and the
objective function is about the maximization of the expected return. When the underlying
Markov Process is ergodic, then a unique steady state distribution exists. This means that a
unique state-transition probability matrix from state i to state j, ½Prij�, exists. It ensures that
the above formulated optimization problem has a solution and it is unique (i.e. global
optimum). A solution, or a possible probability matrix represents a policy p that the agent
can follow where p! Pr(i; jÞ. This policy guides the agent as to what actions to choose at
any given state regardless of prior history. The optimal solution represents the optimal
policy p* that maximizes the expected return. Reinforcement Learning (RL) represents an
approach to estimate the policy that optimizes the expected return. RL uses a formal
framework defining the interaction between a goal-directed agent and its environment in
terms of states, actions and rewards (Sutton and Barto 1998). It is a computational
approach to model goal-directed learning and decision-making. It uses exploitation of
experience and exploration of available actions to converge to the policy that yields the
maximum utility. The agent will choose actions at each state according to the optimal
policy p* returned. The elements of the optimal policy can then be used to study agent’s
choices and explain agent’s preferences in various environments.
The transit path choice model
Transit path choice problem as a Markovian decision process
Travelers are assumed to be goal-directed intelligent agents. A traveler has a goal of
minimizing the travel cost (e.g. time, money, inconveniences, etc.) and for some trip
purposes, a goal of maximizing the probability of arriving on a desired arrival time. In
order to achieve their goals, travelers follow policies to optimize the utility of their trip.
These policies manifest themselves through the observed travel behaviour (i.e. trip choi-
ces) of individual passengers. Assuming that travelers are rational, it is logical to expect
that they follow their optimal policy. Based on the notion that actions receive rewards (or
penalties), it is also expected that travelers value their choices according to a utility
function. Although this utility function is not directly observed by the modeler, it is known
to the individual traveler. Note that the term ‘‘return’’ or ‘‘value’’, in the context of transit
path choice modeling refers to an estimate of a random variable and is meant to reflect an
expected value or an expected return.
In the transit assignment context, a transit user is faced with various types of choices
during the trip. A passenger needs to decide on a departure time, an origin stop, a run
associated with a route to board, and possibly a connection to transfer at. In addition, in a
multimodal system, access and egress mode choices to and from the transit service may be
included. For a recurring trip such as home-based work trip, a passenger settles on the trip
400 Transportation (2014) 41:397–417
123
choices by trying (or judging) different options until certain path choices prove to be the
optimal for this particular trip’s objectives. This optimization process is based on the
passenger’s cumulative experience with the transit service performance.
We are interested in the process through which passengers settle on their trip choices.
This is important to be able to model the shifts in trip choices when changes to the transit
service are introduced. A passenger at origin has a choice set of different combinations of
departure time and path choices. Over time, a passenger chooses the combination of
departure time, origin stop, run (or route), and transfer stop that minimizes the perceived
travel cost. In order to find this combination, the passenger must have valued (tried or
judged) all other possible choices and found that the chosen combination outperforms all
other options, with reference to a utility function.
At the origin, the passenger has the objective of minimizing travel cost for the trip. At a
transfer stop, the passenger’s objective is still to minimize the travel cost for the remainder
of the trip, regardless of prior choices. This does not mean that previous choices are not
important; it is rather the relative impact of prior choices on future decisions than the actual
choices. This influence on future choices is expressed in the value of p (the choice
probability for one alternative). The value of p also depends on the transit service per-
formance. The outcome of the passenger’s choice, in turn, affects the transit service
performance through, for example, boarding a vehicle reduces the available capacity on the
chosen run by 1.
The change in the location of the transit rider during the trip represents a stochastic
process, with multiple states and transition probabilities between states. Traveling pas-
sengers move to different locations in the transit network at different points in time (e.g. at
stop, on board). The corresponding state of a stochastic process in this case is represented
by the location of the transit rider, which takes a value out of a state space. The state space
represents all possible locations for a transit rider; it consists of possible origin stop
choices, route choices, transfer stop choices and destination stop choices. A passenger at a
transfer stop has made three transitions: from origin-location to origin-stop-location
(associated with an origin stop choice), from origin-stop-location to onboard-route-location
(associated with a route/run choice), and from onboard-route-location to transfer-stop-
location (associated with a transfer stop choice). Each transition depends on the current
state of the passenger (i.e. location of the passenger) and on the transit service perfor-
mance. The passenger would need to make (at least) two more transitions to reach the trip
destination: from transfer-stop-location to onboard-route-location (associated with a route/
run choice) and from onboard-route-location to destination-stop-location and final desti-
nation. Given the present state (i.e. location), the passenger decides on future transitions.
This resembles a stochastic process with the Markov property. It should be stated that the
location information only might not be sufficient to decide on future transitions; infor-
mation related to previous transitions, such as fare paid, remaining time to scheduled
arrival time, or real-time information about the transit system performance, are important.
Instead of having to memorize previous transitions, the present state (or the system state)
should summarize all information that is assumed to be available to the passenger at any
time t.
This Markovian Stochastic Process is partly dependent on the transit service perfor-
mance and partly controlled by the transit rider. This can be analyzed as a MDP. In an
MDP, actions are rewarded and hence optimal policies can be estimated. For an origin stop
choice, there is an immediate cost expressed as the value of the travel cost to access the
origin stop (time and money). Also, there is a future value of a specific stop choice
expressed in the expected travel cost of the subsequent available route and transfer
Transportation (2014) 41:397–417 401
123
connection choices. For a route choice, there is an immediate cost expressed as the value of
waiting time. A future cost, associated with a route choice, is related to the value of
possible transfer connections and the probability of arriving at the desired arrival time.
Assuming passengers are rational, it is logical to expect passengers to follow their
optimal policy and to optimize their trip return or cost. The effect of such optimal policy is
observed through disaggregate individual choices ½Prij�observed�, and aggregate route loads
Lobserved� . The value of p (the choice probability for one alternative) represents one cell in
the state transition probability matrix ½Prij�observed�. If the underlying Markov process is
ergodic, then there exists a unique optimal policy p* that allows passengers to optimize
their return. Associated with p* is a steady-state transition probability matrix ½Prij��. Pas-
sengers devise their optimal policies based on a value function for the state-action eval-
uation. While the optimal policy p* and its effect through route loads is observed, the only
unknown to the modeler is the value function. By reconstructing the transit path choice
problem as a MDP, the value function used by individual passengers, can be estimated
(Wahba 2008). This is similar to the process of inverse reinforcement learning (IRL)
(Russell 1998).
Mental model structure
In order to ensure the uniqueness and existence of the optimal policy p* in the recon-
structed MDP, the underlying Markov process needs to be ergodic. To show that a Markov
Chain is ergodic, its state-action transition diagram has to be irreducible, positive-recur-
rent, and aperiodic.
The state-action transition diagram for the transit path choice problem is represented by
the ‘mental model’ of the relevant parts of the transit network. For simplicity of analysis,
and without losing generality, assume that the system state is only represented by the
location of the passenger during the trip. At each state s, there is a set of possible actions
ai 2 AðsÞ, available for each passenger. When a passenger is at state s and decides to take
action a, this is referred to as the state-action pair (s,a). Action a moves the passenger from
state s to state s0, expressed as a : s! s0. The reward from or the value of a state-action
pair is measured based on the travel cost association with choosing (s,a).
The schematic representation of state transitions in the underlying stochastic process is
shown in Fig. 1, where the system state variable represents the location of the passenger
during the transit trip. The state space has seven possible states: Origin O, Departure Time
T, Origin Stop G, On-board V, Off-Stop F, On-Stop N, Destination Stop N. The underlying
stochastic process has the Markov property, since the present location of the passenger is
necessary and sufficient for predicting the future location of the passenger, regardless of
prior locations.
For any transit trip, the number of states is finite (i.e. possible locations are finite). The
state OGN is a strong state; that is, any state j (where = OGN) is reachable from state OGN
in a finite number of steps. With the existence of one strong state, a Markov Chain with a
finite number of states is guaranteed to be irreducible. Therefore, the underlying Markov
process for the transit path choice problem is irreducible.
The state OGN is a positive-recurrent state, since after leaving OGN the probability of
returning back to OGN in any number of steps is 1. In a Markov Chain with a finite number
of states, the existence of one positive-recurrent state guarantees that the Markov Chain is
positive-recurrent. Therefore, the underlying Markov process for the transit path choice
problem is positive-recurrent.
402 Transportation (2014) 41:397–417
123
The state OGN has a possible self-loop transition, representing the change in departure
time choice from origin. Similarly, the OGS and ONS states have a possible self-loop
transition, representing the no-boarding decision. Such states with a possible self-loop
transition are considered aperiodic. With the irreducibility property and finite number of
states, the existence of one aperiodic state means that the underlying Markov process for
the transit path choice problem is aperiodic.
The proposed framework, using the mental model structure outlined above, ensures that
the Markov Chain representing the transit path choice problem is therefore ergodic (Wahba
2008). The ergodic property in the transportation context means that each possible com-
bination of path choices can be tried by iterative dynamics (i.e. all path options can be tried
through making the same trip repeatedly over days). It also means that there exists a unique
optimal policy p* that will optimize the return of the transit trip (or minimize the travel cost
DestinationDST
OriginOGN
Origin StopOGS
Route /RunRUT
Off-StopOFS
On-StopONS
Destination StopDNS
Stop Choice
Route Choice
Destination Stop Choice
Off-Stop Choice
On-Stop Choice
Route Choice
Departure Time Choice
Board /Wait Choice
Alight /Stay Choice
Return to Origin
Fig. 1 Schematic representation of the state transitions for the transit path choice problem
Transportation (2014) 41:397–417 403
123
in this regard) given a value function. Alternatively, given an observed optimal policy p*, a
value function can be estimated so as to regenerate the observed optimal policy (or
observed choices). If the form of the value function is assumed, then its parameters can be
calibrated to maximize the likelihood of reproducing the observed choices. When the
objective is to find the probability of choosing action a at state s, Ps(a), Reinforcement
Learning (RL) techniques provide a systematic procedure for estimating these probabili-
ties. RL algorithms attempt to find a policy p� : S! A that the agent follows and it gives
the probability PsðaÞ�. In particular, the temporal-difference (TD) Q-Learning algorithm
(Watkins 1989) is adopted in this framework.
In RL terminology, passengers need to follow a policy that defines the passenger’s
behaviour (i.e. trip choices) at a given time. Passengers need a reward function which
signals the immediate reward of a specific state-action pair. For instance, the immediate
reward of boarding a bus i from a stop j is represented by the experienced waiting time for
bus i at stop j. A value function calculates the accumulated reward over time of a specific
state-action pair. For example, the value of boarding a bus i from a stop j is calculated as
the expected travel time (out-of-vehicle and in-vehicle time) to the destination, starting
from this stop and boarding this bus. Figure 2 depicts value function components for
choosing to access a stop from origin; the immediate return associated with travel cost to
access the stop (e.g. access time, access fare, parking inconvenience, etc.) and future
reward associated with travel cost to complete the trip (e.g. waiting time, in-vehicle time,
fare, number of transfers, deviation from desired arrival time) after arriving at the stop. The
value function ensures that state-action pairs with short-term high reward but with a long-
term low value are not preferred.
In the transit assignment context, a Q-value represents the state-action utility (or called
hereafter the Generalized Cost, GC). The generalized cost for a passenger is given in
Eqs. 1 and 2.
GCðs; aÞ ¼Xn
p¼1
b̂p � Xp ¼Xk
g¼1
b̂g � Xgþ c min8a02Afs0g
GCðs0; a0Þ ð1Þ
GCðs; aÞ ½1� a� � GCðs; aÞ þ a �Xk
g¼1
b̂g � Xgþ c min8a02Afsg
GCðs0; a0Þ" #
ð2Þ
where Xg is a passenger-experienced alternative-specific (state-action) attribute,Pk
g¼1
b̂g � Xg
represents the immediate reward (or cost) for the state-action pair, c min8a02Afs0g
GCðs0; a0Þ
represents long-term expected reward for choosing this state-action pair; c is a discount
factor, which determines the importance of future rewards, a represents the learning rate,
which determines to what extent the newly acquired experience will overwrite the old
experience, A{s}represents the set of attractive actions at state s.
Note that Xg (e.g. XWT: waiting time) is a random variable, and its realization at time
step t depends on the transit network conditions and choices of other passengers. Therefore,
the GC value of a specific state-action pair is a random variable and the GC function acts in
replacement of the unobserved value function.
The choice rule follows a mixed {e� greedy, SoftMax} action-choice model, with a
(1 - e) probability of exploitation, (e) probability of exploration, and a SoftMax model
(Sutton and Barto 1998) such that Eq. 3 is for exploiting and Eq. 4 is for exploring.
404 Transportation (2014) 41:397–417
123
DestinationDST
OriginOGN
Origin StopOGS
Route/RunRUT
Off-StopOFS
On-StopONS
Destination StopDNS
Future Reward
Immediate Return
Given a Departure Time
Choice
Within-Day Service Dynamics
Existing and Predicted Conditions
Information Provision
ExpectedPerformanceby Passenger
Experienced Conditions
by Passenger
OGSDecision
Point
ATISInformation Dissemination Channels
Fig. 2 Value function components for an origin stop choice
Transportation (2014) 41:397–417 405
123
Exploiting with probability ð1� eÞ;PsðaÞ ¼1 GCðs; aÞ ¼ min
8a0GCðs; a0Þ
0 otherwise
�ð3Þ
Exploring with probability ðeÞ;PsðaÞ ¼Vðs; aÞP
8a02A Sf g V s; a0ð Þð Þ ð4Þ
whereVðs; aÞ ¼ g GC s; að Þð Þ The mixed {e� greedy, SoftMax} method means that agents
behave greedy most of the time (i.e. they exploit current knowledge to maximize imme-
diate reward) and every once in a while, with probability (e), they select an action at
random, using the SoftMax rule. The SoftMax method is used to convert action values into
action probabilities, by varying the action probabilities as a weighted function of the
estimated value. This means that the greedy action is given the highest selection proba-
bility, but all other actions are ranked and weighted according to their value estimates.
Learning-based departure time and path choice model
The underlying hypothesis is that individual passengers are expected to adjust their
behaviour (i.e. trip choices) according to their experience with the transit system perfor-
mance. Individual passengers base their daily travel decisions on the accumulated expe-
rience gathered from repetitively traveling through the transit network on consecutive days.
Travelers’ behaviour, therefore, is modeled as a dynamic process of repetitively making
decisions and updating perceptions, according to a learning process. This decision-making
process is based on a mental model of the transit network conditions. The learning and
decision-making processes of passengers are assumed in the proposed model to follow RL
principles for experience updating and choice techniques.
The mental model tree-like structure is an efficient representation of the state-action
table traditionally developed for Q-learning problems (see Fig. 3). The probability of
deciding on action a at state s is based on the accumulated experience, captured by the GC
value. The GC(s,a) is updated every time state s is visited and action a is chosen. The
GC(s,a) has two components: an immediate reward for choosing action a at state s, and an
estimated cumulative future reward for this specific state-action pair. This GC value
represents the passenger’s experience with the transit network conditions.
Previous studies refer to the path choice problem as the decision passengers make to
board a vehicle departing from a stop, at a specific point in time, which is in a set of
attractive paths serving the origin and destination for that passenger—see Hickman and
Berstein (1997) for definitions of the static and dynamic path choice problems. Why a
passenger is at this stop (i.e. origin stop choice) at this point in time (i.e. departure time
choice) is not addressed adequately in the literature as part of the path choice problem.
Often, the travel time on paths including a transfer depends on when the passenger begins
the trip, as paths which are attractive at one time may be less so later if there is a smaller
probability of making particular connections. The proposed learning-based choice model
considers the departure time choice, the stop choice and the run (or sequence of runs)
choice.
Pre-trip choice modeling
The departure time and origin stop choices are assumed to be at-home choices (i.e. pre-
trip), in which passenger agents consider available information obtained from previous
trips, in addition to pre-trip information provision (if any). Once a passenger agent arrives
406 Transportation (2014) 41:397–417
123
at a stop, the bus run choice is considered an adaptive choice, in which, besides previous
information, the passenger considers developments that occur during the trip and additional
en-route information.
At the origin, a passenger develops an initial (tentative) travel plan, based on his updated
mental model (historical experience and pre-trip information, if provided). This travel plan
includes a departure time, an origin stop (which are fixed) and a (tentative) route (or
sequence of routes). The initial plan reflects the passenger’s preferences and expectations,
and he would follow it if reality (en-route dynamics) matches to a great extent his expec-
tations (which may be different from published static schedule information).
Departure time choice
At origin (state O), on iteration d, the departure time choice is modeled according to the
mixed {e� greedy, SoftMax} action choice rule outlined in the previous section. Being at
state O, a specific departure time choice signals an immediate reward and a future return
for a passenger agent Z which can be written as, on iteration d:
GCZðO; tÞzfflfflfflfflfflffl}|fflfflfflfflfflffl{d
½1� a� � GCZðO; tÞzfflfflfflfflfflffl}|fflfflfflfflfflffl{d�1
þ a � CtZ
z}|{d
þ c � MIN8g2AZ ðTÞ
GCZðT; gÞh izfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{d�1
24
35 ð5Þ
Pre
-Trip
(at-
ori
gin
)E
n-R
oute
(ad
apti
ve)
Fig. 3 The mental model structure with different types of choices
Transportation (2014) 41:397–417 407
123
where t : O! T , and CtZ
z}|{d
¼ HtZ . Here Ht
Z is a variable representing the utility of leaving
origin at time t (e.g. auto availability, activity-schedule fitting value). This is an input to the
transit path choice model from activity-based models, if available.
CtZ
z}|{d
þ c � MIN8g2AZ ðTÞ
GCZðT; gÞh izfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{d
24
35 is the actual travel cost, which is not known on day d
before trip starts.
After completing the trip with departure time t, the GC value for agent Z at iteration
d - 1 is updated to the GC value at iteration d, reflecting the passenger agent experience
with this state-action pair. It is important to note that the choice at iteration d is based
partly on the most updated experience from iteration d - 1. Without information provi-
sion, passengers have no means of knowing the actual travel cost of their trip on iteration
d before starting the trip, however they utilize what they have leaned to provide an estimate
of travel cost. If the estimated value GC value at iteration d proves to be close to the actual
travel cost, then the associated choice is reinforced in future iterations.
When a passenger agent is at state t : O! T (i.e. after deciding on a departure time,
state T), the origin stop choice follows a similar procedure to the mixed {e� greedy,
SoftMax} action choice model, where Eq. 6 is for exploring alternatives and Eq. 7 is for
exploiting past experience.
PTðgÞzfflffl}|fflffl{d
¼
f GCðT ; gÞzfflfflfflfflffl}|fflfflfflfflffl{d�1
0@
1A
P8i2AðTÞ
f GCðT ; iÞzfflfflfflfflffl}|fflfflfflfflffl{d�1
0@
1A
ð6Þ
PT gð Þzfflffl}|fflffl{d
¼ 1 iff GCðT ; gÞzfflfflfflfflffl}|fflfflfflfflffl{d�1
¼ MIN8i2A Tð Þ
GC T ; ið Þzfflfflfflfflffl}|fflfflfflfflffl{d�1* +
0 Otherwise
8><
>:ð7Þ
Origin stop choice
The origin stop choice has an immediate cost (e.g. access time) and a future value rep-
resented by the travel cost associated with possible attractive route choice set from the
origin stop. The generalized cost for an origin stop (state G), for a passenger agent on
iteration d, is updated as follows:
GCðT ; gÞzfflfflfflfflffl}|fflfflfflfflffl{d
½1� a� � GCðT ; gÞzfflfflfflfflffl}|fflfflfflfflffl{d�1
þ a � Cgz}|{d
þc � MIN8r2AðSÞ:s¼g
GCðS; rÞh izfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{d�1
24
35 ð8Þ
where g : T ! S, and Cgz}|{d
¼Pk
r¼1 br � Xr. Here Xr represents the immediate travel cost
associated with stop g; for example, access time and access cost to travel to stop g. The
modeling of access mode choice may be introduced at this level by varying Xr as a function
of the access mode. br can be an alternative-specific parameter; for example, access time
408 Transportation (2014) 41:397–417
123
for train stations is perceived differently from bus stops. br may also be a general
parameter for all alternatives, such as the fare parameter that reflects the perceived mon-
etary cost associated with a specific choice g.
It is worth mentioning that the estimated travel cost of choosing stop g is associated
with a specific departure time t. This permits the representation of different transit network
conditions at different times during the modeling period.
En-route choice modeling
Route/run choice
When a passenger arrives at a stop s 2 fg; ng (g: origin stop, n: transfer stop), an initial
route choice is made. This route choice follows the mixed {e� greedy, SoftMax} action
choice model. The value of a run from route r choice at a stop s is updated based on the
following:
GCðS; rÞzfflfflfflfflffl}|fflfflfflfflffl{d
½1� a� � GCðS; rÞzfflfflfflfflffl}|fflfflfflfflffl{d�1
þ a � Crz}|{d
þ c � MIN8f2AðVÞ
GCðV ; f Þh izfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflffl{d�1
24
35 ð9Þ
where r : S! V , and Crz}|{d
¼Pk
j¼1
bj � Xj. Here Xj represents immediate travel cost associ-
ated with a run from route r, for example, waiting time for run r (of route R), crowding
level of run r, etc. bj can be an alternative-specific parameter; for example, waiting time for
trains is perceived differently from waiting time for buses. Crz}|{d
þ c � MIN8f2AðVÞ
GCðV ; f Þh izfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflffl{d
24
35
is the service performance of route r on iteration d, unknown before making a route/run
choice.
This decision-making and experience updating mechanism reflects previous experiences
at iteration d - 1 of the chosen route and does not take into consideration the within-day
dynamics of the transit service on iteration d. In the proposed framework, the choice-rule is
therefore used in conjunction with a number of SWITCH choice rules: ‘SWITCH AND
WAIT’, ‘SWITCH AND BOARD’, ‘SWITCH AND STAY’, and ‘SWITCH AND
ALIGHT’. The first two choice rules are concerned with the adaptive behaviour when
passengers are at stops, while the latter two are used to model the adaptive behaviour when
passengers are on-board of a transit vehicle.
For a passenger waiting at stop s 2 fg; ng, there is an initial route choice r. When a
transit vehicle from the initially chosen route r arrives at stop s, the passenger’s decision
making process follows the ‘SWITCH AND WAIT’ choice rule. On iteration d, if the
transit service, considering within-day dynamics, of route r performs in accordance with or
better than the passenger’s expected travel cost for boarding route r from stop s, then the
passenger will choose to board the initially chosen route-run r and this choice will be
enforced.
Without provision of information, a passenger waiting at stop s when a vehicle arrives
from the initially chosen route r has no knowledge of the actual performance of other
attractive routes r0 2 AðsÞ on iteration d except through previous experiences, summarized
up to iteration d - 1. Therefore, when service performs according to (or better than)
Transportation (2014) 41:397–417 409
123
expectations, there is no need to re-evaluate the initial choice, as the experiences for all
attractive routes were considered in making the initial route choice.
For a transit vehicle arriving at a stop s from the initially chosen route r for a passenger,
when within-day dynamics result in transit service for route r performing not in accordance
with (i.e. worse than) passenger’s expectations, a reconsideration of the initial route choice
is warranted. In this case, the passenger’s expectation of the service performance of route r
is updated and the route choice mechanism is applied with Crz}|{d
þ c � MIN8f2AðVÞ
GCðV ; f Þh izfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflffl{d�1
24
35
as a replacement of GCðS; rÞzfflfflfflfflffl}|fflfflfflfflffl{d�1
for route r. Without information provision, the passenger’s
expectations of the service performance of all other attractive routes remain a function of
previously accumulated experiences, GCðS; iÞzfflfflfflffl}|fflfflfflffl{d�1
; 8i 6¼ r. If the initial route choice still out-
performs other attractive routes (based on experience summarized up to iteration d - 1 for
other route options), the ‘SWITCH AND WAIT’ choice rule will result in passenger
choosing the initial route choice. A change in route choice using the ‘SWITCH AND
WAIT’ choice rule is observed if the within-day dynamics of route r has resulted in another
attractive route r0outperforming route r. The passenger then waits (and hence the name of
the routine) for a run from route r0
to arrive. Regardless of the output of the ‘SWITCH
AND WAIT’ choice rule, the passenger’s accumulated experience with route r at stops s,
GCðS; rÞzfflfflfflfflffl}|fflfflfflfflffl{d
, is updated based on Eq. 9 since Crz}|{d
has been already experienced.
For a passenger waiting at stop s 2 fg; ng with an initial route choice r 2 AðsÞ, the
SWITCH AND BOARD choice rule is applied when a transit vehicle arrives from route
r0 2 AðsÞ, that is not the initially chosen route r. The updated travel cost for route r0,
Cr0z}|{d
þc � MIN8f2AðVÞ
GCðV; f Þh izfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflffl{d�1
264
375, is calculated. If its updated travel cost matches the
expected value, GCðS; r0Þzfflfflfflfflffl}|fflfflfflfflffl{d�1
, then there is no need to reconsider the initial route choice r. If
the updated travel cost for route r0
is improved, then the initial route choice r needs
reconsideration. In this situation, the updated travel cost for route r0
Cr0z}|{d
þc � MIN8f2AðVÞ
GCðV; f Þh izfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflffl{d�1
264
375 is compared only to GCðS; rÞ
zfflfflfflfflffl}|fflfflfflfflffl{d�1
, without considering other
attractive routes since without information provision the passenger’s expectations
regarding their performance (GCðS; iÞzfflfflfflffl}|fflfflfflffl{d�1
; 8i 6¼ fr; r0g) remains the same, and the performance
of other routes was estimated to be inferior to the initial route choice r. If the updated travel
cost of route r0
becomes more attractive, then the ‘SWITCH AND BOARD’ choice rule
results in the passenger choosing to board route r0. Otherwise, the passenger continues to
wait for the arrival of route r (the initial route choice).
410 Transportation (2014) 41:397–417
123
Transfer choice
When a passenger is on board of a transit vehicle V of route r, there are two possibilities.
First, this route r reaches the trip destination and no more travel choices are required
(except egress mode choice, if any). In this case, the choice set, A(V), has only one
element,{des}, representing the destination stop associated with route r. Based on the
mental model structure, there can be only one destination stop associated with each route.
The immediate travel cost of deciding on alighting at a destination stop can be thought of
as, for example, the in-vehicle time and fare to reach the destination stop. There is also a
future travel cost associated with this decision; this cost includes egress time and egress
monetary cost. When the passenger arrives at destination, the schedule delay is calculated
representing the deviation from the scheduled arrival time.
Since no more trip decisions are needed at destination, the destination location repre-
sents the terminating-state of the underlying stochastic process. At this state, the recursive
calculations terminate and are propagated backwards to update expectations for the
departure time choice made at the beginning of the trip. The GCðV ; desÞ is updated based
on the following equation:
GCðV; desÞzfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{d
½1� a� � GCðV; desÞzfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{d�1
þ a � Cdesz}|{d
þ c � Xdesz}|{d
264
375 ð10Þ
where des:V ! D; Cdesz}|{d
¼Pk
j¼1
bj � Xj; and Xdesz}|{d
¼Pk
a¼1
ba � Xa: Here Xj represents immedi-
ate travel cost associated with the destination stop, for example, in-vehicle time to reach
the destination stop, fare to reach the destination stop. bj can be an alternative-specific
parameter; for example, in-vehicle time for subways is perceived differently from in-
vehicle time for buses. Xa represents future travel cost associated with the destination stop,
for example, egress time and egress monetary cost to reach the trip destination from the
destination stop. It also represents the schedule delay associated with trip choices. ba
reflects the various perceptions with regard to future travel cost components. For instance,
the early/late schedule delay values.
It is worth mentioning that the destination stop choice {des} is linked to a route r. The
route choice is associated with a stop choice s, which is for a departure time t. This means
that the experience for path choices is relevant to the departure time choice. It is not
uncommon that passengers make different trip decisions for different origin departure time
choices. The transit service performance is observed to be time-dependent and passengers
have time-dependent expectations about path choices, hence the departure time choice. In
this formulation, schedule delay is dependent on both the departure time and trip choices.
This can be noted from Eq. 5; schedule delay does not directly appear in the travel cost
calculation for departure time choice and is implicitly included through the term
MIN8g2AZ Tð Þ
GCZðT; gÞh izfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{d�1
by recursive computations.
For on-board passengers, a transfer connection may be needed to reach the trip desti-
nation. In such situations, a passenger decides on an initial off-stop choice upon boarding
the transit vehicle V of route r following the same mixed {e� greedy, SoftMax} action-
choice model. The GC value for the off-stop choice is updated as follows:
Transportation (2014) 41:397–417 411
123
GCðV; f Þzfflfflfflfflffl}|fflfflfflfflffl{d
½1� a� � GCðV; f Þzfflfflfflfflffl}|fflfflfflfflffl{d�1
þ a � Cfz}|{d
þ c � MIN8n2Aðf Þ
GCðF; nÞh izfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflffl{d�1
264
375 ð11Þ
where f:V ? F, and Cfz}|{d
¼Pk
j¼1
bj � Xj. Here Xj represents immediate travel cost associated
with off-stop f: for example, in-vehicle time to each off-stop f, number of transfers to reach
destination from off-stop f, etc. bj can be an alternative-specific parameter (in-vehicle time
for subways is perceived differently from in-vehicle time for buses) or a general-parameter
(transfer penalty).
This initial off-stop choice is based on previous experiences’ generalized cost, and does
not reflect within-day dynamics of the transit service. The ‘SWITCH AND STAY’ and
‘SWITCH AND ALIGHT’ choice rules capture on-board passengers’ adaptive behaviour
in response to service dynamics.
When the transit vehicle arrives at the initially chosen off-stop f, and when the current
experience for the chosen off-stop matches expectations, the passenger choice is to alight
at the initially chosen off-stop f according to the ‘SWITCH AND STAY’ choice rule. In
other situations, passengers may experience an unexpected travel cost associated with the
initial off-stop choice f (e.g. an unexpected delay). This warrants the re-evaluation of the
initial off-stop choice since it may not be the optimal choice anymore when compared with
other attractive off-stop options.
When the transit vehicle arrives at a feasible off-stop f 0 2 AðVÞ; f 0 6¼ f (where f is the
passenger’s initial off-stop choice), the ‘SWITCH AND ALIGHT’ choice rule is applied
and a passenger adjusts en-route transfer choices according to service dynamics (either by
alighting at the current stop f 0 or staying on-board for the arrival at the initially chosen off-
stop f).
Regardless of the decisions made according to the ‘SWITCH AND STAY’ (‘SWITCH
AND ALIGHT’) choice rule, the passenger’s accumulated experience with off-stop f (off-
stop f0), GCðV; f Þzfflfflfflfflffl}|fflfflfflfflffl{d
(GCðV; f 0Þzfflfflfflfflfflffl}|fflfflfflfflfflffl{d
), is updated based on Eq. 11 since Cfz}|{d
has been already
experienced.
When a passenger alights at an off-stop f 2 AðVÞ, a decision concerning the transfer on-
stop, n 2 AðFÞ, is made. This decision is based on the expected travel cost associated with
each possible transfer on-stop. These expectations reflect previous experiences and choices
are made following a similar decision-making behaviour outlined for previous choices.
When a passenger arrives at a transfer on-stop n, an initial route choice is made. The
boarding decision is dependent on the within-day dynamics of the transit service as
explained before.
There could be an immediate penalty associated with the choice of a transfer on-stop n
such as inconvenience due to walking or crossing multiple intersections to reach the on-
stop n. There is also a future travel cost related to the transfer on-stop choice expressed by
the expected utility of attractive routes r 2 AðSÞ; s ¼ n. The travel cost associated with a
transfer on-stop n choice is updated as:
GCðF; nÞzfflfflfflfflffl}|fflfflfflfflffl{d
½1� a� � GCðF; nÞzfflfflfflfflffl}|fflfflfflfflffl{d�1
þ a � Cnz}|{d
þ c � MIN8r2AðSÞ�s¼n
GCðS; rÞh izfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{d�1
24
35 ð12Þ
412 Transportation (2014) 41:397–417
123
where n:F ? S, and Cnz}|{d
¼Pk
r¼1
bj � Xj. Here Xj represents the immediate travel cost
associated with an on-stop choice n; for example, transfer walk time. bj reflects the pas-
senger’s perception of travel cost associated with on-stop choices, e.g. transfer penalty.
In summary, the proposed model assumes individual passengers as decision makers,
who choose a departure time, an origin stop, a destination stop and a route (or a sequence
of routes) between a given pair of an origin and a destination each day. This decision-
making behaviour, as explained, consists of a two-step process: making choices and
updating perceptions. When information on service performance (via real-time information
provision system or information sharing with other users) is not provided, the step of
making choices precedes perception updating. Perception updating occurs only when the
state-action pair is visited. Pre-trip decisions are treated as fixed choices, while en-route
choices are adaptive.
Notes on the proposed model
A note on the modeling of information provision
The benefits of Advanced Traveler Information Systems (ATIS) applications can be
assessed by comparing the path choice behaviour and time savings of informed passengers
versus non-informed ones. However, conventional transit assignment models (e.g. strat-
egy-based models) assume that passengers have full information about the network con-
ditions and infinite information processing capabilities; this is referred to as the ‘‘perfect
knowledge of network’’ assumption. These models are not appropriate for evaluating
information provision policies since information regarding network conditions would be
assumed to be available to all passengers. The emergence and increased deployment of
ATIS make it practically important to relax the assumption of perfect information or
perfect knowledge in transit assignment studies.
The system performance on iteration d is dependent on choices of other passengers;
these choices are not known, to any passenger, apriori. In the proposed model and with no
provision of real-time information, passengers do not have perfect knowledge about the
system performance on iteration d. Instead, passengers form their expectations about the
performance of the transit service based on accumulated experiences summarized up to
iteration d - 1 (see Fig. 2).
Under information provision, perception update precedes decision making (see Fig. 2).
That is, ‘mental model’ content is updated before making choices: Caz}|{d
can now be
obtained for all actions a 2 AðSÞ and PS(A) is computed as f GCðs; aÞzfflfflfflfflffl}|fflfflfflfflffl{d
0@
1A, compared to
PSðaÞ ¼ f GCðs; aÞzfflfflfflfflffl}|fflfflfflfflffl{d�1
0@
1A when real-time information is not provided. This is also reflected in
the SWITCH choice rules calculations. For example, the ‘SWITCH AND WAIT’ choice
rule compares Crz}|{d
þ c � MIN8f2AðVÞ
GCðV; f Þh izfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflffl{d�1
24
35 and GCðS; iÞ
zfflfflfflffl}|fflfflfflffl{d�1
; 8i 6¼ r, when information is
Transportation (2014) 41:397–417 413
123
not provided. GC S; ið Þzfflfflfflffl}|fflfflfflffl{d�1
; 8i 6¼ r represents passengers’ expectations for the unknown per-
formance of other attractive routes, in iteration d. With information provision, Ciz}|{d
; 8i 6¼ r
is now obtainable (see Fig. 4).
Based on the above, information provision will become effective when:
• great variability in service performance exists and average GC values are not
representative of state-action pair values,
• non-recurrent congestion occurs—due for example to traffic incidents—where
GCðS; iÞzfflfflfflffl}|fflfflfflffl{d�1
; 8i 6¼ rare not representative of the within-day network dynamics in iteration
d, and
• non-familiar travelers’ expectations are not consistent with the network performance.
The proposed framework allows for the evaluation of the impact of pre-trip and en-route
information provision by comparing passengers’ choice profiles (trip choices and usage of
SWITCH choice rules) in the case of absence of information and the case of information
provision.
A note on the classification of the proposed choice model
The decision rule (or choice rule) in the proposed model is stochastic (mixed e� greedy,
SoftMax} action choice model), allowing exploitation and exploration of individual’s
experience. The utility (or generalized travel cost) term is formed by the passenger, and it
has no random error component (see Eq. 2). Therefore, the proposed model can be clas-
sified as a bounded rational model, with a constant utility term and a stochastic decision
rule. On the other hand, random utility models are characterised by a deterministic decision
rule and a stochastic utility term.
When the decision making process of transit travelers (see Fig. 3) is modeled as a
nested-logit model (or a cross-nested logit model), the parameter estimation process in the
decision-making tree is performed from the bottom nest all the way up to the root. At the
run choice-nest, for example, the parameter estimation procedure forgets (or does not
consider) upper level choice-nests (e.g. stop choice-nest) and takes into consideration
further nests, assuming that information needed to make the run choice is preserved at the
run choice-nest level. This in many ways is analogous to the memoryless property of the
MDP.
Framework applications
Wahba and Shalaby (2011) report on a full-scale implementation of MILATRAS,
including the proposed departure time and path choice model of this paper, using the
Toronto Transit Commission (TTC) system as a case study. It presents a large-scale real-
world application; the TTC AM network has 500 branches with more than 10,000 stops
and about 332,000 passenger-agents. The choice model parameters were calibrated such
that the entropy of the simulated route loads was optimized with reference to the observed
route loads, and validated with individual choices. The modeled route loads, based on the
calibrated parameters, greatly approximate the distribution underlying the observed loads.
414 Transportation (2014) 41:397–417
123
In this application, transit passengers were assumed to plan their transit trip based on their
experience with the transportation network; with no prior (or perfect) knowledge of service
performance.
Wang et al. (2010) compared the proposed framework, MILATRAS, with two con-
ventional transit assignment approaches, namely EMME/2 Transit Assignment Module and
MADITUC. MILATRAS was shown to perform comparatively and to produce base case
At State S
Choice a
Based on individual’s expectations of other choices a’ and current
performance of choice a
Update Mental Model
based on individual’s experience (EX) with action a
Service Dynamics
(a) Traveler Behaviour - without information provision
At State S
Choice a
based on updated expectations
Update Mental Model
based on individual’s experience (EX) with action a
Service Dynamics
Update Mental Model
based on provided information (IP)
ATIS
Update Confidence in ATIS ( )
based on individual’s experience (EX) and Information Provision (IP)
(b) Traveler Behaviour – with information provision
Fig. 4 Modeling of traveler behaviour a without information provision and b with information provision
Transportation (2014) 41:397–417 415
123
steady-state run loads and stop loading profiles. MILATRAS, however, presents a policy-
sensitive platform for modeling and evaluating transit-ITS deployments; a real challenge to
conventional tools.
The proposed framework was used to investigate the impact of various scenarios of
information provision on transit riders departure time and path choices, and network
performance. The results show that, for a medium-size transit network with low to medium
frequency services, the stop and departure time choices seem to be more important than the
run choice, as they significantly affect the trip time.
Another area of application is emissions modeling. The proposed framework, MILA-
TRAS, was integrated within a multi-modal framework to evaluate transit emissions on a
link-based mesoscopic level for the TTC network using time-dependent network loading
profiles (Lau et al. 2010).
Another study (Kucirek 2012) investigated fare-based assignment for inter-municipal,
cross-regional trips using three different operators (with different fare) using the proposed
framework and comparing assignment results to surveyed trip-record data and results from
a fare-based travel demand forecasting model implemented in EMME2. The results show
that the conventional approach over-predicts number of transfer between operators, while
MILATRAS is capable of accurately estimating transfers between operators and capturing
fare-interactions between local (e.g. Mississauga Transit) and premium (e.g. GO Transit)
transit services.
Future research
A mode choice component will be integrated into the overall modeling framework, using
the learning-based approach. Future efforts will be directed to the modeling of passengers’
travel choices in a multimodal network, incorporating the access and egress mode choices.
Also, the growing trend in using smart cards (e.g. Presto Card in the Greater Toronto Area)
provides a rich source of data on transit departure time and path choices in particular and
multi-modal trips in general. Such data sources will present great opportunities for cali-
brating and validating dynamic transit path choice models.
Acknowledgments This research was supported by the Natural Sciences and Engineering ResearchCouncil (NSERC) and the Ontario Graduate Scholarship (OGS) Program.
References
Arentze, T.A., Timmermans, H.J.P.: Modeling learning and adaptation processes in activity-travel choice: aframework and numerical experiments. Transportation 30, 37–62 (2003). doi:10.1023/A:1021290725727
Cantarella, G.E., Cascetta, E.: Dynamic processes and equilibrium in transportation networks: towards aunifying theory. Transp. Sci. 29, 305–329 (1995). doi:10.1287/trsc.29.4.305
Cascetta, E., Cantarella, G.E.: A day-to-day and within-day dynamic stochastic assignment model. Transp.Res. A 25, 277–291 (1991). doi:10.1016/0191-2607(91)90144-F
Davis, G.A., Nihan, N.L.: Large population approximations of a general stochastic traffic assignment model.Oper. Res. 41, 169–178 (1993). doi:10.1287/opre.41.1.169
de Palma, A., Marchal, F.: Real cases applications of the fully dynamic METROPOLIS tool-box: anadvocacy for large-scale mesoscopic transportation systems. Netw Spat. Econ. 2, 347–369 (2002).doi:10.1023/A:1020847511499
416 Transportation (2014) 41:397–417
123
Ettema, D., Tamminga, G., Timmermans, H.J.P., Arentze, T.: A micro-simulation model system ofdeparture time using a perception updating model under travel time uncertainty. Transp. Res. A 39,325–344 (2005). doi:10/1016/j.tra.2004.12.002
Hazelton, M., Watling, D.: Computation of equilibrium distributions of Markov traffic assignment models.Transp. Sci. 38, 331–342 (2004). doi:10.1287/trsc.l030.0052
Hickman, M., Berstein, D.: Transit service and path choice models in stochastic and time-dependent net-works. Transp. Sci. 31, 129–146 (1997). doi:10.1287/trsc.31.2.129
Kucirek, P., Wahba, M., Miller, E.J.: Fare-based transit assignment models: comparing the accuracy ofstrategy-based aggregate model EMME, against microsimulation model MILATRAS, using tripscrossing the Mississauga-Toronto boundary. University of Toronto DMG Research Reports. http://www.dmg.utoronto.ca/pdf/reports/research/kucirekp_gta_transitfares.pdf (2012). Accessed March 2012
Lau, J., Hatzopoulou, M., Wahba, M., Miller, E.J.: An integrated multi-model evaluation of transit busemissions in Toronto. J. Transp. Res. Rec. 2216, 1–9 (2010). doi:10.3141/2216-01
Nakayama, S., Kitamura, R., Fuji, S.: Drivers’ learning and network behaviour: a dynamic analysis of thedriver-network system as a complex system. Transp. Res. Rec. 1676, 30–36 (1999). doi:10.3141/1676-04
Nuzzolo, A., Russo, F., Crisalli, U.: A doubly dynamic schedule-based assignment model for transit net-works. Transp. Sci. 35(3), 268–285 (2001). doi:10.1287/trsc.35.3.268.10149
Nuzzolo, A., Russo, F., Crisalli, U.: Transit Networking Modelling: The Schedule-Based ApproachDynamic Approach. FrancoAngeli, Milan (2003)
Petersen, K.: Ergodic Theory, 1st edn. Cambridge University Press, Cambridge (1990)Roorda, M.J., Miller, E.J.: Assessing transportation policy using an activity-based microsimulation model of
travel demand. ITE J. 76, 16–21 (2006)Rossetti, R.J.F., Liu, R.: A multi-agent approach to assess drivers responses to pre-trip information systems.
J. Intell. Transp. Syst. 9, 1–10 (2005). doi:10.1080/15472450590912529Russell, S.: Learning agents for uncertain environments (extended abstract). In: COLT’ 98 Proceedings of the
Eleventh Annual Conference on Computational Learning Theory (1998). doi:10.1145/279943.279964Salvini, P., Miller, E.J.: ILUTE: an operational prototype of a comprehensive microsimulation model of
urban systems. Netw. Spat. Econ. 5, 217–234 (2003). doi:10.1007/s11067-005-2630-5Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)Timmermans, H.J.P., Arentze, T., Ettema, D.: Learning and adaptation behaviour: empirical evidence and
modeling issues. In: Proceedings ITS Conference, Eindhoven (2003)Wahba, M.: MILATRAS: MIcrosimulation Learning-based Approach to Transit Assignment. University of
Toronto, Dissertation (2008)Wahba, M., Shalaby, A.: MILATRAS: a new modeling framework for the transit assignment problem. In:
Wilson, N.H.M., Nuzzolo, A. (eds.) Schedule-based Modeling of Transportation Networks: Theory andApplications, pp. 171–194. Springer, New York (2009)
Wahba, M., Shalaby, A.: Large-scale application of MILATRAS: case study of the Toronto transit network.Transportation 38, 889–908 (2011). doi:10.1007/s11116-011-9358-5
Wang, J., Wahba, M., Miller, E.J.: Comparison of agent-based transit assignment procedure with conven-tional approaches. J. Trans. Res. Rec. 2175, 47–56 (2010). doi:10.3141/2175-06
Watkins, C.J.C.H.: Learning from delayed rewards. Dissertation, University of Cambridge (1989)Watling, D.: Asymmetric problems and stochastic process models of traffic assignment. Transp. Res. B 30,
339–357 (1996). doi:10.1016/0191-2615(96)00006-9
Author Biographies
Mohamed Wahba is an Associate at IBI Group and an Adjunct Professor at the University of Toronto. Dr.Wahba’s research activities include the modelling of multimodal networks; optimizing the utilization oftransport multimodal infrastructure in emergency evacuation situations; estimation of transit networkemissions using Microsimulation; agent-based travel behaviour models; and the modelling of transit-ITSdeployments and their influence on traveller’s behaviour. Mohamed has a Master of Philosophy degree fromCambridge University, UK, in Management Sciences, and Master of Applied Science and PhD degrees inCivil Engineering (Transportation) from the University of Toronto, Canada.
Amer Shalaby is a Professor of Transportation Engineering and Planning at the University of Toronto. Hespecializes in transit planning and ITS applications to transit. Dr. Shalaby is an appointed member of the USTransportation Research Board committees ‘‘Emerging and Innovative Public Transport and Technologies’’,‘‘Bus Transit Systems’’ and ‘‘Light Rail Transit’’. He obtained his M.A.Sc. and Ph.D. degrees from theUniversity of Toronto.
Transportation (2014) 41:397–417 417
123