learning-based framework for transit assignment modeling under information provision

Learning-based framework for transit assignmentmodeling under information provision

Mohamed Wahba • Amer Shalaby

Published online: 26 November 2013� Springer Science+Business Media New York 2013

Abstract The modeling of service dynamics has been the focus of recent developments

in the field of transit assignment modeling. The emerging focus on dynamic service

modeling requires a corresponding shift in transit demand modeling to represent appro-

priately the dynamic behaviour of passengers and their responses to Intelligent Trans-

portation Systems technologies. This paper presents the theoretical development of a

departure time and transit path choice model based on the Markovian Decision Process.

This model is the core of the MIcrosimulation Learning-based Approach to TRansit

Assignment. Passengers, while traveling, move to different locations in the transit network

at different points in time (e.g. at stop, on board), representing a stochastic process. This

stochastic process is partly dependent on the transit service performance and partly con-

trolled by the transit rider’s trip choices. This can be analyzed as a Markovian Decision

Process, in which actions are rewarded and hence passengers’ optimal policies for maxi-

mizing the trip utility can be estimated. The proposed model is classified as a bounded

rational model, with a constant utility term and a stochastic choice rule. The model is

appropriate for modeling information provision since it distinguishes between individual’s

experience with the service performance and information provided about system dynamics.

Keywords Travel choice � Markovian decision process � Learning � Information

provision

Introduction

This paper presents the theoretical development of a departure time and transit path choice

model based on the Markovian Decision Process. This model is the core of MILATRAS—

M. Wahba (&) � A. ShalabyCivil Engineering Department, University of Toronto, 35 St. George Street, Toronto,ON M5S 1A4, Canadae-mail: [email protected]

123

Transportation (2014) 41:397–417DOI 10.1007/s11116-013-9510-5

MIcrosimulation Learning-based Approach to TRansit ASsignment (Wahba 2008; Wahba

and Shalaby 2009). The proposed model considers multiple dimensions of the transit path

choice problem; such dimensions were either simplified (e.g. stop choice) or ignored (e.g.

departure time) in most previous approaches. The developed assignment procedure is

capable of modeling the day-to-day and within-day dynamics of the transit service, as well

as passengers’ responses under information provision.

Nuzzolo et al. (2001) presented a dynamic stochastic path choice model which con-

siders variations in transit services and passengers’ learning and adaptation. The model is

called ‘‘doubly dynamic’’ as it accounts for within-day and day-to-day variations. Pas-

sengers are assumed to make a discrete choice of trip decisions based on travel disutility

and the desired arrival time at destination. Nuzzolo et al. (2003) report on the theoretical

foundation of the schedule-based approach to modeling travelers’ behaviour in transit

networks. In the assignment procedure, the path choice model employs the Logit/Probit

formulations to calculate the choice probabilities. The run-choice is varied in both within-

day and day-to-day assignment based on an exponential updating mechanism of run

attributes. This is not considered a learning-driven approach as the learning process does

not follow an explicit learning algorithm, and it is done on an aggregate level using a

common pool of information that is accessed by all travelers.

Recently, traffic assignment procedures implemented learning-based models which

have been shown to result in different and more realistic assignments relative to con-

ventional methods (Nakayama et al. 1999; Arentze and Timmermans 2003; Ettema et al.

2005). With an agent-based representation, individual travelers are explicitly modeled as

cognitive agents. This has led to the shift to an agent-based modeling framework that has

been successfully and aggressively applied in activity-based models (Timmermans et al.

2003; Salvini and Miller 2003; Roorda and Miller 2006) and in traffic assignment models

with information provision (de Palma and Marchal 2002; Rossetti and Liu 2005; Ettema

et al. 2005). As such, the next generation of dynamic transit assignment algorithms should

consider adopting learning-based and multi-agent microsimulation concepts for consis-

tency with emerging traffic assignment models and for integration with state-of-the-art

activity-based modeling frameworks. In response, the authors developed MILATRAS

(Wahba 2008; Wahba and Shalaby 2009) for the modeling of day-to-day and within-day

dynamics of the transit assignment problem.

The proposed transit path choice model within MILATRAS is inspired by the non-

equilibrium framework for assignment procedures proposed by Cascetta and Cantarella

(1991) and the agent-based, learning-driven path choice model presented by Ettema et al.

(2005). It applies similar concepts to the transit assignment problem and passenger

behavioural modeling, taking into consideration the distinctive features of public transport

networks and transit rider travel behaviour.

Cascetta and Cantarella (1991) addressed the lack of explicit modeling of day-to-day

and within-day variations in supply and demand within static equilibrium procedures

where a fixed-point solution is searched. They argued that the equilibrium framework is not

structurally compatible with observed phenomena, such as habitual behaviour, random

variations in demand and network conditions, and transient states of network conditions

following modifications. They proposed a process model that generates fixed-point or

steady-state arc-flows (approximating the mean of the distribution of the corresponding

stochastic process model and comparable to Stochastic User Equilibrium outputs) with the

explicit representation of day-to-day adjustments in travelers’ decisions and within-day

variations of demand. The evolution of the path-flows in successive days is then modeled

as a Markovian stochastic process. Cantarella and Cascetta (1995) proposed conditions for

398 Transportation (2014) 41:397–417

123

the existence and uniqueness of stationary probability distributions for the stochastic

process model. The model, however, did not consider en-route replanning due to variations

in network conditions or information provision. Examples of research efforts in this

direction include Davis and Nihan (1993), Watling (1996), and Hazelton and Watling

(2004).

Ettema et al. (2005) represent learning and experience at the individual level for the

auto-assignment problem. They promote the use of microsimulation methods to model the

system-level phenomena, such as congestion, based on individual-level choices. They used

a microsimulation platform to account for the travel time uncertainties on departure time

choices, thus allowing for the modeling of day-to-day adjustments of travelers’ behaviour.

They used mental models to represent traveler’s experience of network conditions.

Travelers are assumed to be decision-makers who decide on the trip departure time from

origin, for a fixed route. The study, therefore, did not consider other en-route trip choices,

e.g. route choice.

For a more comprehensive exposition to methodological developments of transit

assignment models over the past few decades, see Wahba (2008).

Markovian decision process

Markov chains and the ergodic theory

A Markov Chain is an ordered sequence of system state occurrences from a stochastic

process, whose system variables satisfy the Markov property. The Markov property means

that the conditional probability distribution of future states of the process (i.e. values of the

random variable at future time instances), given the present state and all past states,

depends only upon the present state and not on any past state. The present state of the

system is sufficient and necessary to predict the future state of the process. This is

sometimes referred to as the ‘memoryless property’ of the Markov process.

The Ergodic Theory provides the foundation for studying stochastic processes that have

the Markovian property. It establishes the conditions under which a Markov Chain can be

analysed to determine its transition probabilities and steady state behaviour. The Ergodic

Theory states that if a Markov Chain is ergodic, then a unique steady state distribution

exists and is independent of the initial conditions (Petersen 1990). A Markov Chain is

called ‘ergodic’ if, and only if, the chain is irreducible, positive-recurrent and aperiodic.

The positive-recurrent and aperiodic characteristics guarantee that the steady state distri-

bution exists, while irreducibility guarantees that the steady state distribution is unique and

independent of initial conditions.

Actions, rewards and decision making in Markov chains

In Markov processes, the outcome of the stochastic process is entirely random, or is not

controlled but rather observed. Situations with the underlying stochastic process having the

Markov property and where outcomes are partly random and partly under the control of a

decision maker are studied using Markov Decision Processes (MDPs).

A MDP is a discrete-time stochastic process, where a decision-maker partly controls the

transition probabilities through deliberate choices at discrete points in time. In an MDP, in

addition to the system state representation, at each state there are several actions from

which the decision maker must choose. This decision-making behaviour, consequently,

Transportation (2014) 41:397–417 399

123

influences the transition probability from one system state to another. Associated with the

decision-making capability is the concept of a reward. Rewards can be earned for each

state visited, or for each state-action pair implemented. The decision-maker in an MDP has

a goal of optimizing some cumulative function of the rewards, implying that the decision-

maker acts rationally. The main difference between Markov Processes and Markov

Decision Processes is the addition of actions (allowing choices) and rewards (giving

motivation).

If the reward and return functions are specified, then the MDP becomes an (stochastic)

optimization problem where the decision variables are the transition probabilities and the

objective function is about the maximization of the expected return. When the underlying

Markov Process is ergodic, then a unique steady state distribution exists. This means that a

unique state-transition probability matrix from state i to state j, ½Prij�, exists. It ensures that

the above formulated optimization problem has a solution and it is unique (i.e. global

optimum). A solution, or a possible probability matrix represents a policy p that the agent

can follow where p! Pr(i; jÞ. This policy guides the agent as to what actions to choose at

any given state regardless of prior history. The optimal solution represents the optimal

policy p* that maximizes the expected return. Reinforcement Learning (RL) represents an

approach to estimate the policy that optimizes the expected return. RL uses a formal

framework defining the interaction between a goal-directed agent and its environment in

terms of states, actions and rewards (Sutton and Barto 1998). It is a computational

approach to model goal-directed learning and decision-making. It uses exploitation of

experience and exploration of available actions to converge to the policy that yields the

maximum utility. The agent will choose actions at each state according to the optimal

policy p* returned. The elements of the optimal policy can then be used to study agent’s

choices and explain agent’s preferences in various environments.

The transit path choice model

Transit path choice problem as a Markovian decision process

Travelers are assumed to be goal-directed intelligent agents. A traveler has a goal of

minimizing the travel cost (e.g. time, money, inconveniences, etc.) and for some trip

purposes, a goal of maximizing the probability of arriving on a desired arrival time. In

order to achieve their goals, travelers follow policies to optimize the utility of their trip.

These policies manifest themselves through the observed travel behaviour (i.e. trip choi-

ces) of individual passengers. Assuming that travelers are rational, it is logical to expect

that they follow their optimal policy. Based on the notion that actions receive rewards (or

penalties), it is also expected that travelers value their choices according to a utility

function. Although this utility function is not directly observed by the modeler, it is known

to the individual traveler. Note that the term ‘‘return’’ or ‘‘value’’, in the context of transit

path choice modeling refers to an estimate of a random variable and is meant to reflect an

expected value or an expected return.

In the transit assignment context, a transit user is faced with various types of choices

during the trip. A passenger needs to decide on a departure time, an origin stop, a run

associated with a route to board, and possibly a connection to transfer at. In addition, in a

multimodal system, access and egress mode choices to and from the transit service may be

included. For a recurring trip such as home-based work trip, a passenger settles on the trip

400 Transportation (2014) 41:397–417

123

choices by trying (or judging) different options until certain path choices prove to be the

optimal for this particular trip’s objectives. This optimization process is based on the

passenger’s cumulative experience with the transit service performance.

We are interested in the process through which passengers settle on their trip choices.

This is important to be able to model the shifts in trip choices when changes to the transit

service are introduced. A passenger at origin has a choice set of different combinations of

departure time and path choices. Over time, a passenger chooses the combination of

departure time, origin stop, run (or route), and transfer stop that minimizes the perceived

travel cost. In order to find this combination, the passenger must have valued (tried or

judged) all other possible choices and found that the chosen combination outperforms all

other options, with reference to a utility function.

At the origin, the passenger has the objective of minimizing travel cost for the trip. At a

transfer stop, the passenger’s objective is still to minimize the travel cost for the remainder

of the trip, regardless of prior choices. This does not mean that previous choices are not

important; it is rather the relative impact of prior choices on future decisions than the actual

choices. This influence on future choices is expressed in the value of p (the choice

probability for one alternative). The value of p also depends on the transit service per-

formance. The outcome of the passenger’s choice, in turn, affects the transit service

performance through, for example, boarding a vehicle reduces the available capacity on the

chosen run by 1.

The change in the location of the transit rider during the trip represents a stochastic

process, with multiple states and transition probabilities between states. Traveling pas-

sengers move to different locations in the transit network at different points in time (e.g. at

stop, on board). The corresponding state of a stochastic process in this case is represented

by the location of the transit rider, which takes a value out of a state space. The state space

represents all possible locations for a transit rider; it consists of possible origin stop

choices, route choices, transfer stop choices and destination stop choices. A passenger at a

transfer stop has made three transitions: from origin-location to origin-stop-location

(associated with an origin stop choice), from origin-stop-location to onboard-route-location

(associated with a route/run choice), and from onboard-route-location to transfer-stop-

location (associated with a transfer stop choice). Each transition depends on the current

state of the passenger (i.e. location of the passenger) and on the transit service perfor-

mance. The passenger would need to make (at least) two more transitions to reach the trip

destination: from transfer-stop-location to onboard-route-location (associated with a route/

run choice) and from onboard-route-location to destination-stop-location and final desti-

nation. Given the present state (i.e. location), the passenger decides on future transitions.

This resembles a stochastic process with the Markov property. It should be stated that the

location information only might not be sufficient to decide on future transitions; infor-

mation related to previous transitions, such as fare paid, remaining time to scheduled

arrival time, or real-time information about the transit system performance, are important.

Instead of having to memorize previous transitions, the present state (or the system state)

should summarize all information that is assumed to be available to the passenger at any

time t.

This Markovian Stochastic Process is partly dependent on the transit service perfor-

mance and partly controlled by the transit rider. This can be analyzed as a MDP. In an

MDP, actions are rewarded and hence optimal policies can be estimated. For an origin stop

choice, there is an immediate cost expressed as the value of the travel cost to access the

origin stop (time and money). Also, there is a future value of a specific stop choice

expressed in the expected travel cost of the subsequent available route and transfer

Transportation (2014) 41:397–417 401

123

connection choices. For a route choice, there is an immediate cost expressed as the value of

waiting time. A future cost, associated with a route choice, is related to the value of

possible transfer connections and the probability of arriving at the desired arrival time.

Assuming passengers are rational, it is logical to expect passengers to follow their

optimal policy and to optimize their trip return or cost. The effect of such optimal policy is

observed through disaggregate individual choices ½Prij�observed�, and aggregate route loads

Lobserved� . The value of p (the choice probability for one alternative) represents one cell in

the state transition probability matrix ½Prij�observed�. If the underlying Markov process is

ergodic, then there exists a unique optimal policy p* that allows passengers to optimize

their return. Associated with p* is a steady-state transition probability matrix ½Prij��. Pas-

sengers devise their optimal policies based on a value function for the state-action eval-

uation. While the optimal policy p* and its effect through route loads is observed, the only

unknown to the modeler is the value function. By reconstructing the transit path choice

problem as a MDP, the value function used by individual passengers, can be estimated

(Wahba 2008). This is similar to the process of inverse reinforcement learning (IRL)

(Russell 1998).

Mental model structure

In order to ensure the uniqueness and existence of the optimal policy p* in the recon-

structed MDP, the underlying Markov process needs to be ergodic. To show that a Markov

Chain is ergodic, its state-action transition diagram has to be irreducible, positive-recur-

rent, and aperiodic.

The state-action transition diagram for the transit path choice problem is represented by

the ‘mental model’ of the relevant parts of the transit network. For simplicity of analysis,

and without losing generality, assume that the system state is only represented by the

location of the passenger during the trip. At each state s, there is a set of possible actions

ai 2 AðsÞ, available for each passenger. When a passenger is at state s and decides to take

action a, this is referred to as the state-action pair (s,a). Action a moves the passenger from

state s to state s0, expressed as a : s! s0. The reward from or the value of a state-action

pair is measured based on the travel cost association with choosing (s,a).

The schematic representation of state transitions in the underlying stochastic process is

shown in Fig. 1, where the system state variable represents the location of the passenger

during the transit trip. The state space has seven possible states: Origin O, Departure Time

T, Origin Stop G, On-board V, Off-Stop F, On-Stop N, Destination Stop N. The underlying

stochastic process has the Markov property, since the present location of the passenger is

necessary and sufficient for predicting the future location of the passenger, regardless of

prior locations.

For any transit trip, the number of states is finite (i.e. possible locations are finite). The

state OGN is a strong state; that is, any state j (where = OGN) is reachable from state OGN

in a finite number of steps. With the existence of one strong state, a Markov Chain with a

finite number of states is guaranteed to be irreducible. Therefore, the underlying Markov

process for the transit path choice problem is irreducible.

The state OGN is a positive-recurrent state, since after leaving OGN the probability of

returning back to OGN in any number of steps is 1. In a Markov Chain with a finite number

of states, the existence of one positive-recurrent state guarantees that the Markov Chain is

positive-recurrent. Therefore, the underlying Markov process for the transit path choice

problem is positive-recurrent.

402 Transportation (2014) 41:397–417

123

The state OGN has a possible self-loop transition, representing the change in departure

time choice from origin. Similarly, the OGS and ONS states have a possible self-loop

transition, representing the no-boarding decision. Such states with a possible self-loop

transition are considered aperiodic. With the irreducibility property and finite number of

states, the existence of one aperiodic state means that the underlying Markov process for

the transit path choice problem is aperiodic.

The proposed framework, using the mental model structure outlined above, ensures that

the Markov Chain representing the transit path choice problem is therefore ergodic (Wahba

2008). The ergodic property in the transportation context means that each possible com-

bination of path choices can be tried by iterative dynamics (i.e. all path options can be tried

through making the same trip repeatedly over days). It also means that there exists a unique

optimal policy p* that will optimize the return of the transit trip (or minimize the travel cost

DestinationDST

OriginOGN

Origin StopOGS

Route /RunRUT

Off-StopOFS

On-StopONS

Destination StopDNS

Stop Choice

Route Choice

Destination Stop Choice

Off-Stop Choice

On-Stop Choice

Route Choice

Departure Time Choice

Board /Wait Choice

Alight /Stay Choice

Return to Origin

Fig. 1 Schematic representation of the state transitions for the transit path choice problem

Transportation (2014) 41:397–417 403

123

in this regard) given a value function. Alternatively, given an observed optimal policy p*, a

value function can be estimated so as to regenerate the observed optimal policy (or

observed choices). If the form of the value function is assumed, then its parameters can be

calibrated to maximize the likelihood of reproducing the observed choices. When the

objective is to find the probability of choosing action a at state s, Ps(a), Reinforcement

Learning (RL) techniques provide a systematic procedure for estimating these probabili-

ties. RL algorithms attempt to find a policy p� : S! A that the agent follows and it gives

the probability PsðaÞ�. In particular, the temporal-difference (TD) Q-Learning algorithm

(Watkins 1989) is adopted in this framework.

In RL terminology, passengers need to follow a policy that defines the passenger’s

behaviour (i.e. trip choices) at a given time. Passengers need a reward function which

signals the immediate reward of a specific state-action pair. For instance, the immediate

reward of boarding a bus i from a stop j is represented by the experienced waiting time for

bus i at stop j. A value function calculates the accumulated reward over time of a specific

state-action pair. For example, the value of boarding a bus i from a stop j is calculated as

the expected travel time (out-of-vehicle and in-vehicle time) to the destination, starting

from this stop and boarding this bus. Figure 2 depicts value function components for

choosing to access a stop from origin; the immediate return associated with travel cost to

access the stop (e.g. access time, access fare, parking inconvenience, etc.) and future

reward associated with travel cost to complete the trip (e.g. waiting time, in-vehicle time,

fare, number of transfers, deviation from desired arrival time) after arriving at the stop. The

value function ensures that state-action pairs with short-term high reward but with a long-

term low value are not preferred.

In the transit assignment context, a Q-value represents the state-action utility (or called

hereafter the Generalized Cost, GC). The generalized cost for a passenger is given in

Eqs. 1 and 2.

GCðs; aÞ ¼Xn

p¼1

b̂p � Xp ¼Xk

g¼1

b̂g � Xgþ c min8a02Afs0g

GCðs0; a0Þ ð1Þ

GCðs; aÞ ½1� a� � GCðs; aÞ þ a �Xk

g¼1

b̂g � Xgþ c min8a02Afsg

GCðs0; a0Þ" #

ð2Þ

where Xg is a passenger-experienced alternative-specific (state-action) attribute,Pk

g¼1

b̂g � Xg

represents the immediate reward (or cost) for the state-action pair, c min8a02Afs0g

GCðs0; a0Þ

represents long-term expected reward for choosing this state-action pair; c is a discount

factor, which determines the importance of future rewards, a represents the learning rate,

which determines to what extent the newly acquired experience will overwrite the old

experience, A{s}represents the set of attractive actions at state s.

Note that Xg (e.g. XWT: waiting time) is a random variable, and its realization at time

step t depends on the transit network conditions and choices of other passengers. Therefore,

the GC value of a specific state-action pair is a random variable and the GC function acts in

replacement of the unobserved value function.

The choice rule follows a mixed {e� greedy, SoftMax} action-choice model, with a

(1 - e) probability of exploitation, (e) probability of exploration, and a SoftMax model

(Sutton and Barto 1998) such that Eq. 3 is for exploiting and Eq. 4 is for exploring.

404 Transportation (2014) 41:397–417

123

DestinationDST

OriginOGN

Origin StopOGS

Route/RunRUT

Off-StopOFS

On-StopONS

Destination StopDNS

Future Reward

Immediate Return

Given a Departure Time

Choice

Within-Day Service Dynamics

Existing and Predicted Conditions

Information Provision

ExpectedPerformanceby Passenger

Experienced Conditions

by Passenger

OGSDecision

Point

ATISInformation Dissemination Channels

Fig. 2 Value function components for an origin stop choice

Transportation (2014) 41:397–417 405

123

Exploiting with probability ð1� eÞ;PsðaÞ ¼1 GCðs; aÞ ¼ min

8a0GCðs; a0Þ

0 otherwise

�ð3Þ

Exploring with probability ðeÞ;PsðaÞ ¼Vðs; aÞP

8a02A Sf g V s; a0ð Þð Þ ð4Þ

whereVðs; aÞ ¼ g GC s; að Þð Þ The mixed {e� greedy, SoftMax} method means that agents

behave greedy most of the time (i.e. they exploit current knowledge to maximize imme-

diate reward) and every once in a while, with probability (e), they select an action at

random, using the SoftMax rule. The SoftMax method is used to convert action values into

action probabilities, by varying the action probabilities as a weighted function of the

estimated value. This means that the greedy action is given the highest selection proba-

bility, but all other actions are ranked and weighted according to their value estimates.

Learning-based departure time and path choice model

The underlying hypothesis is that individual passengers are expected to adjust their

behaviour (i.e. trip choices) according to their experience with the transit system perfor-

mance. Individual passengers base their daily travel decisions on the accumulated expe-

rience gathered from repetitively traveling through the transit network on consecutive days.

Travelers’ behaviour, therefore, is modeled as a dynamic process of repetitively making

decisions and updating perceptions, according to a learning process. This decision-making

process is based on a mental model of the transit network conditions. The learning and

decision-making processes of passengers are assumed in the proposed model to follow RL

principles for experience updating and choice techniques.

The mental model tree-like structure is an efficient representation of the state-action

table traditionally developed for Q-learning problems (see Fig. 3). The probability of

deciding on action a at state s is based on the accumulated experience, captured by the GC

value. The GC(s,a) is updated every time state s is visited and action a is chosen. The

GC(s,a) has two components: an immediate reward for choosing action a at state s, and an

estimated cumulative future reward for this specific state-action pair. This GC value

represents the passenger’s experience with the transit network conditions.

Previous studies refer to the path choice problem as the decision passengers make to

board a vehicle departing from a stop, at a specific point in time, which is in a set of

attractive paths serving the origin and destination for that passenger—see Hickman and

Berstein (1997) for definitions of the static and dynamic path choice problems. Why a

passenger is at this stop (i.e. origin stop choice) at this point in time (i.e. departure time

choice) is not addressed adequately in the literature as part of the path choice problem.

Often, the travel time on paths including a transfer depends on when the passenger begins

the trip, as paths which are attractive at one time may be less so later if there is a smaller

probability of making particular connections. The proposed learning-based choice model

considers the departure time choice, the stop choice and the run (or sequence of runs)

choice.

Pre-trip choice modeling

The departure time and origin stop choices are assumed to be at-home choices (i.e. pre-

trip), in which passenger agents consider available information obtained from previous

trips, in addition to pre-trip information provision (if any). Once a passenger agent arrives

406 Transportation (2014) 41:397–417

123

at a stop, the bus run choice is considered an adaptive choice, in which, besides previous

information, the passenger considers developments that occur during the trip and additional

en-route information.

At the origin, a passenger develops an initial (tentative) travel plan, based on his updated

mental model (historical experience and pre-trip information, if provided). This travel plan

includes a departure time, an origin stop (which are fixed) and a (tentative) route (or

sequence of routes). The initial plan reflects the passenger’s preferences and expectations,

and he would follow it if reality (en-route dynamics) matches to a great extent his expec-

tations (which may be different from published static schedule information).

Departure time choice

At origin (state O), on iteration d, the departure time choice is modeled according to the

mixed {e� greedy, SoftMax} action choice rule outlined in the previous section. Being at

state O, a specific departure time choice signals an immediate reward and a future return

for a passenger agent Z which can be written as, on iteration d:

GCZðO; tÞzfflfflfflfflfflffl}|fflfflfflfflfflffl{d

½1� a� � GCZðO; tÞzfflfflfflfflfflffl}|fflfflfflfflfflffl{d�1

þ a � CtZ

z}|{d

þ c � MIN8g2AZ ðTÞ

GCZðT; gÞh izfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{d�1

24

35 ð5Þ

Pre

-Trip

(at-

ori

gin

)E

n-R

oute

(ad

apti

ve)

Fig. 3 The mental model structure with different types of choices

Transportation (2014) 41:397–417 407

123

where t : O! T , and CtZ

z}|{d

¼ HtZ . Here Ht

Z is a variable representing the utility of leaving

origin at time t (e.g. auto availability, activity-schedule fitting value). This is an input to the

transit path choice model from activity-based models, if available.

CtZ

z}|{d

þ c � MIN8g2AZ ðTÞ

GCZðT; gÞh izfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{d

24

35 is the actual travel cost, which is not known on day d

before trip starts.

After completing the trip with departure time t, the GC value for agent Z at iteration

d - 1 is updated to the GC value at iteration d, reflecting the passenger agent experience

with this state-action pair. It is important to note that the choice at iteration d is based

partly on the most updated experience from iteration d - 1. Without information provi-

sion, passengers have no means of knowing the actual travel cost of their trip on iteration

d before starting the trip, however they utilize what they have leaned to provide an estimate

of travel cost. If the estimated value GC value at iteration d proves to be close to the actual

travel cost, then the associated choice is reinforced in future iterations.

When a passenger agent is at state t : O! T (i.e. after deciding on a departure time,

state T), the origin stop choice follows a similar procedure to the mixed {e� greedy,

SoftMax} action choice model, where Eq. 6 is for exploring alternatives and Eq. 7 is for

exploiting past experience.

PTðgÞzfflffl}|fflffl{d

¼

f GCðT ; gÞzfflfflfflfflffl}|fflfflfflfflffl{d�1

0@

1A

P8i2AðTÞ

f GCðT ; iÞzfflfflfflfflffl}|fflfflfflfflffl{d�1

0@

1A

ð6Þ

PT gð Þzfflffl}|fflffl{d

¼ 1 iff GCðT ; gÞzfflfflfflfflffl}|fflfflfflfflffl{d�1

¼ MIN8i2A Tð Þ

GC T ; ið Þzfflfflfflfflffl}|fflfflfflfflffl{d�1* +

0 Otherwise

8><

>:ð7Þ

Origin stop choice

The origin stop choice has an immediate cost (e.g. access time) and a future value rep-

resented by the travel cost associated with possible attractive route choice set from the

origin stop. The generalized cost for an origin stop (state G), for a passenger agent on

iteration d, is updated as follows:

GCðT ; gÞzfflfflfflfflffl}|fflfflfflfflffl{d

½1� a� � GCðT ; gÞzfflfflfflfflffl}|fflfflfflfflffl{d�1

þ a � Cgz}|{d

þc � MIN8r2AðSÞ:s¼g

GCðS; rÞh izfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{d�1

24

35 ð8Þ

where g : T ! S, and Cgz}|{d

¼Pk

r¼1 br � Xr. Here Xr represents the immediate travel cost

associated with stop g; for example, access time and access cost to travel to stop g. The

modeling of access mode choice may be introduced at this level by varying Xr as a function

of the access mode. br can be an alternative-specific parameter; for example, access time

408 Transportation (2014) 41:397–417

123

for train stations is perceived differently from bus stops. br may also be a general

parameter for all alternatives, such as the fare parameter that reflects the perceived mon-

etary cost associated with a specific choice g.

It is worth mentioning that the estimated travel cost of choosing stop g is associated

with a specific departure time t. This permits the representation of different transit network

conditions at different times during the modeling period.

En-route choice modeling

Route/run choice

When a passenger arrives at a stop s 2 fg; ng (g: origin stop, n: transfer stop), an initial

route choice is made. This route choice follows the mixed {e� greedy, SoftMax} action

choice model. The value of a run from route r choice at a stop s is updated based on the

following:

GCðS; rÞzfflfflfflfflffl}|fflfflfflfflffl{d

½1� a� � GCðS; rÞzfflfflfflfflffl}|fflfflfflfflffl{d�1

þ a � Crz}|{d

þ c � MIN8f2AðVÞ

GCðV ; f Þh izfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflffl{d�1

24

35 ð9Þ

where r : S! V , and Crz}|{d

¼Pk

j¼1

bj � Xj. Here Xj represents immediate travel cost associ-

ated with a run from route r, for example, waiting time for run r (of route R), crowding

level of run r, etc. bj can be an alternative-specific parameter; for example, waiting time for

trains is perceived differently from waiting time for buses. Crz}|{d


GCðV ; f Þh izfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflffl{d

24

35

is the service performance of route r on iteration d, unknown before making a route/run

choice.

This decision-making and experience updating mechanism reflects previous experiences

at iteration d - 1 of the chosen route and does not take into consideration the within-day

dynamics of the transit service on iteration d. In the proposed framework, the choice-rule is

therefore used in conjunction with a number of SWITCH choice rules: ‘SWITCH AND

WAIT’, ‘SWITCH AND BOARD’, ‘SWITCH AND STAY’, and ‘SWITCH AND

ALIGHT’. The first two choice rules are concerned with the adaptive behaviour when

passengers are at stops, while the latter two are used to model the adaptive behaviour when

passengers are on-board of a transit vehicle.

For a passenger waiting at stop s 2 fg; ng, there is an initial route choice r. When a

transit vehicle from the initially chosen route r arrives at stop s, the passenger’s decision

making process follows the ‘SWITCH AND WAIT’ choice rule. On iteration d, if the

transit service, considering within-day dynamics, of route r performs in accordance with or

better than the passenger’s expected travel cost for boarding route r from stop s, then the

passenger will choose to board the initially chosen route-run r and this choice will be

enforced.

Without provision of information, a passenger waiting at stop s when a vehicle arrives

from the initially chosen route r has no knowledge of the actual performance of other

attractive routes r0 2 AðsÞ on iteration d except through previous experiences, summarized

up to iteration d - 1. Therefore, when service performs according to (or better than)

Transportation (2014) 41:397–417 409

123

expectations, there is no need to re-evaluate the initial choice, as the experiences for all

attractive routes were considered in making the initial route choice.

For a transit vehicle arriving at a stop s from the initially chosen route r for a passenger,

when within-day dynamics result in transit service for route r performing not in accordance

with (i.e. worse than) passenger’s expectations, a reconsideration of the initial route choice

is warranted. In this case, the passenger’s expectation of the service performance of route r

is updated and the route choice mechanism is applied with Crz}|{d


GCðV ; f Þh izfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflffl{d�1

24

35

as a replacement of GCðS; rÞzfflfflfflfflffl}|fflfflfflfflffl{d�1

for route r. Without information provision, the passenger’s

expectations of the service performance of all other attractive routes remain a function of

previously accumulated experiences, GCðS; iÞzfflfflfflffl}|fflfflfflffl{d�1

; 8i 6¼ r. If the initial route choice still out-

performs other attractive routes (based on experience summarized up to iteration d - 1 for

other route options), the ‘SWITCH AND WAIT’ choice rule will result in passenger

choosing the initial route choice. A change in route choice using the ‘SWITCH AND

WAIT’ choice rule is observed if the within-day dynamics of route r has resulted in another

attractive route r0outperforming route r. The passenger then waits (and hence the name of

the routine) for a run from route r0

to arrive. Regardless of the output of the ‘SWITCH

AND WAIT’ choice rule, the passenger’s accumulated experience with route r at stops s,

GCðS; rÞzfflfflfflfflffl}|fflfflfflfflffl{d

, is updated based on Eq. 9 since Crz}|{d

has been already experienced.

For a passenger waiting at stop s 2 fg; ng with an initial route choice r 2 AðsÞ, the

SWITCH AND BOARD choice rule is applied when a transit vehicle arrives from route

r0 2 AðsÞ, that is not the initially chosen route r. The updated travel cost for route r0,

Cr0z}|{d

þc � MIN8f2AðVÞ

GCðV; f Þh izfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflffl{d�1

264

375, is calculated. If its updated travel cost matches the

expected value, GCðS; r0Þzfflfflfflfflffl}|fflfflfflfflffl{d�1

, then there is no need to reconsider the initial route choice r. If

the updated travel cost for route r0

is improved, then the initial route choice r needs

reconsideration. In this situation, the updated travel cost for route r0

Cr0z}|{d

þc � MIN8f2AðVÞ


264

375 is compared only to GCðS; rÞ

zfflfflfflfflffl}|fflfflfflfflffl{d�1

, without considering other

attractive routes since without information provision the passenger’s expectations

regarding their performance (GCðS; iÞzfflfflfflffl}|fflfflfflffl{d�1

; 8i 6¼ fr; r0g) remains the same, and the performance

of other routes was estimated to be inferior to the initial route choice r. If the updated travel

cost of route r0

becomes more attractive, then the ‘SWITCH AND BOARD’ choice rule

results in the passenger choosing to board route r0. Otherwise, the passenger continues to

wait for the arrival of route r (the initial route choice).

410 Transportation (2014) 41:397–417

123

Transfer choice

When a passenger is on board of a transit vehicle V of route r, there are two possibilities.

First, this route r reaches the trip destination and no more travel choices are required

(except egress mode choice, if any). In this case, the choice set, A(V), has only one

element,{des}, representing the destination stop associated with route r. Based on the

mental model structure, there can be only one destination stop associated with each route.

The immediate travel cost of deciding on alighting at a destination stop can be thought of

as, for example, the in-vehicle time and fare to reach the destination stop. There is also a

future travel cost associated with this decision; this cost includes egress time and egress

monetary cost. When the passenger arrives at destination, the schedule delay is calculated

representing the deviation from the scheduled arrival time.

Since no more trip decisions are needed at destination, the destination location repre-

sents the terminating-state of the underlying stochastic process. At this state, the recursive

calculations terminate and are propagated backwards to update expectations for the

departure time choice made at the beginning of the trip. The GCðV ; desÞ is updated based

on the following equation:

GCðV; desÞzfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{d

½1� a� � GCðV; desÞzfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{d�1

þ a � Cdesz}|{d

þ c � Xdesz}|{d

264

375 ð10Þ

where des:V ! D; Cdesz}|{d

¼Pk

j¼1

bj � Xj; and Xdesz}|{d

¼Pk

a¼1

ba � Xa: Here Xj represents immedi-

ate travel cost associated with the destination stop, for example, in-vehicle time to reach

the destination stop, fare to reach the destination stop. bj can be an alternative-specific

parameter; for example, in-vehicle time for subways is perceived differently from in-

vehicle time for buses. Xa represents future travel cost associated with the destination stop,

for example, egress time and egress monetary cost to reach the trip destination from the

destination stop. It also represents the schedule delay associated with trip choices. ba

reflects the various perceptions with regard to future travel cost components. For instance,

the early/late schedule delay values.

It is worth mentioning that the destination stop choice {des} is linked to a route r. The

route choice is associated with a stop choice s, which is for a departure time t. This means

that the experience for path choices is relevant to the departure time choice. It is not

uncommon that passengers make different trip decisions for different origin departure time

choices. The transit service performance is observed to be time-dependent and passengers

have time-dependent expectations about path choices, hence the departure time choice. In

this formulation, schedule delay is dependent on both the departure time and trip choices.

This can be noted from Eq. 5; schedule delay does not directly appear in the travel cost

calculation for departure time choice and is implicitly included through the term

MIN8g2AZ Tð Þ

GCZðT; gÞh izfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{d�1

by recursive computations.

For on-board passengers, a transfer connection may be needed to reach the trip desti-

nation. In such situations, a passenger decides on an initial off-stop choice upon boarding

the transit vehicle V of route r following the same mixed {e� greedy, SoftMax} action-

choice model. The GC value for the off-stop choice is updated as follows:

Transportation (2014) 41:397–417 411

123

GCðV; f Þzfflfflfflfflffl}|fflfflfflfflffl{d

½1� a� � GCðV; f Þzfflfflfflfflffl}|fflfflfflfflffl{d�1

þ a � Cfz}|{d

þ c � MIN8n2Aðf Þ

GCðF; nÞh izfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflffl{d�1

264

375 ð11Þ

where f:V ? F, and Cfz}|{d

¼Pk

j¼1

bj � Xj. Here Xj represents immediate travel cost associated

with off-stop f: for example, in-vehicle time to each off-stop f, number of transfers to reach

destination from off-stop f, etc. bj can be an alternative-specific parameter (in-vehicle time

for subways is perceived differently from in-vehicle time for buses) or a general-parameter

(transfer penalty).

This initial off-stop choice is based on previous experiences’ generalized cost, and does

not reflect within-day dynamics of the transit service. The ‘SWITCH AND STAY’ and

‘SWITCH AND ALIGHT’ choice rules capture on-board passengers’ adaptive behaviour

in response to service dynamics.

When the transit vehicle arrives at the initially chosen off-stop f, and when the current

experience for the chosen off-stop matches expectations, the passenger choice is to alight

at the initially chosen off-stop f according to the ‘SWITCH AND STAY’ choice rule. In

other situations, passengers may experience an unexpected travel cost associated with the

initial off-stop choice f (e.g. an unexpected delay). This warrants the re-evaluation of the

initial off-stop choice since it may not be the optimal choice anymore when compared with

other attractive off-stop options.

When the transit vehicle arrives at a feasible off-stop f 0 2 AðVÞ; f 0 6¼ f (where f is the

passenger’s initial off-stop choice), the ‘SWITCH AND ALIGHT’ choice rule is applied

and a passenger adjusts en-route transfer choices according to service dynamics (either by

alighting at the current stop f 0 or staying on-board for the arrival at the initially chosen off-

stop f).

Regardless of the decisions made according to the ‘SWITCH AND STAY’ (‘SWITCH

AND ALIGHT’) choice rule, the passenger’s accumulated experience with off-stop f (off-

stop f0), GCðV; f Þzfflfflfflfflffl}|fflfflfflfflffl{d

(GCðV; f 0Þzfflfflfflfflfflffl}|fflfflfflfflfflffl{d

), is updated based on Eq. 11 since Cfz}|{d

has been already

experienced.

When a passenger alights at an off-stop f 2 AðVÞ, a decision concerning the transfer on-

stop, n 2 AðFÞ, is made. This decision is based on the expected travel cost associated with

each possible transfer on-stop. These expectations reflect previous experiences and choices

are made following a similar decision-making behaviour outlined for previous choices.

When a passenger arrives at a transfer on-stop n, an initial route choice is made. The

boarding decision is dependent on the within-day dynamics of the transit service as

explained before.

There could be an immediate penalty associated with the choice of a transfer on-stop n

such as inconvenience due to walking or crossing multiple intersections to reach the on-

stop n. There is also a future travel cost related to the transfer on-stop choice expressed by

the expected utility of attractive routes r 2 AðSÞ; s ¼ n. The travel cost associated with a

transfer on-stop n choice is updated as:

GCðF; nÞzfflfflfflfflffl}|fflfflfflfflffl{d

½1� a� � GCðF; nÞzfflfflfflfflffl}|fflfflfflfflffl{d�1

þ a � Cnz}|{d

þ c � MIN8r2AðSÞ�s¼n

GCðS; rÞh izfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{d�1

24

35 ð12Þ

412 Transportation (2014) 41:397–417

123

where n:F ? S, and Cnz}|{d

¼Pk

r¼1

bj � Xj. Here Xj represents the immediate travel cost

associated with an on-stop choice n; for example, transfer walk time. bj reflects the pas-

senger’s perception of travel cost associated with on-stop choices, e.g. transfer penalty.

In summary, the proposed model assumes individual passengers as decision makers,

who choose a departure time, an origin stop, a destination stop and a route (or a sequence

of routes) between a given pair of an origin and a destination each day. This decision-

making behaviour, as explained, consists of a two-step process: making choices and

updating perceptions. When information on service performance (via real-time information

provision system or information sharing with other users) is not provided, the step of

making choices precedes perception updating. Perception updating occurs only when the

state-action pair is visited. Pre-trip decisions are treated as fixed choices, while en-route

choices are adaptive.

Notes on the proposed model

A note on the modeling of information provision

The benefits of Advanced Traveler Information Systems (ATIS) applications can be

assessed by comparing the path choice behaviour and time savings of informed passengers

versus non-informed ones. However, conventional transit assignment models (e.g. strat-

egy-based models) assume that passengers have full information about the network con-

ditions and infinite information processing capabilities; this is referred to as the ‘‘perfect

knowledge of network’’ assumption. These models are not appropriate for evaluating

information provision policies since information regarding network conditions would be

assumed to be available to all passengers. The emergence and increased deployment of

ATIS make it practically important to relax the assumption of perfect information or

perfect knowledge in transit assignment studies.

The system performance on iteration d is dependent on choices of other passengers;

these choices are not known, to any passenger, apriori. In the proposed model and with no

provision of real-time information, passengers do not have perfect knowledge about the

system performance on iteration d. Instead, passengers form their expectations about the

performance of the transit service based on accumulated experiences summarized up to

iteration d - 1 (see Fig. 2).

Under information provision, perception update precedes decision making (see Fig. 2).

That is, ‘mental model’ content is updated before making choices: Caz}|{d

can now be

obtained for all actions a 2 AðSÞ and PS(A) is computed as f GCðs; aÞzfflfflfflfflffl}|fflfflfflfflffl{d

0@

1A, compared to

PSðaÞ ¼ f GCðs; aÞzfflfflfflfflffl}|fflfflfflfflffl{d�1

0@

1A when real-time information is not provided. This is also reflected in

the SWITCH choice rules calculations. For example, the ‘SWITCH AND WAIT’ choice

rule compares Crz}|{d



24

35 and GCðS; iÞ

zfflfflfflffl}|fflfflfflffl{d�1

; 8i 6¼ r, when information is

Transportation (2014) 41:397–417 413

123

not provided. GC S; ið Þzfflfflfflffl}|fflfflfflffl{d�1

; 8i 6¼ r represents passengers’ expectations for the unknown per-

formance of other attractive routes, in iteration d. With information provision, Ciz}|{d

; 8i 6¼ r

is now obtainable (see Fig. 4).

Based on the above, information provision will become effective when:

• great variability in service performance exists and average GC values are not

representative of state-action pair values,

• non-recurrent congestion occurs—due for example to traffic incidents—where

GCðS; iÞzfflfflfflffl}|fflfflfflffl{d�1

; 8i 6¼ rare not representative of the within-day network dynamics in iteration

d, and

• non-familiar travelers’ expectations are not consistent with the network performance.

The proposed framework allows for the evaluation of the impact of pre-trip and en-route

information provision by comparing passengers’ choice profiles (trip choices and usage of

SWITCH choice rules) in the case of absence of information and the case of information

provision.

A note on the classification of the proposed choice model

The decision rule (or choice rule) in the proposed model is stochastic (mixed e� greedy,

SoftMax} action choice model), allowing exploitation and exploration of individual’s

experience. The utility (or generalized travel cost) term is formed by the passenger, and it

has no random error component (see Eq. 2). Therefore, the proposed model can be clas-

sified as a bounded rational model, with a constant utility term and a stochastic decision

rule. On the other hand, random utility models are characterised by a deterministic decision

rule and a stochastic utility term.

When the decision making process of transit travelers (see Fig. 3) is modeled as a

nested-logit model (or a cross-nested logit model), the parameter estimation process in the

decision-making tree is performed from the bottom nest all the way up to the root. At the

run choice-nest, for example, the parameter estimation procedure forgets (or does not

consider) upper level choice-nests (e.g. stop choice-nest) and takes into consideration

further nests, assuming that information needed to make the run choice is preserved at the

run choice-nest level. This in many ways is analogous to the memoryless property of the

MDP.

Framework applications

Wahba and Shalaby (2011) report on a full-scale implementation of MILATRAS,

including the proposed departure time and path choice model of this paper, using the

Toronto Transit Commission (TTC) system as a case study. It presents a large-scale real-

world application; the TTC AM network has 500 branches with more than 10,000 stops

and about 332,000 passenger-agents. The choice model parameters were calibrated such

that the entropy of the simulated route loads was optimized with reference to the observed

route loads, and validated with individual choices. The modeled route loads, based on the

calibrated parameters, greatly approximate the distribution underlying the observed loads.

414 Transportation (2014) 41:397–417

123

In this application, transit passengers were assumed to plan their transit trip based on their

experience with the transportation network; with no prior (or perfect) knowledge of service

performance.

Wang et al. (2010) compared the proposed framework, MILATRAS, with two con-

ventional transit assignment approaches, namely EMME/2 Transit Assignment Module and

MADITUC. MILATRAS was shown to perform comparatively and to produce base case

At State S

Choice a

Based on individual’s expectations of other choices a’ and current

performance of choice a

Update Mental Model

based on individual’s experience (EX) with action a

Service Dynamics

(a) Traveler Behaviour - without information provision

At State S

Choice a

based on updated expectations

Update Mental Model

based on individual’s experience (EX) with action a

Service Dynamics

Update Mental Model

based on provided information (IP)

ATIS

Update Confidence in ATIS ( )

based on individual’s experience (EX) and Information Provision (IP)

(b) Traveler Behaviour – with information provision

Fig. 4 Modeling of traveler behaviour a without information provision and b with information provision

Transportation (2014) 41:397–417 415

123

steady-state run loads and stop loading profiles. MILATRAS, however, presents a policy-

sensitive platform for modeling and evaluating transit-ITS deployments; a real challenge to

conventional tools.

The proposed framework was used to investigate the impact of various scenarios of

information provision on transit riders departure time and path choices, and network

performance. The results show that, for a medium-size transit network with low to medium

frequency services, the stop and departure time choices seem to be more important than the

run choice, as they significantly affect the trip time.

Another area of application is emissions modeling. The proposed framework, MILA-

TRAS, was integrated within a multi-modal framework to evaluate transit emissions on a

link-based mesoscopic level for the TTC network using time-dependent network loading

profiles (Lau et al. 2010).

Another study (Kucirek 2012) investigated fare-based assignment for inter-municipal,

cross-regional trips using three different operators (with different fare) using the proposed

framework and comparing assignment results to surveyed trip-record data and results from

a fare-based travel demand forecasting model implemented in EMME2. The results show

that the conventional approach over-predicts number of transfer between operators, while

MILATRAS is capable of accurately estimating transfers between operators and capturing

fare-interactions between local (e.g. Mississauga Transit) and premium (e.g. GO Transit)

transit services.

Future research

A mode choice component will be integrated into the overall modeling framework, using

the learning-based approach. Future efforts will be directed to the modeling of passengers’

travel choices in a multimodal network, incorporating the access and egress mode choices.

Also, the growing trend in using smart cards (e.g. Presto Card in the Greater Toronto Area)

provides a rich source of data on transit departure time and path choices in particular and

multi-modal trips in general. Such data sources will present great opportunities for cali-

brating and validating dynamic transit path choice models.

Acknowledgments This research was supported by the Natural Sciences and Engineering ResearchCouncil (NSERC) and the Ontario Graduate Scholarship (OGS) Program.

References

Arentze, T.A., Timmermans, H.J.P.: Modeling learning and adaptation processes in activity-travel choice: aframework and numerical experiments. Transportation 30, 37–62 (2003). doi:10.1023/A:1021290725727

Cantarella, G.E., Cascetta, E.: Dynamic processes and equilibrium in transportation networks: towards aunifying theory. Transp. Sci. 29, 305–329 (1995). doi:10.1287/trsc.29.4.305

Cascetta, E., Cantarella, G.E.: A day-to-day and within-day dynamic stochastic assignment model. Transp.Res. A 25, 277–291 (1991). doi:10.1016/0191-2607(91)90144-F

Davis, G.A., Nihan, N.L.: Large population approximations of a general stochastic traffic assignment model.Oper. Res. 41, 169–178 (1993). doi:10.1287/opre.41.1.169

de Palma, A., Marchal, F.: Real cases applications of the fully dynamic METROPOLIS tool-box: anadvocacy for large-scale mesoscopic transportation systems. Netw Spat. Econ. 2, 347–369 (2002).doi:10.1023/A:1020847511499

416 Transportation (2014) 41:397–417

123

http://dx.doi.org/10.1023/A:1021290725727

http://dx.doi.org/10.1023/A:1021290725727

http://dx.doi.org/10.1287/trsc.29.4.305

http://dx.doi.org/10.1016/0191-2607(91)90144-F

http://dx.doi.org/10.1287/opre.41.1.169

http://dx.doi.org/10.1023/A:1020847511499

Ettema, D., Tamminga, G., Timmermans, H.J.P., Arentze, T.: A micro-simulation model system ofdeparture time using a perception updating model under travel time uncertainty. Transp. Res. A 39,325–344 (2005). doi:10/1016/j.tra.2004.12.002

Hazelton, M., Watling, D.: Computation of equilibrium distributions of Markov traffic assignment models.Transp. Sci. 38, 331–342 (2004). doi:10.1287/trsc.l030.0052

Hickman, M., Berstein, D.: Transit service and path choice models in stochastic and time-dependent net-works. Transp. Sci. 31, 129–146 (1997). doi:10.1287/trsc.31.2.129

Kucirek, P., Wahba, M., Miller, E.J.: Fare-based transit assignment models: comparing the accuracy ofstrategy-based aggregate model EMME, against microsimulation model MILATRAS, using tripscrossing the Mississauga-Toronto boundary. University of Toronto DMG Research Reports. http://www.dmg.utoronto.ca/pdf/reports/research/kucirekp_gta_transitfares.pdf (2012). Accessed March 2012

Lau, J., Hatzopoulou, M., Wahba, M., Miller, E.J.: An integrated multi-model evaluation of transit busemissions in Toronto. J. Transp. Res. Rec. 2216, 1–9 (2010). doi:10.3141/2216-01

Nakayama, S., Kitamura, R., Fuji, S.: Drivers’ learning and network behaviour: a dynamic analysis of thedriver-network system as a complex system. Transp. Res. Rec. 1676, 30–36 (1999). doi:10.3141/1676-04

Nuzzolo, A., Russo, F., Crisalli, U.: A doubly dynamic schedule-based assignment model for transit net-works. Transp. Sci. 35(3), 268–285 (2001). doi:10.1287/trsc.35.3.268.10149

Nuzzolo, A., Russo, F., Crisalli, U.: Transit Networking Modelling: The Schedule-Based ApproachDynamic Approach. FrancoAngeli, Milan (2003)

Petersen, K.: Ergodic Theory, 1st edn. Cambridge University Press, Cambridge (1990)Roorda, M.J., Miller, E.J.: Assessing transportation policy using an activity-based microsimulation model of

travel demand. ITE J. 76, 16–21 (2006)Rossetti, R.J.F., Liu, R.: A multi-agent approach to assess drivers responses to pre-trip information systems.

J. Intell. Transp. Syst. 9, 1–10 (2005). doi:10.1080/15472450590912529Russell, S.: Learning agents for uncertain environments (extended abstract). In: COLT’ 98 Proceedings of the

Eleventh Annual Conference on Computational Learning Theory (1998). doi:10.1145/279943.279964Salvini, P., Miller, E.J.: ILUTE: an operational prototype of a comprehensive microsimulation model of

urban systems. Netw. Spat. Econ. 5, 217–234 (2003). doi:10.1007/s11067-005-2630-5Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)Timmermans, H.J.P., Arentze, T., Ettema, D.: Learning and adaptation behaviour: empirical evidence and

modeling issues. In: Proceedings ITS Conference, Eindhoven (2003)Wahba, M.: MILATRAS: MIcrosimulation Learning-based Approach to Transit Assignment. University of

Toronto, Dissertation (2008)Wahba, M., Shalaby, A.: MILATRAS: a new modeling framework for the transit assignment problem. In:

Wilson, N.H.M., Nuzzolo, A. (eds.) Schedule-based Modeling of Transportation Networks: Theory andApplications, pp. 171–194. Springer, New York (2009)

Wahba, M., Shalaby, A.: Large-scale application of MILATRAS: case study of the Toronto transit network.Transportation 38, 889–908 (2011). doi:10.1007/s11116-011-9358-5

Wang, J., Wahba, M., Miller, E.J.: Comparison of agent-based transit assignment procedure with conven-tional approaches. J. Trans. Res. Rec. 2175, 47–56 (2010). doi:10.3141/2175-06

Watkins, C.J.C.H.: Learning from delayed rewards. Dissertation, University of Cambridge (1989)Watling, D.: Asymmetric problems and stochastic process models of traffic assignment. Transp. Res. B 30,

339–357 (1996). doi:10.1016/0191-2615(96)00006-9

Author Biographies

Mohamed Wahba is an Associate at IBI Group and an Adjunct Professor at the University of Toronto. Dr.Wahba’s research activities include the modelling of multimodal networks; optimizing the utilization oftransport multimodal infrastructure in emergency evacuation situations; estimation of transit networkemissions using Microsimulation; agent-based travel behaviour models; and the modelling of transit-ITSdeployments and their influence on traveller’s behaviour. Mohamed has a Master of Philosophy degree fromCambridge University, UK, in Management Sciences, and Master of Applied Science and PhD degrees inCivil Engineering (Transportation) from the University of Toronto, Canada.

Amer Shalaby is a Professor of Transportation Engineering and Planning at the University of Toronto. Hespecializes in transit planning and ITS applications to transit. Dr. Shalaby is an appointed member of the USTransportation Research Board committees ‘‘Emerging and Innovative Public Transport and Technologies’’,‘‘Bus Transit Systems’’ and ‘‘Light Rail Transit’’. He obtained his M.A.Sc. and Ph.D. degrees from theUniversity of Toronto.

Transportation (2014) 41:397–417 417

123

http://dx.doi.org/10/1016/j.tra.2004.12.002

http://dx.doi.org/10.1287/trsc.l030.0052

http://dx.doi.org/10.1287/trsc.31.2.129

http://www.dmg.utoronto.ca/pdf/reports/research/kucirekp_gta_transitfares.pdf

http://www.dmg.utoronto.ca/pdf/reports/research/kucirekp_gta_transitfares.pdf

http://dx.doi.org/10.3141/2216-01

http://dx.doi.org/10.3141/1676-04

http://dx.doi.org/10.1287/trsc.35.3.268.10149

http://dx.doi.org/10.1080/15472450590912529

http://dx.doi.org/10.1145/279943.279964

http://dx.doi.org/10.1007/s11067-005-2630-5

http://dx.doi.org/10.1007/s11116-011-9358-5

http://dx.doi.org/10.3141/2175-06

http://dx.doi.org/10.1016/0191-2615(96)00006-9

learning-based framework for transit assignment modeling under information provision

Documents