analysis of automated fare collection data from montevideo ... · ii . analysis of automated fare...

Analysis of Automated Fare Collection Data from Montevideo, Uruguay for Planning Purposes

by

Catalina Parada Hernandez

A thesis submitted in conformity with the requirements for the degree of Master of Applied Science

Department of Civil & Mineral Engineering University of Toronto

© Copyright by Catalina Parada Hernandez 2018

ii

Analysis of Automated Fare Collection Data from Montevideo, Uruguay for Planning Purposes

Catalina Parada Hernandez

Master of Applied Science

Department of Civil & Mineral Engineering

University of Toronto

2018

Abstract

Automated Fare Collection (AFC) and smartcard systems have been rapidly adopted by transit

systems all over the world. The datasets produced by these systems are extensive and can be

analyzed for planning purposes and system evaluation. This thesis analyzes the AFC data from

the bus transit system in Montevideo, Uruguay and proposes methods to reconstruct itineraries,

identify the alighting locations of transactions, and understand travel behaviour of smartcard

users. The methods used successfully build itineraries for 97% of bus runs, identify 87.7% of

alighting locations, and recreate 67.5% of complete trip chains of smartcard users.

The complete trip chain data is then used to pair smartcards with individuals from the

Montevideo travel survey. As the trip chain and travel survey datasets have different parameters,

spatial and temporal windows were used to enable matching. Only 10 to 15% of individuals from

the survey could be paired with smartcards. These findings have important implications for

incorporating AFC datasets in transportation planning and evaluating transit systems.

iii

Acknowledgements

This thesis is the result of many hours of work, failures followed by successes, and unexpected

challenges. While the scope of the work and the obstacles changed, the people supporting me did

not. One wise woman once said, “behind every successful woman there is a tribe” and I could

not relate more.

The closest members of my tribe are, of course, my family. I would like to thank my parents,

Yenny and Ricardo, and my brothers, German and Sergio, for always being a text or call away

and willing to offer advice or simply listen to me talk about anything from research to social

injustice. You have supported me in so many different ways, and I would not be here or be the

person I am without you, literally.

To my supervisor and mentor Eric J. Miller, my sincere gratitude for trusting me with this pilot

project in Montevideo. You have become a mentor, providing me with advice in several facets of

life, and with unending support to my sometimes-unusual ideas and approaches.

I would also like to thank the funding and data providers, and some collaborators of the project –

Corporación Andina de Fomento (CAF), Diego Hernandez, Antonio Mauttone, and Verónica

Orellano – for their invaluable support in accessing and interpreting the data.

I am also very grateful to the newer members of my tribe, my friends and colleagues, for their

unofficial guidance and advice, and for creating such a welcoming and friendly environment.

Special thanks to my partner Donelle and closest friends Laura, Chris, Alex, Brittany, and

Gozde. Donelle, for knowing me better than anyone else and thus providing me with food

incentives, strategies, and support to always do my best. Laura and Alex, for all the picnics,

weekend adventures, and workouts; Chris, for showing me where to get past exams and free

pizza, and for helping me navigate through the University’s complicated structure. Gozde, for

being my Turkish sister, and Brittany, for organizing the tennis and volleyball games.

Finally, the feline support of rescue cats Ariel and Mia for keeping me company during the

writing process and reminding me of the important things in life, such as petting them.

iv

Table of Contents

Table of Contents ........................................................................................................................... iv

List of Tables ................................................................................................................................. vi

List of Figures .............................................................................................................................. viii

List of Abbreviations .......................................................................................................................x

Introduction .....................................................................................................................1

1.1 Study Objective ....................................................................................................................1

1.2 Study Motivation .................................................................................................................1

1.3 Thesis Structure ...................................................................................................................2

Literature Review ............................................................................................................4

2.1 Analysis of Smartcard Data for Estimating Trip Destinations ............................................4

2.2 Analysis of Smartcard Data for Transit Operations and Performance Measures ................7

2.3 Transit User Regularity ........................................................................................................8

2.4 Integration of Smartcard Data with Travel Surveys ............................................................9

Data ...............................................................................................................................11

3.1 Data Description ................................................................................................................12

3.1.1 Boarding Records......................................................................................................12

3.1.2 Bus lines and branches ..............................................................................................14

3.1.3 MHMS ......................................................................................................................15

3.2 Data Analysis for Monday, August 15 ..............................................................................17

Data Preparation ............................................................................................................21

4.1 Preliminary query (Query #1) ............................................................................................21

4.2 Method for invalid bus runs ...............................................................................................22

4.3 Query for OD estimation and incorporation strategies (Query 2) .....................................24

4.4 Data cleaning MHMS ........................................................................................................26

Acknowledgements .....................................................................................................,.................. iii

v

Building Itineraries from Boarding Transactions ..........................................................27

5.1 Method ...............................................................................................................................27

5.2 Results ................................................................................................................................31

Origin and Destination Estimation ................................................................................38

6.1 Method ...............................................................................................................................38

6.1.1 Incorporation of smartcard transactions without alighting location identified and

single riders ............................................................................................................41

6.1.2 Incorporation of no-card users ..................................................................................42

6.2 Results ................................................................................................................................43

6.2.1 Analysis for smartcard users .....................................................................................43

6.2.2 Analysis for no-card users ........................................................................................48

6.2.3 Spatial analysis of travel behaviour ..........................................................................52

Analysis of Travel Survey Riders and Smartcard Users ...............................................63

7.1 Method ...............................................................................................................................63

7.2 Results ................................................................................................................................64

7.2.1 Comparison of MHMS with smartcard data .............................................................64

7.2.2 Pairing MHMS individuals with STM cards ............................................................67

Discussion and Conclusions ..........................................................................................70

Limitations and Future Work ........................................................................................73

References ......................................................................................................................................75

Appendices .....................................................................................................................................80

Appendix A - STM Card Types ................................................................................................80

Appendix B - Results for all days .............................................................................................81

Appendix C - Details of algorithm ............................................................................................86

vi

List of Tables

Table 3-1 Boarding records and descriptive statistics .................................................................. 13

Table 3-2 Boardings per STM card type for Monday, August 15th ............................................. 14

Table 3-3 Bus branches and bus UIDs .......................................................................................... 15

Table 3-4 Trips and legs of trips in the MHMS ............................................................................ 16

Table 4-1 Query criteria for smartcard data .................................................................................. 21

Table 4-2 Smartcard data - Query #1 ............................................................................................ 22

Table 4-3 Bun run classification ................................................................................................... 23

Table 4-4 Query criteria for OD estimation .................................................................................. 25

Table 4-5 Smartcard data - Query #2 ............................................................................................ 25

Table 4-6 No-card data - Query #2 ............................................................................................... 26

Table 4-7 Query criteria for MHMS ............................................................................................. 26

Table 4-8 MHMS - Query results ................................................................................................. 26

Table 5-1 Sample itinerary built from passenger transactions ...................................................... 29

Table 5-2 Itinerary building strategies for special cases ............................................................... 30

Table 5-3 Passenger service time ranges ...................................................................................... 34

Table 5-4 Characterization of stops with service time over 10 seconds ....................................... 34

Table 6-1 Criteria for OD incorporation of Smartcard users ........................................................ 42

Table 6-2 OD estimation results ................................................................................................... 44

Table 6-3 Alighting estimation rate based on time period ............................................................ 45

vii

Table 6-4 Alighting estimation rate based on bus run .................................................................. 45

Table 6-5 Trip chains based on STM card type ............................................................................ 46

Table 6-6 Assignment of alighting location to transactions with missing alighting location ....... 47

Table 6-7 Assignment of alighting location to single riders ......................................................... 47

Table 7-1. Comparison of legs and trips for MHMS and STM data ............................................ 65

Table 7-2. Strategies to compute boarding and alighting times.................................................... 67

Table 7-3 MHMS identification of individuals for different temporal windows ......................... 68

Table 7-4 Weekly analysis of MHMS and STM card pairs.......................................................... 69

viii

List of Figures

Figure 1-1 Organization of the thesis .............................................................................................. 3

Figure 2-1.Trips destinations and legs of trips................................................................................ 5

Figure 3-1 Census segments served by STM ................................................................................ 11

Figure 3-2 Transit trip data collected in MHMS (Montevideo et al., 2016) ................................. 16

Figure 3-3 Temporal distribution of STM card transactions ........................................................ 17

Figure 3-4 Temporal distribution of transactions without card .................................................... 18

Figure 3-5 Transactions for STM (left) and no-card (right) per time period ................................ 19

Figure 3-6 Transactions for STM cards ........................................................................................ 19

Figure 3-7 Transfers per STM card .............................................................................................. 20

Figure 4-1 Valid bus run assigned to invalid run .......................................................................... 23

Figure 4-2 Frequency of validation of invalid bus runs ................................................................ 24

Figure 5-1 Temporal distribution of dwell times .......................................................................... 32

Figure 5-2 Dwell time (minutes) vs. Passenger boardings ........................................................... 33

Figure 5-3 Stops with high dwell time .......................................................................................... 35

Figure 5-4 Example of itinerary .................................................................................................... 37

Figure 6-1 Schematic example of transactions for a smartcard user ............................................ 39

Figure 6-2 STM cards with similar transactions on the other weekdays for transactions with

unknown alighting location (left) and single riders (right) ........................................................... 48

Figure 6-3 Alighting location assignment to no-card passengers ................................................. 50

Figure 6-4 Bus loading profile for all passengers ......................................................................... 51

ix

Figure 6-5 AM Trip origins .......................................................................................................... 53

Figure 6-6 AM Trip destinations .................................................................................................. 54

Figure 6-7 AM boardings for no-card users ................................................................................. 55

Figure 6-8 AM alightings for no-card users ................................................................................. 56

Figure 6-9 PM Trip origins ........................................................................................................... 57

Figure 6-10 PM Trip destinations ................................................................................................. 58

Figure 6-11 AM Transfers ............................................................................................................ 59

Figure 6-12 PM Transfers ............................................................................................................. 60

Figure 7-1 Histograms of bus trips. Queried data (left) and using all MHMS data retrieved from

CAF et al. (2017) (right) ............................................................................................................... 65

Figure 7-2 Histogram of legs and trips for MHMS and STM data ............................................... 66

Figure 7-3 Comparison of trip frequency for MHMS and STM data ........................................... 67

x

List of Abbreviations

AFC – Automated Fare Collection

AMMON – Metropolitan Area of Montevideo (Translated from Spanish: Area Metropolitana de

Montevideo)

MHMS – Montevideo Household Mobility Survey

OD – Origin and Destination

STM – Metropolitan System of Transportation (Translated from Spanish: Sistema de Transporte

Metropolitano)

1

Introduction

1.1 Study Objective

The objective of this study is to analyze the capabilities of smartcard data in transit systems for

planning purposes and system operation metrics. These capabilities are explored from a

methodological perspective for the specific context of Montevideo, Uruguay, using data that has

not been analyzed for planning purposes before. There are three main objectives of this study:

1. Estimate the alightings location of boarding transactions of smartcard and non-smartcard

passengers.

2. Build itineraries and evaluate operations metrics at bus stops.

3. Analyze and compare the smartcard data with the transit riders from the 2016

Montevideo Home Mobility Survey (MHMS).

1.2 Study Motivation

Automated Fare Collection (AFC) and smartcard systems have been rapidly adopted by transit

systems all over the world. These systems benefit the transit operators and the passengers alike.

Even though these systems were built to facilitate fare collection (Trepanier, Tranchant, &

Chapleau, 2007) and for user convenience, the data stored can be used for multiple purposes.

AFC systems passively and continuously collect details of transit transactions, creating large

datasets of transit trips. The smartcard data is of particular interest as the transactions for each

card can be identified throughout days and longer periods of time. Moreover, the smartcard data

can be useful to transportation planners due to the large sample size of transit riders that use

smartcards (Hickman, 2017). This data has a variety of uses, including serving short and long-

term planning strategies, and complementing transit system operation, development, and

evaluation strategies (Schmöcker, Kurauchi, & Shimamoto, 2017).

This new method for collecting data has been explored by researchers for several transit systems.

There are challenges in working with this data for planning and operational purposes, such as

inconsistencies in network data (Hemily, 2015), absence of user demographic characteristics and

trip purpose (Schmöcker et al., 2017), and errors in the AFC systems that are reflected in the

2

quality of the data. Yet, due to differences of transit systems in terms of the data collected,

characteristics of the network, and sources of data available, there are also particular challenges

of working and analyzing AFC and smartcard data.

This study acknowledges the general and the particular challenges for the transit system Sistema

de Transporte Metropolitano (STM) in Montevideo. Using exclusively AFC data and the

network characteristics, this study proposes strategies to deal with the data limitations and

methods to process and analyze the AFC data for the STM. The methods proposed create

information for planning purposes and evaluation of the system and transit network.

1.3 Thesis Structure

The remainder of this thesis is structured into nine chapters that are organized systematically and

described as follows. Chapter 2 summarizes previous work with smartcard data for planning and

system operation purposes, and for integration with household travel surveys. Chapter 3 then

presents a quantitative description of the data available for this study and a through analysis of

the data for a single day to understand travel behaviour characteristics.

Chapter 4 describes the data preparation (cleaning and validation) procedures, which consists on

selecting Montevideo Home Mobility Survey (MHMS) transit riders and applying two queries

for the boarding records. This data is used in Chapters 5, 6, and 7. The relationships between the

data and an overall description of Chapter 4 and the subsequent chapters are shown in Figure

1-1. Chapter 5 contains the method to build itineraries, identifying stops with high dwell times

and presenting a sample of the built itineraries. Chapter 6 contains the method to identify

alighting locations for smartcard users based on their daily transactions and the strategies to

estimate the alighting location of particular cases of smartcards (single transactions or with

incomplete trips) and of no-card users. In addition, this chapter presents the results of these

procedures spatially and temporally.

This chapter is followed by Chapter 7, in which the results for smartcard users with complete

trips are compared to those of transit riders in the MHMS. This chapter also contains a method to

pair MHMS individuals with smartcards based on the location and time of their transit trips.

Lastly, Chapter 8 discusses the results and presents the conclusions and Chapter 9 mentions the

limitations of the study and future work.

3

Figure 1-1 Organization of the thesis

4

Literature Review

This literature review presents various studies that process smartcard data and serve as guides for

developing and applying methods to the smartcard data in Montevideo. The first part of the

review describes strategies to identify the destinations of public transit users based on their

smartcard transactions. The next part include works that evaluate the transit system’s

performance and compute operational metrics, highlighting the added value of using smartcard

data. The third presents strategies that have been proposed to quantify and analyze transit

ridership regularity; and the last part describes the efforts to integrate daily transactions of

smartcards with the reported travel in public transit in travel surveys

2.1 Analysis of Smartcard Data for Estimating Trip Destinations

One of the most interesting applications of smartcard data for transportation planning is the

determination of Origin-Destination (OD) matrices for public transit. For public transit systems

where passengers only validate their card while boarding (tap-on systems), researchers have

proposed methods for estimating alighting locations using the subsequent transactions of

passengers for a given day.

The earliest methods were developed by Barry, Newhouser, Rahbee, & Sayeda (2002) and

Trepanier et al. (2007). Several other researchers have adjusted these methods to improve the

alighting estimation, incorporate other sources of data such as AVL (Automated Vehicle

Location), and account for multi-modal transit systems, including Seaborn, Attanucci, & Wilson

(2009), Gordon (2012), and M. A. Munizaga & Palma (2012). There are some common

assumptions in these methods which are outlined by Hickman (2017) as follows:

▪ The destination of the last trip leg of a passenger’s daily trips is the same as the origin

of the first trip leg of the day.

▪ Passengers generally take the most direct walking paths between services, as

measured by time, distance, or some generalized time or cost.

▪ Passengers do not take other modes of transportation between transit trips.

▪ Passengers take the next service available after arriving at a stop.

5

Trepanier et al., (2007) presented a formal model to estimate the alighting stops of individuals

for a bus system. The model determines the alighting stop for a passenger by identifying the stop

of the route that is closest to the boarding stop on the subsequent route the passenger takes, as

illustrated in Figure 2-1.

Figure 2-1.Trips destinations and legs of trips.

This method estimated 66% alighting locations. It was further developed by M. A. Munizaga &

Palma (2012) to be implemented on multimodal transit systems and create OD matrices. The

major contributions are estimating the alighting location by minimizing the generalized time (on-

board and walking time) instead of distance between alighting stop and next boarding, and

building OD matrices with this data. The matrices can be aggregated at any level as the boarding

and alighting data is on the disaggregate stop level.

This method was later validated using three data sources: the smartcard data used in the method,

an OD survey for metro users, and a group of volunteers M. Munizaga, Devillaine, Navarrete, &

Silva (2014). This validation revelealed that the method proposed correctly estimates 84.2% of

alighting locations and distinguishes 90% of the legs of trips from trips.

Based on these results, Munizaga et al. (2014) propose four improvements to the methology:

allowing a walking distance greater than 1 kilometre between the alighting location and the next

boarding, considering the start of a day at the time period with the lowest transactions (4:00:00

a.m. for this case) instead of midnight, estimating the alighting location for single day

transactions by using the subsequent day trips, and recognizing separate trips by comparing the

6

Euclidean distance between the board and alight stops with the on-board distance travelled. This

last proposition is similar to the proposition by Robinson, Narayanan, Toh, & Pereira (2014) to

compute a “directness” ratio between the Euclidean distance and on-board distance. This ratio

allows to identify trips that were previously considered as legs part of the same trip, but that are

separate trips instead.

In addition to these suggestions, other researchers have incorporated into their methods distance

thresholds between potential alighting stops and the next boarding stop, and time thresholds to

identify transfers. Some of these researchers include Gordon (2012); Nassir, Khani, Lee, Noh, &

Hickman (2011); and Seaborn et al. (2009).

While many different assumptions were used, little efforts were being made to validate them

until A. A. Alsger, Mesbah, Ferreira, & Safi (2015). These researchers test the different transfer

time threshold, allowable walking distance, and last trip destination assumptions by applying the

OD methods to a dataset with tap-on and tap-off data. A. A. Alsger et al. (2015) found that

increasing the transfer time threshold from 15 to 90 minutes had small impacts on the estimated

alightings, and that more than 90% of passengers walked less than 10 minutes to their transfer

stops but spent most of the transfer time waiting. Also, 88% of the passengers returned to a stop

within 800 metres from the first boarding location.

Further research by A. Alsger, Assemi, Mesbah, & Ferreira (2016) focused on accuracy of OD

matrices using smartcard data; a 30 minute allowed transfer time provided more accurate OD

matrices and the accuracy was not improved with beyond a 800 metre walking distance

thresholds. However, they note that the actual destinations do not necessarily match with the

estimated ones due to individual passenger behaviours and use of other modes of transportation.

While the methods to estimate the alighting location have been validated and improved, another

obstacle to determining the alighting is for those passengers that only record one transaction on a

given day. For smartcards with single daily transactions, Trépanier et al. (2007) inspect previous

trips of the card that have similar boarding location and time and for which alighting location can

be identified, to assign the alighting stop to the single trip. Furthermore, He & Trépanier (2015)

propose a kernel density estimation to compute a spatial-temporal probability using historical

boarding and alighting records to assign destinations to unlinked trips.

7

2.2 Analysis of Smartcard Data for Transit Operations and Performance Measures

Data about passenger boarding and alighting at the stop and route level, obtained from the

estimation procedures previously discussed, can be used for a myriad of operations and

performance measures. Some of these include recreating bus trajectories (Fourie, Erath,

Ordonez, Charikov, & K.W, 2017), creating load profiles of individual buses and bus routes

(Trepanier et al., 2007) (Beltrán et al., 2011), analyzing on-route travel times and distances

(Trepanier & Morency, 2017), identifying spatiotemporal demand variations of bus routes, and

recognizing transfer points, volumes, and transfer times for passengers (Jang, 2010). These

measures can be aggregated at any spatial and temporal level to monitor, evaluate, and/or

propose improvements to the transit network.

Fourie et al. (2017) propose using smartcard data to reconstruct bus trajectories, and compute

travel and dwell times. The transactions at each stop are clustered to determine dwell times and

travel times between stops. For stops without transactions Fourie et al. (2017) obtain the time by

interpolating between known stops before and after that stop. Unusual smartcard records due to

glitches in smartcard readers and late tap-ons are disregarded. Furthermore, using the

reconstructed bus trajectories and itineraries, the on-board travel times and transfer times are

similar to those obtained in MATSim simulations.

Trepanier et al. (2007) and Beltrán et al. (2011) show examples of load profiles using the

alighting stop estimation results, that can be useful for transit operators. In addition to load

profiles, Trepanier & Morency (2017) compute Key Performance Indicators (KPI) using

smartcard data. These KPI include bus speeds, average trip time and duration, passenger-

kilometres and passenger-hours, schedule adherence, and others. Trepanier & Morency (2017)

highlight that these KPI from smartcard data provide advantageous measurements because they

come from the empirical demand and can be computed for every transit vehicle and the different

smartcard users.

Analysis of smartcard data can also help identify passenger travel times, Level of Service (LOS),

and locations with high transfer volumes and times (Jang, 2010). This valuable information can

reduce the need for other data collection efforts and help to identify the routes, locations, and

8

areas that need improvements. Additionally, the travel times can be used as inputs for mode

choice models (Jang, 2010).

The integration of smartcard data with other sources of data, such as AVL (Automated Vehicle

Location) can also provide valuable operational metrics. Using smartcard data with scheduling

and AVL data it is possible to compute commercial speeds (Beltrán et al., 2011; Trépanier,

Morency, & Agard, 2009) identify headway variation (Beltrán et al., 2011), and schedule

adherence (Trépanier et al., 2009).

Note that these measures can be computed with confidence for transactions using smartcards, but

cannot be applied for passengers without cards without understanding their travel behaviour first.

Smartcard users could have very different travel behaviours than no-card users depending on the

fare structure and incentives available to smartcard users. As the incentives differ among

transportation systems (Schmöcker et al., 2017), the travel behaviour for smartcard and non-card

passengers should be compared or studied independently to prevent obtaining biased results

(Park, Kim, & Lim, 2008).

2.3 Transit User Regularity

Transit users might make regular trips that can be analyzed over long periods of time. The travel

behaviour and regularity is of interest to identify regular users, propose incentives to regular

passengers, and detect frequent places and times of travel. Smartcard data presents an

unparalleled data source to understand transit regularity as it is continuously collected for all

days and all smartcard users (Hickman, 2017).

The first study to use smartcard for user regularity was Trepanier et al. (2007); this study

measured regularity for each user using monthly transactions and identifying similar transactions

(on the same route and around the similar time). A measure of distance and time is used to

determine the regularity of users across the month of transactions. Following this study, another

one measured transit user regularity for different card users for a 10-month period (Morency,

Trépanier, & Agard, 2007).

To measure user regularity, researchers use either data mining algorithms or spatial-temporal

windows. For data mining, there are supervised and unsupervised algorithms that identify spatial

9

and temporal clusters for different card types (Morency et al., 2007) and for individual travel

patterns (Kieu, Bhaskar, & Chung, 2015; Ma, Liu, Wen, Wang, & Wu, 2017). Characterizing

individual travel patterns allow researchers to classify passengers into different user categories

based on the regularity of their travel (Kieu et al., 2015) and to identify their residence and

workplace based on spatial and temporal considerations (Ma et al., 2017).

The temporal windows of unsupervised algorithms are not controlled and vary depending on the

cluster. On the other side, supervised ones have predefined temporal windows of 1 hour

(Morency et al., 2007) and 30 minutes (Ma et al., 2017). The spatial windows of these studies

have been handled with unsupervised algorithms and specified by considering neighbouring

stops or transactions that occur on the same transit route.

The research works that have discretely specified spatial windows have done so to identify trip

attractors and locations of residence, work, and study (Chu & Chapleau, 2010; Zou, Yao, Zhao,

Wei, & Ren, 2016). Chu & Chapleau (2010) defined a spatial window of 500 metres to identify

residence and study locations and Zou et al. (2016) a window of 1,000 meters to detect home

location and trip purpose. These studies also consider other travel behaviour factors such as the

time of travel and duration of activities to identify the trip purpose.

2.4 Integration of Smartcard Data with Travel Surveys

Household travel surveys and travel diaries can be integrated with the smartcard data to identify

the travel behaviour of individuals and extract demographic characteristics and trip purpose of

smartcard users. Hickman (2017) highlights the need for integrating smartcard data with

household surveys as only few authors have integrated smartcard data with surveys and travel

diaries.

These two data sources are inherently different as smartcard data is passively collected but

contains transaction details for all transactions in public transit, while household surveys contain

the reported trips by the individuals and their trip purpose of a sample of households. One would

assume that the reported public transit trips can be idenfitied on the smartcard dataset using

common attributes, such as boarding time, location, and service taken. Some researchers have

evaluated the information provided in surveys about public transit usage.

10

Spurr, Chu, Chapleau, & Piché (2015) proposed matching smartcard data with household travel

survey data using spatiotemporal windows regarding the daily transactions boarding times and

locations, as well as line numbers and subway stations. The dimensions of the windows are not

clearly defined and are variable as they are adjusted to find a match between a survey respondent

and a smartcard. With this approach and a sample of survey responses, the daily journeys of

50% of survey respondants that declared using public transit could be paired with at least one

smartcard. The 50% paired journeys comprise three matching scenarios: exact matches, partial

matches with undereporting of trips, and match with typical daily travel patterns instead of the

day asked on the survey.

This results are fairly similar to those obtained by Riegel (2013). The difference of this study

resides in that Riegel (2013) obtained the smartcard ID linked to survey respondants volunteers

and could pair exact survey responses to the transactions of a specific smartcard. For this study,

there were only 44% exact matches between reported daily trips and the smartcard data for the

card IDs.

Another application of integrating smartcard data with travel surveys was explored by Kusakabe

& Asakura (2017). These researchers estimate trip purposes for rail smartcard data by combining

this data with survey data using a Naïve Bayes classifier. The integration of data sets was based

on behavioural attributes, which include boarding and alighting times and locations. Even though

the datasets have different spatial and temporal accuracy, these are handled by approximation to

the closest hour. This method correctly identified over 80% of the commutting and home trips

but only over 20% of leisure trips; this is expected as leisure trips are less common and often

underreported in surveys.

11

Data

The data was provided and facilitated by the Smart Cities Technology group and the Intendencia

de Montevideo, the governmental agency that monitors, coordinates, and integrates the public

transportation system in the Metropolitan Area of Montevideo (AMMON), Uruguay. The

integrated transportation system STM serves Montevideo and the surrounding urban areas in

blue as shown in Figure 3-1.

Figure 3-1 Census segments served by STM

12

The system is composed of buses from four different operators: Coetc, Comesa, Cutcsa and

Ucot. It has 144 bus lines with 107 different destinations, and 4,835 stops (Montevideo, 2018).

There are four main components of the data:

1. Boarding records (tap-ons): Seven consecutive days of passenger boarding records,

including the five weekdays and a weekend from August 15th to August 21st, 2016 where

provided for this analysis. These records belong to smartcard (STM card) and no-card

passengers recorded by the system.1

2. Bus lines and branches: Information about bus routes including the direction and order of

stops. Each bus run or trajectory in one direction, is labeled with a unique identification

number that can be paired with this data to obtain the run’s line and branch.

3. Stops: Number, coordinates, and description of the closest intersection from the stop.

A fifth additional source of data is the 2016 Montevideo Home Mobility Survey (MHMS). This

is a household survey that collects trips by individuals from a sample of households in the

AMMON. The trips by bus are of interest in this study and the survey results can be used to

evaluate the OD method results.

3.1 Data Description

This section provides qualitative and quantitative descriptions of the five data sources used in

this study. It begins with an overall description of the boarding transactions for the seven days

and an explanation the differences between trips made with STM cards and without them. It then

presents a description of the bus lines and branches and the MHMS dataset. The section closes

with an in-depth analysis of the data for Monday, August 15 processed for overall understanding

of travel patterns and temporal distribution of transactions.

3.1.1 Boarding Records

The passenger boarding records correspond to smartcard and non-smartcard users during a

complete week (Monday-Sunday). The total boarding records for smartcards is 5,077,674 and for

no cards is 2,371,815, representing a 68% to 32% split.

1 The term smartcard is used interchangeable with STM when referring to boarding transactions and passengers.

13

Table 3-1 shows the volumes and some descriptive statistics for the boarding records. The day

start is considered here at 3 A.M. as the lowest volume of transactions occur at this time, as will

be shown in section 3.2 . For smartcards, the weekday average is 868,811 with Thursday having

the highest volume of 872,844 records and Friday having a significantly lower volume with

863,231. The weekend has low volumes with 454,576 records on Saturday and 279,043 on

Sunday. For records with no cards, the weekday average is 395,697 with Monday having the

highest volume of 404,217 records. The weekend has significantly lower volumes with 239,686

on Saturday and 153, 644 on Sunday.

Table 3-1 Boarding records and descriptive statistics

Boarding Records

Boarding Records

Weekdays Smartcard No-card

Weekend Smartcard No-card

Monday 869,898 404,217

Saturday 454,576 239,686

Tuesday 870,437 392,990

Sunday 279,043 153,644

Wednesday 867,645 388,127

Weekend total 733,619 393,330

Thursday 872,844 395,517

Average 366,810 196,665

Friday 863,231 397,634

Standard

deviation 87,766 43,021

Weekday total 4,344,055 1,978,485

Average 868,811 395,697

Standard

deviation 3,243 5,310

Week total 5,077,674 2,371,815

There are several differences for passengers that use a smartcard and those who do not.

Smartcard users benefit from being able to transfer between buses within 1 or 2 hours, depending

on the trip type they choose, and they also pay reduced fares. Smartcard users can use their card

for people they travel with and benefit from fares and transfers between buses, as long as they

travel together. This is a unique characteristic of the Montevideo system, as most transportation

systems with smartcards permit only one card per person. On the other hand, passengers without

cards cannot make transfers and pay higher fares than the users that have smartcards.

The passengers that do not have cards pay the fare as they board the bus and the system records

the time of boarding, ticket number, boarding stop, bus run unique identification number and bus

destination, fare details, and number of passengers. The users that have smartcards tap their STM

card on readers that are mounted on the buses and the system records the number of the card,

14

time of boarding, boarding stop, bus run unique identification number and bus destination, fare

details, card type and fare discount if applicable, ordinal of trip, and whether the tap is

considered a transfer (ordinal of trip≥1) or a new trip (ordinal of trip=1).

Furthermore, the system records the transactions considered as part of the same trip (trip with 2

or more trip legs) and assigns them a common trip ID. This information is essential to understand

the method proposed in section 6.1.

For smartcard users the fare discounts are associated with the different card types. These types

distinguish ordinary users from other user groups that benefit from reduced or subsidized trip

fares (see Appendix A).

Table 3-2 shows the boarding records for each smartcard type on Monday August 15. Note the

high percentage of boarding records made by students (Student A and Student Free).

Table 3-2 Boardings per STM card type for Monday, August 15th

STM card type Boardings Percentage

Standard 397,034 45.8%

Student A 170,134 19.6%

Student B 21,448 2.5%

Student Free 142,712 16.5%

Retired A 44,317 5.1%

Retired B 16,235 1.9%

Social Work 29,330 3.4%

Prepaid 23,608 2.7%

Others 21,651 2.5%

3.1.2 Bus lines and branches

This data can be paired to the bus runs of the 7-day period for which there are passenger records.

Each bus run has a unique identification number (UID) and this UID is attached to the passenger

transactions when they board the bus. In addition to the IUD, bus runs have the code of the line

and branch they are serving. In theory, the branches for the UIDs can be paired with the

branches in this dataset; however, the dataset is missing branches. Table 3-3 shows the share of

15

bus branches and UIDs that are in the dataset and those that are not, considered in this study as

valid and invalid runs, respectively.

Table 3-3 Bus branches and bus UIDs

Condition Bus branches UIDs

Valid 527 90,780

Invalid 640 27,084

Total 1,167 117,864

Note that even though 55% of the branches are missing, 77% of the bus runs (UIDs) operate on

valid runs. For the valid branches, the sequence of stops and characteristics of the branch are

known. The remaining branches are missing and their charactersitics could not be obtained from

the data provider. This data issue occurred due to outdated data and errors in the digitalization of

the bus branches.

The branches that do not appear on the database are problematic as their sequences of stops are

unknown. However, having the boarding records of people who boarded the UIDs running these

bus routes, a method is proposed in section 4.2 to validate the invalid runs by matching them

with valid runs.

3.1.3 MHMS

The data was collected during the period of August-October 2016 in the Metropolitan Area of

Montevideo (AMMON) (CAF et al., 2017). The size of this survey represents a 0.34% sample of

the households in the AMMON with 2,230 households interviewed. For detailed information

about the survey please refer to Montevideo et al. (2016), CAF et al. (2017), and Miller, Parada

Hernandez, & Habib (2017).

This study uses the trips of the MHMS made by bus. Of the total trips, 3,166 (2,136 in

Montevideo) representing 25.2% (28%) of trips in the AMMON (Montevideo) are made by bus

and they correspond mainly to home, work, and school trips. There are 3,844 (2,599 in

Montevideo) legs of trips corresponding to these trips. The data collected for the trips and legs of

trips is summarized on Figure 3-2.

16

Figure 3-2 Transit trip data collected in MHMS (Montevideo et al., 2016)

It is important to note that currently the STM does not serve the entire AMMON, it serves

Montevideo and some surrounding areas coloured in blue in Figure 3-1. This map shows the

level of data aggregation by the MHMS. These units are called census segments which are

groups of blocks (INE, 2009). The average census segment contains 12 blocks and the segments

in urban areas, such as the ones in urban Montevideo on the South, are smaller containing an

average of 6 blocks. The average area of the census segments served by the STM is 551,532

squared meters.

Figure 3-1 shows the 1,133 census segments served by the STM, of which 1,063 are in

Montevideo. Furthermore, Table 3-4 describes the bus trips and legs of trips collected by the

MHMS which occur in the census segments served by STM. Around 76% of the bus transactions

occur in the census segments served by the STM.

Table 3-4 Trips and legs of trips in the MHMS

Locations

Occurrences in STM Occurrences in AMMON

Census

segments Individuals

Census

segments Individuals

Legs of trip (Boardings and

alightings)

640 6,441 835 7,661

Trips (Origins and destinations) 665 5,108 871 6,296

Trips legs i,n•Boarding and alighting

location

•Walking distance to/from stop

•Wait time at stop

•Bus line

Trip i

•Origin and Destination

•Departure and arrival time

•Travel time

•Purpose

•Weekly frequency

Trip i

Trip leg i,jTrip leg i,

j+1

17

3.2 Data Analysis for Monday, August 15

For the subsequent parts of this study, the data corresponding to Monday, August 15 is used to

show the implementation of the methods and results. The results and statistics are similar across

all weekdays, therefore presenting these metrics for one day is sufficient. Appendix B contains

some of the most important results for all days, and sections in this study will indicate to see this

appendix to access the results for all the other days.

This section provides a thorough analysis of data for Monday August 15. This is done with the

aim of providing an in-depth description and validation of daily data, testing procedures and

assumptions, and developing methods that can be used for any other day. For this selected day,

there are 1,267,798 records with a split of 68% to 32%, corresponding to 869,868 and 404,217

STM card and no-card records respectively. Moreover, the smartcard records correspond to

302,516 STM cards with an average of 2.86 transactions per card.

Data is processed for overall understanding of travel patterns and temporal distribution of trips.

Smartcard and no-card data is processed separately to identify differences in travel patterns;

moreover, the smartcard users can be analyzed according to the card type. The temporal

distributions in Figure 3-3 and Figure 3-4 aggregated by 30-minute intervals reveal interesting

and different travel patterns for smartcard and no-card transactions.

There are three evident peak times for STM cards between 7 a.m. and 8 a.m., 1 p.m. and 2 p.m.,

and 5:30 p.m. and 6:30 p.m. Interestingly, the midday peak exceeds the morning and evening

peak volumes and the volumes after this peak are similar or higher than morning volumes until 7

p.m.

Figure 3-3 Temporal distribution of STM card transactions

0

10,000

20,000

30,000

40,000

0:3

0

1:3

0

2:3

0

3:3

0

4:3

0

5:3

0

6:3

0

7:3

0

8:3

0

9:3

0

10

:30

11

:30

12

:30

13

:30

14

:30

15

:30

16

:30

17

:30

18

:30

19

:30

20

:30

21

:30

22

:30

23

:30

Tran

sact

ion

s

Time of the day (30 minute intervals)

18

On the other hand, for no-card transactions there are two evident peaks between 8 a.m. and 9

a.m., and 5:30 p.m. and 6:30 p.m. There is no noticeable midday peak, instead there are high

transaction volumes starting at 12:30 p.m. until the evening peak. The volumes at midday and

evening times are relatively higher than morning ones.

Figure 3-4 Temporal distribution of transactions without card

These distributions are compared for statistical similarity using the Kolmogorov-Smirnov test.

Using a 90% confidence level, the hypothesis that the distributions are similar can be

rejected(𝐷𝑛 = 0.74).

In addition to the temporal travel pattern analysis, for STM card users the daily transactions and

transfers per card can be identified. Figure 3-6 shows the transactions per card. Just above half

of the cards (53.7%) have one or two transactions per day and 99.6% of the cards make 9 or less

transactions on this day.

From the previous analysis, the transactions are aggregated into four time periods that

differentiate volumes between the peaks: AM from 4 a.m. to 11 a.m., Midday from 11 a.m. to

3:30 p.m., PM from 3:30 p.m. to 10 p.m., and Overnight from 10 p.m. to 4 a.m. The midday

period is short (4.5 hours) compared to the other three, to prevent including typical morning

home-to-work and evening work-to-home trips. And even though it is short, Figure 3-5 illustrate

that almost a third of daily transactions occur during Midday.

The total number of passengers boarding buses with STM cards is 884,018 and these are shown

per time period in Figure 3-5 with the highest volume occurring during the PM period, followed

by the Midday. Note the number of passengers exceeds by 17,549 the number of STM cards

0

5,000

10,000

15,000

20,000

0:3

0

1:3

0

2:3

0

3:3

0

4:3

0

5:3

0

6:3

0

7:3

0

8:3

0

9:3

0

10

:30

11

:30

12

:30

13

:30

14

:30

15

:30

16

:30

17

:30

18

:30

19

:30

20

:30

21

:30

22

:30

23

:30

Tran

sact

ion

s

Time of the day (30 minute intervals)

19

boarding records. As previously discussed, this occurs as smartcard users can use their cards for

the trips of other individuals they are traveling with.

Figure 3-5 Transactions for STM (left) and no-card (right) per time period

The total passenger transactions without STM cards is 411,156 and the shares among the time

periods are shown in Figure 3-5. The highest volume occurs during the PM hours followed by

AM volumes.

Figure 3-6 displays the smartcards with different number of transactions per day. Most

smartcards have two transactions and 85% of have more than one transaction. For the cards with

more than one transaction, the transfers per card are shown in Figure 3-7. 94.3% of the users

transfer one or two times per day and 99.6% of the cards make four transfers or less.

Figure 3-6 Transactions for STM cards

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 >=13

STM

Car

ds

in t

ho

usa

nd

s

Transactions

20

Figure 3-7 Transfers per STM card

0

20

40

60

80

100

120

140

160

1 2 3 4 5 6 7 8 9 11

STM

car

ds

in t

ho

usa

nd

s

Transfers per Trip

21

Data Preparation

The boarding records, bus lines and branches, and MHMS data need to undergo a process that

removes invalid records and prepares the data for further analysis. The sections on this chapter

explain the query processes for the boarding records, the recovery of data from bus branches, and

the process of selecting the MHMS trips that occur within the study area.

4.1 Preliminary query (Query #1)

The purpose of this query is to keep the passengers who have normal travel behaviour and

remove the null transactions. Due to the differences between no-card and smartcard records, the

cleaning process differs. For non-smartcard records, the only records that can be removed are

those that are null. There are 0.4% null, leaving 2,362,786 boarding records.

The smartcard data is queried based on the travel patterns identified and described in section 3.2.

The query criteria are included in Table 4-1 and the queried transactions for each day in Table

4-2. The transactions account for over 98% of the smartcard transactions and over 99% of no-

card transactions.

Table 4-1 Query criteria for smartcard data

Query Criteria Value

Void No

Transfers per passenger <5

Passenger number <5

Transactions per day <8

22

Table 4-2 Smartcard data - Query #1

Date (Day start 3 am) Smartcard No-card

Monday Aug15 859,721 402,712

Tuesday Aug 16 860,653 391,580

Wednesday Aug 17 858, 106 386,752

Thursday Aug 18 864, 528 394,020

Friday Aug 19 852,441 396,131

Saturday Aug 20 450,810 238,652

Sunday Aug 21 277,264 152,941

4.2 Method for invalid bus runs

As indicated in section 3.1.2, the bus lines and branches dataset does not contain all of the

branches. This is one of the challenges of working with transit data as datasets with network

characteristics, routes, and stop are not updated on an ongoing basis (Hemily, 2015).

Over 50% of the branches covered by buses are missing. A method was developed to validate the

invalid runs by matching them with valid runs. The goal of this method is to determine if the

invalid bus runs can be matched with any valid run that contains all the stops in the invalid run.

This is done by identifying the stops where passengers board for each invalid run and

determining which of the valid runs contain all the stops. Figure 4-1 shows the stops associated

with an invalid bus run and the valid run that can be assigned to it. As there is one run that

contains all the stops from the invalid run, the characteristics of the valid run (line, branch, and

stop sequence) are assigned to the invalid one.

23

Figure 4-1 Valid bus run assigned to invalid run

The requirement of pairing an invalid run with only one valid run allows one to identify with

certainty the most likely route travelled by a bus, and discard the runs for which passengers

board in a few stops that are common to many valid runs.

This method is applied to 640 invalid bus runs and for 243 (38%) of them a valid bus run could

be identified as shown in Table 4-3. The remaining runs could not be identified due to either few

boarding records common to many runs, or boarding records at stops that do not coincide with

any valid run.

Table 4-3 Bun run classification

Classification Bus runs

Valid 527

Invalid Validated 243

Not fixed 397

Total 1,167

24

An analysis of the validation procedure is shown in Figure 4-2. This reveals that most invalid

runs can be validated for one day, which means that in one day the stops match with the ones on

other run, because on the other days the invalid run stops coincides with many more runs. The

matched runs were inspected to have more than 10 stops and belong to the same bus line as the

run they were matched with.

Without having any other data about the bus network, this was considered the most efficient

approach to recover data and be able to use transactions that occur on invalid runs. The

implications of including validated runs and passengers on these runs are explained in further

chapters, by comparing the results between valid and validated runs.

Figure 4-2 Frequency of validation of invalid bus runs

4.3 Query for OD estimation and incorporation strategies (Query 2)

After applying the method for invalid bus runs, a more rigorous query is needed to account for

the available network and bus data. This second query is applied to smartcard and no-card

transactions; the criteria is presented on Table 4-4.

Note that the query criteria that is exclusively applied to smartcard records is the one that

requires multiple daily transactions; this is because the OD algorithm in section 1.1.6 needs the

subsequent daily transactions of a user to estimate the destination of the previous transactions.

0%

5%

10%

15%

20%

25%

30%

35%

0

10

20

30

40

50

60

70

80

90

1 2 3 4 5 6 7

Per

cen

tage

Fre

quen

cy o

f V

alid

atio

n

Number of Days

25

The single riders (either smartcard or no-card) are later incorporated using the results from the

OD method. These procedures are explained in detail in section 1.1.6.

Table 4-4 Query criteria for OD estimation

Query Criteria Value Applied to…

Transactions per day >1 Smartcard data for OD

algorithm

Transactions on bus runs Valid/Validated Smartcard (OD

algorithm and single

riders) and no-card

Boarding on recognized

stops

Yes

Table 4-5 shows the query results for the smartcard transactions on Monday, August 15 and

Table 4-6 the results for the no-card transactions of the whole week.

Table 4-5 Smartcard data - Query #2

Condition No. of Cards No. of Transactions

Initial 303,917 869,898

Initial query results 298,993 (98.4%) 859,721 (98.8%)

Single ride per day 44,780 (14.7%) 44,780 (5.15%)

Valid bus runs 281,894 (92.8%) 661,522 (76.1%)

Validated bus runs 58,665 (19.3%) 69,001 (7.9%)

Invalid bus runs (Not

validated) 104,054 (34.2%) 138,917 (16.0%)

Invalid boarding stops 8,762 (2.9%) 9,114 (1.0%)

Total of records at valid boarding stops on valid and validated runs

Single rider 37,925 (12.4%) 37,925 (4.4%)

Cards for OD algorithm 155,694 (51.2%) 441,752 (50.8%)

From Table 4-5 it is evidenced that the issue with the invalid bus runs unfortunately removes a

significant number of users for the OD estimation algorithm. Interestingly, only 16% of

transactions are on invalid runs but 34% of users have at least one transaction on an invalid run

and are removed. The smartcard users that have a single transaction per day also remove a big

share of the users for the OD estimation, but only represent 4.4% of the valid and validated

transactions. In contrast, due to no-card passengers not being able to make transfers, 84.7% of

the transactions can be used as shown in Table 4-6.

26

The results for the query of smartcard data for the other days are included on Appendix B and

show very similar percentages for the number of cards and transactions that meet each condition.

Table 4-6 No-card data - Query #2

Condition No. of Transactions

Initial 2,362,786

Valid bus runs 1,865,586 (78.9%)

Validated bus runs 157,392 (6.6%)

Invalid bus runs (Not

validated) 339,808 (14.4%)

Invalid boarding stops 25,618 (1.1%)

Total of records with valid and validated runs

Records 2,001, 653 (84.7%)

4.4 Data cleaning MHMS

There are 1,333 census segments served by the STM and the bus trips of the MHMS used on this

study are those that occur within these segments. The process of data cleaning for the bus trips

and legs of trips consist on the criteria outlined in Table 4-7.

Table 4-7 Query criteria for MHMS

Query Condition Value

Trip origins and destinations Both served by STM

Board and alight for legs of trips Both served by STM

Line number Valid (Bus lines and branches dataset) or

validated

These criteria are applied to the total 3,166 trips and 3,844 legs of trips made by transit. The

results are included in Table 4-8 and include 84% of legs and 81% of trips from the survey.

Table 4-8 MHMS - Query results

Condition Occurrences

Trip location served by STM 2,266

Trip location served by STM 2,803 corresponding to 2,294 individuals

Correct bus line number for legs 2,624

Individuals for which all trips and legs

have valid location and line number

1,007 individuals corresponding to 2,572

legs and 2,150 trips

27

Building Itineraries from Boarding Transactions

Schedules are used to determine the time buses arrive at a certain location, which in turn

can be used to estimate the alighting times for passengers. In the absence of schedule data,

the itineraries can be created using the data available: the passenger boarding records and

the characteristics of the bus routes (lines and branches). This chapter contains the method

used to combine these data to create itineraries, identifying outliers on transaction records

and stops with high dwell times, and then presents the findings of the method and a sample

itinerary.

The method consists on a sequence scripts of code (Python 3.6) that output the results and

the itineraries for each day in text and CSV files.

5.1 Method

Each bus run has a unique identification number (UID) that is attached to the passenger

transactions when they board the bus. Both, smartcard and no-card records that pass the

preliminary query criteria (Query #1) have an UID and the boarding location and time,

regardless of run validity. The records are grouped by UID and stop number to obtain the

dwell times2 and average boarding times, considered as the arrival times in the itineraries.

The dwell times are analyzed before using the average boarding times to create the

itineraries.

It is necessary to compare the passenger flow time with an acceptable service time to

identify bus stops with erratic dwelling times and prevent them from creating inaccurate

itineraries. Robinson et al. (2014) highlight that late tap-ons can cause severe impacts on

smartcard analysis. Usually the smartcard systems only allow tap-ons when the transit unit

is close to a stop and malfunctions of the system is one of the main causes of erroneous

2 The term dwell time is used to refer to the passenger boarding flow time, disregarding the doors opening

and closing times. The flow time is computed as the time between the first and last passenger of all the

passengers boarding at a stop (tapping the STM card or paying the fare with cash).

28

smartcard data. For the Montevideo system, the passengers get a receipt with the time,

location, and trip type they purchase. It is believed that the driver verifies and may correct

the location but there could be errors in the transaction times.

To identify these errors, outliers are identified for the clustered transactions at each stop

using the Interquartile Range (IQR), which measures statistical dispersion between the first

and third quartiles. The boarding transactions out of range3 are outliers and disregarded for

computing the bus arrival time at a stop.

The next step to identify unusually high dwelling times is to compute the passenger service

time for stops that are geocoded and are part of valid or validated bus runs. The service

time is the time per passenger boarding and is used to obtain typical dwell times (Transit

Capacity and Quality of Service Manual, 2003). The stops with high dwelling times are

those with service times that exceed a critical service time, as is explored in section 5.2, and

are disregarded for computing the bus arrival times. This dwell time analysis excludes

terminals and the first and last stop of each route; as the time spent there is terminal time

and is established by the operator.

Note that only boarding transactions are recorded, therefore the time of arrival at a stop is

used as the alighting time. Moreover, as passengers do not board at every stop on a route,

the itineraries created from passenger records need to incorporate all the stops on a bus

route. The stops arrival times for the UIDs are concatenated with the sequence of stops on

the bus route, organized by the stop ordinal. Table 5-1 shows an example of the built

itinerary for a bus route with some stops that do not have arrival times highlighted in blue.

3 Range for identifying outliers: (𝐹𝑖𝑟𝑠𝑡 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 − 1.5 ∗ 𝐼𝑄𝑅, 𝑇ℎ𝑖𝑟𝑑 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 + 1.5 ∗ 𝐼𝑄𝑅)

29

Table 5-1 Sample itinerary built from passenger transactions

UID Branch Stop ID Arrival time Interpolated time Stop Ordinal

1.653E+10 1763 2521 4:13:26 PM 4:13:26 PM 1

1.653E+10 1763 6153 0 4:14:22 PM 2

1.653E+10 1763 2522 4:15:17 PM 4:15:17 PM 3

1.653E+10 1763 2523 4:15:34 PM 4:15:34 PM 4

1.653E+10 1763 2524 4:16:40 PM 4:16:40 PM 5

1.653E+10 1763 2022 4:17:25 PM 4:17:25 PM 6

1.653E+10 1763 2023 4:18:29 PM 4:18:29 PM 7

1.653E+10 1763 2525 4:19:39 PM 4:19:39 PM 8

1.653E+10 1763 2526 0 4:20:32 PM 9

1.653E+10 1763 2527 4:21:24 PM 4:21:24 PM 10

1.653E+10 1763 2528 0 4:21:55 PM 11

1.653E+10 1763 2529 0 4:22:25 PM 12

1.653E+10 1763 2530 0 4:22:56 PM 13

1.653E+10 1763 2531 0 4:23:26 PM 14

1.653E+10 1763 2532 0 4:23:57 PM 15

1.653E+10 1763 2533 0 4:24:28 PM 16

1.653E+10 1763 2534 0 4:24:58 PM 17

1.653E+10 1763 2535 4:25:29 PM 4:25:29 PM 18

The arrival time for these stops are calculated by using simple interpolation between

previous and subsequent stops that have arrival times as shown in the column “interpolated

arrival time” in Table 5-1. This is a common approach used in the literature to reconstruct

bus trajectories and obtain arrival times (Fourie et al., 2017).

Interpolation is only applied between the first and last stops for which arrival time is

available from the boarding records. This interpolation technique is adapted from the

technique used by Fourie et al. (2017) to reconstruct bus trayectories using smartcard and

GPS data. Moreover, Table 5-2 outlines the strategies to incorporate stops with specific

characteristics.

30

Table 5-2 Itinerary building strategies for special cases

Condition Strategy

Stops with unknown stop ordinal Cannot be used to interpolate schedule. Are

incorporated to the finalized schedule

based on the arrival time.

Intermediate stops of a bus route with high

dwell time

Disregard arrival time from boarding

records; estimate arrival time using

interpolation.

Stops at terminals or first/last stop of bus

route

- Intermediate terminal: Consider the

first and last transaction times as

different arrival and departure

times.

- For first stop: Last transaction time

considered as departure time (last

passenger that boards)

- For last stop: If known4, first

transaction time considered as

arrival time (first passenger that

boards)

Computing itinerary for stops after last boarding record

Passengers do not usually board on the last stops of a route, however they are likely to

alight. Therefore, it is key to be able to forecast the bus arrival times for the stops after the

last stop for which passenger records are available.

The arrival times between stops are interpolated with confidence for intermediate stops for

which previous and subsequent stop arrival times are known. After the last stop, the arrival

times are forecasted using the interpolation temporal step size for the immediately previous

interpolation corresponding to the bus route. This is done to consider the vehicle speed and

driving conditions of previous stops.

4 This occurs when passengers board a bus before the driver signals the start of a new bus run, departing from

the last stop of the previous bus run.

31

There are three main assumptions to use this method:

▪ The time and speed between stops after the last stop with boarding records is

assumed to be the same as the time and speed between previous stops.

▪ Changes in traffic conditions do not change

▪ The distance between stops is relatively similar

5.2 Results

The queried boarding records for smartcard and no-card passengers from the preliminary

query (Query #1) are used to build the itineraries. The boardings are grouped for each bus

run at each stop to compute arrival times; but before doing so, these occurrences5 are

analyzed to identify outliers and compute dwell and service times.

The outliers are recognized using the IQR range; they represent 1% of the transactions for

each day and are removed. The service times are analyzed using the Transit Capacity and

Quality of Service Manual (2003); the recommended passenger service time is “3.5

seconds per passenger with smartcard and 4.0 seconds per passenger that pays with change”

(p. 4-5). Using 3.75 seconds per passenger, assuming half of passengers are smartcard

users, results in 70% of occurrences with more than one passenger exceed the allowed time.

This can be partly due to the unusual AFC system in Montevideo that provides users a

small receipt after boarding, which likely increases the service time.

The recommended passenger service time does not seem appropriate for this transit system

and there is not an identifiable indicator of occurrences with high dwell times, as they occur

on all bus branches, at stops all over the network, and throughout the day as shown in

Figure 5-1.

5 The term “occurrence” refers to each group of boardings at a stop for a bus run.

32

Figure 5-1 Temporal distribution of dwell times

Graphing dwell times to passenger boardings gives a better indication of the passenger

service time for this system. The graph in Figure 5-2 indicates that increasing the passenger

volumes leads to longer dwell times, as it is expected, and the slope quantifies this

relationship as the service time per passenger. The intercept of the trendline is set to zero as

the passenger boarding volume is the only explanatory variable and the dwell time for zero

boardings should correspond to zero.

33

Figure 5-2 Dwell time (minutes) vs. Passenger boardings

The R2 value of the graph evidences a weak fit of the trendline but without other

alternatives to determine an acceptable service time, the slope is used. The value is 0.1853

minutes (11.1 seconds) approximated to 10 seconds per passenger.

Table 5-3 shows the number of passengers and occurrences at different service time ranges,

followed by the percentages on valid and validated runs from the 14,647 and 1,523 runs,

respectively. There are minor differences between valid and validated runs on the shares of

passengers for each range, and the shares of occurrences is rather similar.

For all bus runs, 89.8% of passengers and 93.9% occurrences are within the 10 second

threshold and are used to build the itineraries. The arrival time for these occurrence is the

average boarding time from the transactions.

34

Table 5-3 Passenger service time ranges

Passenger service time

(seconds) Passengers Occurrences

0 152,300 (14.31%) 152,297 (40.33%)

0 to 5 468,845 (44.06%) 131,534 (34.83%)

5 to 10 333,969 (31.38%) 70,728 (18.73%)

10 to 15 52,067 (4.89%) 12,261 (3.25%)

15 to 20 19,073 (1.79%) 4,383 (1.16%)

20 to 25 11,619 (1.09%) 2,229 (0.59%)

More than 25 26,307 (2.47%) 4,220 (1.12%)

Valid runs Validated runs Valid runs Validated runs

0 14.35% 13.89% 40.33% 40.32%

0 to 5 43.82% 46.46% 34.72% 35.96%

5 to 10 31.61% 28.35% 18.83% 17.68%

10 to 15 4.88% 5.05% 3.26% 3.12%

15 to 20 1.80% 1.72% 1.16% 1.19%

20 to 25 1.11% 0.95% 0.59% 0.62%

More than 25 2.36% 3.58% 1.12% 1.10%

Of the 23,093 occurrences with passenger service time over 10 seconds, Table 5-4 shows

that 85.49% (5.21% of all occurrences) occur at stops that are neither terminals nor the first

or last stops of bus routes. The arrival time at these stops and those at stops with unknown

ordinal are considered as stops with unusual high dwell time and are disregarded. Refer to

Table 5-2 for details about the estimation of arrival times for the occurrences that fall under

each category.

Table 5-4 Characterization of stops with service time over 10 seconds

Stop category Occurrences Passengers

Stop with unknown ordinal 618 (2.68%) 3,327 (3.05%)

Stop neither first or last of a route nor terminal 19,743 (85.49%) 84,132 (77.14%)

Stop at intermediate terminals 273 (1.18%) 21,607 (19.81%)

First or last stop of bus route including terminal 2,459 (10.65%)

35

The stops with unusually high dwell time are analyzed spatially to identify the locations

where they occur. These correspond to 2,260 stops (48.0% of all stops), shown in Figure

5-3, and they occur especially along major corridors and in downtown (inset map). There

are also few stops with high dwell times on the outskirts and outside Montevideo.

Figure 5-3 Stops with high dwell time

Having identified the stops with high dwell times, itineraries are built for the bus runs. The

itineraries are built for over 97% of the daily bus runs. The remaining 3% cannot be built

due to bus runs with all passengers boarding at a unique stop or bus runs where all stops

had high dwell times.

Figure 5-4 shows an example of the itinerary for five buses serving the bus line 19, branch

number 205. The times highlighted in blue correspond to stops with no passenger boardings

and those in grey, to stops with high dwell times. The arrival times for these cells were

36

interpolated, using the arrival times from the previous and subsequent stops that are not

highlighted. Note that the last stops of runs, highlighted in yellow, are forecasted using the

time step of the interpolation immediately before.

The forecasting of arrival times reduces the unknown arrival time at stops after the last stop

with boarding records; the unknown arrival times are reduced from 24.1 % to 6.0% for

weekdays and from 26.3% to 6.0% for the weekend.

37

Figure 5-4 Example of itinerary

Bus line 19 Run 1 Run 2 Run 3 Run 4 Run 5

Branch 205 Start time Start time Start time Start time Start time

Monday, August 15 5:23:15 5:56:48 6:19:47 6:26:06 6:42:01

Stop ordinal Stop Arrival

time

Arrival

time

Arrival

time

Arrival

time

Arrival

time

1 3079 - - - - -

2 3017 05:31:44 06:10:43 06:21:35 06:32:48 06:46:28

3 3019 05:33:21 06:11:36 06:22:09 06:33:33 06:47:17

4 3020 05:33:51 06:12:29 06:22:44 06:34:10 06:48:07

5 3021 05:34:22 06:13:15 06:23:19 06:34:53 06:48:43

6 3022 05:35:04 06:14:00 06:23:54 06:35:32 06:49:19

7 3023 05:35:47 06:15:02 06:24:31 06:36:21 06:50:11

8 3024 05:36:52 06:15:44 06:25:10 06:37:45 06:50:52

9 3025 05:37:58 06:16:27 06:25:50 06:38:11 06:51:32

10 3026 05:39:06 06:17:09 06:26:55 06:39:03 06:52:13

… … … … … …

60 3733 06:16:16 06:59:23 07:08:07 07:22:21 07:37:15

61 3734 06:17:00 07:00:10 07:08:52 07:23:21 07:38:06

62 3735 06:17:43 07:00:58 07:09:38 07:24:20 07:39:07

63 563 06:18:27 07:01:46 07:11:06 07:26:00 07:40:59

64 564 06:19:12 07:02:35 07:11:37 07:26:42 07:41:23

65 565 06:19:58 07:03:30 07:12:13 07:27:25 07:42:02

66 4578 06:20:44 07:04:28 07:12:50 07:28:08 07:42:41

67 566 06:21:30 07:05:27 07:13:27 07:28:51 07:43:20

68 3924 06:22:16 07:06:25 07:14:04 07:29:33 07:44:00

69 570 06:23:02 07:07:24 07:14:40 07:30:16 07:45:00

70 3925 06:23:48 07:08:22 07:15:17 07:30:59 07:46:00

71 4580 06:24:34 07:09:21 07:15:54 07:31:42 07:47:00

72 4615 06:25:20 07:10:19 07:16:35 07:32:29 07:48:06

73 4909 06:26:05 07:11:18 07:17:15 07:33:16 07:49:12

74 4040 06:26:51 07:12:16 07:17:56 07:34:03 07:50:19

75 4041 06:27:37 07:13:15 07:18:36 07:34:50 07:51:25

76 4763 06:28:23 07:14:13 07:19:17 07:35:37 07:52:31

77 4764 06:29:09 07:15:12 07:19:57 07:36:24 07:53:37

78 4765 06:29:55 07:16:10 07:20:38 07:37:11 07:54:43

79 5086 06:30:41 07:17:09 07:21:19 07:37:59 07:55:50

80 4766 06:31:27 07:18:07 07:21:59 07:38:46 07:56:56

81 4767 06:32:13 07:19:06 07:22:40 07:39:33 07:58:02

38

Origin and Destination Estimation

The STM has a tap-on scheme that validates and records the passenger transactions when

boarding. While the data collected contains the times and locations of the boardings, the

alighting locations and times are unknown. This chapter presents a method to identify the

alighting locations of passengers, understand their individual travel behaviour, and observe

the O-D flows of all transit users. The method and results in this chapter distinguish

between smartcard and no-card users, as the data and approaches for each are different.

The methods consist on different scripts of code (Python 3.6) that output the results in text

and CSV files.

6.1 Method

The method has three goals: 1. Estimate the alighting locations and times of transit

transactions (from STM and no-card users) 2. Identify the origin and destination of trips for

STM users 3. Compute travel behaviour metrics such as travel times, transfer walking

distance, location, and time for STM users. These are similar to the goals proposed by

Trepanier et al. (2007) and M. A. Munizaga & Palma (2012), except for the incorporation

of no-card users. Therefore, the method for smartcard users is similar to the methods

proposed by these researchers and includes the improvements proposed by A. Alsger et al.

(2016).

The no-card transactions constitute a significant share of records and this study explores

integrating them into the OD estimation. The following paragraphs describe in detail the

method for smartcard users and are followed by two subsections: the first explains the

incorporation of smartcard transactions for which the method cannot be applied to; and the

second explains the incorporation of no-card transactions.

First, some terms are defined to help understand the goals of this method:

▪ A trip is defined as the travel from an origin (e.g. home) to a destination for a

specific purpose (e.g. work).

39

▪ Trips can have one or multiple legs, identified by the transfers between bus services,

and can have a walking and waiting time portion on the transfers.

▪ The daily trips made by a smartcard user that start and end around the same location

constitute a tour.

The STM card transactions can be either trips or legs of trips. These are differentiated by

the trip ordinal and the trip ID fields assigned by the system. Transactions that are trips

have unique trip IDs that are not shared with any other transactions; while the transactions

that are legs of trips share trip IDs with the other legs of the trip (transactions) and their

ordinals of trip are labeled chronologically with an ordinal of 1 for the first trip leg and so

on. Figure 6-1 shows a schematic example of trips, legs of trips, and a tour for a smartcard,

where the variables and indices refer to:

𝑛 = 𝑇𝑟𝑖𝑝 𝑛𝑢𝑚𝑏𝑒𝑟 (𝑇ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑡𝑟𝑖𝑝 𝑖𝑠 𝑛 = 1)

𝑙 = 𝐿𝑒𝑔 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑛𝑢𝑚𝑏𝑒𝑟

𝑂𝑛 = 𝑂𝑟𝑖𝑔𝑖𝑛 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑛

𝐷𝑛 = 𝐷𝑒𝑠𝑡𝑖𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑛

𝑎𝑛, 𝑙 = 𝑎𝑙𝑖𝑔ℎ𝑡𝑖𝑛𝑔 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑡𝑟𝑖𝑝 𝑛, 𝑙𝑒𝑔 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑙

𝑏𝑛, 𝑙 = 𝑏𝑜𝑎𝑟𝑑𝑖𝑛𝑔 𝑙𝑜𝑐𝑐𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑡𝑟𝑖𝑝 𝑛, 𝑙𝑒𝑔 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑙

𝑑 = 𝑤𝑎𝑙𝑘𝑖𝑛𝑔 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑡𝑜𝑝𝑠

→ 𝐷𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡𝑟𝑎𝑣𝑒𝑙 𝑎𝑛𝑑 𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑜𝑓 𝑠𝑡𝑜𝑝𝑠 𝑓𝑜𝑟 𝑎 𝑏𝑢𝑠 𝑟𝑢𝑛

Figure 6-1 Schematic example of transactions for a smartcard user

40

From Figure 6-1 one can infer the data needed to estimate the alighting location for

transactions: the boarding location for the transactions, whether they are trips of legs of

trips; the direction and stop sequence for the routes that correspond to the transactions, the

road and sidewalk network and the geographic location of the stops to obtain the walking

distance between alighting and boarding stops. Additionally, the time of alighting can be

retrieved from the bus routes itineraries. For technical details of the algorithm refer to

Appendix C.

The method is an algorithm that integrates and organizes these data sources for the

transactions of each STM card. For a card’s transaction, the algorithm analyzes which of

the subsequent stops of the bus route is closest to the next transaction’s boarding stop. The

closest stop is estimated as the alighting stop. For the last transaction of the day, the

algorithm considers the first boarding stop of the day to estimate the alighting stop for this

last transaction. When the alighting stop is estimated the algorithm retrieves the time of

arrival of the bus at this stop from the itinerary.

The algorithm estimates the alighting location based on the following considerations:

▪ Alighting location must be different than boarding stop and must be at a stop

subsequent of boarding stop.

▪ The maximum walking range allowed is of 1,000 metres. This is calculated using

the network characteristics and the ArcGIS Network Analyst tool.

After all the transactions of a STM card are processed the algorithm identifies the origins,

destinations, and transfer locations for the trips as well as travel and transfer times. To

identify the origins, destinations, and transfer locations, the algorithm takes into

consideration the trip ordinal and the trip ID fields of each transaction, but does not solely

rely on these as passengers can pay one fare and make more than one trip. Many transit

systems allow passengers to pay one fare and use the transit system within a period of time

(Hickman, 2017), and in Montevideo, passengers can choose between a 1-hour or a 2-hour

fare to use the system.

41

The system does not discern between a trip and 2 legs of a trip if they occur within the

chosen fare type. However, with this algorithm it is possible to capture some of the trips

that are made by a passenger in one fare. The two considerations used to capture trips are:

▪ The passenger transaction is on the same line as the previous transaction. Taking the

same line in the same direction indicates that the passenger had two destinations on

the same path; taking the same line in the opposite direction indicates that the

passenger went to a destination and returned.

▪ Transfer time lasts more than 30 minutes.

Having identified trips and legs of trips and therefore locations for boardings, alightings,

transfers, and origins and destinations of trips it is possible to compute travel behaviour

metrics. For each transaction, the on-board time can be computed and for every trip the

travel and transfer times, the latter if a trip has more than one leg.

6.1.1 Incorporation of smartcard transactions without alighting location identified and single riders

The OD estimation for smartcard users requires users to have more than one transaction per

day. For some users, the alighting locations for their transactions could be estimated for one

day but not for other(s). Other smartcard holders make single transactions in certain days

and several transactions in other days for which the OD algorithm can be applied to. These

two types of users are the focus of this section.

The results from the OD method for the entire week can be used to assign alighting

locations to the transactions for which alighting could not be estimated or which were

single daily transactions. This is done by observing the transactions for each smartcard and

identifying similar transactions in other days for which alightings could be estimated. To

assess if transactions are similar, the criteria in Table 6-1 for spatial-temporal windows are

proposed.

42

Table 6-1 Criteria for OD incorporation of Smartcard users

Condition Value

Temporal Window 1 hour

Spatial Window 1 kilometre

Bus lines Same line

There is not a consensus in the literature for determining temporal and spatial windows for

smartcard regularity as mentioned in section 2.3. The proposed spatial and temporal

windows are similar to the wider ones used in the literature. The spatial window accounts

for different boarding locations within walkable distance and the temporal window relaxes

the timing requirements, allowing different start times of trips. The condition for a

passenger to board the same line limits the person from taking other lines that are similar or

travel between similar areas in the city, but ensures that the trip direction is the same.

It is important to note that the alighting locations assigned using this method might vary in

different days. To account for this, the most likely alighting location is selected as that with

higher frequency. Weekdays and the weekend are considered separately due to the

differences in passenger transactions and expected travel behaviour.

6.1.2 Incorporation of no-card users

The no-card users have not been considered in OD strategies as their transactions can not

be identified throughout the day or other days. The transactions made without a card are

individually stored on the system with distinct ticket IDs. As mentioned in section 3.1.1,

these users have different characteristics than smartcard users, with higher fares and unable

to make transfers.

These passengers have not been integrated in ODs in the literature as their behaviour could

be different than that of smartcard users (Schmöcker et al., 2017). However, these

transactions constitute a significant share of records and this study explores the integration

of these users into the OD methods.

The no-card transactions are integrated by using the travel patterns of smartcard users. The

travel patterns of smartcard users are analyzed by branch and time of the day (AM, Midday,

43

PM, and Overnight) for the weekdays and weekend separately. For each bus branch, the

stops are classified and assigned weights corresponding to the volumes of alightings. The

no-card passengers are assigned an alighting stop based on the weights for each stop.

For instance, for a bus branch in the AM period 21% of smartcard passengers alight at a

given stop in downtown. Thus 21% of the no-card passengers are assigned that stop as their

alighting location. The alighting volumes are rounded to the nearest whole number and then

balanced.

This approach has a deterministic nature and assumes similar behaviour between smartcard

and no-card passengers; however, it takes into consideration the weekly behaviour of

smartcard users and identifies the stops with high alighting volumes, which are likely to be

trip attractors for all passengers.

6.2 Results

The boardings and alightings of smartcard and no-card transactions and the origins and

destinations (OD) of smartcard users are the focus of this section. Due to the differences in

methodologies used for the STM and no-card users, the results are presented in two

subsections. However, the results of both users are compared spatially in a third subsection.

6.2.1 Analysis for smartcard users

The algorithm to estimate alighting locations and ODs for smartcard users is implemented

for all weekdays and weekends. The average alighting estimation for weekdays is of 87.7%

with a lower rate for the transactions on Friday. The rate for the weekends is significantly

lower and this can be attributed to the different and irregular travel behaviour expected on

weekends. The results and statistics for each day are included on Table 6-2.

The on-board and travel time are computed using the boarding times of the transactions and

the alighting times extracted from the itinerary. Around 0.5% of alighting times could not

be retrieved due to bus runs with all passengers boarding at a unique stop or bus runs where

all stops had high dwell times. The on-board and travel times are very similar for all

weekdays and weekend and the walking distance between alightings and subsequent

boarding locations is longer for weekdays than weekends.

44

Table 6-2 OD estimation results

Result

Indicators

Monday,

August 15

Tuesday,

August 16

Wednesday,

August 17

Thursday,

August 18

Friday,

August 19

Saturday,

August 20

Sunday,

August 21

Original

transactions

and cards

Transactions:

441,751

Cards:

155,693

Transactions:

440,622

Cards:

155,310

Transactions:

446,587

Cards:

157,251

Transactions:

445,577

Cards:

156,951

Transactions:

434,019

Cards:

152,870

Transactions:

226,341

Cards:

83,660

Transactions:

142,318

Cards:

53,728

Alighting

location

identification

387,940

transactions

(87.8%)

387,764

transactions

(88.0%)

392,321

transactions

(87.8%)

391,171

transactions

(87.8%)

377,384

transactions

(87.0%)

192,094

transactions

(84.9%)

119,153

transactions

(83.7%)

Average

walking

distance

between alight

and next

boarding

176.85 m

(209.86m

disregarding 0

metre

distances)

177. 81m

(210.52m

disregarding 0

metre

distances)

179. 08m

(211.82m

disregarding 0

metre

distances)

179.31 m

(212.21m

disregarding 0

metre

distances)

180.92m

(214.40m

disregarding 0

metre

distances)

171.50m

(201.28m

disregarding 0

metre

distances)

171.72m

(201.71m

disregarding 0

metre

distances)

Transactions

and cards with

complete trip

chains

Transactions:

304,397

(68.9%)

Cards:

105,655

(67.9%)

Transactions:

305,325

(69.3%)

Cards:

106,065

(68.3%)

Transactions:

307,351

(68.8%)

Cards:

106,673

(67.8%)

Transactions:

306,251

(68.7%)

Cards:

106,220

(67.7%)

Transactions:

288,064

(66.4%)

Cards:

100,375

(65.7%)

Transactions:

142,580

(63.0%)

Cards:

51,875

(62.0%)

Transactions:

87,824

(61.7%)

Cards:

32,171

(59.9%)

Time average

for trip chains

On-board:

18.7min

Trip: 30.4min

On-board:

18.7min

Trip: 30.1min

On-board:

18.8min

Trip: 30.1min

On-board:

18.8min

Trip: 30.0min

On-board:

18.6min

Trip: 30.1min

On-board:

17.2min

Trip: 29.8min

On-board:

18.4min

Trip: 29.9min

45

The alighting location estimation results can be analyzed for different times of the day, bus

runs, and card type holders. The results for Monday, August 15 are analyzed based on these

three categories as follows.

Table 6-3 shows the estimation rate for the four time periods. The transactions on the PM

and Overnight periods have a lower alighting estimation success rate. This could be due to

some passengers not returning to the origin location of their first trip of the day and

passengers having unusual travel behaviour on the overnight hours as identified by

Trepanier et al. (2007) as a reason for low success rate.

Table 6-3 Alighting estimation rate based on time period

Second, the transactions are characterized based on the type of bus runs where they occur

and the results are included in Table 6-4. There is a small percentage of transactions that

occur on validated runs but they have a lower alighting estimation rate. Recall the validated

runs are assigned a valid run based on where passengers board, however these runs could

take a different direction or cover additional stops for passengers to alight. Therefore, the

lower alighting estimation rate is reasonably lower for validated runs.

Table 6-4 Alighting estimation rate based on bus run

Thirdly, the transactions are categorized based on the STM card holder types. Instead of

analyzing the alighting estimation rate, the trip chains are compared to identify the users

with traceable trips. The trip chains are compared to the boardings per card type (Table 3-2)

and shown in Table 6-5.

Time Period Alighting location estimated No estimation

AM (4 a.m. to 11 a.m.) 115,313 (88.94%) 14,337 (11.06%)

Midday (11 a.m. to 3:30 p.m.) 123,827 (89.15%) 15,077 (10.85%)

PM (3:30 p.m. to 10 p.m.) 136,276 (86.22%) 21,786 (13.79%)

Overnight (10 p.m. to 4 a.m.) 12,524 (82.75%) 2,611 (17.25%)

Transaction type Transactions Alighting location estimated

Normal (no need to validate

runs)

403,637 (91.4%) 355,385 (88.0%)

Corrected (with validated runs) 38,114 (8.6%) 32,555 (85.4%)

46

The share of boardings per card type differs from the share for which the trip chains can be

estimated. The standard users represent 45.8% of cardholders, but 39.4% of the cards for

which trip chains are estimated. In contrast, for students (particularly Student A and

Student free) and for retired cardholders, the trip chain percentage is 1 to 3% higher than

their percentage as cardholders. These differences indicate more traceable travel patterns

for students and retired users, who make all legs and trips of their daily travel by transit;

and less traceable patterns for standard users, which means that these users are more likely

to use multiple modes (e.g. car, taxi, car-pooling) on their daily travel.

Table 6-5 Trip chains based on STM card type

STM card type Boardings Complete trip chains

Standard 397,034 (45.80%) 120,043 (39.40%)

Student A 170,134 (19.60%) 65,090 (21.40%)

Student B 21,448 (2.50%) 7,786 (2.60%)

Student Free 142,712 (16.5%) 59,041 (19.4%)

Retired A 44,317 (5.10%) 19,355 (6.40%)

Retired B 16,235 (1.90%) 7,305 (2.40%)

Social Work 29,330 (3.40%) 10,263 (3.40%)

Prepaid 23,608 (2.70%) 9,078 (3.00%)

Others 21,651 (2.50%) 6,436 (2.10%)

The next step is to integrate the smartcard users with transactions for which the alighting

location could not be identified and/or who have single transactions in other day(s). This is

done by observing the transactions for each smartcard and identifying similar transactions

in other days for which the alighting stops were estimated.

First, the algorithm analyzes the output of the OD estimation and identifies the smartcard

users with transaction for which the alighting stop could not be estimated. The alighting

location for these transactions is assigned as the location of similar transactions of those

users during the other weekdays.

The alighting locations that could be assigned represent an average of 13.3% of the

transaction with unknown alightings for the weekdays but they only add a 1.64 % to the

47

total transactions (Original transactions in Table 6-2). The incorporation for weekends is

significantly lower and the results are shown on Table 6-6.

Table 6-6 Assignment of alighting location to transactions with missing alighting

location

Day Transactions with

assigned alighting

location

Percentage from the

transactions without

estimated alighting

Percentage from the

original transactions

Monday 7,282 13.53% 1.65%

Tuesday 7,056 13.35% 1.60%

Wednesday 7,378 13.60% 1.65%

Thursday 7,366 13.54% 1.65%

Friday 7,080 12.50% 1.63%

Saturday 341 1.00% 0.15%

Sunday 387 1.67% 0.27%

Second and similarly to the smartcards with similar transactions in other days, the

algorithm identifies smartcard users that have one transaction in at least one day. The

transactions in other days are analyzed and compared in terms of spatial and temporal

similarity. If the transactions are similar, the alighting locations are assigned to the single

transactions.

For the single STM users, Table 6-7 shows the single riders and those with an assigned

alighting location. The percentages of estimated alighting are very similar for all weekdays

and significantly lower for weekends.

Table 6-7 Assignment of alighting location to single riders

Day Single riders Estimated Alighting

Monday 37,925 7,456 (19.6%)

Tuesday 36,778 7,547 (20.5%)

Wednesday 37,101 7,602 (20.5%)

Thursday 37,701 7,500 (19.9%)

Friday 38,942 7,516 (19.3%)

48

Saturday 29,935 366 (1.2%)

Sunday 23,362 529 (2.3%)

Figure 6-2 shows the number of STM cards from Monday (with unknown alighting

location) that have similar boardings on the other weekdays. The cards that have

transactions with similar boardings on more than one day are assigned the alighting

location with higher frequency.

Figure 6-2 STM cards with similar transactions on the other weekdays for

transactions with unknown alighting location (left) and single riders (right)

6.2.2 Analysis for no-card users

The no-card transactions are integrated into this study by using the travel patterns of

smartcard users. The alightings are analyzed for each bus branch during the week for the

four time periods; the percentages of alighting at each stop are computed and assigned to

the no-card transactions.

Figure 6-3 shows an example of the transactions in the AM for branch number 205 during

the AM period with the boardings and alightings from STM and no-card passengers. The

alightings for smartcard users are estimated using the OD method and the incorporation

methods discussed in section 6.2.1. On the other hand, the alightings for no-card users are

determined using the weekly percentage of alightings for branch 205 on the AM period.

Note the residual passengers (STM users) on the stop “Unknown”; the alighting location

for them could not be estimated.

49

The incorporation of no-card users also allows to compute bus load profiles. Figure 6-4

shows the loading profile of one of the morning bus runs for branch number 205. The load

profiles can be studied for buses and loads can be analyzed for different bus lines, time

periods, and corridors.

50

Figure 6-3 Alighting location assignment to no-card passengers

Bus line 19 Branch 205 Total smartcard AM passengers: 1127 Total no-card AM passengers: 760

Stop Stop Ordinal Smartcard Smartcard

share of

alightings

No-card

Boardings Alightings Boardings Alightings

3079 1 1 2 0

3017 2 30 21 0

3019 3 8 1 0.10% 2 0

3020 4 21 15 0

3021 5 21 1 0.10% 4 1

3022 6 5 1 0.10% 4 1

3023 7 43 2 0.19% 55 1

3024 8 8 2 0.19% 6 1

3025 9 14 1 0.10% 15 1

3026 10 20 7 0.68% 12 5

3027 11 14 1 0.10% 7 1

3028 12 33 1 0.10% 6 1

3029 13 12 10 0.97% 7 7

3524 14 59 86 8.36% 36 65

3493 15 1 1 0.10% 2 1

… … … … … … …

565 65 2 9 0.87% 3 7

4578 66 5 0.49% 0 4

566 67 24 2.33% 1 18

3924 68 3 0.29% 4 2

570 69 1 0.10% 0 1

3925 70 1 9 0.87% 1 7

4580 71 16 21 2.04% 4 16

4615 72 10 0.97% 0 7

4909 73 18 1.75% 0 13

4040 74 18 13 1.26% 15 10

4041 75 1 57 5.54% 3 42

4763 76 4 0.39% 0 3

4764 77 18 1.75% 0 13

4765 78 2 0.19% 0 1

5086 79 10 0.97% 0 7

4767 81 15 1.46% 0 11

Unknown 99

51

Figure 6-4 Bus loading profile for all passengers

0

10

20

30

40

50

60

0

2

4

6

8

10

12

14

16

18

20

22

24

26

30

79

30

19

30

21

30

23

30

25

30

27

30

29

34

93

35

70

35

72

35

75

31

58

35

55

35

57

27

37

27

39

27

41

27

43

37

18

37

20

36

53

36

15

36

10

36

03

14

72

14

74

16

17

49

23

15

73

15

75

37

34

56

3

56

5

56

6

57

0

45

80

49

09

40

41

47

64

50

86

Un

kno

wn

Pas

sen

gers

on

-bo

ard

Pas

sen

ger

bo

ard

ings

an

d a

ligh

tin

gs

Stops

Boardings Alightings Load

52

6.2.3 Spatial analysis of travel behaviour

The origins, destinations, and transfers from STM and no-card transactions can be

visualized at any level of spatiotemporal aggregation. In this study, the transactions are

aggregated per census segment and transfers are analyzed at the disaggregate stop level.

Using ArcMap 10.2.2, here are presented some maps depicting the smartcard and no-card

travel behaviour on Monday. The maps contain an inset map for the downtown area and are

followed by a short description of the observed travel behavior and transfer locations.

▪ Figure 6-5 AM Trip origins

▪ Figure 6-6 AM Trip destinations

▪ Figure 6-7 AM boardings for no-card users

▪ Figure 6-8 AM alightings for no-card users

▪ Figure 6-9 PM Trip origins

▪ Figure 6-10 PM Trip destinations

▪ Figure 6-11 AM Transfers

▪ Figure 6-12 PM Transfers

53

Figure 6-5 AM Trip origins

54

Figure 6-6 AM Trip destinations

55

Figure 6-7 AM boardings for no-card users

56

Figure 6-8 AM alightings for no-card users

57

Figure 6-9 PM Trip origins

58

Figure 6-10 PM Trip destinations

59

Figure 6-11 AM Transfers

60

Figure 6-12 PM Transfers

61

The origins of trips in the AM period occur around the urban periphery and the high urbanized

areas on the northeast and northwest of Montevideo, as shown in Figure 6-5. There are also

many trips that originate in the downtown. Figure 6-6 shows that the destinations of these trips

occur on few census tracks, particularly in or close to downtown. There are also some clusters of

census segments in the east and northeast part of the city with moderate volumes of trips

destination.

Note that the trip destinations exceed the trip origins in the downtown. The destination volumes

in the downtown are high with volumes between 145 and 900 person-trips per census segment,

while the trip origin volumes are between 145 and 264 person-trips.

The boardings and alightings of no-card users in the AM, shown in Figure 6-7 and Figure 6-8,

are similar to the trip origins and destinations of smartcard users. The boardings occur across the

city and suburban areas, while the destinations occur mainly in the downtown area. There are

also few census segments on the northeast with numerous alightings.

The origins for trips in the PM period, in Figure 6-9, occur mainly in the downtown and there are

several segments with high volumes on the west and a single segment on the east side of the city.

This is interesting as the area on the west is rural. An inspection using Google maps reveals that

there are multiple hotels, industrial parks, sports complexes, and farms on the west segments and

the airport is on the east segment. The location of these places explains the high volumes of trip

origins as employees return home from work. Conversely, Figure 6-10 depicts the destinations of

the PM trips which are distributed all over the city, similarly to the AM trip origins.

The transfers during both the AM and PM time periods, shown in Figure 6-11 and Figure 6-12,

occur at specific locations: along major roads, the downtown area, terminals, and major stops.

The roads with the most transfers run from downtown to the west and to the northeast. As

expected, there are many transfers on the terminals, identified on the maps as yellow triangles.

Additionally, there are few stops with high transfer volumes in the periphery.

The AM and PM origins and destinations are similar to the ones on A. A. Alsger et al. (2015).

The morning origins and evening destinations are spread out throughout the city and the morning

62

destinations coincide with the evening origins. The latter pinpoint hot spots in the Central

Business District (CBD) and few hot spots in specific areas of the city.

63

Analysis of Travel Survey Riders and Smartcard Users

In this chapter the transit riders from the MHMS are compared with the STM card users. This

comparison is done at an aggregate level and then at a disaggregate level to identify the

individuals form the survey with STM cards. This sections on this chapter describe the method

and present the results.

7.1 Method

The OD estimation for smartcard users, particularly the trip chains, are of interest as they can be

joined with the transit trips on the MHMS. Even though these two datasets are different, they are

compared based on aggregate metrics such as legs and trips per person, and at a disaggregate

level by pairing the survey individuals with smartcard users using the trips’ locations and times.

The comparison at the disaggregate level takes into consideration the differences of the data. For

the MHMS, the locations are provided at the census segment level and the boarding and alighting

times are not provided but can be calculated. The boarding time is computed as the start time of

the trip minus the reported walking distance from the origin of the trip to the bus stop and the

wait time. The alighting time is similarly computed by subtracting the walking time from the

reported end of the trip. For the estimated smartcard trip chains, the boarding and alighting

locations and times are provided at the stop level and the closest second.

Moreover, the MHMS data is spatially assigned to the closest census segment and temporally

reported by individuals to the start and end of their trips to the closest 5 or 10-minute mark. In a

similar study, Spurr et al. (2015) noted that the respondents in surveys tend to approximate times

to the nearest half or quarters of hours. This approximation of time and distance induces bias

when joining of the datasets and is handled by using spatial and temporal windows.

To account for spatial and temporal precision differences this study uses spatial and temporal

windows. The spatial window considers neighbouring census segments, as this is smallest level

of aggregation. This window accounts for stops that could be reported in a stop located at a

corner of one segment that is common to 3 other segments. The temporal window is adjusted

based on the matching rate. The matching process verifies that the matching smartcard has the

64

lowest spatial and temporal differences to the MHMS individual, assimilating the method

proposed by Spurr et al. (2015). In their study there are spatial and temporal windows for which

precision is adjusted if there are many smartcards matches for one individual.

The method is applied for each weekday of the available data (August 15-August 19) as the

MHMS data was collected on days between August and October of 2016, but the specific dates

are unknown.

7.2 Results

This section has two subsections: The first one compares the MHMS with smartcard data at an

aggregate level and the second describes the results for identifying individuals with transit trips

on the MHMS with smartcards that have complete trip chains (recall the trip data collected by

the survey shown in Figure 3-2).

7.2.1 Comparison of MHMS with smartcard data

The report CAF et al. (2017) provides preliminary analyses of the data collected. While most of

the analyses about the STM include mode share, opinions about the system, and user

characteristics, there is a histogram with the travel time distributions.

Figure 7-1, compares histograms for the travel time distributions of bus trips to verify that the

queried data is representative. Also, in CAF et al. (2017) the average trip time for bus users is of

46 minutes and the queried data average trip time is of 46.2 minutes.

65

Figure 7-1 Histograms of bus trips. Queried data (left) and using all MHMS data retrieved

from CAF et al. (2017) (right)

It is also necessary to compare the queried data to the smartcard data. The data is compared in

Table 7-1 and graphically in Figure 7-2. For the smartcard data, the bus trips per person are

computed using the single cards from Query #2 and the trip chains, and the legs per trip using the

trip chains. Note that the trip chains correspond to 67.9% of STM users, thus 67.9% of the results

from the Query#2 are used.

Table 7-1 Comparison of legs and trips for MHMS and STM data

Comparison condition MHMS STM data Notes

Bus trips per

person (percentage)

1 15.09 21.23 Using 67.9% of single cards (Query #2)

2 66.53 60.51

3 10.53 11.72

4 6.85 5.61

>4 0.99 0.92

Average 2.13 2.04

Total 2,150 248,605 222,816 correspond to trip chains

Legs per trip

(percentage)

1 79.78 71.50

2 17.73 26.85

3 2.33 1.33

4 0.15 0.30

>4 0 0.01

Average 1.25 1.30

Total 2,572 304,397

66

Figure 7-2 Histogram of legs and trips for MHMS and STM data

The average trips per person and legs per trip are similar for the queried data and the smartcard

users. There are significant differences in the cells highlighted in blue: the share of single trips

from the smartcard data is higher than the reported trips in the survey, but the share for two trips

is lower. Conversely, the one-legged trips represent a higher percentage on the survey but the

two-legged ones a lower one. These differences range from 6% to 9% but interestingly there are

not as large as those in Spurr et al. (2015), who compared the travel survey responses with the

smartcard data in Montreal.

An interesting piece of information collected by the MHMS is the trip frequency per week6. The

reported frequency is compared with the STM cards that have trip chains in at least one weekday

in Figure 7-3. The frequency reported in the survey is considerably different than the frequency

observed for STM users. MHMS individuals reported that over half of their reported trips are

only made once per week but only 39% of STM cards do not similar trip chains in the other

weekdays. Moreover, the reported trips with a 5-day frequency have a higher percentage than the

ones in the STM data. Note that these differences can be attributed to only considering users with

complete trip chains and variations in travel behaviour (e.g. boarding and alighting at different

locations farther than one kilometre apart, taking different bus lines, and traveling at different

times of the day).

6 Individuals are asked “How often (in days per week) do you make this same trip?”(p. 33) Translated from Spanish

from Montevideo et al. (2016)

67

Figure 7-3 Comparison of trip frequency for MHMS and STM data

Having compared the travel behaviour of MHMS individuals with the smartcard trip chains, the

MHMS individuals are identified in the smartcards. The MHMS are reported for the day the

individuals are interviewed, hence the comparison to smartcards is done for the five weekdays

separately. The 143 individuals that only have one-legged trip are deleted because their

transactions cannot be matched with smartcards from the OD methods. The remaining 864

individuals are used for the identification.

7.2.2 Pairing MHMS individuals with STM cards

The identification of individuals in the STM dataset uses spatial and temporal windows. While

the locations of the MHMS trips are known, the boarding and alighting times are not. The start

and end times of the trips are reported with the walking distance to and from the stops, and the

waiting time before boarding. Table 7-2 includes the strategies for computing alighting and

boarding times.

Table 7-2. Strategies to compute boarding and alighting times

Considerations Strategy Notes

Walking time (MHMS

reports walking distance as

number of blocks)

Assume walking speed 1.05

m/s (3.5 ft/s)

3.5 ft/s recommended by the

FHWA (2009)

Walking distance not

reported

Assume 2 blocks Average distance is 1.3 blocks

for board and alight

Waiting time not reported Assume 10 minutes Average waiting time is 10

minutes

68

The method is applied for each weekday and two ways of identifying MHMS individuals are

proposed: the first one identifies individuals based on the board and alight times and locations,

and the second one, only based on their board times and locations. The results are shown in

Table 7-3 and the column “Increase rate” shows the percentage increase in individual

identification when only the boardings are matched.

Table 7-3 MHMS identification of individuals for different temporal windows

Day

Time

window

(minutes)

Board and alight Board only Increase rate

Monday

20 39 57 46.15%

30 60 72 20.00%

40 78 87 11.54%

60 92 137 48.91%

Tuesday

20 28 41 46.43%

30 59 62 5.08%

40 78 92 17.95%

60 94 133 41.49%

Wednesday

20 42 61 45.24%

30 69 71 2.90%

40 88 96 9.09%

60 107 146 36.45%

Thursday

20 31 42 35.48%

30 62 74 19.35%

40 89 89 0.00%

60 106 130 22.64%

Friday

20 30 43 43.33%

30 66 78 18.18%

40 82 85 3.66%

60 97 126 29.90%

The number of individuals identified increases as the time window is expanded but the match

rate of the 864 individuals is low: less than 10% with the 20 and 30-minute windows, and around

15% with the 60-minute one.

The days with higher matches are Monday and Wednesday; making these days as the most likely

days when individuals were interviewed (assuming they were interviewed the same day).

Another interesting observation is the decline of the increase rate at the 40-minute window for

most days.

69

Using this time window, the MHMS individuals and smartcard pairs are observed across the

week, identifying the number of days where MHMS individuals are paired with the same STM

card. Table 7-4 shows the results for this analysis and includes the pairs for which the reported

frequency matches the observed pair frequency.

Table 7-4 Weekly analysis of MHMS and STM card pairs

Number of days Pairs (Board and alight) Pairs (Board only)

1 103 130

2 14 72

3 14 9

4 13 3

5 27 1

Total 171 215

Match between

reported

frequency and

pair

42 81

The results for the pairs using only the boarding data capture more pairs but 94% of the pairs are

observed in one and two days. Meanwhile, for the board and alight 32% of the pairs are observed

in more than two days. This difference is striking and evidences that considering boarding and

alighting information could help identify regular transit riders. Furthermore, the low number of

pairs matched using only the boarding information can be attributed to the matching process

which pairs the STM card with the most similar transactions for each day, but does not take into

account the matched pairs of the other days.

70

Discussion and Conclusions

The previous chapters have provided a look into the potential of analysing AFC data for planning

purposes, analysis of the STM transit system, and integration with travel survey data. The

methods in this study can provide metrics and results for these objectives, as shown on the

results and examples, but they have strengths and weaknesses that need further discussion.

Moreover, some of these weaknesses arise from the collection methods and data availability,

reflected in all methods and their results. The challenges from using the STM dataset are

described in the following paragraphs before diving into each method.

The AFC system of the STM collects high quality data for passenger transactions, which include

the bus runs and the boarding locations and times. The location and bus run are not usually

collected on bus AFC systems and this is an advantage reflected in all the methods: from

computing dwell times of each occurrence to identifying the origins and destinations of trips. In

contrast, the STM network data (bus and branches dataset) was incomplete: 66% of the bus runs

were valid or validated. This caused the removal of half of the smartcard transactions and cards

and 15% of the no-card transactions for the OD procedures.

It is reasonable to believe that applying the method to a complete AFC dataset with updated

network data, or counting with AVL data to reconstruct trajectories of bus runs, would provide

similar results to the ones in this study. This is because the transactions on missing bus runs are

alike those on valid runs and the boardings on invalid runs do not have distinguishable

characteristics spatially nor temporally.

Having explored the challenges of the datasets, the next chapters present the strengths and

weaknesses for each method and potential reasons of the results obtained. These are introduced

by briefly describing the relevant assumptions.

On Chapter 5, the itineraries are built considering a threshold of 10 seconds for acceptable

passenger service time. This threshold was determined by considering the boarding transactions

only. The alightings, bus characteristics, and the loads of buses usually included in dwell time

71

calculations were disregarded. Including these factors requires to validate the OD methods and to

count with additional information about characteristics of the buses.

Even though the 10-second threshold has limitations, it is rigorous and captures 92% of the

occurrences and 95% of passengers. The high dwell times for these occurrences could be

attributed to the unaccounted alighting volumes, to boardings that occur during red traffic lights

and/or at stops close to schools or popular locations, or to passengers with physical disabilities.

Moreover, the bus runs with built itineraries represent 97% of all the runs. The remaining 3%

could not be estimated due to passengers boarding at only one stop or all boarding occurrences

having high dwell times that could not be considered. The bus runs with built itineraries are used

to determine the alighting times in the OD method for smartcards, and this was successfully done

for 95.5% of the transactions.

On Chapter 6, the method to estimate the alighting locations of transactions for smartcard users

has similar assumptions to methods previously proposed (Refer to section 2.1) but the estimation

results in this study are significantly higher: 88% for weekdays and 84% for weekends. This can

be attributed to several reasons: details of transactions collected (exact stop location and bus

route), passenger behaviour and transit culture and, characteristics of the urban environment.

The passenger behaviour and transit culture is reflected in the high percentage of passengers with

complete trip chains: 67.5% for weekdays and 61% for weekends. These passengers and trip

chains are important for future applications to infer trip purposes and analyze user regularity,

mentioned in Chapter 9.

The user regularity is briefly explored in Chapter 6 to determine the alighting location of

smartcards that have single transactions in one or more days and transactions for which alighting

could not be estimated using the OD method. Similar transactions are considered as those that

occur on the same bus line and happen within the specified spatial and temporal windows.

A weakness of these considerations is not accounting for bus lines and branches that travel

between the same areas or in parallel streets in Montevideo. This can be a reason of the low

success rate of 20% for single transactions and 13.3% for the transactions with missing alighting

location. Another reason could be irregular travel behaviour, but this could only be captured by

analyzing data for longer periods of time such as weeks or months.

72

Chapter 6 also includes no-card riders, which is something that had not been explored before.

Researchers recommend not including no-card passengers in smartcard studies, but given that

these passengers account for 32% of the transactions, they are included in this study. The

strategy proposed to assign the alighting locations of these users’ transactions assumes that their

travel behaviour is alike that of smartcard users. The implications of this assumption fall outside

the scope of this study and ways to study them are included on Chapter 9.

Lastly, Chapter 7 has the most challenges of all chapters as the STM and MHMS datasets are

inherently different and the dates of collection for the MHMS are unknown (households are

interviewed door by door during different days). The comparison of travel patterns between STM

users and MHMS individuals are similar to a certain extent.

Conversely, the pairing process between MHMS and STM cards has a very low matching rate

(10-15%). This low percentage was expected as similar studies have found only 40-50% of pairs

using high quality data and even the card IDs of households interviewed.

This study has continued to reinforce the potential of smartcard data as a powerful source of data

for transit studies. The methods proposed in this study use and incorporate the data sources

available, taking into consideration the data limitations. Even though the methods and their

assumptions have limitations and weaknesses, the results reveal the usefulness of these methods

for processing AFC data for transit planning purposes and computing and evaluating the system

operations and the transit network.

A final consideration for the methods is the time efficiency. The Python scripts were run in an

Intel Core i7 @ 3.40GHz machine with 16GB of RAM, and the script for the OD method of

smartcards with subsequent transactions had the longest processing time of 30 minutes per day of

transactions. The second longest was for pairing the MHMS individuals with STM users, at 5

minutes per day. The rest of the scripts took less than 1 minute to process.

73

Limitations and Future Work

There are limitations in terms of the amount and quality of the data available, and several

improvements that can be made to enhance the analysis and processing of data, and validate the

results. Overcoming the limitations and implementing the improvements would help to obtain

more accurate and comprehensive results and a better understanding of travel behaviour. These

are mentioned in the following bullet points in order of relevance, beginning with the most

important ones.

▪ The OD method was applied to only 50% of the smartcard transactions available, mainly

due to invalid bus runs (Refer to sections 3.1.1 and 4.3 for details). While using the

method to validate bus runs was the most effective way to incorporate more data, this

highlights the need for transit agencies to update their network datasets.

▪ The weekly transactions and travel behaviour of smartcards are used to incorporate

smartcards with single transactions and transactions for which alighting could not be

estimated. Using a week of data reveals some level of travel behaviour regularity for the

weekdays but not for weekends. Access to more data would provide better insights of

travel patterns for both weekdays and weekends.

▪ Validation of the results must be done to assess the proposed methods and results. The

itineraries can be validated using the schedule data (recently standardized and digitalized)

and AVL data. These sources of data can also be used to measure on-time performance

and adherence to schedules. The OD methods and incorporation of single-riders can be

validated using Automated Passenger Counter (APC) systems and by conducting on-

board surveys or collecting the transaction receipts when passengers alight.

▪ There is a significant share of the transit riders that do not use STM cards. This study

assumed that these riders have similar behaviour as smartcard users, however this

assumption cannot be verified nor rejected until the behaviour of these users is studied

empirically. This might not be necessary if all riders are required to use smartcards,

which is a recent trend adopted by transit agencies.

74

▪ The OD method can be improved by incorporating some of the additional

recommendations on M. Munizaga et al. (2014), described on section 2.3. This will likely

improve the alighting estimation and trip differentiation methods.

▪ The process of building itineraries provided a glimpse to dwell time models. A full model

could be developed using the boarding and alighting volumes, such as the one proposed

by Sun, Tirachini, Axhausen, Erath, & Lee (2014). These researchers model dwell time,

as defined in this study, using the boardings and alightings from smartcard data.

▪ There are two aspects to improve the identification of similar transactions by smartcard

users. First, transactions are only considered similar if the user rides the same bus line; by

considering bus lines that cover comparable paths or areas in the city, more transactions

could be regarded as similar. Second, a sensitivity analysis could be done using different

spatial-temporal windows.

▪ The low matching rate for identifying MHMS individuals with STM cards could be

increased by collecting additional data in the survey: the date households are surveyed

and the card IDs of individuals.

▪ This study presents some examples of the results and data analysis that can be done. The

transit agency could use the results, or use the methods to produce additional results, to

analyze specific bus lines, time periods, sectors of the city and/or the travel behaviour of

the different STM card types.

In addition to the future work proposed for the methods and results of these study, there are a

myriad of other applications for smartcard data analysis. To mention a few, understanding travel

patterns and ridership for different seasons and changes in the network or during disruptions,

identifying locations or routes with high volumes to implement changes, analyzing transit

assignment, and integrating into agent-based transportation models. Furthermore, some of the

limitations of the passively collected smartcard data can be overcome by inferring trip purpose

and collecting demographic data from users, or by actively collecting this through transit

surveys.

75

References

Alsger, A. A., Mesbah, M., Ferreira, L., & Safi, H. (2015). Use of Smart Card Fare Data to

Estimate Public Transport Origin–Destination Matrix. Transportation Research Record:

Journal of the Transportation Research Board, 2535, 88–96. https://doi.org/10.3141/2535-

10

Alsger, A., Assemi, B., Mesbah, M., & Ferreira, L. (2016). Validating and improving public

transport origin-destination estimation algorithm using smart card fare data. Transportation

Research Part C: Emerging Technologies, 68, 490–506.

https://doi.org/10.1016/j.trc.2016.05.004

Barry, J., Newhouser, R., Rahbee, A., & Sayeda, S. (2002). Origin and Destination Estimation in

New York City with Automated Fare System Data. Transportation Research Record:

Journal of the Transportation Research Board, 1817(2), 183–187.

https://doi.org/10.3141/1817-24

Beltrán, P., Cortés, C. E., Gschwender, A., Ibarra, R., Munizaga, M., Palma, C., … Zúñiga, M.

(2011). Obtención de información valiosa a partir de datos de Transantiago Pablo Beltrán,

Cristián E. Cortés, Antonio Gschwender, Richard Ibarra, Marcela Munizaga, Carolina

Palma, Meisy Ortega, Mauricio Zúñiga.

CAF, Montevideo, I. de, Canelones, I. de, Jose, I. de S., Publicas, M. de T. y O., Republica, U.

de la, & Uruguay, P. (2017). Principales Resultados e Indicadores - Encuesta de Movilidad

en el Área Metropolitana de Montevideo 2016. Montevideo.

Chu, K. K., & Chapleau, R. (2010). Augmenting Transit Trip Characterization and Travel

Behavior Comprehension. Transportation Research Record, 2183, 29–40.

https://doi.org/10.3141/2183-04

FHWA. (2009). Chapter 4F - MUTCD 2009 Edition - FHWA. Retrieved March 1, 2018, from

https://mutcd.fhwa.dot.gov/htm/2009/part4/part4e.htm

Fourie, P. ., Erath, A., Ordonez, S. A., Charikov, A., & K.W, A. (2017). Using Smartcard Data

for Agent-Based Transport Simulation. In F. Kurauchi & J.-D. Schmocker (Eds.), Public

76

Transport Planning with Smart Card Data (pp. 133–160). Boca Raton.

Gordon, J. B. (Jason B. (2012). Intermodal passenger flows on London’s public transport

network : automated inference of full passenger journeys using fare-transaction and vehicle-

location data. Retrieved from https://dspace.mit.edu/handle/1721.1/78242#files-area

He, L., & Trépanier, M. (2015). Estimating the Destination of Unlinked Trips in Public

Transportation Smart Card Fare Collection Systems. Transportation Research Board 94th

Annual Meeting. https://doi.org/10.3141/2535-11

Hemily, B. (2015). The Use of Transit ITS Data for Planning and Management , and Its

Challenges ; a Discussion Paper.

Hickman, M. (2017). Transit Origin-Destination Estimation. In F. Kurauchi & J.-D. Schmocker

(Eds.), Public Transport Planning with Smart Card Data (pp. 15–35). Boca Raton: CRC

Press.

INE. (2009). Encuesta Continua de Hogares (Vol. 2). Montevideo. Retrieved from

http://www.marispolymersecuador.com/DOCS/FichasTecnicas/MARISEAL 250° .pdf

Jang, W. (2010). Travel Time and Transfer Analysis Using Transit Smart Card Data.

Transportation Research Record: Journal of the Transportation Research Board, 2144,

142–149. https://doi.org/10.3141/2144-16

Kieu, L. M., Bhaskar, A., & Chung, E. (2015). Passenger segmentation using smart card data.

IEEE Transactions on Intelligent Transportation Systems, 16(3), 1537–1548.

https://doi.org/10.1109/TITS.2014.2368998

Kusakabe, T., & Asakura, Y. (2017). Combination of Smart Card Data with Person Trip Survey

Data. In F. Kuruauchi & J. D. Schmocker (Eds.), Public Transport Planning with Smart

Card Data (pp. 73–92). Boca Raton: CRC Press.

Ma, X., Liu, C., Wen, H., Wang, Y., & Wu, Y. J. (2017). Understanding commuting patterns

using transit smart card data. Journal of Transport Geography, 58, 135–145.

https://doi.org/10.1016/j.jtrangeo.2016.12.001

77

Metropolitano, S. de T. (2015). ESPECIFICACIÓN OPERATIVA Y DEL SISTEMA FASE I :

TRANSPORTE COLECTIVO URBANO. Montevideo.

Miller, E. J., Parada Hernandez, C., & Habib, K. M. N. (2017). Report-02 REVIEW OF THE

MONTEVIDEO HOME MOBILITY SURVEY.

Montevideo, I. de. (2018). Sistema de Transporte Metropolitano (STM). Retrieved March 1,

2018, from http://www.montevideo.gub.uy/transito-y-transporte/el-sistema

Montevideo, I. de, Jose, I. de S., Canaria, C., MTOP, Transporte, C. M. de, CAF, & Uruguay, P.

(2016). Encuesta de movilidad metropolitana. Retrieved from

http://www.consorciozaragoza.es/content/encuesta-de-movilidad-metropolitana

Morency, C., Trépanier, M., & Agard, B. (2007). Measuring transit use variability with smart-

card data. Transport Policy, 14(3), 193–203. https://doi.org/10.1016/j.tranpol.2007.01.001

Munizaga, M. A., & Palma, C. (2012). Estimation of a disaggregate multimodal public transport

Origin-Destination matrix from passive smartcard data from Santiago, Chile.

Transportation Research Part C: Emerging Technologies, 24, 9–18.

https://doi.org/10.1016/j.trc.2012.01.007

Munizaga, M., Devillaine, F., Navarrete, C., & Silva, D. (2014). Validating travel behavior

estimated from smartcard data. Transportation Research Part C: Emerging Technologies,

44, 70–79. https://doi.org/10.1016/j.trc.2014.03.008

Nassir, N., Khani, A., Lee, S., Noh, H., & Hickman, M. (2011). Transit Stop-Level Origin-

Destination Estimation Through Use of Transit Schedule and Automated Data Collection

System. Transportation Research Record: Journal of the Transportation Research Board,

2263, 140–150. https://doi.org/10.3141/2263-16

Park, J. Y., Kim, D.-J., & Lim, Y. (2008). Use of Smart Card Data to Define Public Transit Use

in Seoul, South Korea. Transportation Research Record: Journal of the Transportation

Research Board, 2063(2063), 3–9. https://doi.org/10.3141/2063-01

Riegel, L. (2013). Utilizing Automatically Collected Smart Card Data to Enhance Travel

Demand Surveys. MIT theses. Massachusetts Institute of Technology.

78

Robinson, S., Narayanan, B., Toh, N., & Pereira, F. (2014). Methods for pre-processing

smartcard data to improve data quality. Transportation Research Part C: Emerging

Technologies, 49, 43–58. https://doi.org/10.1016/j.trc.2014.10.006

Schmöcker, J.-D., Kurauchi, F., & Shimamoto, H. (2017). An Overview on Opportunities and

Challenges of Smart Card Data Analysis. In F. Kurauchi & J.-D. Schmöcker (Eds.), Public

Transport Planning with Smart Card Data (pp. 2–12). Boca Raton: CRC Press.

https://doi.org/978-1-4987-2659-7

Seaborn, C., Attanucci, J., & Wilson, N. H. M. (2009). Using Smart Card Fare Payment Data To

Analyze Multi-Modal Public Transport Journeys in London. Transportation Research

Record: Journal of the Transportation Research Board 2121.-1, (2121), 55–62.

https://doi.org/10.3141/2121-06

Spurr, T., Chu, A., Chapleau, R., & Piché, D. (2015). A smart card transaction “travel diary” to

assess the accuracy of the Montréal household travel survey. Transportation Research

Procedia, 11, 350–364. https://doi.org/10.1016/j.trpro.2015.12.030

Sun, L., Tirachini, A., Axhausen, K. W., Erath, A., & Lee, D. H. (2014). Models of bus boarding

and alighting dynamics. Transportation Research Part A: Policy and Practice, 69, 447–

460. https://doi.org/10.1016/j.tra.2014.09.007

Transit Capacity and Quality of Service Manual. (2003). Washington, D.C.

Trepanier, M., & Morency, C. (2017). Evaluation of Bus Service Key Performance Indicators

Using Smartcard Data. In F. Kurauchi & J. . Schmoker (Eds.), Public Transport Planning

with Smart Card Data (pp. 181–196). Boca Raton: CRC Press.

https://doi.org/10.1007/s11947-009-0181-3

Trépanier, M., Morency, C., & Agard, B. (2009). Calculation of Transit Performance Measures

Using Smartcard Data, 79–96.

Trepanier, M., Tranchant, N., & Chapleau, R. (2007). Individual Trip Destination Estimation in a

Transit Smart Card Automated, 11(1), 1–14. https://doi.org/10.1080/15472450601122256

Zou, Q., Yao, X., Zhao, P., Wei, H., & Ren, H. (2016). Detecting home location and trip

79

purposes for cardholders by mining smart card transaction data in Beijing subway.

Transportation, (3), 1–26. https://doi.org/10.1007/s11116-016-9756-9

Algorithms in this thesis were developed using Python from Python Software Foundation.

Python Language Reference, version 2.7. Available at http://www.python.org

Maps throughout this thesis were created using ArcGIS® software by Esri. ArcGIS® and

ArcMap™ are the intellectual property of Esri and are used herein under license. Copyright ©

Esri. All rights reserved. For more information about Esri® software, please visit www.esri.com.

http://www.python.org/

http://www.esri.com/

80

Appendices

Appendix A - STM Card Types

Card types and description for STM users, retrieved from Metropolitano (2015) (p.25)

Group Description User Code Description

1 Normal 01/11 Normal

2 Student

21/121 Student A

22/122 Student B

23/123 Student FREE

3 Retired 31/131 Retired A

32/132 Retired B

4 Social Work 41 Special schools

42 Social benefits

5 Conventions

organisms

51 Entity with quotes

52 Employee with quotes

53 Entity without quote validation

54 Quote without quote validation

320/07

Ministry of National Defense (Special

characteristics)

6 Prepaid 61

Employee of authorized private companies

and public organizations

7 Linked

71 Employee with quotes

72 Retired

73 Investor without quotes

74 Relative of employee/investor

75 Employee of transport system

81

Appendix B - Results for all days

Results from Query # 2 for OD estimation and incorporation of single riders

▪ Tuesday August 16


Initial 303,511 870,437

Initial query results 298,871 860, 653 (98.9%)

Cards that do not meet query

criteria

4,640 (1.5%) 9,574 (1.1%)

Cards with single ride per day 43,466 (14.3%)

Transactions with valid bus run 662,402 (76.1%)

Transactions with validated bus

run

67,894 (7.8%)

Transactions with invalid bus run

(cannot be validated)

140,140 (16.1%)

Transactions with invalid

boarding stops

9,156 (1.1%)

Totals

Single rider 36,779 (12.11%) 36,779 (4.22%)

Cards for algorithm (estimate

alighting)

155,310 (51.2%) 440,622 (50.6%)

▪ Wednesday August 17


Initial 298,681 867,645

Initial query results 858,106 (98.9%)


criteria

4,625 (1.5%) 9102 (1.0%)




run

68,686 (7.9%)



136,197(15.7%)


boarding stops

9,104 (1.0%)

Totals

Single rider 37,101 (12.42%) 37,101 (4.28%)


alighting)

157,251 (52.6%) 446,587 (51.5%)

82

▪ Thursday August 18


Initial 305,377 872,844



criteria

4,585 (1.5%) 8,871 (1.0%)




run

69,513 (79.6%)



138,468 (15.9%)


boarding stops

9,349 (1.1%)

Totals

Single rider 33,701 (11.04%) 33,701 (3.86%)


alighting)

156,951 (51.4%) 445,577 (51.0%)

▪ Friday August 19


Initial 301,848 863,231



criteria

5,156 (1.7%) 10,283 (11.9%)

Cards with single ride per day 46,037(15.3%)



run

676,882 (78.4%)



138,242 (16.0%)


boarding stops

8,714 (1.0%)

Totals

Single rider 38,942 (12.90%) 38,942 (4.51%)


alighting)

152,870 (50.6%) 434,019 (50.3%)

83

▪ Saturday August 20


Initial 454,123 454,576



criteria

1,834 3313 (0.7%)

Cards with single ride per day 35,866

Transactions with valid bus

run

346,345 (76.2%)

Transactions with validated

bus run

31,392 (6.9%)

Transactions with invalid bus

run (cannot be validated)

76,386 (16.8%)


boarding stops

4,462 (1.0%)

Totals

Single rider 29,935 (6.59%) 29,935 (6.58%)


alighting)

83,660 (18.42%) 226,341 (49.8%)

▪ Sunday August 21


Initial 278,781 279,043



criteria

851 1,517 (0.5%)

Cards with single ride per day 27,671

Transactions with valid bus

run

215,790 (77.3%)

Transactions with validated

bus run

17,406 (6.2%)

Transactions with invalid bus

run (cannot be validated)

45,585 (16.3%)


boarding stops

2,509 (0.9%)

Totals

Single rider 12,622 (4.27%) 12,622 (4.52%)


alighting)

53,728 (19.27%) 142,318 (51.0%)

84

Results from itinerary - Comparison and fixing stops with passenger service time over 10

seconds

▪ Tuesday August 16

Condition Occurrences Passengers

Stop with unknown ordinal 624 4,031

Stop neither first or last of a

route nor terminal 19,653 82,398

Stop at intermediate terminals 292

22,123 First or last stop of bus route

including terminal 2,406

▪ Wednesday August 17








▪ Thursday August 18








85

▪ Friday August 19








▪ Saturday August 20








▪ Sunday August 21








86

Appendix C - Details of algorithm

Recall the definition of variables and indices:

𝑛 = 𝑇𝑟𝑖𝑝 𝑛𝑢𝑚𝑏𝑒𝑟 (𝑇ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑡𝑟𝑖𝑝 𝑖𝑠 𝑛 = 1)

𝑙 = 𝐿𝑒𝑔 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑛𝑢𝑚𝑏𝑒𝑟

𝑂𝑛 = 𝑂𝑟𝑖𝑔𝑖𝑛 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑛

𝐷𝑛 = 𝐷𝑒𝑠𝑡𝑖𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑛

𝑎𝑛, 𝑙 = 𝑎𝑙𝑖𝑔ℎ𝑡𝑖𝑛𝑔 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑡𝑟𝑖𝑝 𝑛, 𝑙𝑒𝑔 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑙

𝑏𝑛, 𝑙 = 𝑏𝑜𝑎𝑟𝑑𝑖𝑛𝑔 𝑙𝑜𝑐𝑐𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑡𝑟𝑖𝑝 𝑛, 𝑙𝑒𝑔 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑙

𝑑 = 𝑤𝑎𝑙𝑘𝑖𝑛𝑔 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑡𝑜𝑝𝑠

→ 𝐷𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡𝑟𝑎𝑣𝑒𝑙 𝑎𝑛𝑑 𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑜𝑓 𝑠𝑡𝑜𝑝𝑠 𝑓𝑜𝑟 𝑎 𝑏𝑢𝑠 𝑟𝑢𝑛

The transactions are identified by the index 𝑖, with the first transaction labelled 𝑖 = 1. The

algorithm consists of two parts. The steps for each part are outlined as follows:

First part: Estimation of alighting location and time

1. Identify all the transactions for a smartcard and organize them chronologically. Label the

transactions as 𝑖 , 𝑖 + 1, … , 𝑘. Starting with 𝑖 =1 and 𝑘 ≤ 9.

2. For transaction 𝑖, retrieve the bus UID and match with the corresponding valid bus run to

obtain the sequence of stops following the boarding location.

3. Pair the boarding location of the next boarding transaction (𝑖 + 1) with one of the stops

from step 1 that minimizes the walking distance (𝑑) between the stops and label this as

87

the alighting stop (𝑎𝑛, 𝑙) for transaction 𝑖. Thus, minimizing the walking distance for

passengers between the alighting stop and the next boarding.7

4. Retrieve the time of arrival for the alighting stop (𝑎𝑛, 𝑙) from the bus UID itinerary.

5. Repeat steps 2 through 4 for transaction 𝑖 = 𝑖 + 1 until reaching transaction 𝑘*.

*For transaction 𝑘, which is the last transaction of the day, use the boarding location for the first

transaction of the day (𝑖 = 1) as the next boarding location for step 3.

Second part: Estimation of trip origin and destination

1. Set variables 𝑛 = 1, 𝑙 = 1, count=0

2. Identify trip IDs for the transactions for a smartcard:

a. If transaction 𝑖 has unique trip ID:

i. Assign label 𝑛

ii. 𝑂𝑛 = Boarding stop transaction 𝑖

iii. 𝐷𝑛 = Alighting stop transaction 𝑖

b. If transaction 𝑖 shares trip ID with transaction 𝑖 + 1:

i. Retrieve and count subsequent transactions with shared trip ID and assign

them label 𝑛. Assign label 𝑙 for the first transaction, 𝑙 + 1 for the second,

and so on until all transactions are labeled.

ii. 𝑂𝑛 = Boarding stop transaction 𝑖

iii. 𝑎𝑛, 𝑙 = Alighting stop transaction 𝑖 (Note that the alighting is not the trip

destination as this is the first leg of the trip 𝑛)

iv. If transaction labeled 𝑙 + 1 is last transaction with shared 𝑛:

1. 𝑏𝑛, 𝑙+1 = Boarding stop

2. 𝐷𝑛 = Alighting stop

v. If transaction labeled 𝑙 + 1 is not last transaction with shared 𝑛:

1. 𝑏𝑛, 𝑙+1 = Boarding stop (transfer boarding stop for leg 𝑙 + 1)

2. 𝑎𝑛, 𝑙+1 = Alighting stop (transfer alighting stop for leg 𝑙 + 1)

vi. Repeat steps iv and v for subsequent transactions with shared 𝑛. Update

𝑙 = 𝑙 + 1.

3. Set variables 𝑛 = 𝑛 + 1, 𝑙 = 𝑙 + 1. Repeat step 2 transaction 𝑖 = 𝑖 + 1

4. Repeat steps 1 and 2 for next smartcard

7 The pairing process can be done by minimizing the distance between alighting and boarding

stops (Trepanier et al., 2007) or the generalized time (M. A. Munizaga & Palma, 2012). The

method proposed here considers minimizing the walking distance between stops and sets a

maximum walking distance of 1,000 metres.

analysis of automated fare collection data from montevideo ... · ii . analysis of automated fare...

Documents