synthesizing agents and relationships for land use / transportation modelling

David PritchardCivil Engineering, University of Toronto

September 12, 2008

Synthesizing Agents and Relationships for Land Use / Transportation

Modelling

2

Lecture Outline

● Introduction● Previous Work● Data● New Methods● Results

3

Introduction

● How would land use, transportation patterns and emissions react to...

● High congestion charge?● Greenbelt policy?● “Do nothing” while population grows ● Major transportation projects

● Major extrapolations from current behaviour● Too hard to predict conventionally

4

Introduction

Traditional 4-stage

5

Introduction

Integrated Land Use/Transportation Environment (ILUTE) model

6

Introduction

● We can’t build such a complicated model using conventional methods

● Instead, preferred approach is microsimulation model

● What is microsimulation?

7

Introduction

Conventional Model

Simulation Model

8

Introduction

● Microsimulation = Simulation + Agents● Models the state of agents● Combined behaviour of agents yields

system state● 1. Begin with initial population in start year● 2. Update population, year by year

● age persons, change family structures● change jobs, move homes● use this to predict annual travel patterns

● 3. Obtain travel patterns in forecast year

9

Introduction

● Need an initial population in the start year● List of agents and their attributes - e.g.,

● Number of persons, and their ages● Number of vehicles● Type of dwelling● etc.

● But - complete list is unknown● “Population Synthesis” used instead

● Use known data to create initial agents● Result has known statistical properties● Best estimate from limited data

10

Introduction

My results:Improved method for population synthesis

Allows more attributes for each agentNew method for relationship synthesis

Allows correct set of agents and correct set of relationships

Created a synthetic population for ILUTE Persons, families, households and dwellings Complete 1986 population for GTHA

11

Previous Work

Two representations of set of agents List of agents and their attributes (as categories) Contingency table

One cell for each combination of attributes Cell contains count of number of agents

12

Previous Work

Data Limitations Patchwork of partial data Mostly, we have one-way margins Break down of a single attribute into a few

categoriesExample: look at how we can use one-way margins

13

Previous Work

14

Previous Work

Iterative Proportional Fitting

15

Previous Work

Iterative Proportional Fitting

16

Previous Work

Iterative Proportional Fitting e.g., “Biproportional Updating” of O/D tables

Exactly satisfies target marginsAlso minimizes discrimination information relative to source population

Information theory: maximum entropyResulting PDF satisfies the constraints without assuming any information we do not possess

17

Previous Work

Many options for margins in 3D

18

Previous Work

Beckman, Baggerley & McKay (1996)State-of-the-art application of IPF for census Geography attribute gets special treatment

Due to nature of data in PUMS and census tablesTwo approaches: zone-by-zone, or all zones at once

Treats final table as a PMF Monte Carlo draws used to integerize Hurts fit to target margins

Limited number of attributes

19

Previous Work

Williamson, Birkin and Rees (1998)Not IPF: “Combinatorial Optimisation”List-based, instead of tablesPros:

good fit to target margins may handle more attributes

Cons: no guarantees about relationship with source

sample not entropy maximizing slow

20

Data

Summary Tables Usually one attribute, by zone (2D margin) Contingency table Large sample: 20% or 100% Sometimes 2-3 attributes by zone Used as Target Margins

Public Use Microdata Sample (PUMS) List; almost all attributes, except zones Small sample (1-2%) Canada: defined for each large Census

Metropolitan Area (CMA) Used as Source Sample

21

Data

22

Data

23

Data

24

Data

Canadian Census includes three PUMS Persons Census families Households & Dwellings

Also summary tables related to each

25

New Methods: Sparsity

Beckman et al.’s approach doesn’t work well with many attributes

Computation becomes hard Huge memory requirement Slow

Thirteen attributes on family agent: Beckman Zone-by-Zone needs 1.4 GB memory Beckman Multizone needs 1,036 GB memory

26


Number of cells in multiway table grows exponentially with number of attributes (dimensions)

27


28


Large number of binsMost bins are zeroNumber of bins is larger than sample!

29


Is it meaningful to use many attributes? Tentatively, yes Not a meaningful 13-way distribution But, a link between many statistically valid low-

order distributions (e.g., 3-way) If acceptable, can we do better than standard IPF?

Yes - use a sparse data structure instead of a complete array to represent table

Store only non-zero cells in table

30


Same representation as Williamson’s “Combinatorial Optimisation”

But, uses IPF algorithmMaximum entropy guarantee; fastCan implement either zone-by-zone or multizone IPF using sparse data structure

31

New Methods: Relationships

Land use/transportation models have more types of agents Agents: Persons, families,

households, business establishments

Objects: Vehicles, dwellings

32


Need to synthesize correct relationshipsExamples:

Which persons are married? Opposite sex, similar ages - usually

Which household owns/rents a given dwelling? Number of rooms and number of persons should

be correlatedEarlier methods could guarantee correct PDF for one agent type, but not all simultaneously

33


Family PUMS contains information about persons in family husband/wife ages; child ages

Can synthesize “family” agent Include some “person” attributes in family

34


Then, conditionally synthesize persons on family attributes IPF result is a joint probability mass function

P(AGE, EDU, INCOME, OCCUP, SEX, ...)

Can convert to a conditional PMF

P(EDU, INCOME, OCCUP, ... | AGE, SEX)

Synthesize, repeating for husband, wife, children

35


Guarantees good fit for both agent types Correct Family PDF Correct Person PDF

Simple, data-driven No rules No special data sources, models

Provided that attributes can be aligned between agents

36

Results

37

Results

38

Results

Programmed in R A statistical programming platform Dynamic language, fast prototyping Good support for categorical data, contingency

tablesToronto CMA: 1.1 million households, 1.0 million families, 3.3 million persons

Run time: 2 hours, 7 minutes on older 1.5GHz computer

Repeated for Hamilton and Oshawa CMAs

39

Results

40

Results

Experiment Is there value in using really rich input data? Or does PUMS + 1D tables give enough?

Calculated fit against all available dataSRMSE and G2 information theoretic statistics

41

Results

42

Results

Improvement of result with additional data evident However, no statistical tests possible

Monte Carlo stage causes some errorMy conditional synthesis introduces small amount of additional error

Little difference between zone-by-zone and multizone methods

43

Questions?

synthesizing agents and relationships for land use / transportation modelling

Documents