estimation tools for advanced transportation models · estimation tools for advanced transportation...

46
Estimation Tools for Advanced Transportation Models Morgan State University The Pennsylvania State University University of Maryland University of Virginia Virginia Polytechnic Institute & State University West Virginia University The Pennsylvania State University The Thomas D. Larson Pennsylvania Transportation Institute Transportation Research Building University Park, PA 16802-4710 Phone: 814-865-1891 Fax: 814-863-3707 www.mautc.psu.edu

Upload: others

Post on 18-Jan-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

Estimation Tools for Advanced Transportation Models

Morgan State University The Pennsylvania State University

University of Maryland University of Virginia

Virginia Polytechnic Institute & State University West Virginia University

The Pennsylvania State University The Thomas D. Larson Pennsylvania Transportation Institute

Transportation Research Building University Park, PA 16802-4710 Phone: 814-865-1891 Fax: 814-863-3707

www.mautc.psu.edu

Page 2: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

ii

Estimation Tools for Advanced Transportation Models

FINAL REPORT

UMD-2013-02

Prepared for

Mid-Atlantic Universities Transportation Center

By

Dr. Cinzia Cirillo, Principal Investigator

Nayel Urena Serulle, Graduate Student

Jean-Michel Tremblay, Graduate Student

Department of Civil & Environmental Engineering

1173 Glenn L. Martin Hall, Bldg #088

University of Maryland, College Park, MD 20742

September 2015

The contents of this report reflect the views of the authors, who are responsible for the facts

and the accuracy of the data presented herein. The contents do not necessarily reflect the official

views or policies of the University of Maryland.

Page 3: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

I

Page 4: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

II

Page 5: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

III

Page 6: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

IV

Page 7: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

V

Page 8: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

1 Table of Variables and Symbols

variable for static models descriptionn number of observationsk number of alternativesp number of parametersU utilities of alternativesV sistematic component of utilitiesε random errorX design matrix of Vx elements of Xβ coefficients of XC choice setY chosen alternativeZ latent variable for ordered modelsγ threshold in ordered models

variable for dynamic models descriptionT number of time periodsL number of look ahead steps. For example if L = 2, at

time period t the decision maker will consider utilitiesand payoffs at time period t+2 but ignore what happensat time period t + 3

D Attributes of alternatives for each time period. supposethat alternative attributes are ”cost” and ”time” andthere are K = 3 alternatives, then for each time period t,D(t) contains the variables ”time.1”, ”time.2”, ”time.3”,”cost.1”, ”cost.2” and ”cost.3”. D refers to the collectionof all D(t)’s.

F Attributes of held item at the beginning of the process.For our example it would contain a value of ”cost” and”time” for each observation.

G Global variables that do not vary with time or alterna-tives. Usually they corresponds to characteristics of thedecision maker (income, age, etc).

Z Attributes of held alternative. Z(t) corresponds to theattributes of held alternatives at time t. The ith line ofZ(t+1) is equal to the ith line of Z(t) if decision makeri keeps his item at time period t, and is updated usingvalues in D(t) if he decides to update his item.

τ Data set that contains all time sensitive variables. In ourexample it would contain ”ts1.1”, ”ts1.2”, ”ts1.3” and”ts1.4” all in one file. τ(t) would correspond to ts1.t

1

Page 9: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

H Matrix of one period payoffs. Hit is the one period payoffof individual i at time period t.

M Design matrix to compute H. There are T + L matricesM(t), each of which contain variables in Z(t), τ and G.

α coefficients of H, M(t)α gives the tth column of H.

U(t) Utilities of alternatives. U(t) gives the utility of eachalternative at time period t.

X(t) Design matrix of V. Entries in X(t) contain values inD(t), Z(t), τ(t) and G and possibly functions of thesevariables. For example an entry can be the sum of onevariable in D(t) and one variable in Z(t).

ν Matrix of biggest utilities. νit = maxjV(t)ij

W Reservation utilities.

2

Page 10: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

2 The Data

We assume that the data file contains in its first line a list of labels corresponding to theavailable variables, that each row corresponds to an observation, that each variable hasa numerical value and that there are no blanks (missing values should be replaced by anumerical value). Delimiters can be tabs or spaces. Documentation on how to read datafilesin R can be found in the following references:

http://cran.r-project.org/doc/manuals/r-release/R-data.pdf

http://www.r-tutor.com/r-introduction/data-frame/data-import

http://www.statmethods.net/input/importingdata.html

http://www.cyclismo.org/tutorial/R/input.html

reference:

https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html

3

Page 11: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

3 The Model

The specification of utility-based discrete choice models requires the definition of utilityfunctions for each alternative [j] in the choice set (C). In our setting, utilities [U] are linearcombinations of attributes [X] and coefficients [β]:

Uj = XTj βj ∀ j = 1, ..., K (1)

In vector and matrix from equation 1 becomes:

U = Xβ

where U = (U1, U2, ..., Uk)T, β = (β1, β2, ...βp)T and X is a design matrix.

4

Page 12: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

4 Specification of Utility Functions

4.1 Dependent Variable

It is name of the variable in the datafile that indicates the chosen alternative or the contin-uous dependent variable in discrete-continuous models.

Y = CHOICE

or

Y = AVMT (where AVMT = Annual Vehicle Miles Travelled)

4.2 Alternative Specific Constants

Alternative Specific Constants (ASC) are dummy variable that are assumed to be 1 for thealternative for which they are specified, 0 otherwise. The model assumes that a full set ofASCs are specified, with the first alternative being the base.

ASC = TRUE

If different specifications are needed ASC should be set to ”FALSE” and the ASC includedin the specific list. The design matrix should be modified to account for the dummy variablescorresponding to the ASCs in the model.

4.3 Model Coefficients

The specification of discrete choice models in R, requires the definition of a list of genericand specific coefficients. A coefficient is generic if it appears in all the utilities with the samevalue; a coefficient is specific of the alternative if different coefficients are estimated for eachalternative.

The generic list contains as many elements as the number of generic coefficients, and eachelement is a vector of predictors of size K (the number of alternatives). If a common predictordoes not affect a certain utility, this predictor is set to ”zero”.

The specific list contains K elements, and the kth element is a vector that contains all thepredictors that affect the kth utility. The size of each element is 0 or the number of specific

5

Page 13: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

predictors in the kth utility.

Consider the example in Table 1; the specification includes two ASCs : ASC transit andASC bike (car alternative being the base), the variables cost car, cost transit and cost bikewith a generic coefficient and the variables time car, time transit and time bike with specificcoefficients.

Ucar = βcost × cost car + βtc × time carUtransit = ASC transit + βcost × cost transit + βtt × time transitUbike = ASC bike + βcost × cost bike + βtb × time bike

Table 2: Example 1

This specification corresponds to the following syntax in R:

1 common = list(c("cost_car","cost_transit","cost_bike"))

2 specific = list(c("time_car"), c("time_transit"), c("time_bike"))

spec1.R

In the example in Table 2 the variables cost car and cost transit have a generic coefficient,but the cost for bike alternative is assumed to be ”zero”.

Ucar = βcost × cost car + βtc × time carUtransit = ASC transit + βcost × cost transit + βtt × time transitUbike = ASC bike + βtb × time bike

Table 3: Example 2

This specification corresponds to the following syntax in R:

1 common = list(c("cost_car","cost_transit","zero"))

2 specific = list(c("time_car"), c("time_transit"), c("time_bike"))

spec2.R

In the example in Table 3 the car alternative has two specific parameters (time car and toll),transit has one (time transit) and bike has none, but they all have a generic parameter (cost).

Ucar = βcost × cost car + βtc × time car + βtoll × tollUtransit = ASC transit + βcost × cost transit + βtt × time transitUbike = ASC bike + βcost × cost bike

Table 4: Example 3

This specification corresponds to the following syntax in R:

6

Page 14: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

1 common = list(c("cost_car","cost_transit","cost_bike"))

2 specific = list(c("time_car","tolls"), c("time_transit"), c())

spec3.R

The utilities in the last example Table 4 have no generic parameters but each of them hasone specific parameter (time).

Ucar = βtc × time carUtransit = ASC transit + βtt × time transitUbike = ASC bike + βtb × time bike

Table 5: Example 4

This specification corresponds to the following syntax in R:

1 common = list()

2 specific = list(c("time_car"), c("time_transit"), c("time_bike"))

spec4.R

4.4 Design Matrices

We provide below the design matrices (X) corresponding to the model specifications in theprevious Section. Suppose that for a given observation we have the following values for thepredictors:

predictor valuecost car x1

cost transit x2cost bike x3time car x4

time transit x5time bike x6

toll x7

It is easy to verify that X1β1 = U1 for the first example, with U1 = (Ucar, Utransit, Ubike)T,and β1 = (βcost, βtc, βtt, βtb)T is:

X1 =

x1 x4 0 0x2 0 x5 0x3 0 0 x6

7

Page 15: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

The other design matrices for example 2, 3 and 4 are respectively:

X2 =

x1 x4 0 0x2 0 x5 00 0 0 x6

X3 =

x1 x4 x7 0x2 0 0 x5x3 0 0 0

and:

X4 =

0 x4 0 00 0 x5 00 0 0 x6

8

Page 16: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

5 Multinomial Logit

1 library(dcmodels)

23 # model spec

4 generic = list(c("X1","X2","X3"))

5 specific = list(c("X4"), c("X5"), c("X6"))

67 # utility coefficients. typically you don ’t know that but we ’re generating choices here

8 b = c(-1, 1, 1, 1)

910 # if you supply your data you don ’t need this

11 nObs = 1000

12 nVars = 6

13 type = "norm"

14 seed = 1234

15 D = genData(nObs , nVars , type , seed)

16 choiceLogit = genLogit(generic , specific , D, b)

17 D$Y = choiceLogit

1819 # specification for the logit example

20 specLogit = list(

21 generic = generic ,

22 specific = specific ,

23 Y = "Y",

24 ASC = TRUE # they should be 0

25 )

2627 # run the logit

28 modelLogit = model(logit , specLogit , D)

29 #logitElas = modelElasticity(logit , logitSpec , logitData , c("X1","X2"))

30 #logitApply = modelAndApply(logit , logitSpec , logitData , 0.8)

../R/examples/exLogit.R

9

Page 17: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

6 Nested Logit

1 library(dcmodels)

23 n = 1000

4 p = 8

5 D = data.frame(matrix(rnorm(n*p),n,p))

6 D$one = 1

7 D$zero = 0

89 common = list(c("one","one","zero","zero"),

10 c("zero","zero","one","one"),

11 c("X1","X2","zero","zero"),

12 c("zero","zero","X3","X4"))

13 specific = list(c("X5"),c("X6"),c("X7"),c("X8"))

1415 m = list(common = common , specific = specific)

16 b = c(0.5,1,1,-1,2,-1,1,-2)

1718 # this is a matrix where (U1 ,U2) are correlated and (U3 ,U4)

19 # are correlated

20 S = matrix(c(1,0.5,0,0,0.5,1,0,0,0,0,1,0.5,0,0,0.5,1) ,4,4)

2122 # M transforms the error terms into differences against 1st

23 # error term

24 M = dcmodels :: cov2diff (4,1)

2526 # Variance of differences

27 SDiff = M %*% S %*% t(M)

2829 # generate choices

30 D$Y = dcmodels :: genChoiceProbit(m,b,D,NULL ,SDiff)

31 D$YSplit = 1 + 1 * (D$Y >= 3)

3233 spec = list(

34 split = list(

35 common = list(),

36 specific = list(c(),c())),

37 nests = list(

38 list(common = list(c("one","one"),c("X1","X2"))

39 , specific=list(c("X5"),c("X6"))),

40 list(common = list(c("X3","X4"))

41 , specific=list(c("X7"),c("X8")))),

42 YSplit = "YSplit",

43 Y = "Y",

44 SD = "hessian")

4546 #args = dcmodels :: nlogit$computeArgs(spec , D)

47 #b = dcmodels :: nlogit$computeStart(spec , D)

48 #sum(log(nlogit$LLVec(b,args)))

4950 M = model(nlogit , spec , D)

5152 specProbit = list(

53 common = common ,

54 specific = specific ,

55 method = "pmvnorm",

56 Y = "Y",

57 SD = "none"

58 )

5960 #MP = model(probit , specProbit , D)

../R/examples/exNLogit.R

10

Page 18: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

7 Ordered Probit

1 library(dcmodels)

23 Z = c("X1","X2","X3","X4","X5")

45 # if you supply your data you don ’t need this

6 a = c(1,1)

7 b = c(1, -1, 0.5, 0.5, 2)

8 nObs = 1000

9 nVars = 5

10 type = "norm"

11 seed = 1234

12 D = genData(nObs , nVars , type , seed)

13 D$Y = genOprobit(Z, b, a, D)

1415 # there is not much to specify for the ordered probit

16 specOProbit = list(

17 Y = "Y",

18 Z = c("const",Z)

19 )

2021 modelOProbit = model(oprobit , specOProbit , D)

../R/examples/exOProbit.R

11

Page 19: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

8 The Multinomial Probit

In discrete choice modeling, we use the probit model to predict the outcome of a categoricalchoice. Each alternative j is assumed to have a utility Uj to the decision maker, who picksthe alternative with the biggest utility:

Uj = XTj βj + εj, j = 1, ..., k

We define:Vj = XT

j βj

to be the deterministic part of the utilities. The choice is given by:

Y = arg maxj

Uj

The error term ε = (ε1, ..., εk) is assumed to follow a multivariate normal distribution withmean 0 and variance Σ∗. This assumption on the error term is what makes the modela Probit. If the errors were independently distributed along a Gumbel distribution, theywould define a Logit model.

Utilities do not have a scale or a unit, they are simply an abstract measurement of thepreference that decision makers have for alternatives.

For example, if someone faces three transportation alternatives to go to work with a utilityof 12 for driving, 15 for biking and -10 for public transportation, he will choose to bike towork.

8.1 Identification

There are two important scale and location properties to this model. First, scale is notrelevant. If the person in the previous example had utilities of 120, 150 and -100 (×10) fordriving, biking and using public transportation, his choice would not be affected. Second,location of utilities are not relevant either so the person would take the same decision if theutilities were 112, 115 and 90 (+100).

The location is identified by making sure that the same predictor is not used for all alterna-tives. In practice it mostly means that one alternative is not allowed to have an intercept.It is customary, but not necessary to simply set V1 = 0.

The scale problem is addressed by considering differences in utilities rather than the absolutevalue of utilities. To compute the maximum likelihood estimate, that we define later, weneed to express the probability of selecting each alternative. The following calculations showhow to express the probability of choosing the first and second alternatives.

1st alternative:

12

Page 20: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

P(Y = 1)= P(U1 > U2, U1 > U3, ...)= P(V1 + ε1 > V2 + ε2, V1 + ε1 > V3 + ε3, ...)= P(V1 – V2 > ε2 – ε1, V1 – V3 > ε3 – ε1, ...)= P(ε2 – ε1 < V1 – V2, ε3 – ε1 < V1 – V3, ...)

= P(δ(–1) < d(1–))

2nd alternative:

P(choice = 2)= P(U2 > U1, U2 > U3, ...)= P(V2 + ε2 > V1 + ε1, V2 + ε2 > V3 + ε3, ...)= P(V2 – V1 > ε2 – ε1, V2 – V2 > ε3 – ε2, ...)= P(ε1 – ε2 < V2 – V1, ε3 – ε2 < V2 – V3, ...)

= P(δ(–2) < d(2–))

withδ(–j) := (ε1 – εj, ..., εk – εj)

d(j–) := (Vj – V1, ..., Vj – Vk)

We see that choosing the first alternative depends on δ(–1) and choosing the second alternativedepends on δ(–2) and, generally choosing the kth alternative depends on the differencesbetween the error terms and εk. Fortunately there is a way to transform one set of differencesto another. For the sake of illustration, consider differences with respect to ε1 and a choiceset with 4 alternatives. The following matrices generate all set of differences:–1 0 0

–1 1 0–1 0 1

ε2 – ε1ε3 – ε1ε4 – ε1

=

ε1 – ε2ε3 – ε2ε4 – ε2

= δ(–2)

0 –1 01 –1 00 –1 1

ε2 – ε1ε3 – ε1ε4 – ε1

=

ε1 – ε3ε2 – ε3ε4 – ε3

= δ(–3)

0 0 –11 0 –10 1 –1

ε2 – ε1ε3 – ε1ε4 – ε1

=

ε1 – ε4ε2 – ε4ε3 – ε4

= δ(–4)

The choice probability will involve the covariance of error difference so we define them assuch:

13

Page 21: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

Σ(–j) = Cov(δ(–j))

Considering the full error term structure would over parametrize the model since we canwork with δ(–1) only, therefore we will only estimate the covariance structure of δ(–1). Tolighten notations, we define:

Σ := Σ(–1)

We use Σ to denote the parameter that we want to estimate, and we specifically write Σ(–1)

when we want to stress that it originates from taking differences of error terms with respectto ε1.

From Σ, we need to be able to compute Σ(–j) for any j. This is easy since differences aretransformed using linear transformations. If x ∼ N(µ, Σ), then a linear transformation of xhas the following distribution:

Mx + y ∼ N(Mµ + y, MΣMT)

The last restriction that we impose is that one diagonal element of Σ will be set. We chooseΣ1,1 = 2, but any other choice could have been made. This sets the scale of the differencebetween the first and the second utility, and therefore between other differences since theyimplicitly depend on this difference, via transformations.

To identify the location of the utilities, we impose that not all utilities have an intercept. It iscustomary to say that U1 does not contain an intercept. For some problems, one alternativeserves as a default, or no a action alternative, and it’s deterministic utility (Vi) is completelyset to zero, which also provides a solution to the location identification problem. For example,in marketing, if we want to predict the product that a consumer will buy, the option ”do notpurchase anything” will have a deterministic utility of zero that other products compare to.

8.2 Other Issues

The covariance matrix must be positive definite and this constraint is hard to express math-ematically. Instead we optimize the Cholesky factor of Σ(–1), i.e. the lower triangular matrixL such that LLT = Σ(–1)

Finally, the notation that we use to describe utilities is not flexible enough to deal withutilities that share a common coefficient. Consider this example:

V1 = X1β1 + X5β5V2 = X2β1 + X6β6V3 = X3β1 + X7β7V4 = X4β1 + X8β8

14

Page 22: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

We will replace this statement with:V = Xβ

Where:

V = (V1, V2, V3, V4)T

X =

X1 X5 0 0 0X2 0 X6 0 0X3 0 0 X7 0X4 0 0 0 X8

β = (β1, β5, β6, β7, β8)T

At this point, we need to calculate the maximum likelihood estimate of (β, Σ).

15

Page 23: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

9 Maximum-Likelihood Estimation

The log-likelihood (LL) function of any model is defined as:

LL(θ|x) =n∑

i=1

log(P(X = xi|θ))

In probability, the mass (or density, we only discuss probability masses here) is a functionof x, the observation, and depends on θ, the parameter of the model. P(X = x|θ) is theprobability of observing a realization x of the random variable X if the parameter of themodel is θ. The LL function reverses the conditioning and considers θ as a parameter, giventhat we observed a sample of the random variable. In our case we have:

LL(β, Σ|X, y) =n∑

i=1

log(P(choice = yi|Σ,β, Xi))

The maximum likelihood estimator (MLE) is defined as:

(β̂, Σ̂) = arg maxβ,Σ

LL(β, Σ|X, y)

We argue without proof that, for this problem, the challenge of optimizing the log-likelihoodfunction reduces to provide an estimate with good properties. In particular we want theestimate of the LL function to be continuously differentiable, and we want to be able toestimate the Hessian matrix at the maximum. The computation of the Hessian matrixcannot be avoided since it is required to estimate the standard errors of the estimator.

16

Page 24: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

10 Computations

10.1 Multivariate Normal CDF

We need to be able to calculate the probability of each choice. We have seen that this reducesto:

P(Y = j) = P(δ(–j) < d(j–))

Since the vector follows a normal distribution, this is simply the cumulative distributionfunction (CDF) of the multivariate normal distribution, for which no closed form exist. Wewill try to use two estimators for it.

10.2 Monte Carlo

The first estimator that we analyze is the naive acceptance rejection method.

P(N(0, Σ) < d) ≈ 1

B

B∑b=1

I(xb < d)

where the xb are draws from the normal distribution. Drawing from the multivariate normaldistribution is easy:

xb = Lz + µ ∼ N(µ, Σ = LLT)

if z follow a multivariate standard normal. z is easy to generate since all its components areindependent.

Because of the random character of this estimator, we must use the same draws of z from oneevaluation of the log-likelihood to another. This introduces bias in our estimation but alsoensures that function evaluations are deterministic. The storage requirement for the draws, indouble precision, is 8(k–1)Bn bytes. This amounts to 160 MB for a model with 5 alternatives,1,000 simulation draws and 5,000 observations in the sample, which is a reasonable amountof memory. We do not discuss the issue of estimation bias.

The acceptance-rejection algorithm is fairly easy to implement:

1 double Probit :: probaSim(const arma:: rowvec& d, const arma::mat& L

2 , int choice , int obs){

3 arma::mat err = L * z.at(obs);

4 int nsucc = 0;

5 for(int i = 0; i < nsim; ++i){

6 bool is_suc = true;

7 for(int diff = 0; diff < nalt - 1; ++diff)

8 if(err(diff ,i) > d(diff)){

9 is_suc = false;

10 break;

11 }

12 if(is_suc)

13 nsucc ++;

17

Page 25: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

14 }

1516 if(nsucc)

17 return (( double) nsucc) / (( double) nsim);

18 else

19 // cannot return a 0 probability , so we return 0.1 succes / nsim

20 return 0.1/ (( double) nsim);

21 }

code sim.cpp

The major drawback of the algorithm is that it provides a highly non-smooth estimate of theLL function. We generated an artificial sample of 1,000 observations, found the maximumand plotted the LL function with respect to β1 and L2,2 both at the maximum and at thestarting value of the maximization to appreciate the smoothness, or lack thereof, of thefunction. The plots are in figure 1.

We note a few things:

• The function is not smooth enough to calculate centered difference derivatives, but ifwe take a big enough step (h = 0.1) we can estimate derivatives well enough to performsome optimization. This is the method suggested by Train.

• The function is flat around the max so any maximum found will be fuzzy.

• It will not be possible to estimate the Hessian matrix at the max.

• The estimate is much smoother for β1 than for L2,2.

10.3 Genz’s Algorithm

The competing algorithm has been suggested by Genz and involves Quasi Monte-Carlo sam-pling and other transformations. It involves transformations of subspaces of the normaldistribution’s support into unit hypercubes to which quasi Monte-Carlo sampling is applied.The algorithm is very difficult to implement and is delivered in FORTRAN. We translatedthis code in C++ using f2c and applied minor linking directives, as well as a random gen-erator reset function. This is necessary to make sure that the LL function evaluations aredeterministic, even if the variance in function evaluations is very small.

As shown on the plots in figure 2, Genz’s algorithm produces a smooth estimation of the LLfunction. This comes at a high cost since computing the algorithm take as long as computing30,000 simulations for the ordinary Monte Carlo. This looks like a steep price, howeverwe definitely get what pay for, and more, since we will ultimately obtain a good estimatorof the probit model, along with standard errors of estimates. We also make intensive useof highly optimized multi-threaded matrix computation from OpenBLAS for the ordinaryMonte Carlo, when Genz’s algorithm comes plain, with a lot of potential for improvement.

18

Page 26: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

−1.00 −0.95 −0.90 −0.85

−82

4−

822

−82

0

LL at max for simulation

β1

log−

likel

ihoo

d

0.60 0.65 0.70 0.75−

819.

9−

819.

7−

819.

5−

819.

3

LL at max for simulation

L23

log−

likel

ihoo

d

−0.10 −0.05 0.00 0.05 0.10

−15

00−

1450

−14

00−

1350

LL at start for simulation

β1

log−

likel

ihoo

d

0.90 0.95 1.00 1.05 1.10

−14

15−

1413

LL at start for simulation

L23

log−

likel

ihoo

d

Figure 1: Smoothness of Acceptance-Rejection Algorithm

19

Page 27: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

−1.00 −0.95 −0.90 −0.85

−81

8.5

−81

7.5

−81

6.5

−81

5.5

LL at max for pmvnorm

β1

log−

likel

ihoo

d

0.60 0.65 0.70 0.75−

815.

65−

815.

55−

815.

45

LL at max for pmvnorm

L23

log−

likel

ihoo

d

−0.10 −0.05 0.00 0.05 0.10

−15

00−

1450

−14

00−

1350

LL at start for pmvnorm

β1

log−

likel

ihoo

d

0.90 0.95 1.00 1.05 1.10

−14

16−

1414

−14

12

LL at start for pmvnorm

L23

log−

likel

ihoo

d

Figure 2: Smoothness of Genz’s Algorithm

20

Page 28: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

11 Sample

To validate the suggested estimators, a sample of size n = 1000 is generated artificially.Variables X1, · · · , X8 are generated following a standard normal distribution (mean 0 andvariance 1). Utilities are specified by:

U1 = β1X1 + β5X5 + ε1U2 = β1X2 + β6X5 + ε2U3 = β1X3 + β7X5 + ε3U4 = β1X4 + β8X5 + ε4

The value of the parameter β = (β1, β2, β3, β4, β5)T is (–1, 1, 1, 1, 1)T. The differences in

error terms δ(–1) follow a multivariate normal:

δ(–1) ∼ N

0,

2 1 11 2 11 1 2

The corresponding Cholesky matrix to estimate is:

L =

√2 0 00.707 1.225 00.707 0.408 1.155

where the

√2 element does not need to be estimated.

We do not expect to recover the real parameters since we work with a finite sample witherror terms. We want to compute estimates that are close to the true values. This is not anumerical issue as the maximum of the LL function is not the true value of the parameters.We use the original parameters as a guideline to ensure that our algorithm is implementedproperly.

21

Page 29: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

12 Results

12.1 Coefficients

We use both methods to try to retrieve the original parameters using the starting valuesdefined in figure 3.

The number of simulation is set to B = 1000. We obtain the estimates in table 6.

Results in table 6 present the Monte Carlo method in a very good light. Typically MonteCarlo performs well, at best, on an artificial sample and poorly on an actual sample.

We recall that the convergence of optimization algorithms can depend on the choice of startingvalues. To illustrate this issue we attempt to solve the problem with each method using fiverandomly chosen starting vectors. These values are selected randomly within the [–5, 5]interval.

As expected, results for Genz’s algorithm presented in table 7 appear to be independent fromthe starting value used in the optimization. Even if we are generally able to find good startingvalues, for instance using output from the logit model, having a robust estimation methodprovides additional security. Also, covariance matrices are typically difficult to estimatebecause they appear to have a lot of point where some of their components have a derivativeclose to zero.

Results from table 8 are not satisfying. We observe a lot of variation in the estimated Choleskymatrix and the results for the first starting vector are completely off. It demonstrates thatthe estimated LL function is not smooth and the solver gets stuck in sub-optimal points.Note that we used the same simulation draws for all five starting values.

It is much more difficult to evaluate the convergence of a covariance matrix because it de-scribes error terms. Typically the researcher explains everything he knows about the depen-dent variable in the predictors so he cannot evaluate the quality of the estimated covariancematrix. It is easier to appreciate estimated coefficients (the βi’s) since they have a meaningin the equations. For example if we want to predict travel mode (bicycle, car or bus), wehave an idea of how people trade off between travel time and cost so we can compare thesecoefficients and see if the behavior of people appears to be reasonable. Therefore we cannot

βstart = (0, 0, 0, 0, 0)T

Lstart =

√2 0 01 1 01 1 1

Figure 3: starting values

22

Page 30: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

Table 6: Comparison of Genz’s algorithm and Monte Carlo

parameter true value Genz Monte Carloβ1 -1 -0.927 -0.926β2 1 1.043 1.045β3 1 0.820 0.821β4 1 0.893 0.869β5 1 0.943 0.948 L2,1 0.707 0.678 0.697 L2,2 1.225 1.249 1.256 L3,1 0.707 0.545 0.542 L3,2 0.408 0.499 0.493 L3,3 1.155 0.945 0.937

Table 7: Five optimizations with five different starting values - Genz’s algorithm

parameter true value 1st start 2nd start 3rd start 4th start 5st startβ1 -1 -0.93 -0.93 -0.93 -0.93 -0.93β1 1 1.04 1.04 1.04 1.04 1.04β1 1 0.82 0.82 0.82 0.82 0.82β1 1 0.89 0.89 0.89 0.89 0.89β1 1 0.94 0.94 0.94 0.94 0.94L2,1 0.71 0.68 0.68 0.68 0.68 0.68L2,2 1.23 1.25 1.25 1.25 1.25 1.25L3,1 0.71 0.55 0.55 0.55 0.54 0.55L3,2 0.41 0.5 0.5 0.5 0.5 0.5L3,3 1.16 0.95 0.95 0.94 0.94 0.94

23

Page 31: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

Table 8: Five optimizations with five different starting values - Monte Carlo

parameter true value 1st start 2nd start 3rd start 4th start 5st startβ1 -1 -6.34 -0.93 -0.95 -0.91 -0.9β1 1 5.34 1.02 1.08 1.02 1β1 1 3.59 0.83 0.84 0.81 0.77β1 1 5.2 0.86 0.9 0.86 0.83β1 1 4.84 0.9 0.96 0.91 0.87L2,1 0.71 -10 0.7 0.74 0.74 0.65L2,2 1.23 0.94 1.24 1.31 1.23 1.17L3,1 0.71 -1.71 0.53 0.56 0.56 0.62L3,2 0.41 -4.08 0.52 0.5 0.48 0.42L3,3 1.16 10 -0.91 -0.96 -0.92 -0.91

satisfy ourselves with a method that does not provide a reliable estimate of the Choleskymatrix.

12.2 Standard Errors

The last point that we want to explore is the computation of t-statistics. In maximumlikelihood estimation we have that if a random variable X is parametrized by θ, then themaximum likelihood estimator of θ converges to a normal distribution:

θ̂mled−→ N(θ, H–1)

where H is the Hessian matrix of the LL function evaluated at the max. Therefore it isvery important to be able to estimate the Hessian matrix of the LL function, but only at themaximum. It is clear from the plots that all hope to estimate it with the Monte Carlo methodis lost. We need a very reliable estimate of the Hessian matrix because we need to invert it.Among other problems that we can have is a non-invertible matrix, or an inverse that is notpositive semi-definite. In the later case H–1 would not describe a covariance matrix. Theworse is if H–1 exists and is positive semi-definite but completely off, because we may fail todiagnose failure in the estimation when we obtain meaningful numbers for it.

We use centered differences with h = 0.001 to estimate the elements of H. The analysis ofthe full structure of the covariance of θ̂mle is tedious so we only analyze its diagonal ele-ments. They correspond to the estimated variance of the estimator, thus ignoring covarianceeffects between pairs of predictors. It is not a standard practice in statistics to consider thecovariance of pairs of estimates so we do not dismiss much by ignoring it. The square root ofthe diagonal elements are called the standard errors (and not standard deviations, becausethey are estimated).

24

Page 32: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

Table 9: Genz’s algorithm with standard errors

parameter estimate standard errorβ1 -0.9267 0.0561β2 1.0426 0.0819β3 0.8197 0.0801β4 0.8927 0.0815β5 0.9427 0.0793L2,1 0.6782 0.1387L2,2 1.2486 0.1075L3,1 0.5453 0.1321L3,2 0.4994 0.1441L3,3 0.9451 0.0867

There are many reasons why we are satisfied with these standard errors. First, they are alldefined. Second their scale makes sense. We expect the standard error of β1 to be smallerthan the standard error of all other βi. This is because β1 is required to estimate the utilityof all observed dependent variables, whereas all other βi only affect one utility so, informallyspeaking, we accumulated less information about them in the sample. Third, the elementsof the Cholesky matrix have higher standard errors, and we do expect them to convergeslower than the coefficients. Finally, these numbers compare to what we generally obtainfrom similar models with similar sample sizes.

12.3 Bootstrap Variance Estimation

One way to improve even more our confidence that the standard errors are good would beto generate a sample of estimates. An estimate is a realization of an estimator, an estimatoris a random variable. We would proceed by generating a sample of samples from the samecoefficient. For example we could generate B∗ = 500 samples and calculate the maximumlikelihood estimate of each of them. That would give us a sample from the distribution ofthe estimator where we could calculate some of its properties, like expectation and variance.This technique is called the bootstrap.

If we relied on the pure Monte-Carlo we would have to use bootstrap variance estimation toestimate standard deviations. This is why we think that the overhead of Genz’s algorithm isacceptable since it cuts one layer of simulation.

25

Page 33: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

13 Conclusion

We have seen what the difficulties of estimating the probit model with maximum likelihoodestimation are. We discussed two alternatives to estimate the coefficients of the model andsaw that the algorithm developed by Genz is much more robust.

We ignored other approaches to parameter estimation. In particular Bayesian statistics pro-vide a very clever way to fit a probit model in a very efficient fashion, using data augmentationand the Bernstein - Von Mises theorem. However, these methods are less systematic thanfrequentist statistics and require sheer creativity for each model.

The next step is to combine several models together. In particular we want to combinea continuous variable with a categorical variable and model them jointly. The maximumlikelihood approach that we investigated will allow us to do this transition smoothly. Oneresearch problem that we aim to solve is the modeling of car holdings and use, which hasimportant implications in civil engineering and policy analysis.

26

Page 34: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

14 Discrete Continuous Ordered Probit

1 library(dcmodels)

23 # generate a data set

4 nObs = 100

5 nVars = 10

6 type = "norm"

7 seed = 1234

8 D = genData(nObs , nVars , type , seed)

910 # generate an OP

11 Z = c("X1","X2","X3","X4","X5")

12 a = c(1,1)

13 bOp = c(1, -1, 0.5, 0.5, 2)

14 D$Yop = genOprobit(Z, bOp , a, D)

1516 # generate a regression. this is not correlated with the OP so me expect a correlation of 0

17 bReg = c(1,1,1,2,-2)

18 sReg = 5

19 reg = c("X6","X7","X8","X9","X10")

20 D$Yreg = as.matrix(D[,reg]) %*% bReg + rnorm(nObs ,0,sReg)

2122 #’ DC2 specs

23 specDC2 = list(

24 op = Z,

25 reg = reg ,

26 Yop = "Yop",

27 Yreg = "Yreg"

28 #Y = "Yop",

29 #Z = c("X1","X2","X3","X4","X5")

30 )

313233 spec = specDC2

34 modelFns = dc2

3536 dc2Model = model(dc2 , specDC2 , D)

3738 write.table(D[,c("X1","X2","X3","X4","X5")],"~/xdisc.txt",row.names=F,col.names=F)

39 write.table(D[,c("X6","X7","X8","X9","X10")],"~/xcont.txt",row.names=F,col.names=F)

40 write.table(D$Yop ,"~/ydisc.txt",row.names=F,col.names=F)

41 write.table(D$Yreg ,"~/ycont.txt",row.names=F,col.names=F)

../R/examples/exDC2.R

27

Page 35: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

15 Discrete Continuous Probit

1 library(dcmodels)

23 # generate a data set

4 nObs = 500

5 nVars = 10

6 type = "norm"

7 seed = 1234

8 D = genData(nObs , nVars , type , seed)

910 # model spec

11 generic = list(c("X1","X2","X3"))

12 specific = list(c("X4"), c("X5"), c("X6"))

13 ASC = FALSE # to keep it fast

1415 # utility coefficients. typically you don ’t know that

16 # but we’re generating choices here

17 b = c(-1,1,1,1)

18 # covariance of differences , typically you don ’t know that either

19 SDiff = matrix(c(1, 0.5, 0.5, 1) ,2,2)

20 D$Ypr = genChoiceProbit(generic , specific , b, D, NULL , SDiff , ASC)

2122 # generate a regression. this is not correlated with the OP so me expect a correlation of 0

23 bReg = c(1,1,2,-2)

24 sReg = 5

25 reg = c("X7","X8","X9","X10")

26 D$Yreg = as.matrix(D[,reg]) %*% bReg + rnorm(nObs ,0,sReg)

2728 #’ DC4 specs

29 specDC4 = list(

30 reg = reg ,

31 generic = generic ,

32 specific = specific ,

33 method = "genz",

34 ASC = TRUE ,

35 Yreg = "Yreg",

36 Ypr = "Ypr",

37 SD = "none"

38 )

3940 dc4Model = model(dc4 , specDC4 , D)

../R/examples/exDC4.R

28

Page 36: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

29

Page 37: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

30

𝑍𝑖(𝑡)

𝐻𝑖𝑡 = 𝑍𝑖(𝑡)𝛼𝑧 + 𝐺𝑖𝛼𝑔 + 𝜏𝑖

(𝑡)𝛼𝜏 = 𝑀𝑖

(𝑡)𝛼

𝑈𝑠𝑡𝑎𝑡𝑖𝑐 = 𝑉 + 𝜀 = 𝑋𝛽 + 𝜀

𝑉𝑖𝑡𝑗 = 𝑋𝑖(𝑡)𝛽𝑗 + 𝑍𝑖

(𝑡)𝛾𝑗

𝑉 = 𝑓(𝐷, 𝜏, 𝐺, 𝑍)

Page 38: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

31

𝑣𝑖𝑡 = max𝑗

𝑉𝑖𝑡𝑗

Page 39: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

32

• ≠

Page 40: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

33

Page 41: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

34

Page 42: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

35

𝑟𝑡 = 𝛼𝑟𝑡−1 + 𝛾 + 𝜀𝑡

Page 43: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

36

Page 44: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

37

Page 45: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

38

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

No Evac Forecast 1 Forecast 2 Forecast 3 Forecast 4

Comparison of Observed vs. Estimated evacuations

Observed Dyn. Logit with PK Dyn. Logit AR(1)

Page 46: Estimation Tools for Advanced Transportation Models · Estimation Tools for Advanced Transportation Models FINAL REPORT UMD-2013-02 Prepared for Mid-Atlantic Universities Transportation

39