Download - Core Training Presentations- 3 Estimating an Ag Database using CE Methods

National Accounts and SAM Estimation UsingCross-Entropy Methods

Sherman Robinson

Estimation Problem

• Partial equilibrium models such as IMPACT require balanced and consistent datasets the represent disaggregated production and demand by commodity

• Estimating such a dataset requires an efficiency method to incorporate and reconcile information from a variety of sources

2

3

Primary Data Sources for IMPACT Base Year

• FAOSTAT for country totals for:– Production: Area, Yields and Supply– Demand: Total, Food, Intermediate, Feed, Other Demands– Trade: Exports, Imports, Net Trade– Nutrition: Calories per capita, calories per kg of

commodity• AQUASTAT for country irrigation and rainfed

production• SPAM pixel level estimation of global allocation of

production

Estimating a Consistent and Disaggregated Database

4

Estimate IMPACT Country

Database

• FAOSTAT

Estimate Technology

Disaggregated Production

• IMPACT Country Database• FAO AQUASTAT

Estimate Geographic

Disaggregated Production

• Technology Disaggregated

• SPAM

5

Bayesian Work PlanSource Data (FAO, SPAM)

Feedback to data source

Priors on values and estimation errors of

production, demand, and trade

Estimation by Cross-Entropy Method

Check results against priors and identify potential data

problems

New information to correct identified problems

Information Theory Approach

• Goal is to recover parameters and data we observe imperfectly. Estimation rather than prediction.

• Assume very little information about the error generating process and nothing about the functional form of the error distribution.

• Very different from standard statistical approaches (e.g., econometrics).– Usually have lots of data

6

Estimation Principles

• Use all the information you have.• Do not use or assume any information you do

not have.• Arnold Zellner: “Efficient Information

Processing Rule (IPR).”• Close links to Bayesian estimation

7

Information Theory

• Need to be flexible in incorporating information in parameter/data estimation– Lots of different forms of information

• In classic statistics, “information” in a data set can summarized by the moments of the distribution of the data– Summarizes what is needed for estimation

• We need a broader view of “estimation” and need to define “information”

8

9

An analogy from physics

initial state of motion.

final state of motion.

Force

Force is whatever induces a change of motion: dtpdF

10

Inference is dynamics as well

old beliefs new beliefs

information

“Information” is what induces a change in rational beliefs.

Information Theory

• Suppose an event E will occur with probability p. What is the information content of a message stating that E occurs?

• If p is “high”, event occurrence has little “information.” If p is low, event occurrence is a surprise, and contains a lot of information– Content of the message is not the issue: amount,

not meaning, of information

11

Information Theory• Shannon (1948) developed a formal measure of

“information content” of the arrival of a message (he worked for AT&T)

)(00)(1

)/1log()(

phthenpIFphthenpIFpph

12

Information Theory• For a set of events, the expected information

content of a message before it arrives is the entropy measure:

1 1

( ) ( ) log( )

and 1

n n

k k k kk k

kk

H p p h p p p

p

13

14

Claude Shannon

E.T. Jaynes

• Jaynes proposed using the Shannon entropy measure in estimation

• Maximum entropy (MaxEnt) principle:– Out of all probability distributions that are

consistent with the constraints, choose the one that has maximum uncertainty (maximizes the Shannon entropy metric)

• Idea of estimating probabilities (or frequencies)– In the absence of any constraints, entropy is

maximized for the uniform distribution

15

16

E.T. Jaynes

Estimation With a Prior

• The estimation problem is to estimate a set of probabilities that are “close” to a known prior and that satisfy various known moment constraints.

• Jaynes suggested using the criterion of minimizing the Kullback-Leibler “cross entropy” (CE) “divergence” between the estimated probabilities and the prior.

17

18

Cross Entropy Estimation

Minimize:

log log log

where is the prior probability.

kk k k k

k kk

pp p p p

p

p

“Divergence”, not “distance”. Measure is not symmetric and does not satisfy the triangle inequality. It is not a “norm”.

MaxEnt vs Cross-Entropy

• If the prior is specified as a uniform distri-bution, the CE estimate is equivalent to the MaxEnt estimate

• Laplace’s Principle of Insufficient Reason: In the absence of any information, you should choose the uniform distribution, which has maximum uncertainty– Uniform distribution as a prior is an admission of

“ignorance”, not knowledge19

Cross Entropy Measure

• Two kinds of information– Prior distribution of the probabilities– Moments of the distribution

• Can know any moments– Can also specify inequalities– Moments with error will be considered– Summary statistics such as quantiles

20

21

Cross-Entropy Measure

K

k 1

,1

1

Minimize

ln

subject to constraints (information) about moments

and the adding-up constraint (finite distribution)

1

kk

k

K

k t k tk

k

kk

pp

p

p x y

p

22

Lagrangian

1

,1 1

1

ln

1

Kk

kk k

T K

t t k t kt k

K

kk

pL pp

y p x

p

23

First Order Conditions

T

tkttkk xpp

1, 01lnln

K

kktkt xpy

1, 0

011

K

kkp

24

Solution

,11 2

,1 1

exp( , ,..., )

where

exp

Tk

k t t ktT

K T

k t t kk t

pp x

p x

Cross-Entropy (CE) Estimates

• Ω is called the “partition function”. • Can be viewed as a limiting form (non-

parametric) of a Bayesian estimator, transforming prior and sample information into posterior estimates of probabilities.

• Not strictly Bayesian because you do not specify the prior as a frequency function, but a discrete set of probabilities.

25

From Probabilities to Parameters

• From information theory, we now have a way to use “information” to estimate probabilities

• But in economics, we want to estimate parameters of a model or a “consistent” data set

• How do we move from estimating probabilities to estimating parameters and/or data?

26

Types of Information

• Values:– Areas, production, demand, trade

• Coefficients: technology– Crop and livestock yields– Input-output coefficients for processed

commodities (sugar, oils)• Prior Distribution of measurement error:

– Mean– Standard error of measurement– “Informative” or “uninformative” prior distribution

27

Data Estimation

• Generate a prior “best” estimate of all entries: Values and/or coefficients.

• A “prototype” based on:– Values and aggregates

• Historical and current data• Expert Knowledge

– Coefficients: technology and behavior• Current and/or historical data• Assumption of behavior and technical stability

28

Estimation Constraints

• Nationally – Area times Yield = Production by crop– Total area = Sum of area over crops– Total Demand = Sum of demand over types of

demand– Net trade = Supply – Demand

• Globally– Net trade sums to 0

29

Measurement Error

• Error specification– Error on coefficients or values– Additive or multiplicative errors

• Multiplicative errors– Logarithmic distribution– Errors cannot be negative

• Additive– Possibility of entries changing sign

30

Error Specification

,k ,k

,k

,k

,k

Typical error specification (additive): x = x

where 0 1

and 1

and is the "support set" for the errors

i i i

i i ik

i

ik

i

e

e W v

W

W

v

31

Error Specification

• Errors are weighted averages of support set values– The v parameters are fixed and have units of item

being estimated. – The W variables are probabilities that need to be

estimated. • Convert problem of estimating errors to one

of estimating probabilities.

32

Error Specification

• The technique provides a bridge between standard estimation where parameters to be estimated are in “natural” units and the information approach where the parameters are probabilities. – The specified support set provides the link.

33

Error Specification

• Conversion of a “standard” stochastic specification with continuous random variables into a specification with a discrete set of probabilities– Golan, Judge, Miller

• Problem is to estimate a discrete probability distribution

34

Uninformative Prior

• Prior incorporates only information about the bounds between which the errors must fall.

• Uniform distribution is the continuous uninformative prior in Bayesian analysis.– Laplace: Principle of insufficient reason

• We specify a finite probability distribution that approximates the uniform distribution.

35

Uninformative Prior

• Assume that the bounds are set at ±3s where s is a constant.

• For uniform distribution, the variance is:

36

2

2 23 33

12s s

s

37

7-Element Support Set

1 2 3 4

5 6 7

3 2 02 3

v s v s v s vv s v s v s

2 2

22 2

1 and the prior is 7

9 4 1 1 4 9 47

k k kk

w v w

s s

Uninformative Prior

• Finite uniform prior with 7-element support set is a conservative uninformative prior.

• Adding more elements would more closely approximate the continuous uniform distribution, reducing the prior variance toward the limit of 3s2.

• Posterior distribution is essentially unconstrained.

38

Informative Prior

• Start with a prior on both mean and standard deviation of the error distribution– Prior mean is normally zero.– Standard deviation of e is the prior on the

standard error of measurement of item.• Define the support set with s=σ so that the

bounds are now ±3σ.

39

40

Informative Prior, 2 Parameters

2 2,k ,ki i i

k

W v Variance

,k ,k 0i ik

W v Mean

41


,1

,3

,5

3

0

3

i i

i

i i

v

v

v

42


2 2 2,1 ,2 ,39 0 9i i i i i iW W W

,1 ,3

,2 ,1 ,3

1

1816

118

i i

i i i

W W

W W W

Informative Prior: 4 Parameters

• Must specify prior for additional statistics– Skewness and Kurtosis

• Assume symmetric distribution: – Skewness is zero.

• Specify normal prior: – Kurtosis is a function of σ.

• Can recover additional information on error distribution.

43

44


2 2,k ,ki i i

k

W v

4 4,k ,k 3i i i

k

W v

Variance

Kurtosis

,k ,k 0i ik

W v Mean

3,k ,k 0i i

k

W v Skewness

45


,1

,2

,3

,4

,5

3.0

1.5

0

1.5

3.0

i i

i i

i

i i

i i

v

v

v

v

v

46


2 2 2,1 ,2 ,3

2 2,2 ,1

4 4 4,1 ,2

4 4,3 ,2 ,1

9 2.25 0

2.25 9

813 8116

81 0 8116

i i i i i i

i i i t

i i i i i

i i i i t

W W W

W W

W W

W W W

,1 ,5 ,2 ,4 ,31 16 48; ;

162 81 81i i i i iW W W W W

Implementation

• Implement program in GAMS– Large, difficult, estimation problem– Major advances in solvers. Solution is now robust

and routine. • CE minimand similar to maximum likelihood estimators.

• Excel front end for GAMS program– Easy to use

47

48

Implementation

IMPACT 3 FAOSTAT Database

Data Estimation with Cross EntropyNationally: Trade = Supply - Demand Nationally: Area X Yield = Supply Globally: Supply = Demand

Data Cleaning and Setting PriorsCrop Production Livestock Production Commodity Demand and

TradeProcessed Commodities

(oilseeds, sugar, etc.)

Data CollectionCommodity Balance Food Balance

Download - Core Training Presentations- 3 Estimating an Ag Database using CE Methods

Top Related