National Accounts and SAM Estimation UsingCross-Entropy Methods
Sherman Robinson
Estimation Problem
• Partial equilibrium models such as IMPACT require balanced and consistent datasets the represent disaggregated production and demand by commodity
• Estimating such a dataset requires an efficiency method to incorporate and reconcile information from a variety of sources
2
3
Primary Data Sources for IMPACT Base Year
• FAOSTAT for country totals for:– Production: Area, Yields and Supply– Demand: Total, Food, Intermediate, Feed, Other Demands– Trade: Exports, Imports, Net Trade– Nutrition: Calories per capita, calories per kg of
commodity• AQUASTAT for country irrigation and rainfed
production• SPAM pixel level estimation of global allocation of
production
Estimating a Consistent and Disaggregated Database
4
Estimate IMPACT Country
Database
• FAOSTAT
Estimate Technology
Disaggregated Production
• IMPACT Country Database• FAO AQUASTAT
Estimate Geographic
Disaggregated Production
• Technology Disaggregated
• SPAM
5
Bayesian Work PlanSource Data (FAO, SPAM)
Feedback to data source
Priors on values and estimation errors of
production, demand, and trade
Estimation by Cross-Entropy Method
Check results against priors and identify potential data
problems
New information to correct identified problems
Information Theory Approach
• Goal is to recover parameters and data we observe imperfectly. Estimation rather than prediction.
• Assume very little information about the error generating process and nothing about the functional form of the error distribution.
• Very different from standard statistical approaches (e.g., econometrics).– Usually have lots of data
6
Estimation Principles
• Use all the information you have.• Do not use or assume any information you do
not have.• Arnold Zellner: “Efficient Information
Processing Rule (IPR).”• Close links to Bayesian estimation
7
Information Theory
• Need to be flexible in incorporating information in parameter/data estimation– Lots of different forms of information
• In classic statistics, “information” in a data set can summarized by the moments of the distribution of the data– Summarizes what is needed for estimation
• We need a broader view of “estimation” and need to define “information”
8
9
An analogy from physics
initial state of motion.
final state of motion.
Force
Force is whatever induces a change of motion: dtpdF
10
Inference is dynamics as well
old beliefs new beliefs
information
“Information” is what induces a change in rational beliefs.
Information Theory
• Suppose an event E will occur with probability p. What is the information content of a message stating that E occurs?
• If p is “high”, event occurrence has little “information.” If p is low, event occurrence is a surprise, and contains a lot of information– Content of the message is not the issue: amount,
not meaning, of information
11
Information Theory• Shannon (1948) developed a formal measure of
“information content” of the arrival of a message (he worked for AT&T)
)(00)(1
)/1log()(
phthenpIFphthenpIFpph
12
Information Theory• For a set of events, the expected information
content of a message before it arrives is the entropy measure:
1 1
( ) ( ) log( )
and 1
n n
k k k kk k
kk
H p p h p p p
p
13
14
Claude Shannon
E.T. Jaynes
• Jaynes proposed using the Shannon entropy measure in estimation
• Maximum entropy (MaxEnt) principle:– Out of all probability distributions that are
consistent with the constraints, choose the one that has maximum uncertainty (maximizes the Shannon entropy metric)
• Idea of estimating probabilities (or frequencies)– In the absence of any constraints, entropy is
maximized for the uniform distribution
15
16
E.T. Jaynes
Estimation With a Prior
• The estimation problem is to estimate a set of probabilities that are “close” to a known prior and that satisfy various known moment constraints.
• Jaynes suggested using the criterion of minimizing the Kullback-Leibler “cross entropy” (CE) “divergence” between the estimated probabilities and the prior.
17
18
Cross Entropy Estimation
Minimize:
log log log
where is the prior probability.
kk k k k
k kk
pp p p p
p
p
“Divergence”, not “distance”. Measure is not symmetric and does not satisfy the triangle inequality. It is not a “norm”.
MaxEnt vs Cross-Entropy
• If the prior is specified as a uniform distri-bution, the CE estimate is equivalent to the MaxEnt estimate
• Laplace’s Principle of Insufficient Reason: In the absence of any information, you should choose the uniform distribution, which has maximum uncertainty– Uniform distribution as a prior is an admission of
“ignorance”, not knowledge19
Cross Entropy Measure
• Two kinds of information– Prior distribution of the probabilities– Moments of the distribution
• Can know any moments– Can also specify inequalities– Moments with error will be considered– Summary statistics such as quantiles
20
21
Cross-Entropy Measure
K
k 1
,1
1
Minimize
ln
subject to constraints (information) about moments
and the adding-up constraint (finite distribution)
1
kk
k
K
k t k tk
k
kk
pp
p
p x y
p
22
Lagrangian
1
,1 1
1
ln
1
Kk
kk k
T K
t t k t kt k
K
kk
pL pp
y p x
p
23
First Order Conditions
T
tkttkk xpp
1, 01lnln
K
kktkt xpy
1, 0
011
K
kkp
24
Solution
,11 2
,1 1
exp( , ,..., )
where
exp
Tk
k t t ktT
K T
k t t kk t
pp x
p x
Cross-Entropy (CE) Estimates
• Ω is called the “partition function”. • Can be viewed as a limiting form (non-
parametric) of a Bayesian estimator, transforming prior and sample information into posterior estimates of probabilities.
• Not strictly Bayesian because you do not specify the prior as a frequency function, but a discrete set of probabilities.
25
From Probabilities to Parameters
• From information theory, we now have a way to use “information” to estimate probabilities
• But in economics, we want to estimate parameters of a model or a “consistent” data set
• How do we move from estimating probabilities to estimating parameters and/or data?
26
Types of Information
• Values:– Areas, production, demand, trade
• Coefficients: technology– Crop and livestock yields– Input-output coefficients for processed
commodities (sugar, oils)• Prior Distribution of measurement error:
– Mean– Standard error of measurement– “Informative” or “uninformative” prior distribution
27
Data Estimation
• Generate a prior “best” estimate of all entries: Values and/or coefficients.
• A “prototype” based on:– Values and aggregates
• Historical and current data• Expert Knowledge
– Coefficients: technology and behavior• Current and/or historical data• Assumption of behavior and technical stability
28
Estimation Constraints
• Nationally – Area times Yield = Production by crop– Total area = Sum of area over crops– Total Demand = Sum of demand over types of
demand– Net trade = Supply – Demand
• Globally– Net trade sums to 0
29
Measurement Error
• Error specification– Error on coefficients or values– Additive or multiplicative errors
• Multiplicative errors– Logarithmic distribution– Errors cannot be negative
• Additive– Possibility of entries changing sign
30
Error Specification
,k ,k
,k
,k
,k
Typical error specification (additive): x = x
where 0 1
and 1
and is the "support set" for the errors
i i i
i i ik
i
ik
i
e
e W v
W
W
v
31
Error Specification
• Errors are weighted averages of support set values– The v parameters are fixed and have units of item
being estimated. – The W variables are probabilities that need to be
estimated. • Convert problem of estimating errors to one
of estimating probabilities.
32
Error Specification
• The technique provides a bridge between standard estimation where parameters to be estimated are in “natural” units and the information approach where the parameters are probabilities. – The specified support set provides the link.
33
Error Specification
• Conversion of a “standard” stochastic specification with continuous random variables into a specification with a discrete set of probabilities– Golan, Judge, Miller
• Problem is to estimate a discrete probability distribution
34
Uninformative Prior
• Prior incorporates only information about the bounds between which the errors must fall.
• Uniform distribution is the continuous uninformative prior in Bayesian analysis.– Laplace: Principle of insufficient reason
• We specify a finite probability distribution that approximates the uniform distribution.
35
Uninformative Prior
• Assume that the bounds are set at ±3s where s is a constant.
• For uniform distribution, the variance is:
36
2
2 23 33
12s s
s
37
7-Element Support Set
1 2 3 4
5 6 7
3 2 02 3
v s v s v s vv s v s v s
2 2
22 2
1 and the prior is 7
9 4 1 1 4 9 47
k k kk
w v w
s s
Uninformative Prior
• Finite uniform prior with 7-element support set is a conservative uninformative prior.
• Adding more elements would more closely approximate the continuous uniform distribution, reducing the prior variance toward the limit of 3s2.
• Posterior distribution is essentially unconstrained.
38
Informative Prior
• Start with a prior on both mean and standard deviation of the error distribution– Prior mean is normally zero.– Standard deviation of e is the prior on the
standard error of measurement of item.• Define the support set with s=σ so that the
bounds are now ±3σ.
39
40
Informative Prior, 2 Parameters
2 2,k ,ki i i
k
W v Variance
,k ,k 0i ik
W v Mean
41
3-Element Support Set
,1
,3
,5
3
0
3
i i
i
i i
v
v
v
42
Informative Prior, 2 Parameters
2 2 2,1 ,2 ,39 0 9i i i i i iW W W
,1 ,3
,2 ,1 ,3
1
1816
118
i i
i i i
W W
W W W
Informative Prior: 4 Parameters
• Must specify prior for additional statistics– Skewness and Kurtosis
• Assume symmetric distribution: – Skewness is zero.
• Specify normal prior: – Kurtosis is a function of σ.
• Can recover additional information on error distribution.
43
44
Informative Prior, 4 Parameters
2 2,k ,ki i i
k
W v
4 4,k ,k 3i i i
k
W v
Variance
Kurtosis
,k ,k 0i ik
W v Mean
3,k ,k 0i i
k
W v Skewness
45
5-Element Support Set
,1
,2
,3
,4
,5
3.0
1.5
0
1.5
3.0
i i
i i
i
i i
i i
v
v
v
v
v
46
Informative Prior, 4 Parameters
2 2 2,1 ,2 ,3
2 2,2 ,1
4 4 4,1 ,2
4 4,3 ,2 ,1
9 2.25 0
2.25 9
813 8116
81 0 8116
i i i i i i
i i i t
i i i i i
i i i i t
W W W
W W
W W
W W W
,1 ,5 ,2 ,4 ,31 16 48; ;
162 81 81i i i i iW W W W W
Implementation
• Implement program in GAMS– Large, difficult, estimation problem– Major advances in solvers. Solution is now robust
and routine. • CE minimand similar to maximum likelihood estimators.
• Excel front end for GAMS program– Easy to use
47
48
Implementation
IMPACT 3 FAOSTAT Database
Data Estimation with Cross EntropyNationally: Trade = Supply - Demand Nationally: Area X Yield = Supply Globally: Supply = Demand
Data Cleaning and Setting PriorsCrop Production Livestock Production Commodity Demand and
TradeProcessed Commodities
(oilseeds, sugar, etc.)
Data CollectionCommodity Balance Food Balance