effects of outliers on productivity analyses based on farm accountancy data network (eu-fadn-dg...
TRANSCRIPT
Effects of outliers on productivity analyses based onthe Farm Accountancy Data Network (EU-FADN - DG AGRI)Thomas Kirschstein∗, Mathias Kloss†, Steffen Liebscher∗, Martin Petrick†∗Martin-Luther-University Halle-Wittenberg, †Leibniz Institute of Agricultural Development in Central and Eastern Europe (IAMO)
Motivation
2005 2006 2007 2008 2009 2010 2011 2012
050
0015
000
2500
035
000
year
defla
ted
gros
s va
lue
adde
d of
the
agr
icul
tura
l ind
ustr
y in
EU
R m
illio
n
Germany Spain France Italy UK
• stagnating agricultural productivity
• data used for productivity analysis may con-tain abnormal observations (outliers)
• in this work a two-step approach is used thatcombines
– non-parametric multivariate outlieridentification procedures
– production function estimation
Data base• FADN individual farm-level data for Ger-
many (East/West) made available by the EC
FADN code Variable description
OutputsSE131 Total output (EUR)
InputsSE011 Labour input (hours)SE025 Total utilised agricultural
area (ha) = landSE275 Total intermediate consump-
tion (EUR) = materialsSE360 Depreciation (EUR) = fixed
capital
Conclusion• outliers present in samples influence elasticity
estimates
• after outlier correction returns-to-scale signifi-cantly 6= 1
• outliers: small companies (East) and labour-intensive companies (West)
⇒ decision makers should consider advancedoutlier detection procedures
Step 1: Outlier identification by pruning the minimum spanning tree
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
−5 0 5 10
02
46
8
log. body weight
log.
bra
in w
eigh
t
outlier
r = 0.56
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
−5 0 5 10
02
46
8
log. body weight
log.
bra
in w
eigh
t
outlier
r = 1.55
Basic idea: Outlying observations are clearly separated fromthe main bulk of data which manifests in long edges in nearestneighbour graphs
Outline of the pMST procedure: (Kirschstein et al., 2013)
1. calculate Euclidean distances between each pair of ob-servations
2. calculate the minimum spanning tree (MST)
3. find a theshold r and drop all edges longer than r fromthe MST⇒ pruned MST
4. determine the largest connected subset of the pruned MST⇒ "good" subset
5. all observations not belonging to the "good" subset arehandled as outliers
Note: Threshold r is determined by a two-step approach usingChebychev’s inequality, see Liebscher and Kirschstein (2013)
Step 2: Estimation of Cobb-Douglas production functionlnYit = αL lnLit + αA lnAit + αM lnMit + αK lnKit + ωit + εit
Y . . . Output; L. . . Labour ; A. . . Land ; M . . . Materials (Working Capital) ; K. . . Capital (fixed) ; ω. . . Farm-& time-specific factor(s) known to farmer, unobserved by analyst; ε. . . IID noise; i, t. . . Farm & time indices;α. . . production elasticities to be estimated
• unbalanced panel over 8 years (2001-2008) for 381 (East) and 844 (West) field crop farms• added year dummies to control for annual fixed effects• assume ω evolves along with observed firm characteristics (Olley and Pakes, 1996)• Levinsohn and Petrin (2003) suggest that materials is a good control candidate for ω
Results #1: Outlier characteristics
●●●●●●●●●●●●●●●●●
●●●●●
●●●
●●●
●●
●●●
East
●●
●
●●●●●●
●
●
●●
●●●●●●●●●
●●●●
●●
−2
02
46
810
log(
hour
s), l
og(h
a), l
og(E
UR
)
Output Labour Land Materials Capital
'good' observations (n=3652)outliers found by pMST (n=139)
●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●
●●●●
●●●
●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●
●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
West
●●
●
●
●
●
●
●
●
●●●
●
●
●●●●●●
●●●●●●
●●
●●●
−2
02
46
810
log(
hour
s), l
og(h
a), l
og(E
UR
)
Output Labour Land Materials Capital
'good' observations (n=7481)outliers found by pMST (n=1210)
Results #2: Cobb-Douglas production elasticities & Returns to Scale
0.0
0.2
0.4
0.6
0.8
1.0
1.2
East West East West East West East West East West
Labour Land Materials Capital RTS
without outlier control (estimate significantly different from 0 resp. 1 at the 10% level)after removing outliers found by pMST (significant)without outlier control (insignificant)after removing outliers found by pMST (insignificant)
AcknowledgementsThis research is part of the ’Factor Markets’ project fundedwithin the EU’s seventh framework research program.
ReferencesKirschstein, T., Liebscher, S., and Becker, C. (2013). Ro-
bust estimation of location and scatter by pruning theminimum spanning tree. Journal of Multivariate Analysis,120(0):173 – 184.
Levinsohn, J. and Petrin, A. (2003). Estimating ProductionFunctions Using Inputs to Control for Unobservables.Review of Economic Studies, 70(2)(243):317–342.
Liebscher, S. and Kirschstein, T. (2013). Efficiency of thepMST and RDELA Scatter Estimators. under review.
Olley, S. and Pakes, A. (1996). The dynamics of productivityin the telecomunications equipment industry. Economet-rica, 64:1263–97.
Petrick, M. and Kloss, M. (2013). Identifying Factor Pro-ductivity from Micro-data: The case of EU agriculture.Technical report, Centre for European Policy Studies.
ContactThomas Kirschstein Mathias [email protected] [email protected] Liebscher Martin [email protected] [email protected]
Martin-Luther-University IAMOGr. Steinstr. 73, 06099 Halle Theodor-Lieser-Str. 2, 06020 Halle