improved county level estimation of crop yield using model-based methodology with a spatial...
TRANSCRIPT
Improved County Level Improved County Level Estimation of Crop Yield Estimation of Crop Yield
Using Model-Based Using Model-Based Methodology With a Methodology With a Spatial ComponentSpatial Component
Michael E. Bellow, USDA/NASSMichael E. Bellow, USDA/NASS
OutlineOutline
• BackgroundBackground
• Simulation MethodologySimulation Methodology
• Results of Ten State StudyResults of Ten State Study
• Convergence EvaluationConvergence Evaluation
• Summary Summary
County Level Commodity County Level Commodity EstimationEstimation• NASS program since 1917NASS program since 1917
• Estimates used by private sector, Estimates used by private sector, academia, governmentacademia, government
• Data from various sources usedData from various sources used
• NASS County Estimates System developed NASS County Estimates System developed to facilitate the estimation processto facilitate the estimation process
Available Data SourcesAvailable Data Sources
• Voluntary response surveys of farm Voluntary response surveys of farm operatorsoperators
• List frame control data (lists of known List frame control data (lists of known farming operations)farming operations)
• Previous year official estimates Previous year official estimates
• Census of Agriculture data (NASS conducts Census of Agriculture data (NASS conducts Census every five years)Census every five years)
• Earth resources satellite dataEarth resources satellite data
County Crop Yield EstimationCounty Crop Yield Estimation
• Yield is ratio of crop production to Yield is ratio of crop production to harvested area (acres)harvested area (acres)
• Accurate estimation challenging due Accurate estimation challenging due to to
- - reliable administrative data seldom availablereliable administrative data seldom available
- high year-to-year variability of yields (weather- high year-to-year variability of yields (weather
sensitive) sensitive)
- lack of adequate sample survey data- lack of adequate sample survey data
Desirable Features of a County Desirable Features of a County Yield Estimation MethodYield Estimation Method
• Repeatability Repeatability
• Accurate variance estimation Accurate variance estimation
• Produce estimates for counties Produce estimates for counties having no survey data having no survey data
Ratio (R) EstimatorRatio (R) Estimator • Traditional crop yield estimator used by Traditional crop yield estimator used by
NASSNASS• Computed as ratio between production Computed as ratio between production
and harvested area estimates (with minor and harvested area estimates (with minor adjustment)adjustment)
• Can produce inconsistent yields due to Can produce inconsistent yields due to fluctuations in harvested acreagefluctuations in harvested acreage
• No utilization of survey data from counties No utilization of survey data from counties other than the one being estimatedother than the one being estimated
Model-Based County Model-Based County Estimation MethodsEstimation Methods
• Based on linear or non-linear models Based on linear or non-linear models relating true yields to survey reported relating true yields to survey reported valuesvalues
• Generally fit using an iterative algorithmGenerally fit using an iterative algorithm
• Convergence not always guaranteedConvergence not always guaranteed
• Estimates can be adjusted for consistency Estimates can be adjusted for consistency with published state figureswith published state figures
Stasny-Goel (SG) MethodStasny-Goel (SG) Method
• Developed at Ohio State University under Developed at Ohio State University under cooperative agreement with NASScooperative agreement with NASS
• Assumes mixed effects model with farm size group Assumes mixed effects model with farm size group as fixed effect and county as random effectas fixed effect and county as random effect
• Random effect assumed multivariate normal with Random effect assumed multivariate normal with covariance matrix reflecting spatial correlation covariance matrix reflecting spatial correlation among neighboring counties -among neighboring counties -
corrcorriijj = = if county if county ii borders county borders county jj
= 0 otherwise= 0 otherwise
• EM algorithm used to fit modelEM algorithm used to fit model
Stasny-Goel Method (cont.)Stasny-Goel Method (cont.)• Previous year county yields used to derive initial Previous year county yields used to derive initial
estimates of county and size group effectsestimates of county and size group effects
• Processing continues until at least one of the Processing continues until at least one of the following two conditions is satisfied –following two conditions is satisfied – relative group and log-likelihood distances fall relative group and log-likelihood distances fall
below preset limits below preset limits maximum allowable number of iterations reachedmaximum allowable number of iterations reached
• County yield estimates computed as weightedCounty yield estimates computed as weighted
• averages of individual farm level estimates averages of individual farm level estimates
• (weights derived from Census of Agriculture (weights derived from Census of Agriculture data)data)
Griffith (G) MethodGriffith (G) Method • Developed by Dr. Dan Griffith at Syracuse Developed by Dr. Dan Griffith at Syracuse
University under cooperative agreement with University under cooperative agreement with NASSNASS
• Predicts yield values using published number of Predicts yield values using published number of farms producing crop of interestfarms producing crop of interest
• Assumes autoregressive modelAssumes autoregressive model
• Employs Box-Cox and Box-Tidwell transformations Employs Box-Cox and Box-Tidwell transformations
• Spatial imputation routine can compute estimates Spatial imputation routine can compute estimates for counties with missing survey datafor counties with missing survey data
Previous Research on Model-Previous Research on Model-Based MethodsBased Methods• Stasny, Goel and RumseyStasny, Goel and Rumsey (1991) – early version of (1991) – early version of
SG method tested on Kansas wheat production dataSG method tested on Kansas wheat production data• Stasny et al (1995)Stasny et al (1995) – improved version of SG tested – improved version of SG tested
on Ohio corn yield dataon Ohio corn yield data• Crouse (2000)Crouse (2000) – SG evaluated for Michigan corn and – SG evaluated for Michigan corn and
barley yield barley yield • Griffith (2000)Griffith (2000) – Griffith method tested on Michigan – Griffith method tested on Michigan corn yield datacorn yield data• Bellow (2004)Bellow (2004) – SG and Griffith methods compared – SG and Griffith methods compared
for North Dakota oats and barley yield (presented for North Dakota oats and barley yield (presented at FCSM Research Conference)at FCSM Research Conference)
Ten-State Research StudyTen-State Research Study
• Compare performance of Stasny-Goel, Compare performance of Stasny-Goel, Griffith and ratio methods for various Griffith and ratio methods for various crops in ten geographically dispersed crops in ten geographically dispersed states:states:
NY, OH, MI, TN, MS, FL, ND, OK,NY, OH, MI, TN, MS, FL, ND, OK, CO, WACO, WA• Criteria for comparison – bias, variance, Criteria for comparison – bias, variance,
MSE, outlier properties, convergence MSE, outlier properties, convergence percentagepercentage
States In Study AreaStates In Study Area
Post-Stratification Size GroupsPost-Stratification Size Groups
• NASS statewide survey data post-NASS statewide survey data post-stratified by county and farm size based stratified by county and farm size based on COA data on COA data
(two or three size groups defined)(two or three size groups defined)• Percentages of Census farm acres by size Percentages of Census farm acres by size
group used as weights for SG algorithmgroup used as weights for SG algorithm• Equal total land in farms criterion used toEqual total land in farms criterion used to form groups form groups
Data Sources For Research Data Sources For Research StudyStudy
• 2002-03 Quarterly Agricultural Survey 2002-03 Quarterly Agricultural Survey
• 2001-03 County Estimates Survey 2001-03 County Estimates Survey
• 2001-02 official crop yield estimates2001-02 official crop yield estimates
(‘previous year’ data)(‘previous year’ data)
• 2002 Census of Agriculture (number of2002 Census of Agriculture (number of
farms, land in farms)farms, land in farms)
Simulation ProcedureSimulation Procedure• Multiple regression performed on survey Multiple regression performed on survey
reported yield vs. official county yields,reported yield vs. official county yields, weighted average neighbor yields, size weighted average neighbor yields, size
group membership variables group membership variables • Artificial population of 10,000 simulated Artificial population of 10,000 simulated
survey data sets used to compute ‘true’ survey data sets used to compute ‘true’ population parameter valuespopulation parameter values
• 250 sample data sets selected at random 250 sample data sets selected at random from populationfrom population
Simulation Procedure (cont.)Simulation Procedure (cont.)
• Moran’s IMoran’s I computed to test whether computed to test whether simulated data sets reflect spatial simulated data sets reflect spatial correlation of realcorrelation of real
survey datasurvey data
• SG, G and R methods applied to each of the SG, G and R methods applied to each of the
250 sampled data sets250 sampled data sets
• Average simulated parameter values Average simulated parameter values compared with corresponding population compared with corresponding population values for each estimation methodvalues for each estimation method
Measures of Estimator Measures of Estimator PerformancePerformance• Absolute BiasAbsolute Bias - average absolute difference between - average absolute difference between
simulated yield estimates and true (population) yieldsimulated yield estimates and true (population) yield• VarianceVariance – sample variance of simulated yield – sample variance of simulated yield
estimates estimates • Mean Square ErrorMean Square Error – average squared deviation – average squared deviation
between simulated estimates and true yield (SG between simulated estimates and true yield (SG program also computes analytic MSE)program also computes analytic MSE)
• Lower (Upper) Tail ProximityLower (Upper) Tail Proximity – average absolute – average absolute difference between 5difference between 5thth (95 (95thth) percentile of simulated ) percentile of simulated yield estimates and true yieldyield estimates and true yield
Pairwise Estimator Comparison for Absolute Pairwise Estimator Comparison for Absolute
BiasBias (* - better method)(* - better method)CropCrop Stasny-Goel vs. Stasny-Goel vs.
RatioRatio Stasny-Goel vs. Stasny-Goel vs. GriffithGriffith
Percent of Counties Percent of Counties Favoring Favoring
Percent of Counties Percent of Counties Favoring Favoring
SGSG RR SGSG GG
BarleyBarley 90*90* 1010 82*82* 18 18
CornCorn 92*92* 8 8 66* 66* 34 34
Cotton Cotton (upland)(upland)
86* 86* 1414 58*58* 4242
Dry BeansDry Beans 93* 93* 77 73*73* 2727
OatsOats 88*88* 1212 63* 63* 3737
RyeRye 83*83* 17 17 4747 53*53*
SorghumSorghum 84*84* 1616 59*59* 4141
SoybeansSoybeans 88*88* 1212 62*62* 3838
SunflowerSunflower 94*94* 6 6 69* 69* 3131
Tobacco Tobacco (burley)(burley)
98*98* 22 56*56* 4444
Wheat (spring)Wheat (spring) 83*83* 17 17 78*78* 2222
Wheat (winter)Wheat (winter) 83*83* 17 17 66* 66* 3434
Pairwise Estimator Comparison for VariancePairwise Estimator Comparison for VarianceCropCrop Stasny-Goel vs. RatioStasny-Goel vs. Ratio Stasny-Goel vs. GriffithStasny-Goel vs. Griffith
Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring
SGSG RR SGSG GG
BarleyBarley 100*100* 00 51*51* 49 49
CornCorn 99.9*99.9* 0.1 0.1 33 33 67* 67*
Cotton (upland)Cotton (upland) 100*100* 00 1313 87*87*
Dry BeansDry Beans 100*100* 00 2020 80*80*
OatsOats 100*100* 0 0
36 36 64*64*
RyeRye 97*97* 33 77*77* 2323
SorghumSorghum 98* 98* 2 2 2525 75* 75*
SoybeansSoybeans 100*100* 0 0 4040 60*60*
SunflowerSunflower 100*100* 00 56*56* 4444
Tobacco (burley)Tobacco (burley) 100*100* 00 49 49 51*51*
Wheat (spring)Wheat (spring) 100*100* 00 62*62* 3838
Wheat (winter)Wheat (winter) 100*100* 00 4343 57*57*
Pairwise Estimator Comparison for MSEPairwise Estimator Comparison for MSECropCrop Stasny-Goel vs. RatioStasny-Goel vs. Ratio Stasny-Goel vs. GriffithStasny-Goel vs. Griffith
Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring
SGSG RR SGSG GG
BarleyBarley 92*92* 88 77*77* 2323
CornCorn 94*94* 6 6 62.5*62.5* 37.537.5
Cotton (upland)Cotton (upland) 89*89* 1111 55*55* 45 45
Dry BeansDry Beans 96*96* 44 75* 75* 25 25
OatsOats 90*90* 1010 61*61* 39 39
RyeRye 87*87* 1313 4040 60*60*
SorghumSorghum 84*84* 1616 51* 51* 4949
SoybeansSoybeans 89*89* 1111 57*57* 4343
SunflowerSunflower 95.5*95.5* 4.54.5 65*65* 35 35
Tobacco (burley)Tobacco (burley) 100*100* 00 53* 53* 47* 47*
Wheat (spring)Wheat (spring) 85*85* 15 15 80*80* 2020
Wheat (winter)Wheat (winter) 86*86* 1414 64*64* 3636
Pairwise Estimator Comparison for LTPPairwise Estimator Comparison for LTPCropCrop Stasny-Goel vs. RatioStasny-Goel vs. Ratio Stasny-Goel vs. GriffithStasny-Goel vs. Griffith
Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring
SGSG RR SGSG GG
BarleyBarley 92*92* 88 55*55* 4545
CornCorn 93* 93* 77 4141 59*59*
Cotton (upland)Cotton (upland) 84*84* 1616 4141 59*59*
Dry BeansDry Beans 96*96* 44 64*64* 3636
OatsOats 94*94* 6 6 52*52* 4848
RyeRye 90*90* 1010 4040 60*60*
SorghumSorghum 97* 97* 3 3 59*59* 4141
SoybeansSoybeans 85*85* 1515 3838 62* 62*
SunflowerSunflower 96* 96* 44 56* 56* 4444
Tobacco (burley)Tobacco (burley) 100*100* 00 31 31 69*69*
Wheat (spring)Wheat (spring) 99*99* 11 69* 69* 3131
Wheat (winter)Wheat (winter) 89*89* 1111 5050 50*50*
Pairwise Estimator Comparison for UTPPairwise Estimator Comparison for UTPCropCrop Stasny-Goel vs. RatioStasny-Goel vs. Ratio Stasny-Goel vs. GriffithStasny-Goel vs. Griffith
Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring Percent of Counties Favoring
SGSG RR SGSG GG
BarleyBarley 93*93* 77 61*61* 3939
CornCorn 98*98* 22 56*56* 44 44
Cotton (upland)Cotton (upland) 97*97* 33 53*53* 4747
Dry BeansDry Beans 98*98* 2 2 4949 51* 51*
OatsOats 92*92* 88 4343 57* 57*
RyeRye 97*97* 33 3333 67*67*
SorghumSorghum 84* 84* 1616 3232 68*68*
SoybeansSoybeans 99*99* 1 1 53*53* 47 47
SunflowerSunflower 91*91* 99 4343 57* 57*
Tobacco (burley)Tobacco (burley) 98*98* 22 69*69* 3131
Wheat (spring)Wheat (spring) 85*85* 1515 4747 53*53*
Wheat (winter)Wheat (winter) 90*90* 1010 53*53* 47 47
Additional Bias EvaluationAdditional Bias Evaluation
• Wilcoxon Rank Sum TestWilcoxon Rank Sum Test – compare – compare median absolute error (over simulation median absolute error (over simulation runs) of SG vs. R, SG vs. G for each county runs) of SG vs. R, SG vs. G for each county
• Wilcoxon Signed Rank TestWilcoxon Signed Rank Test – assess – assess whether median error of SG, G, R is whether median error of SG, G, R is negative, positive or zero (two one-sided negative, positive or zero (two one-sided tests performed for each county) tests performed for each county)
Results of Rank Sum Tests on Absolute BiasResults of Rank Sum Tests on Absolute Bias
CropCrop Stasny-Goel vs. RatioStasny-Goel vs. Ratio Stasny-Goel vs. GriffithStasny-Goel vs. Griffith
Percent of Counties FavoringPercent of Counties Favoring Percent of Counties FavoringPercent of Counties Favoring
SGSG RR NeitherNeither SGSG GG NeitherNeither
BarleyBarley 82*82* 99 1010 74*74* 1313 1313
CornCorn 85*85* 77 8 8 62*62* 27 27 11 11
Cotton (upland)Cotton (upland) 78* 78* 13 13 99 54* 54* 3333 1313
Dry BeansDry Beans 84*84* 7 7 99 67*67* 2222 1111
OatsOats 76*76* 1111 1313 61*61* 3030 10 10
RyeRye 63*63* 1010 27 27 4040 4040 2020
SorghumSorghum 65*65* 1313 2222 56*56* 3535 10 10
SoybeansSoybeans 80*80* 1212 99 60*60* 3232 77
SunflowerSunflower 85*85* 5 5 1010 66* 66* 2525 88
Tobacco (burley)Tobacco (burley) 95*95* 22 33 45*45* 38 38 1616
Wheat (spring)Wheat (spring) 78* 78* 1515 77 75*75* 1717 99
Wheat (winter)Wheat (winter) 72*72* 1616 1212 61*61* 2727 1212
AllAll 79*79* 1111 1010 62*62* 2727 11 11
Summary of Signed Rank Test Results Summary of Signed Rank Test Results (All Crops Combined)(All Crops Combined)
MethodMethod Test ResultTest Result
Bias < 0 Bias < 0 Bias > 0 Bias > 0 Bias = 0Bias = 0No. No. CountiesCounties
%% No. No. CountiesCounties
%% No. No. CountiesCounties
%%
Stasny-GoelStasny-Goel 16071607 5959 887887 3232 243243 99
GriffithGriffith 14561456 5454 11741174 4343 8282 33
RatioRatio 292292 1111 245245 99 22002200 8080
Percent of Counties With Average Percent of Counties With Average Underestimate Less Than 10% of True Yield Underestimate Less Than 10% of True Yield (* - best method)(* - best method)
CropCrop MethodMethod
Stasny-Stasny-Goel Goel
GriffithGriffith RatioRatio
BarleyBarley 81*81* 62 62 4646
CornCorn 83*83* 7171 4242
Cotton Cotton (upland)(upland)
79*79* 78 78 64.5 64.5
Dry BeansDry Beans 95*95* 74 74 62.562.5
OatsOats 70.5*70.5* 5454 2121
RyeRye 41 41 52*52* 1313
SorghumSorghum 52*52* 4141 1111
SoybeansSoybeans 84*84* 7676 6262
SunflowerSunflower 80*80* 63.5 63.5 5050
Tobacco Tobacco (burley)(burley)
9393 98*98* 2727
Wheat (spring)Wheat (spring) 94*94* 5555 5454
Wheat (winter)Wheat (winter) 86*86* 7575 51.551.5
Convergence IssuesConvergence Issues
• SG algorithm not guaranteed to converge SG algorithm not guaranteed to converge within fixed limit on number of iterationswithin fixed limit on number of iterations
• Non-convergence associated with Non-convergence associated with numerical instability conditions numerical instability conditions
• Yield estimates produced for non-Yield estimates produced for non-convergent runs may be suspectconvergent runs may be suspect
• Convergence generally most reliable for Convergence generally most reliable for highly prevalent crops, least reliable for highly prevalent crops, least reliable for rare cropsrare crops
Algorithm Convergence Percentage By Crop Algorithm Convergence Percentage By Crop (Limit of 5000 Iterations) (Limit of 5000 Iterations)
CropCrop Method Method
Stasny-GoelStasny-Goel GriffithGriffith
BarleyBarley 9393 68 68
CornCorn 8787 77 77
Cotton (upland)Cotton (upland) 8181 8989
Dry BeansDry Beans 8989 7575
OatsOats 8080 71 71
RyeRye 7474 8383
SorghumSorghum 8585 6666
SoybeansSoybeans 9393 7373
SunflowerSunflower 90.590.5 8080
Tobacco (burley)Tobacco (burley) 4141 5252
Wheat (spring)Wheat (spring) 6363 52.552.5
Wheat (winter)Wheat (winter) 8888 65 65
Two Approaches to Dealing With SG Two Approaches to Dealing With SG Non-ConvergenceNon-Convergence • SG(1)SG(1) - use estimate generated at final allowable - use estimate generated at final allowable iteration (Niteration (N00))
• SG(2)SG(2) - keep track of which iteration (i*) maximized - keep track of which iteration (i*) maximized the log-likelihood the log-likelihood
- if i* < - if i* < NN00 , rerun algorithm to i* and use that estimate , rerun algorithm to i* and use that estimate
- if i* = N- if i* = N00 , resume processing at iteration (N0+1) and continue , resume processing at iteration (N0+1) and continue until either -until either -
o convergence occurs (use that estimate) OR o convergence occurs (use that estimate) OR o log-likelihood decreases from one iteration to next (use estimateo log-likelihood decreases from one iteration to next (use estimate
at next-to-last iteration)at next-to-last iteration)
Non-Convergence StudyNon-Convergence Study
• Does SG(1) or SG(2) outperform ratio estimator in Does SG(1) or SG(2) outperform ratio estimator in cases where SG failed to converge?cases where SG failed to converge? • Six cases with high non-convergence percentage Six cases with high non-convergence percentage
selected for comparison of SG(1), SG(2) and R selected for comparison of SG(1), SG(2) and R - 2002 CO barley (37 simulation runs)- 2002 CO barley (37 simulation runs)- 2002 MS soybeans (105) - 2002 MS soybeans (105) - 2002 NY winter wheat (39)- 2002 NY winter wheat (39)- 2002 ND dry beans (38)- 2002 ND dry beans (38)- 2002 OH oats (50) - 2002 OH oats (50) - 2003 OK rye (59) - 2003 OK rye (59)
Combined Pairwise Estimator Comparison Combined Pairwise Estimator Comparison forforNon-Convergence Test CasesNon-Convergence Test Cases
MeasureMeasure SG(1) vs. SG(1) vs. RatioRatio
SG(2) vs. Ratio SG(2) vs. Ratio SG(1) vs. SG(1) vs. SG(2)SG(2)
Percent of Percent of Counties Counties
FavoringFavoring
Percent of Percent of Counties Counties
FavoringFavoring
Percent of Percent of Counties Counties
FavoringFavoring
SG(1)SG(1) RR SG(2)SG(2) RR SG(1)SG(1) SG(2)SG(2)
Absolute Absolute BiasBias
78*78* 22 22 80*80* 20 20 2323 77*77*
VarianceVariance 95*95* 55 99*99* 11 00 100*100*
MSEMSE 81*81* 1919 83* 83*
1717 15 15 85*85*
LTPLTP 74* 74*
2626 88*88* 1212 1313 87*87*
UTPUTP 84*84* 1616 90*90* 1010 1515 85*85*
SummarySummary
• SG yield estimation method outperforms R SG yield estimation method outperforms R in all efficiency categories and G in most in all efficiency categories and G in most categories (G outperforms R)categories (G outperforms R)
• Convergence problems can be alleviated Convergence problems can be alleviated using enhanced SG approach using enhanced SG approach
• SG method recommended for integration SG method recommended for integration into NASS County Estimates Systeminto NASS County Estimates System