towards mapping soil carbon landscapes: issues of …
TRANSCRIPT
Iowa State University
From the SelectedWorks of Bradley A. Miller
March, 2016
Towards Mapping Soil Carbon Landscapes: Issuesof Sampling Scale and TransferabilityBradley A Miller, Leibniz Centre for Agricultural Landscape Research (ZALF)Sylvia Koszinski, Leibniz Centre for Agricultural Landscape Research (ZALF)Wilfried Hierold, Leibniz Centre for Agricultural Landscape Research (ZALF)Helmut Rogasik, Leibniz Centre for Agricultural Landscape Research (ZALF)Boris Schröder, Technische Universität Braunschweig, et al.
This work is licensed under a Creative Commons CC_BY-NC-ND International License.
Available at: https://works.bepress.com/bradley_miller/8/
1
Towards Mapping Soil Carbon Landscapes: Issues of Sampling Scale and Transferability 1
Bradley A. Miller1*, Sylvia Koszinski1, Wilfried Hierold1, Helmut Rogasik1, Boris Schröder2,3, Kristof Van 2
Oost4, Marc Wehrhan1, Michael Sommer1 3
1Leibniz Centre for Agricultural Landscape Research (ZALF), Institute of Soil Landscape Research, 4
Eberswalder Straße 84, 15374 Müncheberg, Germany 5
2Technische Universität Braunschweig, Institute of Geoecology, 38106 Braunschweig, Germany 6
3Berlin-Brandenburg Institute of Advanced Biodiversity Research (BBIB), 14195 Berlin, Germany 7
4Université Catholique de Louvain, George Lemaitre Center for Earth and Climate, Earth and Life 8
Institute, 1348 Louvain-la-Neuve, Belgium 9
*Corresponding author10
11
12
Abstract 13
The conversion of point observations to a geographic field is a necessary step in soil mapping. 14
For pursuing goals of mapping soil carbon at the landscape scale, the relationships between sampling 15
scale, representation of spatial variation, and accuracy of estimated error need to be considered. This 16
study examines the spatial patterns and accuracy of predictions made by different spatial modelling 17
methods on sample sets taken at two different scales. These spatial models are then tested on 18
independent validation sets taken at three different scales. Each spatial modelling method produced 19
similar, but unique, maps of soil organic carbon content (SOC%). Kriging approaches excelled at internal 20
spatial prediction with more densely spaced sample points. Because kriging depends on spatial 21
autocorrelation, kriging performance was naturally poor in areas of spatial extrapolation. In contrast, the 22
spatial regression approaches tested could continue to perform well in spatial extrapolation areas. 23
However, the problem of induction allowed the potential for problems in some areas, which was less 24
predictable. This problem also existed for the kriging approaches. Spatial phenomena occurring between 25
sampling points could also be missed by kriging models. Use of covariates with kriging can help, but the 26
This is a manuscript of an article from Soil and Tillage Research (2015): In press, doi:10.1016/j.still.2015.07.004. Posted with permission.
2
requirement of capturing the full feature space in the map remains. Methods that utilize spatial 27
association, such as spatial regression, can map soil properties for landscape scales at a high resolution, 28
but are highly dependent on the inclusion of the full attribute space in the calibration of the model and 29
the availability of transferable covariates. 30
Keywords: soil mapping, sampling design, spatial regression, kriging, estimated error, uncertainty, spatial 31
autocorrelation, spatial association 32
33
Highlights 34
1. Different modelled patterns can have similar levels of performance. 35
2. Spatial regression approaches can extrapolate spatially. 36
3. However, all spatial models are limited by calibration in feature space. 37
4. The problem of induction makes predicting that limitation difficult. 38
39
1. Introduction 40
Erosion and deposition processes redistribute large amounts of mineral soil and soil organic 41
carbon (SOC) across agricultural landscapes (Van Oost et al., 2007). SOC dynamics at the landscape scale 42
show fluctuations in space and time that challenge research on soil and SOC erosion (Kirkels et al., 2014). 43
A key component for monitoring carbon dynamics in soil landscapes is converting point observations to 44
areally extensive maps. This transition from sample points to a geographic field necessitates some type 45
of spatial prediction. Reasons of practicality limit the quantity of points that can be sampled and fewer 46
samples mean the map must depend more upon the spatial prediction methods (Webster and Olivier, 47
1990). Nonetheless, the design of sampling locations can be done strategically to optimize their utility for 48
the spatial model. Previous studies have examined the effect of sampling distribution within the same 49
extent (spatial domain) with respect to different modelling methods (Mueller and Pierce, 2003; Corwin 50
3
et al., 2010; Schmidt et al., 2014). However, for mapping landscapes, issues of scale, representativeness, 51
and uncertainty become increasingly important and that is the focus of this research. 52
Methods for spatial prediction commonly described as spatial interpolation (e.g., inverse 53
distance weighting, kriging) rely on spatial autocorrelation (Burgess and Webster, 1980; Goovaerts, 1999; 54
Schloeder et al., 2001). For this reason, greater sampling density increases the spatial support of the 55
model and prediction error increases with distance away from sampling points. Similarly, because these 56
methods are intended only for spatial interpolation, they are considered inappropriate for extrapolating 57
beyond the extent of the sampling points. 58
Recognizing the utility of spatial association approaches used in traditional soil mapping (Odeh et 59
al., 1994; McBratney et al., 2003), some varieties of kriging leverage spatial covariates to improve 60
predictions. Examples of approaches that incorporate spatial association with spatial autocorrelation 61
include co-kriging (McBratney and Webster, 1983; Juang and Lee, 1998) and universal kriging (Hengl et 62
al., 2007; Li et al., 2015). The covariates used are typically more easily measured than the target variable 63
and thus usually have better spatial coverage than samples of the target variable. However, spatial 64
autocorrelation still has an important role in all forms of kriging. Thus kriging at the landscape scale 65
continues to present a conflict between the size of the mapping extent and the number of observations 66
that need to be taken to produce an adequate sample density for the desired range of uncertainty. 67
Approaches that rely more purely on spatial association have become more quantitative and are 68
using more sophisticated techniques of predictor identification and spatial modelling. Some examples 69
include spatial regression or environmental correlation (McKenzie and Austin, 1993; Moore et al., 1993), 70
regression trees (Adhikari et al., 2014; Lacoste et al., 2014; Miller et al., 2015a), random forests (Vasques 71
et al., 2010; Häring et al. 2012; Schmidt et al., 2014), boosting algorithms (Häring et al. 2014) and 72
artificial neural networks (Tamari et al., 1996; Behrens et al., 2005). In contrast to spatial autocorrelation 73
4
techniques’ characteristic of prediction error increasing with distance from samples, spatial association 74
techniques’ error depends on the model’s ability to fit equations to the full feature space using available 75
covariates. However, spatial autocorrelation would still suggest that areas further away are more likely 76
to be outside the feature space of the sampled area. For these reasons, studies have recommended 77
stratification of the feature space to optimize sampling designs for models utilizing this prediction 78
strategy (Gessler et al., 1995; Hengl et al., 2003). 79
Comparison of resulting maps should consider several factors. Typically maps produced by 80
models are evaluated by error statistics for a single set of validation points, which provides a quantitative 81
comparison. However, spatial model realizations can have similar performance metrics at the designated 82
validation points while still representing differing spatial structures (Mueller and Pierce, 2003; Corwin et 83
al., 2010; Adhikari et al., 2013). This aspect can have implications for the interpretation of landscape 84
processes and the eventual use of the map, which should not be overlooked. Similarly, different 85
combinations of sampling designs and spatial modelling methods will have different patterns of error 86
magnitude, which can also have bearing on the suitability of the map for the desired purpose 87
(McBratney et al., 2000). This study considers each of these criteria in its comparison of maps for SOC% at 88
multiple scales. 89
We focus on the attribute of soil organic carbon content (SOC%) in the topsoil because of its 90
importance in monitoring and modelling carbon dynamics in soil landscapes. The highest concentrations 91
and thus the largest storage of carbon is in the topsoil. However, SOC% can be highly spatially variable, 92
which greatly impacts the mass balance of carbon at the landscape scale. This soil property has been 93
heavily-sampled for the CarboZALF project under different sampling designs for different research 94
purposes. Therefore, these samples provide a unique opportunity for comparing the nature and 95
performance of spatial modelling methods with respect to samples taken at different scales. The 96
objective of this study is to evaluate six spatial models, built from two sample sets taken at different 97
5
scales, in terms of their prediction performance as well as the distribution and reliability of their error 98
estimations. As the spatial scales of the calibration and validation point sets are shifted, the degree of 99
extendibility or transferability of the models is uniquely tested. 100
2. Methods 101
2.1. Study Area 102
Situated in the Northeast German Plain, the site for this study belongs to the main experimental 103
area of the Leibniz Centre for Agricultural Landscape Research (ZALF), located within the rural and 104
agricultural landscape of Uckermark (Figure 1). Initiated in 2008, the “CarboZALF” research project was 105
established to take a multiscale and interdisciplinary approach for quantifying and understanding 106
processes relevant to ecosystem carbon dynamics as well as their driving forces. The region is well-107
known for a long history of agricultural use since medieval deforestation and is dominated by large fields 108
(great manors and estates of rural gentry until 1945, later socialist agricultural cooperatives, and private 109
farmers and farm cooperatives within the last 25 years). 110
Due to the heterogeneity of soil associations within the hummocky ground moraine from the 111
Pommeranian stage of the Weichselian glaciation (Koszinski et al., 2013), small scale variation of soil 112
properties is pronounced. The spatial variability of SOC, in particular, is increased by the anthropogenic 113
effects from hundreds of years of crop and tillage systems (Deumlich and Frielinghaus, this issue; Gerke 114
et al., this issue). Soil parent materials are calcareous till and glaciofluvial sands or gravels that have 115
developed into quite different soils. Soils in the area include Haplic Luvisols or Luvic Arenosols, 116
accompanied by Gleysols, Stagnosols, and even Histosols within typical closed depressions (kettle holes) 117
and wide outwash valleys (IUSS Working Group WRB, 2014). Many of these soils are clearly affected by 118
erosion processes, resulting in additional changes of soil properties including carbon concentration and 119
storage (Sommer et al., 2008; Deumlich et al., 2010). More specifically, the soil type inventory of the 6 ha 120
6
experimental site consists of Albic Cutanic Luvisols, Calcic Cutanic Luvisols, Calcaric Regosols, and 121
Endogleyic Colluvic Regosols (Eutric) over peat. The subcontinental climate is characterized by a mean 122
annual air temperature of 8.7°C and a mean annual precipitation of 483 mm (1992-2011, ZALF research 123
station Dedelow). 124
2.2. Sampling Design 125
Separate sets of samples at a meso- (80 points covering 6 ha) and macro- (28 points covering 65 126
ha) scale provided unique calibration sets for building the models examined in this study (Figure 2). The 127
meso-scale calibration set used a spatially balanced stratified random sampling design according to 128
Theobald et al. (2007) in order to maximize the benefit of covariates in the models. The macro-scale 129
calibration set applied a similar strategy, but relied more on expert knowledge to determine sample 130
locations. For stratification at the meso-scale, we utilized maps of leaf area index (LAI), apparent 131
electrical conductivity (ECa), and topographic position index (TPI; analysis scale of 5 m). These three 132
likely covariates were partitioned into eight quantiles for the respective sampling extents and 133
intersected with one another. To increase the efficiency of sampling the feature space range, the 134
selection of random locations within these resulting zones skipped the third and sixth quantiles of the 135
respective classifications. The macro-scale calibration point locations were purposively selected by 136
expert knowledge with consideration of the same covariates used in the stratified random sampling 137
applied at the meso-scale. Although the distributions of the resulting SOC% data sets were not perfectly 138
normal (Figure 3), we did not transform them in order to avoid issues with back-transformations and 139
comparability in the validation of error estimations. 140
The respective models were then tested on independent validation sets sampled on micro- (145 141
points at 5 m spacing) and meso- (128 points at 20 m spacing) scale grids. Note that the terms micro, 142
meso, and macro used here describe the relative scale of the sampling and do not necessarily 143
7
correspond to specific scale ranges that may be defined elsewhere. The micro-scale validation grid 144
samples were all spatially internal to the meso-scale calibration points. The meso-scale validation points 145
were distinguished between points spatially internal versus external to the area covered by the meso-146
scale calibration points (Figure 2b). As a test of upscaling, the models built from the meso-scale 147
calibration points were also tested at the macro-scale by using the macro-scale calibration points as 148
macro-scale validation points for the meso-scale models. For this reason, the models calibrated at the 149
macro-scale could not be independently validated at the macro-scale. The sequence of testing the meso-150
scale models on the meso-scale internal, meso-scale external, and macro-scale validation points 151
provided an examination of transferability for the models developed at the meso-scale. 152
Soil samples for the meso- and macro-scale calibration, as well as the meso-scale validation 153
points were collected from cores extracted by a hydraulic probe. Soil cores of 10 cm in diameter were 154
taken down to a 1.5 m depth at minimum. For this study, the cores were analyzed for the top 30 cm of 155
the profile, which in most cases coincided with the Ap horizon. A representative mixed mass section 156
taken across the 30 cm and consisting of at least 500 g of air dried soil was analyzed in the lab. Soil 157
samples for the micro-scale validation were taken by hand, providing a mixed sample of at least 500 g for 158
the top 30 cm at every point. The total carbon content was determined by dry combustion at 1250°C and 159
infrared detection of carbon dioxide. Analysis of samples was conducted in duplicate, using the mean of 160
the results in the final data set. The precision of this method for measuring SOC% was approximately 161
±0.1. 162
2.3. Spatial Modelling 163
A range of spatial modelling methods were selected to represent the gradient of approaches 164
from spatial autocorrelation to spatial association. Ordinary kriging (OK) was used as a purely spatial 165
autocorrelation approach. Co-kriging (CK) and universal kriging (UK) were used as hybrids between 166
8
spatial autocorrelation and spatial association. And rule-based, multiple linear regression (MLR) models 167
were used to represent the spatial association family of digital soil mapping approaches. When covariate 168
grids were layered together the finest resolution was maintained, which resulted in the MLR maps 169
having a resolution of 1 m. The kriging product maps were also exported at a resolution of 1 m for 170
comparison. 171
A pool of 297 potential covariates was considered in the production of the MLR models (Table 1). 172
Most of these potential covariates were terrain derivatives from a 1 m resolution digital elevation model, 173
generated by LiDAR. ECa was determined by using an EM38 DD device. Point LAI data was collected in 174
the field and then used to calibrate a semi-physical model for estimating LAI from Quickbird satellite 175
imagery. A low pass filter was then applied to the calculated LAI grid to reduce the effect of gaps 176
between rows. 177
A common problem in spatial association approaches for digital soil mapping is the identification 178
of optimal predictor covariates. Therefore, we used the Cubist 2.08 software (Quinlan, 1994), which not 179
only builds optimized model structures, but has also demonstrated ability to select optimal covariates 180
from a pool of potential predictors (Bui et al. 2006; Minasny and McBratney, 2008; Adhikari et al., 2013; 181
Lacoste et al., 2014; Miller et al., 2015b). Detailed description of Cubist’s process for selecting predictors 182
and building models is provided in Quinlan (1993) and Holmes et al. (1999) and will not be repeated 183
here. To reduce the possibility that the resulting models were overfit, covariates selected by Cubist were 184
re-entered as a more limited covariate pool until the set of selected covariates equaled the set in the 185
supplied covariate pool. Maps of SOC% were produced by combining the raster maps of the covariates as 186
described by the Cubist generated regression equations using map algebra in ArcGIS 10.1 187
(www.esri.com/software/arcgis). 188
9
In order to keep the spatial modelling approaches as comparable as possible, the covariates 189
selected by Cubist were also used for the appropriate kriging approaches using Geostatistical Analyst in 190
ArcGIS 10.1. At the respective sampling scales, the top covariate in the Cubist model (MLR-all) was 191
selected for use with co-kriging. Recognizing that covariates such as LAI and ECa may not always be 192
practical to obtain for landscape scale mapping, Cubist was also run with those potential covariates 193
excluded (MLR-limited). The top covariate selected by Cubist from this more limited covariate pool was 194
also tested with co-kriging. In addition, universal kriging was tested with the two covariates used for the 195
co-kriging models at the respective scales, together. Therefore, six spatial modelling approaches were 196
calibrated on the two scales of sampling design: 197
ordinary kriging (OK), 198
co-kriging with top covariate from Cubist (CK-LAI), 199
co-kriging with top covariate from Cubist when LAI and ECA were excluded (CK-limited), 200
universal kriging with the two covariates tested with co-kriging (UK), 201
MLR (MLR-all), and 202
MLR limited by excluding LAI and ECa (MLR-limited). 203
The prediction of a standard error is included in the calculations of the kriging methods (Goovaerts, 204
2001). This is part of the popularity of kriging. However, estimating error within spatial regression 205
approaches has recently also shown promise. For spatial regression, error can be estimated using the 206
residuals between the regression equation and observed values. This empirical estimation of error can 207
be transferred with the model predictions (Shrestha and Solomatine, 2006; Malone et al., 2011; 208
Lemercier et al., 2012). Under the rule-based approach, rule conditions can be used to identify areas 209
expected to have similar errors (Miller et al., 2015a). The performance of these modelled errors will be 210
tested in this study as well. 211
10
2.4. Comparative Analysis 212
Comparison between the six spatial modelling approaches began with standard measures of 213
model performance. The coefficient of determination (R2) indicated the models’ ability to represent the 214
spatial variation in the validation points. The mean absolute error (MAE) measured the models’ 215
prediction accuracy at the validation points. As a baseline for comparison, these same performance 216
statistics were calculated for the mean model (i.e. the null model), which was simply the use of the 217
calibration points’ mean to predict SOC% everywhere. 218
Part of this study’s objective was to compare the spatial realizations produced by the respective 219
modelling methods. In addition to the visual comparison, the differences in spatial predictions were 220
calculated in ArcGIS 10.1 by map algebra. Also, uncertainty maps were produced using the error 221
estimation methods associated with each of the respective spatial modelling approaches. The accuracy 222
of these error estimations was tested on the multiple scales of more densely sampled point grids used 223
only for validation purposes. The accuracies of error estimations were then compared between spatial 224
modelling methods. 225
3. Results 226
3.1. Configuration of models 227
From the large pool of potential covariates, Cubist selected very few predictors for SOC% based 228
on the points in our study. With the exception of the model built from the meso-scale calibration points 229
using all covariates, the models generated by Cubist used only one predictor each. The MLR-all model 230
based on the meso-scale points utilized four predictors (Table 2). The generated models were also simple 231
in the number of rules produced. Both meso-scale models utilized two rules, while both macro-scale 232
models used only one. Our experience has been that this is related to the quantity and range of 233
11
conditions covered by the calibration points; more data points can provide Cubist with a basis for 234
creating more rules. 235
The Cubist models calibrated on the meso-scale points focused on the use of relative elevation at 236
an analysis scale of 5200 m. Although this is a very regional context for relative elevation, it is still 237
somewhat different than elevation a.s.l. (r = 0.90 at the calibration points). When LAI was made available 238
for model building from the same points, it dominated the estimation of SOC% in the map area because it 239
was the primary covariate for areas above -7.3 m of relative elevation (5200 m analysis scale). For the 240
meso-scale area, this essentially excluded the use of LAI in the wet swale, which runs along the border 241
between the spatial interpolation and extrapolation test zones. By calibrating on the macro-scale points, 242
LAI was actually the only covariate selected by Cubist. Intriguingly, when LAI and ECa were removed from 243
the covariate pool for the macro-scale model, relative elevation at a 20 m analysis scale was selected as 244
the only predictor. This smaller analysis scale for relative elevation suggests a greater focus on variation 245
coinciding with more local relief. 246
The simplicity of the Cubist models facilitated the choice of covariates to use with the CK and UK 247
models for comparison. LAI was chosen as the covariate for the first test of co-kriging (CK-LAI) at both 248
calibration scales. To explore the scenario when covariates such as LAI and ECa are not available, the 249
second co-kriging test used the only covariate selected by Cubist at the respective scales (CK-limited). 250
Specifically, the CK-limited spatial modelling was done using the covariates of relative elevation (5200 m) 251
at the meso-scale and relative elevation (20 m) at the macro-scale. In addition, UK was tested using the 252
LAI and the respective analysis scale of relative elevation together as covariates. Although many other 253
combinations of methods were possible, this provided some representation of the possibilities between 254
the two extremes of spatial modelling approaches. 255
12
For modelling the semivariograms used to conduct the kriging, a spherical model was generally 256
the most appropriate and was used consistently (Figure 4). The semivariogram models for the kriging of 257
the meso-scale calibration points had a nugget between 0 and 0.4 (%)2. The meso-scale semivariogram 258
models for OK and CK-LAI had ranges of 46 m and 58 m, respectively. The meso-scale semivariogram 259
models for CK-limited and UK had larger ranges of 273 m and 266 m, respectively. These two spatial 260
models were the kriging approaches that included relative elevation (5200 m) as a covariate. For the 261
macro-scale kriging models, calculated nuggets ranged between 0.12 and 0.33 (%)2. However, a range of 262
211 m was consistent for all the semivariograms built from the macro-scale points. 263
3.2. Model prediction performance 264
The models built from the meso-scale points all had similar levels of performance for predicting 265
validation points spatially internal to the calibration points’ extent (Tables 3 and 4). The kriging models 266
particularly excelled at predicting the validation points taken at the same scale and that were spatially 267
internal. As expected, the performance of the kriging-based models failed outside the spatial extent of 268
the calibration points. Although the performance of the MLR models also declined in the prediction areas 269
that were spatially external, in most cases they still provided reasonably useful predictions. The 270
exception was the predictions made by the MLR-limited model for meso-scale points in the external 271
validation area. Despite having a low MAE, that model was unable to represent the spatial variation in 272
that opposite hillslope. This suggests that although Cubist did construct a model that minimized the 273
residuals, the LAI provided important information for predicting the variation in the external, meso-scale 274
validation. 275
The performances of the macro-scale models were generally lower than the models based on 276
the meso-scale points (Tables 5 and 6). This was likely due to the lower quantity and density of points 277
available for calibration at the macro-scale. Nonetheless, the independent validation for most of the 278
13
models showed that they made useful predictions in the majority of the area. All four kriging approaches 279
again produced very similar results. Although all of the validation point sets were spatially internal in this 280
model building scenario, the prediction performance on the points that were external at the meso-scale 281
was still very poor. This suggests that there was something unique about the conditions of that hillslope 282
area and making that area spatially internal to the calibration points for the kriging approaches was not 283
sufficient to effectively model that area. 284
The macro-scale MLR models were especially intriguing because their prediction results were 285
both the lowest and the highest of the comparable models. The MLR-limited model performed the worst 286
at virtually all scales. It only improved upon the use of the calibration points’ mean (null model) as a 287
prediction at the micro-scale validation. This could be related to the macro-scale MLR-limited model’s 288
focus on relative elevation (20 m) as a predictor. In contrast, the MLR-all model consistently had the 289
highest R2 for all validation sets. It also had the lowest MAE, of models built from either scale, for the 290
difficult to predict meso-scale external points. 291
3.3. Spatial patterns represented by models 292
3.3.1. Meso-scale variation 293
Although all of the meso-scale based models showed similar validation performance, they 294
produced unique realizations of the SOC%’s spatial pattern (Figure 5). This is most clearly seen in the 295
different location and shape of the boundaries between the mapped attribute classes. The maps 296
produced by the meso-scale models all recognize higher concentrations in the swale that follows the 297
border between this study’s zones for spatial interpolation and extrapolation. However, each map shows 298
different shapes and maximum values. The OK, CK-LAI, and UK maps were the most similar. CK-limited 299
displayed the least amount of spatial variation, reflecting that model’s reliance on relative elevation at a 300
large analysis scale as a covariate. The common inheritance of patterns from the LAI covariate can be 301
14
observed in both the CK-LAI and the MLR-all maps. LAI was only available for the field containing the 302
CarboZALF plots and an adjacent field to the north. Therefore, in the area south of the CarboZALF 303
experimental plots, the MLR-all model produced no predictions. The kriging model predictions were also 304
not reliable in that area because it was an area of spatial extrapolation, especially for the CK-LAI and UK 305
models because their covariate of LAI was missing there. The two MLR-based maps were nearly identical 306
in the swale area, but differ considerably in the remaining area. 307
Using the UK map as a baseline, its predicted values were subtracted from the predicted values 308
of the other models to produce visualizations of their differences (Figure 6). This analysis revealed that 309
the actual difference between the meso-scale kriging models was small (< 0.1 SOC%). The pattern of the 310
differences aligning with the orientation of the crop rows indicated the influence of the LAI covariate. 311
The LAI covariate map had a pattern of lower values between rows, which we attempted to remove by 312
low pass filtering. However, row patterning could only be minimized, not completely removed. The MLR 313
models showed their similarity with each other by contrast with the UK model. Their differences with the 314
UK map were greatest at the extremes of rule attribute ranges, which often correspond with the edges 315
of mapped rule boundaries. For example, the boundary of the rule applied to the swale is apparent by 316
the lower estimation of SOC% by the MLR models compared to the UK model. Conversely, at the center 317
of a depression in the swale, the MLR models predicted more SOC% than the UK model. 318
3.3.2. Macro-scale variation 319
At the macro-scale, the maps produced by the kriging models were even more alike (Figure 7). 320
The similarity of the CK and UK maps to the map produced by OK would suggest that the provided 321
covariates were not very influential in making the predictions for those maps. The MLR-all map was 322
again limited to only locations where LAI data was available. In most studies, the area outside the field of 323
15
interest would be masked for the kriging maps, but the behavior of the spatial models in areas of spatial 324
extrapolation is within the scope of this research. 325
In contrast to the kriging maps, the MLR models each produced unique maps of SOC%. Notably, 326
the MLR-all map shows a pattern that more closely matches the shape of the swale and wetlands 327
running through the center of the field from southwest to northeast. Technically the LAI is not suitable 328
for the wetlands because it was calculated on the basis of field crops and not for wetland vegetation. 329
Nonetheless, the MLR-all map follows an expected trend of SOC% and better resembles the spatial 330
patterns of field features expected to be influential than any of the kriging maps. 331
Although the MLR-limited map follows many of the same patterns as the MLR-all map, the MLR-332
limited model displays an apparent flaw within the wetlands. The remotely sensed LiDAR elevation was 333
not able to detect the true elevation of the mineral surface under the vegetation and water of the 334
wetlands. Also, there were no calibration samples taken in flooded locations. The result was a 335
misrepresentation of the ground surface elevation in the wetlands and thereby an under prediction of 336
SOC% by the MLR-limited model, which strongly relied on local relative elevation (20 m). If the calibration 337
set had included points in these areas, it is likely that the Cubist generated model would have been 338
different. 339
Again using the map produced by UK as the baseline, the values predicted by macro-scale UK 340
were subtracted from the other macro-scale models’ maps to create visualizations of their differences 341
(Figure 8). Although the kriging SOC% maps appear practically identical with the eight attribute classes 342
used to present the maps, the difference calculation reveals patterns of difference related to the 343
respective covariates. For example, the differences shown for the CK-LAI map reflect the influence that 344
relative elevation (20 m) had on the UK map. That the difference maps for OK and CK-limited resemble 345
16
each other demonstrates that the LAI covariate dominated the prediction in the UK map. However, these 346
differences are relatively small (< 1 SOC%) in the context of the range of SOC% observed in this map area. 347
The differences between the macro-scale MLR maps highlight the issues in and near the 348
wetlands. The MLR-limited map was clearly underestimating SOC% within areas of wetland vegetation, 349
yet its difference with the UK map was minimal in those areas. The MLR-all map predicted higher SOC% in 350
the wetland area with a pattern that better matched field observations. The major difference between 351
the kriging maps and the MLR maps was underscored by the spike in SOC% in the central part of the 352
kriging maps. There, a single point that was influenced by wetland conditions elevated the kriging 353
predictions in a circular area surrounding it. The MLR maps more realistically followed the shape of the 354
terrain features. Thus in the MLR difference maps, the contrast in shape of the predicted high SOC% areas 355
can be seen. 356
3.4. Distribution and performance of error estimation 357
The maps of the respective models’ estimated error highlighted the strategies and relative 358
strengths of the spatial modelling approaches (Figures 9 and 10). At both the meso- and macro- scales, 359
the maps of error estimation illustrated the kriging models’ use of spatial autocorrelation with estimated 360
errors increasing radially from the calibration points. The addition of covariates to the kriging models 361
lowered the estimated error in an area extending further out from those points. The method used to 362
map the estimated errors of the MLR models did not provide error estimations that were as continuous. 363
Instead, areas needed to be classified by the model rule applied and assigned a single error estimation 364
value. However, the MLR error estimations were comparable to the error estimations found only in close 365
proximity to calibration points in the kriging maps. 366
In addition to validating model predictions of SOC%, we also used the validation points to test the 367
error estimations mapped by the different models. In theory, the observed error at the validation points 368
17
should be within the estimated error range approximately 68% of the time (one standard error), 369
assuming the errors were normally distributed. Although the validation points are only a sample set, this 370
evaluation provides a practical test of the estimated errors. 371
Evaluation of the residuals for both the calibration and validation points showed that sometimes 372
they were skewed, depending on the model and sample set. The SOC% values of the meso-scale points 373
were positively skewed, as indicated by the positive skew of the mean model’s residuals (Table 7). 374
However, the residuals for the kriging and MLR models for the meso-scale calibration points did not have 375
a strong skew. At the meso-scale, the residuals at the validation points generally had a low to moderate 376
level of skewness. The CK-limited and UK models were an exception when tested on all of the meso-scale 377
validation points together, which produced a distribution of residuals that had skewness coefficients a 378
little below -1. The residuals of the meso-scale kriging and MLR model’s predictions of the micro-scale 379
points were all negatively skewed. The meso-scale, MLR models’ residuals were also negatively skewed 380
for the macro-scale validation points, while the kriging model’s residuals became positively skewed. Thus 381
the direction of observed skewness in the residuals was dependent upon the validation points used, 382
which varied by spatial scale. 383
The skewness coefficients of the macro-scale models’ residuals were very different (Table 8). 384
With the exception of the MLR-limited model, the residuals at the calibration points were all strongly, 385
positively skewed. However, this did not hold true for the residuals of these models at the validation 386
points. Intriguingly, residuals of the macro-scale models were not skewed for the points considered as 387
meso-scale, internal validation, but when considered together with the meso-scale, external validation 388
points, the distribution of residuals matched the skewness of the micro-scale validation residuals. The 389
exception to this trend for the kriging and MLR models was the MLR-limited model. While its residuals 390
were not skewed at the calibration points, they were strongly skewed at the validation points. The 391
18
degree to which the residuals are normally distributed, or lack thereof, could have an important impact 392
on the models’ ability to estimate error and will be discussed later. 393
The mean estimated errors at validation points for the meso-scale kriging models increased from 394
the micro to the macro scale, which reflects the increasingly external position of the validation points 395
(Table 9). The mean estimated errors at validation points for the meso-scale MLR models were relatively 396
stable at all scales and were comparable in magnitude to the kriging models at the micro and meso 397
validation scales. The percentage of validation points with observed errors within the estimated error 398
range for their location followed the size of the mean estimated error. Naturally, a greater range of 399
uncertainty is more likely to include the actual or measured value. The use of covariates resulted in a 400
lower estimation of error for the kriging models for the internal validation points. However, these 401
estimations appeared to be a little too low for micro- and meso-scale validation points, with between 402
34% and 57% of errors being observed within those estimated ranges. The performance of error 403
estimations by the meso-scale kriging models for the macro-scale points emphasized the 404
inappropriateness of those models’ use in spatially external areas. Even though error estimations for 405
those points were greatly increased, the observed error was outside that range more than 75% of the 406
time. The accuracy of the MLR models’ error estimation were generally better than the kriging models’ 407
accuracy, but the performance of the MLR models’ error estimations also declined from the micro- to 408
macro-scale validations. 409
Estimations of error for all the models increased when calibrated at the macro-scale (Table 10). 410
However, the MLR models’ estimated errors were much less than the errors estimated by the kriging 411
models. Nonetheless, the uncertainties estimated by all of the models appeared to be overestimations of 412
the actual error. Observed errors were almost always within the estimated error ranges (86-100%). The 413
exception was the MLR-limited model’s estimation of error at the meso-scale validation points, but the 414
observed error was still within the estimated range more than the expected 68% of the time. 415
19
4. Discussion 416
4.1. Comparison with previous studies 417
In most cases, previous studies have shown that greater use of covariates improves model 418
performance, but the results have not been consistent due to varying target variables, sampling designs, 419
and selected covariates (Knotters et al., 1995; Triantafilis et al., 2001; Mueller and Pierce, 2003; Zhu and 420
Lin, 2010). The differences in results are likely related to the approaches’ sensitivity to modelling 421
conditions. For example, kriging performance can be significantly affected by the variability of the data 422
(Leenaers et al., 1990), choices made in modelling the variogram (Kravchenko and Bullock, 1999; Oliver 423
and Webster, 2014), and sampling design (Voltz and Webster, 1990; Englund et al., 1992; Laslett, 1994; 424
Wollenhaupt et al., 1994; Gotway et al., 1996). In the case of spatial association approaches, the 425
performance of the models can be heavily dependent upon the covariates used (Levi and Rasmussen, 426
2014; Miller et al., 2015b). In general, studies comparing spatial prediction methods have found only 427
small differences in their performance, but each with respective advantages (Zhu and Lin, 2010; Adhikari 428
et al., 2013). In this study, we have attempted to explore further the reasons behind these respective 429
advantages and better elucidate when the different approaches are most appropriate, especially with 430
respect to sampling scale and transferability. 431
Odeh et al. (1994) compared spatial modelling methods in a manner similar to ours in south-432
central Australia, comparing OK, CK, UK, and MLR. That study included an additional comparison with 433
regression kriging, but for our study we wanted to focus on the results of the spatial modelling 434
approaches directly without additional steps to correct predictions. In the Odeh et al. (1994) study, 435
samples were taken on a 25 m spaced grid covering approximately 24 ha. This contrasts with the 436
stratified distribution of the calibration points in our study. The grid sampling design should be more 437
ideal for kriging approaches because it minimizes the distance between points across the mapping area. 438
20
Model results were tested on random points from about 30% of the sampling grid that were excluded 439
from model calibrations. For their predictions of solum depth, depth to bedrock, topsoil gravel, and 440
subsoil clay, the MLR models consistently produced lower root mean square errors than the kriging 441
models. Although the sample distribution in that study was ideal for kriging, the sample density and 442
distribution was also sufficient to insure a likely coverage of the feature space. In our study, the MLR 443
models were not always the better performing prediction method, but the lower sampling density 444
increased the odds of missing something from the feature space. 445
Bostan et al. (2012) tested spatial predictions of average annual precipitation by comparing OK, 446
UK, MLR, as well as regression kriging and geographically weighted regression. Certainly the spatial 447
distribution of average annual precipitation is a different phenomenon from SOC% for a single round of 448
sampling, but their results confirmed several of the same geographic principles observed in the present 449
study. Specifically, kriging methods are only suitable for spatial interpolation of the target variable and 450
spatial extrapolation is the most difficult area for spatial modelling. However, similar to our study, the 451
regression methods demonstrated the strongest performance in spatial extrapolation. Bostan et al. 452
(2012) did not test performance dynamics with respect to scale. 453
Like our study, Mueller and Pierce (2003) sought to examine the impact of sampling scale on the 454
prediction accuracies of OK, CK, kriging with external drift (~UK), and MLR . The models were tested for 455
their ability to predict total carbon concentration (g kg-1) for a field in central Michigan, USA. Their 456
experimental design included samples taken on 30.5, 61, 100 m grids for model calibrations, which were 457
then independently validated on a set of 24 points. Their results showed that prediction error generally 458
increased with sampling scale and the corresponding decrease in sampling density. They also determined 459
that as the spacing of the calibration points increased, use of covariates increased in importance for 460
reducing the error of predictions. Although the sample size for the 100 m grid was determined to be too 461
small for kriging approaches, MLR continued to perform well. Our findings generally support these 462
21
conclusions, but suggest that the sampling density most important to spatial association approaches (i.e. 463
MLR) is density across the feature space, not necessarily across locational space. 464
4.2. Issues with spatial extrapolation (transferability) 465
The primary difference between the focus of this study and the methods of previous studies 466
comparing kriging and MLR models was the arrangement of sampling points. Specifically, the 467
juxtaposition of the meso-scale calibration points and their respective validation points allowed for a test 468
of the predictive power of the models with increasing distance from the area spatially internal to the 469
calibration points. It was already well known that the kriging methods were ill-suited for spatial 470
extrapolation, but given their popularity in digital soil mapping, provided a basis for contrast. The MLR 471
models, which do not rely on spatial autocorrelation, were still affected by it as indicated by the 472
decrease in predictive ability in areas spatially external to their calibration. However, their performance 473
did not necessarily decline with increasing distance from the calibration area. In fact, both of the meso-474
scale MLR models performed better on the more distant macro-scale validation points than on the 475
meso-scale external validation points. 476
The reason for this lack of distance-related pattern for the MLR models’ performance was 477
because they relied on spatial association, which means that the models’ performances were more 478
directly tied to their ability to represent the feature space. The lowest performance of these models, 479
which was in the external validation points at the meso-scale, indicated that something about the 480
conditions of that validation area made it outside the feature space used in calibration. 481
However, the issue of poor prediction performance in the unique conditions of the area 482
designated as the meso-scale external validation was not limited to the MLR models. Clearly the meso-483
scale external validation area was inappropriate to be modeled using the meso-scale calibration points 484
with kriging approaches due to the models’ theoretical underpinnings. In contrast, the same was not true 485
22
for macro-scale calibration of the kriging approaches applied to predict the meso-scale external 486
validation points. That area was spatially internal to the macro-scale calibration points and thus within 487
the theoretical bounds of the kriging approach. Yet macro-scale kriging models still had very poor 488
prediction performance in that area, despite showing reasonable prediction performance on the 489
adjacent and also spatially internal meso-scale internal calibration points. 490
This demonstration of contrasting model performances in two areas that with pre-validation 491
information would have been assumed to be similar is an illustration of the problem of induction. 492
Although not often discussed in modern science and usually relegated to discussions of philosophy, it is 493
instructive for evaluating methods that predict or infer. The problem of induction points out the fallibility 494
of assuming that generalizations (i.e., models) produced by a limited number of observations will be 495
equally informative about unobserved instances (Hume, 1739/2001; Popper, 1959/2005). Of course the 496
only way to combat this dilemma is to maximize observations. However, we cannot remove the 497
possibility that a prediction can be ill-equipped to accurately estimate the target variable or the potential 498
error for instances that result from events about which our calibration points could not inform. 499
4.3. Assumption of normal distributions in the model error 500
Although the SOC% values in our calibration data sets were considered to be sufficiently normal 501
in distribution, the degree of skewness for the models’ residuals varied considerably. Most models rely 502
on the assumption that the model residuals are also normally distributed for estimating the uncertainty 503
of the predictions. If the residuals are negatively skewed, but assumed normal, the mean of the errors 504
could be overestimated, and vice versa. One could argue that the SOC% values used in this study were 505
positively skewed, particularly the macro-scale calibration set. However, this did not necessarily translate 506
to a reflective skewness in the residuals for the models at the calibration or at the validation points. For 507
example, the residuals of the meso-scale kriging models were moderately, positively skewed at the 508
23
calibration points, but went from a negative skew to a positive skew when tested on the micro- to 509
macro-scales, respectively. While this might suggest that the error estimations from the meso-scale 510
kriging models would be overestimated at the micro-scale and underestimated at the macro-scale, the 511
observed error of these models at those points indicated the error was underestimated at all scales. 512
However, the shift to strongly positive skewness coefficients for the residuals at the macro-scale 513
validation points coincides with the decrease in error estimation performance. This again reinforces the 514
known problem of using kriging to model areas outside the extent of the calibration points. 515
The residuals of the meso-scale MLR models had similar levels of skewness as the kriging models. 516
One exception was the MLR-all model residuals at the calibration points, which had a distribution that 517
was practically normal. Another difference of the MLR models from the kriging models at the meso-scale 518
was that the residuals were negatively skewed at both the micro- and macro-scale validation points. This 519
would suggest overestimations at the micro- and macro-scales. However, this still did not fit the 520
observed pattern of greater underestimations for error as validation scales increased. In addition, 521
despite having similar levels of skewness, the MLR’s error estimations at the micro-scale were less 522
underestimated at the micro-scale validation points than the kriging models using covariates. 523
At the macro-scale, almost all of the model residuals were strongly, positively skewed at the 524
calibration points. However, the residuals for those same models were moderately negatively skewed at 525
the validation points. In this case, the positive skewness of the calibration point residuals may have been 526
an indication of the error estimation performance because the errors were overestimated at all of the 527
validation points. 528
5. Conclusions 529
The multi-scale comparisons made in this study demonstrated the advantages of spatial 530
association approaches (i.e. MLR) for soil mapping at the landscape scale. These advantages are a lower 531
24
density of samples needed and the potential for transferability, which can be tested. Kriging, of course, 532
performed best with the highest quantity/density of calibration points available and a more normal 533
distribution of values. However, the MLR models were able to produce competitive validation results in 534
those ideal sampling conditions while remaining robust under conditions where the kriging models 535
severely declined in performance. The key for building a robust MLR model was identifying adequate 536
covariates and calibrating the model across the full feature space of the map area. 537
Spatial interpolation methods, which rely on spatial autocorrelation, perform best when the 538
distance between sample points is minimized and are only appropriate to use for making predictions 539
between points. As the distance between a predicted location and the nearest sampled location 540
increases, the estimated error also increases. This is a problem for landscape scale mapping because it 541
may be impractical to sample at the density required to achieve the reduction in prediction errors 542
needed for the respective maps’ purpose. Adding the use of spatially exhaustive covariates to kriging 543
models can reduce errors in spatial predictions. However, our empirical observations of model error 544
showed that the reduction in predicted error between ordinary kriging and kriging methods using 545
covariates may not be fully warranted as the reliability of those estimated error ranges decreased. 546
The MLR method tested in this study, which depended on spatial association instead of spatial 547
autocorrelation, achieved accuracies of prediction and error estimation comparable to the kriging 548
methods. However, it is noteworthy that when the appropriate covariates were used, the MLR models 549
were able to provide representation of features that were poorly represented in the maps of the other 550
models. For example, the area designated as a spatial extrapolation zone for the meso-scale tests was 551
spatially internal for the macro-scale calibration points. Despite being compatible with the modelling 552
strategy, without calibration points taken on that unique hillslope, the macro-scale kriging models were 553
still not able to predict SOC% for that area well. The only models that could provide useful predictions of 554
that area were the MLR-all models calibrated at either scale, which both utilized LAI as a covariate. 555
25
Although two of the kriging approaches were also able to use LAI as a covariate, it did not improve those 556
modelling approaches’ ability to predict variations in that area. 557
Despite the strength of statistical approaches for spatial modelling, soil mappers must not forget 558
the basic problem of induction. Although it is known that spatial association approaches, such as MLR, 559
are susceptible to issues of capturing the full feature space, results from this study demonstrated that 560
this can equally be a problem for spatial interpolation approaches, such as kriging. 561
Acknowledgements 562
Data used in this research was collected as part of the CarboZALF project. We thank Ingrid Onasch for 563
her field work on the micro-scale validation points, Emilien Aldana-Jague for soil sampling the points 564
used for the meso-scale validation, and Norbert Wypler for supporting the EM38 mapping and soil 565
sampling the points for the meso- and macro-scale calibrations. We also thank Ute Moritz for her 566
support in soil sampling and overall data management. Detlef Deumlich gave valuable comments for 567
evaluating the database and provided the TPI maps. 568
26
References 569
Adhikari, K., Kheir, R.B., Greve, M.B., Greve, M.H., 2013. Comparing kriging and regression approaches 570
for mapping soil clay content in a diverse Danish landscape. Soil Science 178(9), 505-517. 571
doi:10.1097/SS.0000000000000013 572
Adhikari, K., Hartemink, A.E., Minasny, B., Bou Kheir, R., Greve, M.B., Greve, M.H., 2014. Digital mapping 573
of soil organic carbon contents and stocks in Denmark. PLoS ONE 9:e105519. 574
doi:10.1371/journal.pone.0105519 575
Behrens, T., Förster, H., Scholten, T., Steinrücken, U., Spies, E.D., Goldschmitt, M., 2005. Digital soil 576
mapping using artificial neural networks. J. Plant Nutr. Soil Sci. 168(1), 21-33. 577
doi:10.1002/jpln.200421414 578
Bostan, P.A., Heuvelink, G.B.M., Akyurek, S.Z., 2012. Comparison of regression and kriging techniques for 579
mapping the average annual precipitation of Turkey. International Journal of Applied Earth 580
Observation and Geoinformation 19, 115-126. doi:10.1016/j.jag.2012.04.010 581
Bui, E.N., Henderson, B.L., Viergever, K., 2006. Knowledge discovery from models of soil properties 582
developed through data mining. Ecological Modelling 191, 431-446. 583
doi:10.1016/j.ecolmodel.2005.05.021 584
Burgess, T.M., Webster, R., 1980. Optimal interpolation and isarithmic mapping of soil properties. I: the 585
semi-variogram and punctual kriging. J. Soil Sci. 31(2), 315-331. doi:10.1111/j.1365-586
2389.1980.tb02084.x 587
Corwin, D.L., Lesch, S.M., Segal, E., Skaggs, T.H., Bradford, S.A., 2010. Comparison of sampling strategies 588
for characterizing spatial variability with apparent soil electrical conductivity directed soil sampling. J. 589
Environ. Eng. Geophys. 15, 147–162. doi:10.2113/JEEG15.3.147 590
Deumlich, D., Frielinghaus, M., this issue. Imprints of historical erosion patterns on recent erosion 591
processes depicted by C-transport and soil fertility parameters in NE Germany. Soil and Tillage 592
Research 593
Deumlich, D., Schmidt, R., Sommer, M., 2010. A multiscale soil-landform relationship in the glacial-drift 594
area based on digital terrain analysis and soil attributes. Journal of Plant Nutrition and Soil Science 595
173, 6, 843-851. doi:10.1002/jpln.200900094 596
Englund, E., Weber, D., Leviant, N., 1992. The effects of sampling design parameters on block selection. 597
Mathematical Geology 24(3), 329-343. doi:10.1007/BF00893753 598
599
Gerke, H.H., Rieckh, H., Sommer, M., this issue. Feedbacks between crop, water, and dissolved carbon in 600
a hummocky landscape with erosion-affected pedogenesis. Soil and Tillage Research 601
27
Gessler, P.E., Moore, I.D., McKenzie, N.J., Ryan, P.J., 1995. Soil-landscape modelling and spatial 602
prediction of soil attributes. Int. J. Geogr. Inf. Syst. 9, 421–432. doi:10.1080/02693799508902047 603
Goovaerts, P., 1999. Geostatistics in soil science: state-of-the-art and perspectives. Geoderma 89, 1–45. 604
doi:10.1016/S0016-7061(98)00078-0 605
Goovaerts, P., 2001. Geostatistical modelling of uncertainty in soil science. Geoderma 103, 3–26. 606
doi:10.1016/S0016-7061(01)00067-2 607
Gotway, C.A., Ferguson, R.B., Hergert, G.W., Peterson, T.A., 1996. Comparison of kriging and inverse-608
distance methods for mapping soil parameters. Soil Sci. Soc. Am. J. 60, 1237-1247. 609
doi:10.2136/sssaj1996.03615995006000040040x 610
Häring, T., Dietz, E., Osenstetter, S., Koschitzki, T., Schröder, B., 2012. Spatial disaggregation of complex 611
soil map units: A decision tree based approach in Bavarian forest soils. Geoderma 185-186, 37–47. 612
doi:10.1016/j.geoderma.2012.04.001 613
Häring, T., Reger, B., Ewald, J., Hothorn, T., Schröder, B., 2014. Regionalising indicator values for soil 614
reaction in the Bavarian Alps – how reliable are averaged indicator values for prediction? Folia 615
Geobot. 49, 385-405. doi:10.1007/s12224-013-9157-1 616
Hengl, T., Heuvelink, G.B.M., Rossiter, D.G., 2007. About regression-kriging: From equations to case 617
studies. Comput. Geosci. 33, 1301–1315. doi:10.1016/j.cageo.2007.05.001 618
Hengl, T., Rossiter, D., Stein, A., 2003. Soil sampling strategies for spatial prediction by correlation with 619
auxiliary maps. Aust. J. Soil Res. 41, 1403–1422. doi:10.1071/SR03005 620
Holmes, G., Hall, M., Frank, E., 1999. Generating rule sets from model trees. Advanced Topics in Artificial 621
Intelligence, Lecture Notes in Computer Science 1747, 1-12. doi:10.1007/3-540-46695-9_1 622
Hume, D., 1739/2001. A Treatise of Human Nature. Oxford University Press, New York. 622 p. 623
IUSS Working Group WRB, 2014. World Reference Base for Soil Resources 2014. International soil 624
classification system for naming soils and creating legends for soil maps. World Soil Resources 625
Reports No. 106. FAO, Rome. 626
Juang, K.-W., Lee, D.-Y., 1998. A comparison of three kriging methods using auxiliary variables in heavy-627
metal contaminated soils. J. Environ. Qual. 27, 355-363. 628
doi:10.2134/jeq1998.00472425002700020016x 629
Kirkels, F.M.S.A., Cammeraat, L.H., Kuhn, N.J., 2014.The fate of soil organic carbon upon erosion, 630
transport and deposition in agricultural landscapes – A review of different concepts. Geomorphology 631
226, 94-105. doi:10.1016/j.geomorph.2014.07.023 632
Knotters, M., Brus, D.J., Oude Voshaar, J.H., 1995. A comparison of kriging, co-kriging and kriging 633
combined with regression for spatial interpolation of horizon depth with censored observations. 634
Geoderma 67, 227–246. doi:10.1016/0016-7061(95)00011-C 635
28
Koszinski, S., Gerke, H. H., Hierold, W., Sommer, M., 2013. Geophysical-based modeling of a kettle hole 636
catchment of the morainic soil landscape. Vadose Zone Journal 12(4). doi:10.2136/vzj2013.02.0044 637
Kravchenko, A., Bullock, D.G., 1999. A comparative study of interpolation methods for mapping soil 638
properties. Agronomy Journal 91, 393-400. doi:10.2134/agronj1999.00021962009100030007x 639
Lacoste, M., Minasny, B., McBratney, A., Michot, D., Viaud, V., Walter, C., 2014. High resolution 3D 640
mapping of soil organic carbon in a heterogeneous agricultural landscape. Geoderma 213, 296-311. 641
doi:10.1016/j.geoderma.2013.07.002 642
Laslett, G.M., 1994. Kriging and splines: an empirical comparison of their predictive performance in some 643
applications. Journal of the American Statistical Association 83(426), 391-409. 644
doi:10.1080/01621459.1994.10476759 645
Leenaers, H., Okx, J.P., Burrough, P.A., 1990. Comparison of spatial prediction methods for mapping 646
floodplain soil pollution. Catena 17, 535-550. doi:10.1016/0341-8162(90)90028-C 647
Lemercier, B., Lacoste, M., Loum, M., Walter, C., 2012. Extrapolation at regional scale of local soil 648
knowledge using boosted classification trees: a two-step approach. Geoderma 171-172, 75-84. 649
doi:10.1016/j.geoderma.2011.03.010 650
Levi, M.R., Rasmussen, C., 2014. Covariate selection with iterative principal component analysis for 651
predicting physical soil properties. Geoderma 219-220, 46–57. doi:10.1016/j.geoderma.2013.12.013 652
Li, H.Y., Webster, R., Shi, Z., 2015. Mapping soil salinity in the Yangtze delta: REML and universal kriging 653
(E-BLUP) revisited. Geoderma 237-238, 71–77. doi:10.1016/j.geoderma.2014.08.008 654
Malone, B.P., McBratney, A.B., Minasny, B., 2011. Empirical estimates of uncertainty for mapping 655
continuous depth functions of soil attributes. Geoderma 160, 614-626. 656
doi:10.1016/j.geoderma.2010.11.013 657
McBratney, A.B., Mendonça Santos, M.L., Minasny, B., 2003. On digital soil mapping. Geoderma 117, 3–658
52. doi:10.1016/S0016-7061(03)00223-4 659
McBratney, A.B., Odeh, I.O.A., Bishop, T.F.A., Dunbar, M.S., Shatar, T.M., 2000. An overview of 660
pedometric techniques for use in soil survey. Geoderma 97, 293–327. doi:10.1016/S0016-661
7061(00)00043-4 662
McBratney, A.B., Webster, R., 1983. Optimal interpolation and isarithmic mapping of soil properties v. 663
co-regionalization and multiple sampling strategy. J. Soil Sci. 34, 137–162. doi:10.1111/j.1365-664
2389.1983.tb00820.x 665
McKenzie, N.J., Austin, M.P., 1993. A quantitative Australian approach to medium and small scale 666
surveys based on soil stratigraphy and environmental correlation. Geoderma 57(4), 329-355. 667
doi:10.1016/0016-7061(93)90049-Q 668
29
Miller, B.A., Koszinski, S., Wehrhan, M., Sommer, M., 2015a. Comparison of spatial association 669
approaches for landscape mapping of soil organic carbon stocks. SOIL 1(1):217-233. 670
doi:10.5194/solid-1-217-2015 671
Miller, B.A., Koszinski, S., Wehrhan, M., Sommer, M., 2015b. Impact of multi-scale predictor selection for 672
modeling soil properties. Geoderma 239-240, 97–106. doi:10.1016/j.geoderma.2014.09.018 673
Minasny, B. and McBratney, A.B., 2008. Regression rules as a tool for predicting soil properties from 674
infrared reflectance spectroscopy. Chemometrics and Intelligent Laboratory Systems 94(1), 72-79. 675
doi:10.1016/j.chemolab.2008.06.003 676
Moore, I.D., Gessler, P.E., Nielsen, G.A., Peterson, G.A., 1993. Soil attribute prediction using terrain 677
analysis. Soil Sci. Soc. Am. J. 57, 443–452. doi:10.2136/sssaj1993.03615995005700020026x 678
Mueller, T.G., Pierce, F.J., 2003. Soil carbon maps: enhancing spatial estimates with simple terrain 679
attributes at multiple scales. Soil Sci. Soc. Am. J. 67, 258–267. doi:10.2136/sssaj2003.2580 680
Odeh, I.O.A., McBratney, A.B., Chittleborough, D.J., 1994. Spatial prediction of soil properties from 681
landform attributes derived from a digital elevation model. Geoderma 63, 197–214. 682
doi:10.1016/0016-7061(94)90063-9 683
Oliver, M.A. and Webster, R. 2014, 2014. A tutorial guide to geostatistics: Computing and modelling 684
variograms and kriging. Catena 113, 56-59. doi:10.1016/j.catena.2013.09.006 685
Popper, K.R. 1959/2005. The Logic of Scientific Discovery. Routledge, New York. 513 p.Quinlan, J.R., 686
1993. Combining instance-based and model-based learning, in: Proceedings of the Tenth 687
International Conference on Machine Learning, Kaufmann, M. (Ed.), 236-243. 688
Quinlan, J.R., 1994. C4.5: Programs for machine learning. Machine Learning 16, 235-240. 689
Schloeder, C.A., Zimmermann, N.E., Jacobs, M.J., 2001. Comparison of methods for interpolating soil 690
properties using limited data. Soil Sci. Soc. Am. J. 65, 470–479. doi:10.2136/sssaj2001.652470x 691
Schmidt, K., Behrens, T., Daumann, J., Ramirez-Lopez, L., Werban, U., Dietrich, P., Scholten, T., 2014. A 692
comparison of calibration sampling schemes at the field scale. Geoderma 232-234, 243–256. 693
doi:10.1016/j.geoderma.2014.05.013 694
Shrestha, D.L. and Solomatine, D.P., 2006. Machine learning approaches for estimation of prediction 695
interval for the model output. Neural Networks 19(2), 225-235. doi:10.1016/j.neunet.2006.01.012 696
Sommer, M., Gerke, H. H., Deumlich, D., 2008. Modelling soil landscape genesis: A "time split" approach 697
for hummocky agricultural landscapes. Geoderma 145, 3-4, 480-493. 698
doi:10.1016/j.geoderma.2008.01.012 699
Tamari, S., Wösten, J.H.M., Ruiz- Suárez, J.C., 1996. Testing an artificial neural network for predicting soil 700
hydraulic conductivity. Soil Sci. Soc. Am. J. 60(6), 1732-1741. 701
doi:10.2136/sssaj1996.03615995006000060018x 702
30
Theobald, D.M., D.L. Stevens, J.D.W., Urquhart, N.S., Olsen, A.R., Norman, J.B., 2007. Using GIS to 703
generate spatially-balanced random survey designs for natural resource applications. Environ. 704
Manage. 40, 134-146. doi:10.1007/s00267-005-0199-x 705
Triantafilis, J., Odeh, I.O.A., McBratney, A.B., 2001. Five geostatistical models to predict soil salinity from 706
electromagnetic induction data across irrigated cotton. Soil Sci. Soc. Am. J. 65, 869–878. 707
doi:10.2136/sssaj2001.653869x 708
Vasques, G.M., Grunwald, S., Comerford, N.B., Sickman, J.O., 2010. Regional modelling of soil carbon at 709
multiple depths within a subtropical watershed. Geoderma 156, 326–336. 710
doi:10.1016/j.geoderma.2010.03.002 711
Van Oost, K., Quine, T. A., Govers, G., De Gryze, S., Six, J., Harden, J. W., Ritchie, J. C., McCarty, G. W., 712
Heckrath, G., Kosmas, C., Giraldez, J. V., da Silva, J. R. Marques, Merckx, R., 2007. The impact of 713
agricultural soil erosion on the global carbon cycle. Science 318, 626-629. 714
doi:10.1126/science.1145724 715
Voltz, M., Webster, R., 1990. A comparison of kriging, cubic splines and classification for predicting soil 716
properties from sample information. Journal of Soil Science 41, 473-490. doi:10.1111/j.1365-717
2389.1990.tb00080.x 718
Webster, R., Oliver, M.A., 1990. Statistical Methods in Soil and Land Resource Survey: Spatial Information 719
Systems. Oxford University Press, Oxford, UK. 316 p. 720
Wollenhaupt, N.C., Wolkowski, R.P., Clayton, M.K., 1994. Mapping soil test phosphorus and potassium 721
for variable-rate fertilizer application. Journal of Production Agriculture 7, 441-448. 722
doi:10.2134/jpa1994.0441 723
Zhu, Q., Lin, H.S., 2010. Comparing Ordinary Kriging and Regression Kriging for Soil Properties in 724
Contrasting Landscapes. Pedosphere 20, 594–606. doi:10.1016/S1002-0160(10)60049-5 725
31
Tables 726
Table 1. Covariates considered for selection by Cubist. The resolution of the original digital elevation 727
model was maintained for all of the scale-dependent, terrain derivatives. 728
Covariate Software Analysis Scale
Elevation (2011 LiDAR, bare-earth) n/a 1 m
Slope gradient GRASS 3 - 490 m
Profile curvature GRASS 3 - 490 m
Plan curvature GRASS 3 - 490 m
Aspect (8 classes) ArcGIS (raster calculator) 3 - 490 m
Aspect -west {rotated for N, E, and S} GRASS 3 - 490 m
Northness transformed from aspect 3 - 490 m
Eastness transformed from aspect 3 - 490 m
Relative elevation - rect. neighborhood ArcGIS toolbox 10 - 10000 m
Relative elevation - circ. neighborhood ArcGIS toolbox 10 - 10000 m
Topographic position index (TPI) ArcGIS toolbox 10 - 10000 m
TPI - slope position ArcGIS toolbox multiple
TPI - landform classification ArcGIS toolbox multiple
Hillslope position ArcGIS toolbox multiple
Catchment area SAGA n/a
Catchment slope SAGA n/a
Channel network base level SAGA n/a
Convergence index SAGA n/a
Flow path length SAGA n/a
Length-slope factor SAGA n/a
Modified catchment area SAGA n/a
SAGA wetness index SAGA n/a
Stream power index SAGA n/a
Vertical distance to channel SAGA n/a
Wetness index SAGA n/a
Covariate Resolution Date
ECa (vertical mode) 1 m (for the meso-scale) 04 Mar 2009
ECa (vertical mode) 5 m (for the macro-scale) 04 Apr 2007
ECa (horizontal mode) 1 m (for the meso-scale) 04 Mar 2009
ECa (horizontal mode) 5 m (for the macro-scale) 04 Apr 2007
LAI (Quickbird) 5 m 26 Jun 2005 729
730
32
Table 2. Relative use (%) of covariates in models derived by Cubist for predicting SOC%. 731
Meso-scale Macro-scale
Rules MLR Parameter Rules MLR Parameter
All Covariates All Covariates
2 Rules
1 Rule 100% 14% Relative elev. - rect. (5200 m) 100% 100% LAI
100% Eastness (3 m)
100% Eastness (9 m)
100% LAI
Without LAI or ECa (limited) Without LAI or ECa (limited)
2 Rules
1 Rule 100% 100% Relative elev. - rect. (5200 m) 100% 100% Relative elev. - rect. (20 m)
732
Table 3. Coefficient of determination statistics (R2) for models built from the meso-scale calibration 733
points. Italics highlight the strongest performing model/s at the respective validation scales. 734
Validation Points
Models Micro Meso-All Meso-Internal Meso-External Macro
Mean model 0.00 0.00 0.00 0.00 0.00
OK 0.56 0.61 0.71 0.05 0.00
CK-LAI 0.57 0.60 0.71 0.03 0.03
CK-limited 0.54 0.54 0.70 0.06 0.02
UK 0.55 0.55 0.71 0.06 0.05
MLR-all 0.59 0.55 0.61 0.28 0.34
MLR-limited 0.58 0.45 0.48 0.04 0.32 735
Table 4. Mean absolute error statistics (MAE) for models built from the meso-scale calibration points. 736
Italics highlight the strongest performing model/s at the respective validation scales. 737
Validation Points
Models Micro Meso-All Meso-Internal Meso-External Macro
Mean model 0.143 0.124 0.114 0.172 0.460
OK 0.111 0.085 0.071 0.155 0.505
CK-LAI 0.110 0.088 0.069 0.183 0.477
CK-limited 0.123 0.097 0.071 0.229 0.501
UK 0.120 0.096 0.069 0.228 0.486
MLR-all 0.109 0.085 0.078 0.119 0.450
MLR-limited 0.102 0.097 0.093 0.115 0.476 738
739
33
Table 5. Coefficient of determination statistics (R2) for models built from the macro-scale calibration 740
points. Italics highlight the strongest performing model/s at the respective validation scales. 741
Validation Points
Model Micro Meso-All Meso-Internal Meso-External
Mean model 0.00 0.00 0.00 0.00
OK 0.41 0.39 0.42 0.07
CK-LAI 0.41 0.39 0.43 0.06
CK-limited 0.41 0.39 0.42 0.07
UK 0.41 0.39 0.43 0.06
MLR-all 0.57 0.56 0.56 0.56
MLR-limited 0.28 0.01 0.06 0.03 742
Table 6. Mean absolute error statistics (MAE) for models built from the macro-scale calibration points. 743
Italics highlight the strongest performing model/s at the respective validation scales. 744
Validation Points
Model Micro Meso-All Meso-Internal Meso-External
Mean model 0.231 0.307 0.334 0.172
OK 0.130 0.150 0.118 0.312
CK-LAI 0.129 0.148 0.116 0.312
CK-limited 0.130 0.150 0.118 0.312
UK 0.129 0.148 0.116 0.312
MLR-all 0.161 0.138 0.145 0.102
MLR-limited 0.176 0.310 0.281 0.461 745
Table 7. Skewness coefficients of the residuals for the observed minus the predictions of the meso-scale 746
models. 747
Calibration Points
Validation Points
Models Micro Meso-All Meso-Internal Meso-External Macro
Mean model 1.35 0.65 1.25 1.72 -0.53 1.76
OK 0.50 -1.14 -0.40 0.59 0.14 1.68
CK-LAI 0.34 -1.20 -0.59 0.53 0.32 1.78
CK-limited 0.37 -1.27 -1.05 0.62 0.23 1.53
UK 0.22 -1.25 -1.03 0.59 0.27 1.59
MLR-all -0.03 -1.21 0.71 0.81 0.73 -1.77
MLR-limited -0.57 -1.40 0.33 0.31 0.68 -1.70 748
749
34
Table 8. Skewness coefficients of the residuals for the observed minus the predictions of the macro-scale 750
models. 751
Calibration Points
Validation Points
Models Micro Meso-All Meso-Internal Meso-External
Mean model 1.76 0.65 1.25 1.72 -0.53
OK 1.79 -0.72 -0.74 -0.02 -0.17
CK-LAI 1.79 -0.75 -0.75 -0.01 -0.15
CK-limited 1.79 -0.72 -0.74 -0.02 -0.17
UK 1.79 -0.75 -0.75 -0.01 -0.15
MLR-all 2.28 -0.58 -0.11 -0.06 -0.94
MLR-limited 0.13 -1.28 4.73 1.46 3.21 752
Table 9. Mean of estimated errors by the respective meso-scale models and the percentage of observed 753
errors within the estimated error range at validation points. Assuming the estimated (modelled) error 754
represents one standard deviation, the actual SOC% values should be within the estimated error range 755
approximately 68% of the time. Observed errors at the validation points being within the estimated error 756
less than this rate indicates that the estimated error is underestimated, and vice versa. 757
OK CK-LAI CK-limited UK MLR-all MLR-limited
Micro Validation
Mean Est. Error 0.096 0.066 0.057 0.056 0.085 0.093
Within Range 59% 45% 37% 34% 58% 70%
Meso-Internal Validation
Mean Est. Error 0.098 0.067 0.058 0.056 0.078 0.090
Within Range 73% 61% 57% 54% 56% 56%
Meso Validation
Mean Est. Error 0.109 0.080 0.071 0.070 0.079 0.091
Within Range 70% 57% 53% 51% 55% 54%
Macro Validation
Mean Est. Error 0.160 0.149 0.262 0.259 0.084 0.092
Within Range 21% 21% 25% 25% 29% 29%
758
759
35
Table 10. Mean of estimated errors by the respective macro-scale models and the percentage of 760
observed errors within the estimated error range at validation points. Assuming the estimated 761
(modelled) error represents one standard deviation, the actual SOC% values should be within the 762
estimated error range approximately 68% of the time. Observed errors at the validation points being 763
within the estimated error more than this rate indicates that the estimated error is overestimated, and 764
vice versa. 765
OK CK-LAI CK-limited UK MLR-all MLR-limited
Micro Validation
Mean Est. Error 0.587 0.587 0.587 0.587 0.306 0.382
Within Range 100% 100% 100% 100% 86% 90%
Meso Validation
Mean Est. Error 0.631 0.631 0.631 0.631 0.306 0.382
Within Range 98% 98% 98% 98% 96% 76%
766
36
Figures 767
768 Figure 1. Location of study area within the Uckermark district of northeast Germany. 769
37
770 Figure 2. Locations of the sample points at the a) macro-scale and b) meso- and micro-scale. The points 771
used to calibrate the macro-scale models were also used to independently validate the meso-scale 772
models. The spatial interpolation area of b) is the experimental plot area for the CarboZALF project. The 773
aerial images are enhanced with hillshading of the LiDAR elevation model. Note that the soil sampling 774
and collection of covariate data was completed before this imagery was taken and before the 775
infrastructure for the CarboZALF experimental plots were installed. 776
777
38
778 Figure 3. Histograms for the a) micro-scale and b) macro-scale calibration points. Although it could be 779
argued that these calibration sets should be transformed to produce less skewed distributions, 780
complications from back-transformations for analyzing estimated and actual error with the validation 781
points was avoided by maintaining the original values. This study takes a heuristic approach by validating 782
all resulting models with independent data sets that have a higher quantity of samples. 783
39
784 Figure 4. Experimental variograms of a) meso-scale SOC%, b) macro-scale SOC%, c) meso-scale SOC% with 785
LAI, d) macro-scale SOC% with LAI, e) meso-scale SOC% with relative elevation (5200 m analysis scale), f) 786
macro-scale SOC% with relative elevation (20 m analysis scale). Points are the means of lags binned over 787
all directions and the solid lines show the models fitted to them with a spherical model. 788
789 Figure 5. Map realizations of SOC% from meso-scale calibration points by modelling with: a) ordinary 790
40
kriging (OK), b) co-kriging with LAI (CK-LAI), c) co-kriging with a more limited pool of covariates (CK-791
limited), d) universal kriging with LAI and relative elevation at a 5200m analysis scale (UK), e) rule-based 792
multiple linear regression with all available covariates (MLR-all), and f) rule-based multiple linear 793
regression with a more limited pool of covariates (MLR-limited). 794
795 Figure 6. Differences between the respective meso-scale models, using the UK map as a baseline. 796
Positive values indicate the model predicted more than UK model, negative values indicate the model 797
predicted less. a) OK, b) CK-LAI, c) CK-limited, d) MLR-all, and e) MLR-limited. Note that because the UK 798
model includes both of the covariates, the difference between it and the other kriging maps reflects the 799
pattern of the spatial information missing from the other kriging approach. 800
41
801 Figure 7. Map realizations of SOC% from macro-scale calibration points by modelling with: a) OK, b) CK-802
LAI, c) CK-limited (for this model, relative elevation at a 20 m analysis scale was the covariate), d) UK, e) 803
MLR-all, and f) MLR-limited. 804
805 Figure 8. Differences between the respective macro-scale models, using the UK map as a baseline. 806
Positive values indicate the model predicted more than UK model, negative values indicate the model 807
predicted less. a) OK, b) CK-LAI, c) CK-limited, d) MLR-all, and e) MLR-limited. Note that because the UK 808
model includes both of the covariates, the difference between it and the other kriging maps reflects the 809
pattern of the spatial information missing from the other kriging approach. 810
42
811 Figure 9.Estimated error maps produced by the respective meso-scale models: a) OK, b) CK-LAI, c) CK-812
limited, d) UK, e) MLR-all, and f) MLR-limited. 813
814 Figure 10.Estimated error maps produced by the respective macro-scale models: a) OK, b) CK-LAI, c) CK-815
limited, d) UK, e) MLR-all, and f) MLR-limited. 816