towards mapping soil carbon landscapes: issues of …

Iowa State University

From the SelectedWorks of Bradley A. Miller

March, 2016

Towards Mapping Soil Carbon Landscapes: Issuesof Sampling Scale and TransferabilityBradley A Miller, Leibniz Centre for Agricultural Landscape Research (ZALF)Sylvia Koszinski, Leibniz Centre for Agricultural Landscape Research (ZALF)Wilfried Hierold, Leibniz Centre for Agricultural Landscape Research (ZALF)Helmut Rogasik, Leibniz Centre for Agricultural Landscape Research (ZALF)Boris Schröder, Technische Universität Braunschweig, et al.

This work is licensed under a Creative Commons CC_BY-NC-ND International License.

Available at: https://works.bepress.com/bradley_miller/8/

http://www.iastate.edu

https://works.bepress.com/bradley_miller/

http://creativecommons.org/licenses/by-nc-nd/4.0/



https://works.bepress.com/bradley_miller/8/

1

Towards Mapping Soil Carbon Landscapes: Issues of Sampling Scale and Transferability 1

Bradley A. Miller1*, Sylvia Koszinski1, Wilfried Hierold1, Helmut Rogasik1, Boris Schröder2,3, Kristof Van 2

Oost4, Marc Wehrhan1, Michael Sommer1 3

1Leibniz Centre for Agricultural Landscape Research (ZALF), Institute of Soil Landscape Research, 4

Eberswalder Straße 84, 15374 Müncheberg, Germany 5

2Technische Universität Braunschweig, Institute of Geoecology, 38106 Braunschweig, Germany 6

3Berlin-Brandenburg Institute of Advanced Biodiversity Research (BBIB), 14195 Berlin, Germany 7

4Université Catholique de Louvain, George Lemaitre Center for Earth and Climate, Earth and Life 8

Institute, 1348 Louvain-la-Neuve, Belgium 9

*Corresponding author10

11

12

Abstract 13

The conversion of point observations to a geographic field is a necessary step in soil mapping. 14

For pursuing goals of mapping soil carbon at the landscape scale, the relationships between sampling 15

scale, representation of spatial variation, and accuracy of estimated error need to be considered. This 16

study examines the spatial patterns and accuracy of predictions made by different spatial modelling 17

methods on sample sets taken at two different scales. These spatial models are then tested on 18

independent validation sets taken at three different scales. Each spatial modelling method produced 19

similar, but unique, maps of soil organic carbon content (SOC%). Kriging approaches excelled at internal 20

spatial prediction with more densely spaced sample points. Because kriging depends on spatial 21

autocorrelation, kriging performance was naturally poor in areas of spatial extrapolation. In contrast, the 22

spatial regression approaches tested could continue to perform well in spatial extrapolation areas. 23

However, the problem of induction allowed the potential for problems in some areas, which was less 24

predictable. This problem also existed for the kriging approaches. Spatial phenomena occurring between 25

sampling points could also be missed by kriging models. Use of covariates with kriging can help, but the 26

This is a manuscript of an article from Soil and Tillage Research (2015): In press, doi:10.1016/j.still.2015.07.004. Posted with permission.

http://dx.doi.org/10.1016/j.still.2015.07.004

2

requirement of capturing the full feature space in the map remains. Methods that utilize spatial 27

association, such as spatial regression, can map soil properties for landscape scales at a high resolution, 28

but are highly dependent on the inclusion of the full attribute space in the calibration of the model and 29

the availability of transferable covariates. 30

Keywords: soil mapping, sampling design, spatial regression, kriging, estimated error, uncertainty, spatial 31

autocorrelation, spatial association 32

33

Highlights 34

1. Different modelled patterns can have similar levels of performance. 35

2. Spatial regression approaches can extrapolate spatially. 36

3. However, all spatial models are limited by calibration in feature space. 37

4. The problem of induction makes predicting that limitation difficult. 38

39

1. Introduction 40

Erosion and deposition processes redistribute large amounts of mineral soil and soil organic 41

carbon (SOC) across agricultural landscapes (Van Oost et al., 2007). SOC dynamics at the landscape scale 42

show fluctuations in space and time that challenge research on soil and SOC erosion (Kirkels et al., 2014). 43

A key component for monitoring carbon dynamics in soil landscapes is converting point observations to 44

areally extensive maps. This transition from sample points to a geographic field necessitates some type 45

of spatial prediction. Reasons of practicality limit the quantity of points that can be sampled and fewer 46

samples mean the map must depend more upon the spatial prediction methods (Webster and Olivier, 47

1990). Nonetheless, the design of sampling locations can be done strategically to optimize their utility for 48

the spatial model. Previous studies have examined the effect of sampling distribution within the same 49

extent (spatial domain) with respect to different modelling methods (Mueller and Pierce, 2003; Corwin 50

3

et al., 2010; Schmidt et al., 2014). However, for mapping landscapes, issues of scale, representativeness, 51

and uncertainty become increasingly important and that is the focus of this research. 52

Methods for spatial prediction commonly described as spatial interpolation (e.g., inverse 53

distance weighting, kriging) rely on spatial autocorrelation (Burgess and Webster, 1980; Goovaerts, 1999; 54

Schloeder et al., 2001). For this reason, greater sampling density increases the spatial support of the 55

model and prediction error increases with distance away from sampling points. Similarly, because these 56

methods are intended only for spatial interpolation, they are considered inappropriate for extrapolating 57

beyond the extent of the sampling points. 58

Recognizing the utility of spatial association approaches used in traditional soil mapping (Odeh et 59

al., 1994; McBratney et al., 2003), some varieties of kriging leverage spatial covariates to improve 60

predictions. Examples of approaches that incorporate spatial association with spatial autocorrelation 61

include co-kriging (McBratney and Webster, 1983; Juang and Lee, 1998) and universal kriging (Hengl et 62

al., 2007; Li et al., 2015). The covariates used are typically more easily measured than the target variable 63

and thus usually have better spatial coverage than samples of the target variable. However, spatial 64

autocorrelation still has an important role in all forms of kriging. Thus kriging at the landscape scale 65

continues to present a conflict between the size of the mapping extent and the number of observations 66

that need to be taken to produce an adequate sample density for the desired range of uncertainty. 67

Approaches that rely more purely on spatial association have become more quantitative and are 68

using more sophisticated techniques of predictor identification and spatial modelling. Some examples 69

include spatial regression or environmental correlation (McKenzie and Austin, 1993; Moore et al., 1993), 70

regression trees (Adhikari et al., 2014; Lacoste et al., 2014; Miller et al., 2015a), random forests (Vasques 71

et al., 2010; Häring et al. 2012; Schmidt et al., 2014), boosting algorithms (Häring et al. 2014) and 72

artificial neural networks (Tamari et al., 1996; Behrens et al., 2005). In contrast to spatial autocorrelation 73

4

techniques’ characteristic of prediction error increasing with distance from samples, spatial association 74

techniques’ error depends on the model’s ability to fit equations to the full feature space using available 75

covariates. However, spatial autocorrelation would still suggest that areas further away are more likely 76

to be outside the feature space of the sampled area. For these reasons, studies have recommended 77

stratification of the feature space to optimize sampling designs for models utilizing this prediction 78

strategy (Gessler et al., 1995; Hengl et al., 2003). 79

Comparison of resulting maps should consider several factors. Typically maps produced by 80

models are evaluated by error statistics for a single set of validation points, which provides a quantitative 81

comparison. However, spatial model realizations can have similar performance metrics at the designated 82

validation points while still representing differing spatial structures (Mueller and Pierce, 2003; Corwin et 83

al., 2010; Adhikari et al., 2013). This aspect can have implications for the interpretation of landscape 84

processes and the eventual use of the map, which should not be overlooked. Similarly, different 85

combinations of sampling designs and spatial modelling methods will have different patterns of error 86

magnitude, which can also have bearing on the suitability of the map for the desired purpose 87

(McBratney et al., 2000). This study considers each of these criteria in its comparison of maps for SOC% at 88

multiple scales. 89

We focus on the attribute of soil organic carbon content (SOC%) in the topsoil because of its 90

importance in monitoring and modelling carbon dynamics in soil landscapes. The highest concentrations 91

and thus the largest storage of carbon is in the topsoil. However, SOC% can be highly spatially variable, 92

which greatly impacts the mass balance of carbon at the landscape scale. This soil property has been 93

heavily-sampled for the CarboZALF project under different sampling designs for different research 94

purposes. Therefore, these samples provide a unique opportunity for comparing the nature and 95

performance of spatial modelling methods with respect to samples taken at different scales. The 96

objective of this study is to evaluate six spatial models, built from two sample sets taken at different 97

5

scales, in terms of their prediction performance as well as the distribution and reliability of their error 98

estimations. As the spatial scales of the calibration and validation point sets are shifted, the degree of 99

extendibility or transferability of the models is uniquely tested. 100

2. Methods 101

2.1. Study Area 102

Situated in the Northeast German Plain, the site for this study belongs to the main experimental 103

area of the Leibniz Centre for Agricultural Landscape Research (ZALF), located within the rural and 104

agricultural landscape of Uckermark (Figure 1). Initiated in 2008, the “CarboZALF” research project was 105

established to take a multiscale and interdisciplinary approach for quantifying and understanding 106

processes relevant to ecosystem carbon dynamics as well as their driving forces. The region is well-107

known for a long history of agricultural use since medieval deforestation and is dominated by large fields 108

(great manors and estates of rural gentry until 1945, later socialist agricultural cooperatives, and private 109

farmers and farm cooperatives within the last 25 years). 110

Due to the heterogeneity of soil associations within the hummocky ground moraine from the 111

Pommeranian stage of the Weichselian glaciation (Koszinski et al., 2013), small scale variation of soil 112

properties is pronounced. The spatial variability of SOC, in particular, is increased by the anthropogenic 113

effects from hundreds of years of crop and tillage systems (Deumlich and Frielinghaus, this issue; Gerke 114

et al., this issue). Soil parent materials are calcareous till and glaciofluvial sands or gravels that have 115

developed into quite different soils. Soils in the area include Haplic Luvisols or Luvic Arenosols, 116

accompanied by Gleysols, Stagnosols, and even Histosols within typical closed depressions (kettle holes) 117

and wide outwash valleys (IUSS Working Group WRB, 2014). Many of these soils are clearly affected by 118

erosion processes, resulting in additional changes of soil properties including carbon concentration and 119

storage (Sommer et al., 2008; Deumlich et al., 2010). More specifically, the soil type inventory of the 6 ha 120

6

experimental site consists of Albic Cutanic Luvisols, Calcic Cutanic Luvisols, Calcaric Regosols, and 121

Endogleyic Colluvic Regosols (Eutric) over peat. The subcontinental climate is characterized by a mean 122

annual air temperature of 8.7°C and a mean annual precipitation of 483 mm (1992-2011, ZALF research 123

station Dedelow). 124

2.2. Sampling Design 125

Separate sets of samples at a meso- (80 points covering 6 ha) and macro- (28 points covering 65 126

ha) scale provided unique calibration sets for building the models examined in this study (Figure 2). The 127

meso-scale calibration set used a spatially balanced stratified random sampling design according to 128

Theobald et al. (2007) in order to maximize the benefit of covariates in the models. The macro-scale 129

calibration set applied a similar strategy, but relied more on expert knowledge to determine sample 130

locations. For stratification at the meso-scale, we utilized maps of leaf area index (LAI), apparent 131

electrical conductivity (ECa), and topographic position index (TPI; analysis scale of 5 m). These three 132

likely covariates were partitioned into eight quantiles for the respective sampling extents and 133

intersected with one another. To increase the efficiency of sampling the feature space range, the 134

selection of random locations within these resulting zones skipped the third and sixth quantiles of the 135

respective classifications. The macro-scale calibration point locations were purposively selected by 136

expert knowledge with consideration of the same covariates used in the stratified random sampling 137

applied at the meso-scale. Although the distributions of the resulting SOC% data sets were not perfectly 138

normal (Figure 3), we did not transform them in order to avoid issues with back-transformations and 139

comparability in the validation of error estimations. 140

The respective models were then tested on independent validation sets sampled on micro- (145 141

points at 5 m spacing) and meso- (128 points at 20 m spacing) scale grids. Note that the terms micro, 142

meso, and macro used here describe the relative scale of the sampling and do not necessarily 143

7

correspond to specific scale ranges that may be defined elsewhere. The micro-scale validation grid 144

samples were all spatially internal to the meso-scale calibration points. The meso-scale validation points 145

were distinguished between points spatially internal versus external to the area covered by the meso-146

scale calibration points (Figure 2b). As a test of upscaling, the models built from the meso-scale 147

calibration points were also tested at the macro-scale by using the macro-scale calibration points as 148

macro-scale validation points for the meso-scale models. For this reason, the models calibrated at the 149

macro-scale could not be independently validated at the macro-scale. The sequence of testing the meso-150

scale models on the meso-scale internal, meso-scale external, and macro-scale validation points 151

provided an examination of transferability for the models developed at the meso-scale. 152

Soil samples for the meso- and macro-scale calibration, as well as the meso-scale validation 153

points were collected from cores extracted by a hydraulic probe. Soil cores of 10 cm in diameter were 154

taken down to a 1.5 m depth at minimum. For this study, the cores were analyzed for the top 30 cm of 155

the profile, which in most cases coincided with the Ap horizon. A representative mixed mass section 156

taken across the 30 cm and consisting of at least 500 g of air dried soil was analyzed in the lab. Soil 157

samples for the micro-scale validation were taken by hand, providing a mixed sample of at least 500 g for 158

the top 30 cm at every point. The total carbon content was determined by dry combustion at 1250°C and 159

infrared detection of carbon dioxide. Analysis of samples was conducted in duplicate, using the mean of 160

the results in the final data set. The precision of this method for measuring SOC% was approximately 161

±0.1. 162

2.3. Spatial Modelling 163

A range of spatial modelling methods were selected to represent the gradient of approaches 164

from spatial autocorrelation to spatial association. Ordinary kriging (OK) was used as a purely spatial 165

autocorrelation approach. Co-kriging (CK) and universal kriging (UK) were used as hybrids between 166

8

spatial autocorrelation and spatial association. And rule-based, multiple linear regression (MLR) models 167

were used to represent the spatial association family of digital soil mapping approaches. When covariate 168

grids were layered together the finest resolution was maintained, which resulted in the MLR maps 169

having a resolution of 1 m. The kriging product maps were also exported at a resolution of 1 m for 170

comparison. 171

A pool of 297 potential covariates was considered in the production of the MLR models (Table 1). 172

Most of these potential covariates were terrain derivatives from a 1 m resolution digital elevation model, 173

generated by LiDAR. ECa was determined by using an EM38 DD device. Point LAI data was collected in 174

the field and then used to calibrate a semi-physical model for estimating LAI from Quickbird satellite 175

imagery. A low pass filter was then applied to the calculated LAI grid to reduce the effect of gaps 176

between rows. 177

A common problem in spatial association approaches for digital soil mapping is the identification 178

of optimal predictor covariates. Therefore, we used the Cubist 2.08 software (Quinlan, 1994), which not 179

only builds optimized model structures, but has also demonstrated ability to select optimal covariates 180

from a pool of potential predictors (Bui et al. 2006; Minasny and McBratney, 2008; Adhikari et al., 2013; 181

Lacoste et al., 2014; Miller et al., 2015b). Detailed description of Cubist’s process for selecting predictors 182

and building models is provided in Quinlan (1993) and Holmes et al. (1999) and will not be repeated 183

here. To reduce the possibility that the resulting models were overfit, covariates selected by Cubist were 184

re-entered as a more limited covariate pool until the set of selected covariates equaled the set in the 185

supplied covariate pool. Maps of SOC% were produced by combining the raster maps of the covariates as 186

described by the Cubist generated regression equations using map algebra in ArcGIS 10.1 187

(www.esri.com/software/arcgis). 188

9

In order to keep the spatial modelling approaches as comparable as possible, the covariates 189

selected by Cubist were also used for the appropriate kriging approaches using Geostatistical Analyst in 190

ArcGIS 10.1. At the respective sampling scales, the top covariate in the Cubist model (MLR-all) was 191

selected for use with co-kriging. Recognizing that covariates such as LAI and ECa may not always be 192

practical to obtain for landscape scale mapping, Cubist was also run with those potential covariates 193

excluded (MLR-limited). The top covariate selected by Cubist from this more limited covariate pool was 194

also tested with co-kriging. In addition, universal kriging was tested with the two covariates used for the 195

co-kriging models at the respective scales, together. Therefore, six spatial modelling approaches were 196

calibrated on the two scales of sampling design: 197

ordinary kriging (OK), 198

co-kriging with top covariate from Cubist (CK-LAI), 199

co-kriging with top covariate from Cubist when LAI and ECA were excluded (CK-limited), 200

universal kriging with the two covariates tested with co-kriging (UK), 201

MLR (MLR-all), and 202

MLR limited by excluding LAI and ECa (MLR-limited). 203

The prediction of a standard error is included in the calculations of the kriging methods (Goovaerts, 204

2001). This is part of the popularity of kriging. However, estimating error within spatial regression 205

approaches has recently also shown promise. For spatial regression, error can be estimated using the 206

residuals between the regression equation and observed values. This empirical estimation of error can 207

be transferred with the model predictions (Shrestha and Solomatine, 2006; Malone et al., 2011; 208

Lemercier et al., 2012). Under the rule-based approach, rule conditions can be used to identify areas 209

expected to have similar errors (Miller et al., 2015a). The performance of these modelled errors will be 210

tested in this study as well. 211

10

2.4. Comparative Analysis 212

Comparison between the six spatial modelling approaches began with standard measures of 213

model performance. The coefficient of determination (R2) indicated the models’ ability to represent the 214

spatial variation in the validation points. The mean absolute error (MAE) measured the models’ 215

prediction accuracy at the validation points. As a baseline for comparison, these same performance 216

statistics were calculated for the mean model (i.e. the null model), which was simply the use of the 217

calibration points’ mean to predict SOC% everywhere. 218

Part of this study’s objective was to compare the spatial realizations produced by the respective 219

modelling methods. In addition to the visual comparison, the differences in spatial predictions were 220

calculated in ArcGIS 10.1 by map algebra. Also, uncertainty maps were produced using the error 221

estimation methods associated with each of the respective spatial modelling approaches. The accuracy 222

of these error estimations was tested on the multiple scales of more densely sampled point grids used 223

only for validation purposes. The accuracies of error estimations were then compared between spatial 224

modelling methods. 225

3. Results 226

3.1. Configuration of models 227

From the large pool of potential covariates, Cubist selected very few predictors for SOC% based 228

on the points in our study. With the exception of the model built from the meso-scale calibration points 229

using all covariates, the models generated by Cubist used only one predictor each. The MLR-all model 230

based on the meso-scale points utilized four predictors (Table 2). The generated models were also simple 231

in the number of rules produced. Both meso-scale models utilized two rules, while both macro-scale 232

models used only one. Our experience has been that this is related to the quantity and range of 233

11

conditions covered by the calibration points; more data points can provide Cubist with a basis for 234

creating more rules. 235

The Cubist models calibrated on the meso-scale points focused on the use of relative elevation at 236

an analysis scale of 5200 m. Although this is a very regional context for relative elevation, it is still 237

somewhat different than elevation a.s.l. (r = 0.90 at the calibration points). When LAI was made available 238

for model building from the same points, it dominated the estimation of SOC% in the map area because it 239

was the primary covariate for areas above -7.3 m of relative elevation (5200 m analysis scale). For the 240

meso-scale area, this essentially excluded the use of LAI in the wet swale, which runs along the border 241

between the spatial interpolation and extrapolation test zones. By calibrating on the macro-scale points, 242

LAI was actually the only covariate selected by Cubist. Intriguingly, when LAI and ECa were removed from 243

the covariate pool for the macro-scale model, relative elevation at a 20 m analysis scale was selected as 244

the only predictor. This smaller analysis scale for relative elevation suggests a greater focus on variation 245

coinciding with more local relief. 246

The simplicity of the Cubist models facilitated the choice of covariates to use with the CK and UK 247

models for comparison. LAI was chosen as the covariate for the first test of co-kriging (CK-LAI) at both 248

calibration scales. To explore the scenario when covariates such as LAI and ECa are not available, the 249

second co-kriging test used the only covariate selected by Cubist at the respective scales (CK-limited). 250

Specifically, the CK-limited spatial modelling was done using the covariates of relative elevation (5200 m) 251

at the meso-scale and relative elevation (20 m) at the macro-scale. In addition, UK was tested using the 252

LAI and the respective analysis scale of relative elevation together as covariates. Although many other 253

combinations of methods were possible, this provided some representation of the possibilities between 254

the two extremes of spatial modelling approaches. 255

12

For modelling the semivariograms used to conduct the kriging, a spherical model was generally 256

the most appropriate and was used consistently (Figure 4). The semivariogram models for the kriging of 257

the meso-scale calibration points had a nugget between 0 and 0.4 (%)2. The meso-scale semivariogram 258

models for OK and CK-LAI had ranges of 46 m and 58 m, respectively. The meso-scale semivariogram 259

models for CK-limited and UK had larger ranges of 273 m and 266 m, respectively. These two spatial 260

models were the kriging approaches that included relative elevation (5200 m) as a covariate. For the 261

macro-scale kriging models, calculated nuggets ranged between 0.12 and 0.33 (%)2. However, a range of 262

211 m was consistent for all the semivariograms built from the macro-scale points. 263

3.2. Model prediction performance 264

The models built from the meso-scale points all had similar levels of performance for predicting 265

validation points spatially internal to the calibration points’ extent (Tables 3 and 4). The kriging models 266

particularly excelled at predicting the validation points taken at the same scale and that were spatially 267

internal. As expected, the performance of the kriging-based models failed outside the spatial extent of 268

the calibration points. Although the performance of the MLR models also declined in the prediction areas 269

that were spatially external, in most cases they still provided reasonably useful predictions. The 270

exception was the predictions made by the MLR-limited model for meso-scale points in the external 271

validation area. Despite having a low MAE, that model was unable to represent the spatial variation in 272

that opposite hillslope. This suggests that although Cubist did construct a model that minimized the 273

residuals, the LAI provided important information for predicting the variation in the external, meso-scale 274

validation. 275

The performances of the macro-scale models were generally lower than the models based on 276

the meso-scale points (Tables 5 and 6). This was likely due to the lower quantity and density of points 277

available for calibration at the macro-scale. Nonetheless, the independent validation for most of the 278

13

models showed that they made useful predictions in the majority of the area. All four kriging approaches 279

again produced very similar results. Although all of the validation point sets were spatially internal in this 280

model building scenario, the prediction performance on the points that were external at the meso-scale 281

was still very poor. This suggests that there was something unique about the conditions of that hillslope 282

area and making that area spatially internal to the calibration points for the kriging approaches was not 283

sufficient to effectively model that area. 284

The macro-scale MLR models were especially intriguing because their prediction results were 285

both the lowest and the highest of the comparable models. The MLR-limited model performed the worst 286

at virtually all scales. It only improved upon the use of the calibration points’ mean (null model) as a 287

prediction at the micro-scale validation. This could be related to the macro-scale MLR-limited model’s 288

focus on relative elevation (20 m) as a predictor. In contrast, the MLR-all model consistently had the 289

highest R2 for all validation sets. It also had the lowest MAE, of models built from either scale, for the 290

difficult to predict meso-scale external points. 291

3.3. Spatial patterns represented by models 292

3.3.1. Meso-scale variation 293

Although all of the meso-scale based models showed similar validation performance, they 294

produced unique realizations of the SOC%’s spatial pattern (Figure 5). This is most clearly seen in the 295

different location and shape of the boundaries between the mapped attribute classes. The maps 296

produced by the meso-scale models all recognize higher concentrations in the swale that follows the 297

border between this study’s zones for spatial interpolation and extrapolation. However, each map shows 298

different shapes and maximum values. The OK, CK-LAI, and UK maps were the most similar. CK-limited 299

displayed the least amount of spatial variation, reflecting that model’s reliance on relative elevation at a 300

large analysis scale as a covariate. The common inheritance of patterns from the LAI covariate can be 301

14

observed in both the CK-LAI and the MLR-all maps. LAI was only available for the field containing the 302

CarboZALF plots and an adjacent field to the north. Therefore, in the area south of the CarboZALF 303

experimental plots, the MLR-all model produced no predictions. The kriging model predictions were also 304

not reliable in that area because it was an area of spatial extrapolation, especially for the CK-LAI and UK 305

models because their covariate of LAI was missing there. The two MLR-based maps were nearly identical 306

in the swale area, but differ considerably in the remaining area. 307

Using the UK map as a baseline, its predicted values were subtracted from the predicted values 308

of the other models to produce visualizations of their differences (Figure 6). This analysis revealed that 309

the actual difference between the meso-scale kriging models was small (< 0.1 SOC%). The pattern of the 310

differences aligning with the orientation of the crop rows indicated the influence of the LAI covariate. 311

The LAI covariate map had a pattern of lower values between rows, which we attempted to remove by 312

low pass filtering. However, row patterning could only be minimized, not completely removed. The MLR 313

models showed their similarity with each other by contrast with the UK model. Their differences with the 314

UK map were greatest at the extremes of rule attribute ranges, which often correspond with the edges 315

of mapped rule boundaries. For example, the boundary of the rule applied to the swale is apparent by 316

the lower estimation of SOC% by the MLR models compared to the UK model. Conversely, at the center 317

of a depression in the swale, the MLR models predicted more SOC% than the UK model. 318

3.3.2. Macro-scale variation 319

At the macro-scale, the maps produced by the kriging models were even more alike (Figure 7). 320

The similarity of the CK and UK maps to the map produced by OK would suggest that the provided 321

covariates were not very influential in making the predictions for those maps. The MLR-all map was 322

again limited to only locations where LAI data was available. In most studies, the area outside the field of 323

15

interest would be masked for the kriging maps, but the behavior of the spatial models in areas of spatial 324

extrapolation is within the scope of this research. 325

In contrast to the kriging maps, the MLR models each produced unique maps of SOC%. Notably, 326

the MLR-all map shows a pattern that more closely matches the shape of the swale and wetlands 327

running through the center of the field from southwest to northeast. Technically the LAI is not suitable 328

for the wetlands because it was calculated on the basis of field crops and not for wetland vegetation. 329

Nonetheless, the MLR-all map follows an expected trend of SOC% and better resembles the spatial 330

patterns of field features expected to be influential than any of the kriging maps. 331

Although the MLR-limited map follows many of the same patterns as the MLR-all map, the MLR-332

limited model displays an apparent flaw within the wetlands. The remotely sensed LiDAR elevation was 333

not able to detect the true elevation of the mineral surface under the vegetation and water of the 334

wetlands. Also, there were no calibration samples taken in flooded locations. The result was a 335

misrepresentation of the ground surface elevation in the wetlands and thereby an under prediction of 336

SOC% by the MLR-limited model, which strongly relied on local relative elevation (20 m). If the calibration 337

set had included points in these areas, it is likely that the Cubist generated model would have been 338

different. 339

Again using the map produced by UK as the baseline, the values predicted by macro-scale UK 340

were subtracted from the other macro-scale models’ maps to create visualizations of their differences 341

(Figure 8). Although the kriging SOC% maps appear practically identical with the eight attribute classes 342

used to present the maps, the difference calculation reveals patterns of difference related to the 343

respective covariates. For example, the differences shown for the CK-LAI map reflect the influence that 344

relative elevation (20 m) had on the UK map. That the difference maps for OK and CK-limited resemble 345

16

each other demonstrates that the LAI covariate dominated the prediction in the UK map. However, these 346

differences are relatively small (< 1 SOC%) in the context of the range of SOC% observed in this map area. 347

The differences between the macro-scale MLR maps highlight the issues in and near the 348

wetlands. The MLR-limited map was clearly underestimating SOC% within areas of wetland vegetation, 349

yet its difference with the UK map was minimal in those areas. The MLR-all map predicted higher SOC% in 350

the wetland area with a pattern that better matched field observations. The major difference between 351

the kriging maps and the MLR maps was underscored by the spike in SOC% in the central part of the 352

kriging maps. There, a single point that was influenced by wetland conditions elevated the kriging 353

predictions in a circular area surrounding it. The MLR maps more realistically followed the shape of the 354

terrain features. Thus in the MLR difference maps, the contrast in shape of the predicted high SOC% areas 355

can be seen. 356

3.4. Distribution and performance of error estimation 357

The maps of the respective models’ estimated error highlighted the strategies and relative 358

strengths of the spatial modelling approaches (Figures 9 and 10). At both the meso- and macro- scales, 359

the maps of error estimation illustrated the kriging models’ use of spatial autocorrelation with estimated 360

errors increasing radially from the calibration points. The addition of covariates to the kriging models 361

lowered the estimated error in an area extending further out from those points. The method used to 362

map the estimated errors of the MLR models did not provide error estimations that were as continuous. 363

Instead, areas needed to be classified by the model rule applied and assigned a single error estimation 364

value. However, the MLR error estimations were comparable to the error estimations found only in close 365

proximity to calibration points in the kriging maps. 366

In addition to validating model predictions of SOC%, we also used the validation points to test the 367

error estimations mapped by the different models. In theory, the observed error at the validation points 368

17

should be within the estimated error range approximately 68% of the time (one standard error), 369

assuming the errors were normally distributed. Although the validation points are only a sample set, this 370

evaluation provides a practical test of the estimated errors. 371

Evaluation of the residuals for both the calibration and validation points showed that sometimes 372

they were skewed, depending on the model and sample set. The SOC% values of the meso-scale points 373

were positively skewed, as indicated by the positive skew of the mean model’s residuals (Table 7). 374

However, the residuals for the kriging and MLR models for the meso-scale calibration points did not have 375

a strong skew. At the meso-scale, the residuals at the validation points generally had a low to moderate 376

level of skewness. The CK-limited and UK models were an exception when tested on all of the meso-scale 377

validation points together, which produced a distribution of residuals that had skewness coefficients a 378

little below -1. The residuals of the meso-scale kriging and MLR model’s predictions of the micro-scale 379

points were all negatively skewed. The meso-scale, MLR models’ residuals were also negatively skewed 380

for the macro-scale validation points, while the kriging model’s residuals became positively skewed. Thus 381

the direction of observed skewness in the residuals was dependent upon the validation points used, 382

which varied by spatial scale. 383

The skewness coefficients of the macro-scale models’ residuals were very different (Table 8). 384

With the exception of the MLR-limited model, the residuals at the calibration points were all strongly, 385

positively skewed. However, this did not hold true for the residuals of these models at the validation 386

points. Intriguingly, residuals of the macro-scale models were not skewed for the points considered as 387

meso-scale, internal validation, but when considered together with the meso-scale, external validation 388

points, the distribution of residuals matched the skewness of the micro-scale validation residuals. The 389

exception to this trend for the kriging and MLR models was the MLR-limited model. While its residuals 390

were not skewed at the calibration points, they were strongly skewed at the validation points. The 391

18

degree to which the residuals are normally distributed, or lack thereof, could have an important impact 392

on the models’ ability to estimate error and will be discussed later. 393

The mean estimated errors at validation points for the meso-scale kriging models increased from 394

the micro to the macro scale, which reflects the increasingly external position of the validation points 395

(Table 9). The mean estimated errors at validation points for the meso-scale MLR models were relatively 396

stable at all scales and were comparable in magnitude to the kriging models at the micro and meso 397

validation scales. The percentage of validation points with observed errors within the estimated error 398

range for their location followed the size of the mean estimated error. Naturally, a greater range of 399

uncertainty is more likely to include the actual or measured value. The use of covariates resulted in a 400

lower estimation of error for the kriging models for the internal validation points. However, these 401

estimations appeared to be a little too low for micro- and meso-scale validation points, with between 402

34% and 57% of errors being observed within those estimated ranges. The performance of error 403

estimations by the meso-scale kriging models for the macro-scale points emphasized the 404

inappropriateness of those models’ use in spatially external areas. Even though error estimations for 405

those points were greatly increased, the observed error was outside that range more than 75% of the 406

time. The accuracy of the MLR models’ error estimation were generally better than the kriging models’ 407

accuracy, but the performance of the MLR models’ error estimations also declined from the micro- to 408

macro-scale validations. 409

Estimations of error for all the models increased when calibrated at the macro-scale (Table 10). 410

However, the MLR models’ estimated errors were much less than the errors estimated by the kriging 411

models. Nonetheless, the uncertainties estimated by all of the models appeared to be overestimations of 412

the actual error. Observed errors were almost always within the estimated error ranges (86-100%). The 413

exception was the MLR-limited model’s estimation of error at the meso-scale validation points, but the 414

observed error was still within the estimated range more than the expected 68% of the time. 415

19

4. Discussion 416

4.1. Comparison with previous studies 417

In most cases, previous studies have shown that greater use of covariates improves model 418

performance, but the results have not been consistent due to varying target variables, sampling designs, 419

and selected covariates (Knotters et al., 1995; Triantafilis et al., 2001; Mueller and Pierce, 2003; Zhu and 420

Lin, 2010). The differences in results are likely related to the approaches’ sensitivity to modelling 421

conditions. For example, kriging performance can be significantly affected by the variability of the data 422

(Leenaers et al., 1990), choices made in modelling the variogram (Kravchenko and Bullock, 1999; Oliver 423

and Webster, 2014), and sampling design (Voltz and Webster, 1990; Englund et al., 1992; Laslett, 1994; 424

Wollenhaupt et al., 1994; Gotway et al., 1996). In the case of spatial association approaches, the 425

performance of the models can be heavily dependent upon the covariates used (Levi and Rasmussen, 426

2014; Miller et al., 2015b). In general, studies comparing spatial prediction methods have found only 427

small differences in their performance, but each with respective advantages (Zhu and Lin, 2010; Adhikari 428

et al., 2013). In this study, we have attempted to explore further the reasons behind these respective 429

advantages and better elucidate when the different approaches are most appropriate, especially with 430

respect to sampling scale and transferability. 431

Odeh et al. (1994) compared spatial modelling methods in a manner similar to ours in south-432

central Australia, comparing OK, CK, UK, and MLR. That study included an additional comparison with 433

regression kriging, but for our study we wanted to focus on the results of the spatial modelling 434

approaches directly without additional steps to correct predictions. In the Odeh et al. (1994) study, 435

samples were taken on a 25 m spaced grid covering approximately 24 ha. This contrasts with the 436

stratified distribution of the calibration points in our study. The grid sampling design should be more 437

ideal for kriging approaches because it minimizes the distance between points across the mapping area. 438

20

Model results were tested on random points from about 30% of the sampling grid that were excluded 439

from model calibrations. For their predictions of solum depth, depth to bedrock, topsoil gravel, and 440

subsoil clay, the MLR models consistently produced lower root mean square errors than the kriging 441

models. Although the sample distribution in that study was ideal for kriging, the sample density and 442

distribution was also sufficient to insure a likely coverage of the feature space. In our study, the MLR 443

models were not always the better performing prediction method, but the lower sampling density 444

increased the odds of missing something from the feature space. 445

Bostan et al. (2012) tested spatial predictions of average annual precipitation by comparing OK, 446

UK, MLR, as well as regression kriging and geographically weighted regression. Certainly the spatial 447

distribution of average annual precipitation is a different phenomenon from SOC% for a single round of 448

sampling, but their results confirmed several of the same geographic principles observed in the present 449

study. Specifically, kriging methods are only suitable for spatial interpolation of the target variable and 450

spatial extrapolation is the most difficult area for spatial modelling. However, similar to our study, the 451

regression methods demonstrated the strongest performance in spatial extrapolation. Bostan et al. 452

(2012) did not test performance dynamics with respect to scale. 453

Like our study, Mueller and Pierce (2003) sought to examine the impact of sampling scale on the 454

prediction accuracies of OK, CK, kriging with external drift (~UK), and MLR . The models were tested for 455

their ability to predict total carbon concentration (g kg-1) for a field in central Michigan, USA. Their 456

experimental design included samples taken on 30.5, 61, 100 m grids for model calibrations, which were 457

then independently validated on a set of 24 points. Their results showed that prediction error generally 458

increased with sampling scale and the corresponding decrease in sampling density. They also determined 459

that as the spacing of the calibration points increased, use of covariates increased in importance for 460

reducing the error of predictions. Although the sample size for the 100 m grid was determined to be too 461

small for kriging approaches, MLR continued to perform well. Our findings generally support these 462

21

conclusions, but suggest that the sampling density most important to spatial association approaches (i.e. 463

MLR) is density across the feature space, not necessarily across locational space. 464

4.2. Issues with spatial extrapolation (transferability) 465

The primary difference between the focus of this study and the methods of previous studies 466

comparing kriging and MLR models was the arrangement of sampling points. Specifically, the 467

juxtaposition of the meso-scale calibration points and their respective validation points allowed for a test 468

of the predictive power of the models with increasing distance from the area spatially internal to the 469

calibration points. It was already well known that the kriging methods were ill-suited for spatial 470

extrapolation, but given their popularity in digital soil mapping, provided a basis for contrast. The MLR 471

models, which do not rely on spatial autocorrelation, were still affected by it as indicated by the 472

decrease in predictive ability in areas spatially external to their calibration. However, their performance 473

did not necessarily decline with increasing distance from the calibration area. In fact, both of the meso-474

scale MLR models performed better on the more distant macro-scale validation points than on the 475

meso-scale external validation points. 476

The reason for this lack of distance-related pattern for the MLR models’ performance was 477

because they relied on spatial association, which means that the models’ performances were more 478

directly tied to their ability to represent the feature space. The lowest performance of these models, 479

which was in the external validation points at the meso-scale, indicated that something about the 480

conditions of that validation area made it outside the feature space used in calibration. 481

However, the issue of poor prediction performance in the unique conditions of the area 482

designated as the meso-scale external validation was not limited to the MLR models. Clearly the meso-483

scale external validation area was inappropriate to be modeled using the meso-scale calibration points 484

with kriging approaches due to the models’ theoretical underpinnings. In contrast, the same was not true 485

22

for macro-scale calibration of the kriging approaches applied to predict the meso-scale external 486

validation points. That area was spatially internal to the macro-scale calibration points and thus within 487

the theoretical bounds of the kriging approach. Yet macro-scale kriging models still had very poor 488

prediction performance in that area, despite showing reasonable prediction performance on the 489

adjacent and also spatially internal meso-scale internal calibration points. 490

This demonstration of contrasting model performances in two areas that with pre-validation 491

information would have been assumed to be similar is an illustration of the problem of induction. 492

Although not often discussed in modern science and usually relegated to discussions of philosophy, it is 493

instructive for evaluating methods that predict or infer. The problem of induction points out the fallibility 494

of assuming that generalizations (i.e., models) produced by a limited number of observations will be 495

equally informative about unobserved instances (Hume, 1739/2001; Popper, 1959/2005). Of course the 496

only way to combat this dilemma is to maximize observations. However, we cannot remove the 497

possibility that a prediction can be ill-equipped to accurately estimate the target variable or the potential 498

error for instances that result from events about which our calibration points could not inform. 499

4.3. Assumption of normal distributions in the model error 500

Although the SOC% values in our calibration data sets were considered to be sufficiently normal 501

in distribution, the degree of skewness for the models’ residuals varied considerably. Most models rely 502

on the assumption that the model residuals are also normally distributed for estimating the uncertainty 503

of the predictions. If the residuals are negatively skewed, but assumed normal, the mean of the errors 504

could be overestimated, and vice versa. One could argue that the SOC% values used in this study were 505

positively skewed, particularly the macro-scale calibration set. However, this did not necessarily translate 506

to a reflective skewness in the residuals for the models at the calibration or at the validation points. For 507

example, the residuals of the meso-scale kriging models were moderately, positively skewed at the 508

23

calibration points, but went from a negative skew to a positive skew when tested on the micro- to 509

macro-scales, respectively. While this might suggest that the error estimations from the meso-scale 510

kriging models would be overestimated at the micro-scale and underestimated at the macro-scale, the 511

observed error of these models at those points indicated the error was underestimated at all scales. 512

However, the shift to strongly positive skewness coefficients for the residuals at the macro-scale 513

validation points coincides with the decrease in error estimation performance. This again reinforces the 514

known problem of using kriging to model areas outside the extent of the calibration points. 515

The residuals of the meso-scale MLR models had similar levels of skewness as the kriging models. 516

One exception was the MLR-all model residuals at the calibration points, which had a distribution that 517

was practically normal. Another difference of the MLR models from the kriging models at the meso-scale 518

was that the residuals were negatively skewed at both the micro- and macro-scale validation points. This 519

would suggest overestimations at the micro- and macro-scales. However, this still did not fit the 520

observed pattern of greater underestimations for error as validation scales increased. In addition, 521

despite having similar levels of skewness, the MLR’s error estimations at the micro-scale were less 522

underestimated at the micro-scale validation points than the kriging models using covariates. 523

At the macro-scale, almost all of the model residuals were strongly, positively skewed at the 524

calibration points. However, the residuals for those same models were moderately negatively skewed at 525

the validation points. In this case, the positive skewness of the calibration point residuals may have been 526

an indication of the error estimation performance because the errors were overestimated at all of the 527

validation points. 528

5. Conclusions 529

The multi-scale comparisons made in this study demonstrated the advantages of spatial 530

association approaches (i.e. MLR) for soil mapping at the landscape scale. These advantages are a lower 531

24

density of samples needed and the potential for transferability, which can be tested. Kriging, of course, 532

performed best with the highest quantity/density of calibration points available and a more normal 533

distribution of values. However, the MLR models were able to produce competitive validation results in 534

those ideal sampling conditions while remaining robust under conditions where the kriging models 535

severely declined in performance. The key for building a robust MLR model was identifying adequate 536

covariates and calibrating the model across the full feature space of the map area. 537

Spatial interpolation methods, which rely on spatial autocorrelation, perform best when the 538

distance between sample points is minimized and are only appropriate to use for making predictions 539

between points. As the distance between a predicted location and the nearest sampled location 540

increases, the estimated error also increases. This is a problem for landscape scale mapping because it 541

may be impractical to sample at the density required to achieve the reduction in prediction errors 542

needed for the respective maps’ purpose. Adding the use of spatially exhaustive covariates to kriging 543

models can reduce errors in spatial predictions. However, our empirical observations of model error 544

showed that the reduction in predicted error between ordinary kriging and kriging methods using 545

covariates may not be fully warranted as the reliability of those estimated error ranges decreased. 546

The MLR method tested in this study, which depended on spatial association instead of spatial 547

autocorrelation, achieved accuracies of prediction and error estimation comparable to the kriging 548

methods. However, it is noteworthy that when the appropriate covariates were used, the MLR models 549

were able to provide representation of features that were poorly represented in the maps of the other 550

models. For example, the area designated as a spatial extrapolation zone for the meso-scale tests was 551

spatially internal for the macro-scale calibration points. Despite being compatible with the modelling 552

strategy, without calibration points taken on that unique hillslope, the macro-scale kriging models were 553

still not able to predict SOC% for that area well. The only models that could provide useful predictions of 554

that area were the MLR-all models calibrated at either scale, which both utilized LAI as a covariate. 555

25

Although two of the kriging approaches were also able to use LAI as a covariate, it did not improve those 556

modelling approaches’ ability to predict variations in that area. 557

Despite the strength of statistical approaches for spatial modelling, soil mappers must not forget 558

the basic problem of induction. Although it is known that spatial association approaches, such as MLR, 559

are susceptible to issues of capturing the full feature space, results from this study demonstrated that 560

this can equally be a problem for spatial interpolation approaches, such as kriging. 561

Acknowledgements 562

Data used in this research was collected as part of the CarboZALF project. We thank Ingrid Onasch for 563

her field work on the micro-scale validation points, Emilien Aldana-Jague for soil sampling the points 564

used for the meso-scale validation, and Norbert Wypler for supporting the EM38 mapping and soil 565

sampling the points for the meso- and macro-scale calibrations. We also thank Ute Moritz for her 566

support in soil sampling and overall data management. Detlef Deumlich gave valuable comments for 567

evaluating the database and provided the TPI maps. 568

26

References 569

Adhikari, K., Kheir, R.B., Greve, M.B., Greve, M.H., 2013. Comparing kriging and regression approaches 570

for mapping soil clay content in a diverse Danish landscape. Soil Science 178(9), 505-517. 571

doi:10.1097/SS.0000000000000013 572

Adhikari, K., Hartemink, A.E., Minasny, B., Bou Kheir, R., Greve, M.B., Greve, M.H., 2014. Digital mapping 573

of soil organic carbon contents and stocks in Denmark. PLoS ONE 9:e105519. 574

doi:10.1371/journal.pone.0105519 575

Behrens, T., Förster, H., Scholten, T., Steinrücken, U., Spies, E.D., Goldschmitt, M., 2005. Digital soil 576

mapping using artificial neural networks. J. Plant Nutr. Soil Sci. 168(1), 21-33. 577

doi:10.1002/jpln.200421414 578

Bostan, P.A., Heuvelink, G.B.M., Akyurek, S.Z., 2012. Comparison of regression and kriging techniques for 579

mapping the average annual precipitation of Turkey. International Journal of Applied Earth 580

Observation and Geoinformation 19, 115-126. doi:10.1016/j.jag.2012.04.010 581

Bui, E.N., Henderson, B.L., Viergever, K., 2006. Knowledge discovery from models of soil properties 582

developed through data mining. Ecological Modelling 191, 431-446. 583

doi:10.1016/j.ecolmodel.2005.05.021 584

Burgess, T.M., Webster, R., 1980. Optimal interpolation and isarithmic mapping of soil properties. I: the 585

semi-variogram and punctual kriging. J. Soil Sci. 31(2), 315-331. doi:10.1111/j.1365-586

2389.1980.tb02084.x 587

Corwin, D.L., Lesch, S.M., Segal, E., Skaggs, T.H., Bradford, S.A., 2010. Comparison of sampling strategies 588

for characterizing spatial variability with apparent soil electrical conductivity directed soil sampling. J. 589

Environ. Eng. Geophys. 15, 147–162. doi:10.2113/JEEG15.3.147 590

Deumlich, D., Frielinghaus, M., this issue. Imprints of historical erosion patterns on recent erosion 591

processes depicted by C-transport and soil fertility parameters in NE Germany. Soil and Tillage 592

Research 593

Deumlich, D., Schmidt, R., Sommer, M., 2010. A multiscale soil-landform relationship in the glacial-drift 594

area based on digital terrain analysis and soil attributes. Journal of Plant Nutrition and Soil Science 595

173, 6, 843-851. doi:10.1002/jpln.200900094 596

Englund, E., Weber, D., Leviant, N., 1992. The effects of sampling design parameters on block selection. 597

Mathematical Geology 24(3), 329-343. doi:10.1007/BF00893753 598

599

Gerke, H.H., Rieckh, H., Sommer, M., this issue. Feedbacks between crop, water, and dissolved carbon in 600

a hummocky landscape with erosion-affected pedogenesis. Soil and Tillage Research 601

27

Gessler, P.E., Moore, I.D., McKenzie, N.J., Ryan, P.J., 1995. Soil-landscape modelling and spatial 602

prediction of soil attributes. Int. J. Geogr. Inf. Syst. 9, 421–432. doi:10.1080/02693799508902047 603

Goovaerts, P., 1999. Geostatistics in soil science: state-of-the-art and perspectives. Geoderma 89, 1–45. 604

doi:10.1016/S0016-7061(98)00078-0 605

Goovaerts, P., 2001. Geostatistical modelling of uncertainty in soil science. Geoderma 103, 3–26. 606

doi:10.1016/S0016-7061(01)00067-2 607

Gotway, C.A., Ferguson, R.B., Hergert, G.W., Peterson, T.A., 1996. Comparison of kriging and inverse-608

distance methods for mapping soil parameters. Soil Sci. Soc. Am. J. 60, 1237-1247. 609

doi:10.2136/sssaj1996.03615995006000040040x 610

Häring, T., Dietz, E., Osenstetter, S., Koschitzki, T., Schröder, B., 2012. Spatial disaggregation of complex 611

soil map units: A decision tree based approach in Bavarian forest soils. Geoderma 185-186, 37–47. 612

doi:10.1016/j.geoderma.2012.04.001 613

Häring, T., Reger, B., Ewald, J., Hothorn, T., Schröder, B., 2014. Regionalising indicator values for soil 614

reaction in the Bavarian Alps – how reliable are averaged indicator values for prediction? Folia 615

Geobot. 49, 385-405. doi:10.1007/s12224-013-9157-1 616

Hengl, T., Heuvelink, G.B.M., Rossiter, D.G., 2007. About regression-kriging: From equations to case 617

studies. Comput. Geosci. 33, 1301–1315. doi:10.1016/j.cageo.2007.05.001 618

Hengl, T., Rossiter, D., Stein, A., 2003. Soil sampling strategies for spatial prediction by correlation with 619

auxiliary maps. Aust. J. Soil Res. 41, 1403–1422. doi:10.1071/SR03005 620

Holmes, G., Hall, M., Frank, E., 1999. Generating rule sets from model trees. Advanced Topics in Artificial 621

Intelligence, Lecture Notes in Computer Science 1747, 1-12. doi:10.1007/3-540-46695-9_1 622

Hume, D., 1739/2001. A Treatise of Human Nature. Oxford University Press, New York. 622 p. 623

IUSS Working Group WRB, 2014. World Reference Base for Soil Resources 2014. International soil 624

classification system for naming soils and creating legends for soil maps. World Soil Resources 625

Reports No. 106. FAO, Rome. 626

Juang, K.-W., Lee, D.-Y., 1998. A comparison of three kriging methods using auxiliary variables in heavy-627

metal contaminated soils. J. Environ. Qual. 27, 355-363. 628

doi:10.2134/jeq1998.00472425002700020016x 629

Kirkels, F.M.S.A., Cammeraat, L.H., Kuhn, N.J., 2014.The fate of soil organic carbon upon erosion, 630

transport and deposition in agricultural landscapes – A review of different concepts. Geomorphology 631

226, 94-105. doi:10.1016/j.geomorph.2014.07.023 632

Knotters, M., Brus, D.J., Oude Voshaar, J.H., 1995. A comparison of kriging, co-kriging and kriging 633

combined with regression for spatial interpolation of horizon depth with censored observations. 634

Geoderma 67, 227–246. doi:10.1016/0016-7061(95)00011-C 635

28

Koszinski, S., Gerke, H. H., Hierold, W., Sommer, M., 2013. Geophysical-based modeling of a kettle hole 636

catchment of the morainic soil landscape. Vadose Zone Journal 12(4). doi:10.2136/vzj2013.02.0044 637

Kravchenko, A., Bullock, D.G., 1999. A comparative study of interpolation methods for mapping soil 638

properties. Agronomy Journal 91, 393-400. doi:10.2134/agronj1999.00021962009100030007x 639

Lacoste, M., Minasny, B., McBratney, A., Michot, D., Viaud, V., Walter, C., 2014. High resolution 3D 640

mapping of soil organic carbon in a heterogeneous agricultural landscape. Geoderma 213, 296-311. 641

doi:10.1016/j.geoderma.2013.07.002 642

Laslett, G.M., 1994. Kriging and splines: an empirical comparison of their predictive performance in some 643

applications. Journal of the American Statistical Association 83(426), 391-409. 644

doi:10.1080/01621459.1994.10476759 645

Leenaers, H., Okx, J.P., Burrough, P.A., 1990. Comparison of spatial prediction methods for mapping 646

floodplain soil pollution. Catena 17, 535-550. doi:10.1016/0341-8162(90)90028-C 647

Lemercier, B., Lacoste, M., Loum, M., Walter, C., 2012. Extrapolation at regional scale of local soil 648

knowledge using boosted classification trees: a two-step approach. Geoderma 171-172, 75-84. 649

doi:10.1016/j.geoderma.2011.03.010 650

Levi, M.R., Rasmussen, C., 2014. Covariate selection with iterative principal component analysis for 651

predicting physical soil properties. Geoderma 219-220, 46–57. doi:10.1016/j.geoderma.2013.12.013 652

Li, H.Y., Webster, R., Shi, Z., 2015. Mapping soil salinity in the Yangtze delta: REML and universal kriging 653

(E-BLUP) revisited. Geoderma 237-238, 71–77. doi:10.1016/j.geoderma.2014.08.008 654

Malone, B.P., McBratney, A.B., Minasny, B., 2011. Empirical estimates of uncertainty for mapping 655

continuous depth functions of soil attributes. Geoderma 160, 614-626. 656

doi:10.1016/j.geoderma.2010.11.013 657

McBratney, A.B., Mendonça Santos, M.L., Minasny, B., 2003. On digital soil mapping. Geoderma 117, 3–658

52. doi:10.1016/S0016-7061(03)00223-4 659

McBratney, A.B., Odeh, I.O.A., Bishop, T.F.A., Dunbar, M.S., Shatar, T.M., 2000. An overview of 660

pedometric techniques for use in soil survey. Geoderma 97, 293–327. doi:10.1016/S0016-661

7061(00)00043-4 662

McBratney, A.B., Webster, R., 1983. Optimal interpolation and isarithmic mapping of soil properties v. 663

co-regionalization and multiple sampling strategy. J. Soil Sci. 34, 137–162. doi:10.1111/j.1365-664

2389.1983.tb00820.x 665

McKenzie, N.J., Austin, M.P., 1993. A quantitative Australian approach to medium and small scale 666

surveys based on soil stratigraphy and environmental correlation. Geoderma 57(4), 329-355. 667

doi:10.1016/0016-7061(93)90049-Q 668

29

Miller, B.A., Koszinski, S., Wehrhan, M., Sommer, M., 2015a. Comparison of spatial association 669

approaches for landscape mapping of soil organic carbon stocks. SOIL 1(1):217-233. 670

doi:10.5194/solid-1-217-2015 671

Miller, B.A., Koszinski, S., Wehrhan, M., Sommer, M., 2015b. Impact of multi-scale predictor selection for 672

modeling soil properties. Geoderma 239-240, 97–106. doi:10.1016/j.geoderma.2014.09.018 673

Minasny, B. and McBratney, A.B., 2008. Regression rules as a tool for predicting soil properties from 674

infrared reflectance spectroscopy. Chemometrics and Intelligent Laboratory Systems 94(1), 72-79. 675

doi:10.1016/j.chemolab.2008.06.003 676

Moore, I.D., Gessler, P.E., Nielsen, G.A., Peterson, G.A., 1993. Soil attribute prediction using terrain 677

analysis. Soil Sci. Soc. Am. J. 57, 443–452. doi:10.2136/sssaj1993.03615995005700020026x 678

Mueller, T.G., Pierce, F.J., 2003. Soil carbon maps: enhancing spatial estimates with simple terrain 679

attributes at multiple scales. Soil Sci. Soc. Am. J. 67, 258–267. doi:10.2136/sssaj2003.2580 680

Odeh, I.O.A., McBratney, A.B., Chittleborough, D.J., 1994. Spatial prediction of soil properties from 681

landform attributes derived from a digital elevation model. Geoderma 63, 197–214. 682

doi:10.1016/0016-7061(94)90063-9 683

Oliver, M.A. and Webster, R. 2014, 2014. A tutorial guide to geostatistics: Computing and modelling 684

variograms and kriging. Catena 113, 56-59. doi:10.1016/j.catena.2013.09.006 685

Popper, K.R. 1959/2005. The Logic of Scientific Discovery. Routledge, New York. 513 p.Quinlan, J.R., 686

1993. Combining instance-based and model-based learning, in: Proceedings of the Tenth 687

International Conference on Machine Learning, Kaufmann, M. (Ed.), 236-243. 688

Quinlan, J.R., 1994. C4.5: Programs for machine learning. Machine Learning 16, 235-240. 689

Schloeder, C.A., Zimmermann, N.E., Jacobs, M.J., 2001. Comparison of methods for interpolating soil 690

properties using limited data. Soil Sci. Soc. Am. J. 65, 470–479. doi:10.2136/sssaj2001.652470x 691

Schmidt, K., Behrens, T., Daumann, J., Ramirez-Lopez, L., Werban, U., Dietrich, P., Scholten, T., 2014. A 692

comparison of calibration sampling schemes at the field scale. Geoderma 232-234, 243–256. 693

doi:10.1016/j.geoderma.2014.05.013 694

Shrestha, D.L. and Solomatine, D.P., 2006. Machine learning approaches for estimation of prediction 695

interval for the model output. Neural Networks 19(2), 225-235. doi:10.1016/j.neunet.2006.01.012 696

Sommer, M., Gerke, H. H., Deumlich, D., 2008. Modelling soil landscape genesis: A "time split" approach 697

for hummocky agricultural landscapes. Geoderma 145, 3-4, 480-493. 698

doi:10.1016/j.geoderma.2008.01.012 699

Tamari, S., Wösten, J.H.M., Ruiz- Suárez, J.C., 1996. Testing an artificial neural network for predicting soil 700

hydraulic conductivity. Soil Sci. Soc. Am. J. 60(6), 1732-1741. 701

doi:10.2136/sssaj1996.03615995006000060018x 702

30

Theobald, D.M., D.L. Stevens, J.D.W., Urquhart, N.S., Olsen, A.R., Norman, J.B., 2007. Using GIS to 703

generate spatially-balanced random survey designs for natural resource applications. Environ. 704

Manage. 40, 134-146. doi:10.1007/s00267-005-0199-x 705

Triantafilis, J., Odeh, I.O.A., McBratney, A.B., 2001. Five geostatistical models to predict soil salinity from 706

electromagnetic induction data across irrigated cotton. Soil Sci. Soc. Am. J. 65, 869–878. 707

doi:10.2136/sssaj2001.653869x 708

Vasques, G.M., Grunwald, S., Comerford, N.B., Sickman, J.O., 2010. Regional modelling of soil carbon at 709

multiple depths within a subtropical watershed. Geoderma 156, 326–336. 710

doi:10.1016/j.geoderma.2010.03.002 711

Van Oost, K., Quine, T. A., Govers, G., De Gryze, S., Six, J., Harden, J. W., Ritchie, J. C., McCarty, G. W., 712

Heckrath, G., Kosmas, C., Giraldez, J. V., da Silva, J. R. Marques, Merckx, R., 2007. The impact of 713

agricultural soil erosion on the global carbon cycle. Science 318, 626-629. 714

doi:10.1126/science.1145724 715

Voltz, M., Webster, R., 1990. A comparison of kriging, cubic splines and classification for predicting soil 716

properties from sample information. Journal of Soil Science 41, 473-490. doi:10.1111/j.1365-717

2389.1990.tb00080.x 718

Webster, R., Oliver, M.A., 1990. Statistical Methods in Soil and Land Resource Survey: Spatial Information 719

Systems. Oxford University Press, Oxford, UK. 316 p. 720

Wollenhaupt, N.C., Wolkowski, R.P., Clayton, M.K., 1994. Mapping soil test phosphorus and potassium 721

for variable-rate fertilizer application. Journal of Production Agriculture 7, 441-448. 722

doi:10.2134/jpa1994.0441 723

Zhu, Q., Lin, H.S., 2010. Comparing Ordinary Kriging and Regression Kriging for Soil Properties in 724

Contrasting Landscapes. Pedosphere 20, 594–606. doi:10.1016/S1002-0160(10)60049-5 725

31

Tables 726

Table 1. Covariates considered for selection by Cubist. The resolution of the original digital elevation 727

model was maintained for all of the scale-dependent, terrain derivatives. 728

Covariate Software Analysis Scale

Elevation (2011 LiDAR, bare-earth) n/a 1 m

Slope gradient GRASS 3 - 490 m

Profile curvature GRASS 3 - 490 m

Plan curvature GRASS 3 - 490 m

Aspect (8 classes) ArcGIS (raster calculator) 3 - 490 m

Aspect -west {rotated for N, E, and S} GRASS 3 - 490 m

Northness transformed from aspect 3 - 490 m

Eastness transformed from aspect 3 - 490 m

Relative elevation - rect. neighborhood ArcGIS toolbox 10 - 10000 m

Relative elevation - circ. neighborhood ArcGIS toolbox 10 - 10000 m

Topographic position index (TPI) ArcGIS toolbox 10 - 10000 m

TPI - slope position ArcGIS toolbox multiple

TPI - landform classification ArcGIS toolbox multiple

Hillslope position ArcGIS toolbox multiple

Catchment area SAGA n/a

Catchment slope SAGA n/a

Channel network base level SAGA n/a

Convergence index SAGA n/a

Flow path length SAGA n/a

Length-slope factor SAGA n/a

Modified catchment area SAGA n/a

SAGA wetness index SAGA n/a

Stream power index SAGA n/a

Vertical distance to channel SAGA n/a

Wetness index SAGA n/a

Covariate Resolution Date

ECa (vertical mode) 1 m (for the meso-scale) 04 Mar 2009

ECa (vertical mode) 5 m (for the macro-scale) 04 Apr 2007

ECa (horizontal mode) 1 m (for the meso-scale) 04 Mar 2009

ECa (horizontal mode) 5 m (for the macro-scale) 04 Apr 2007

LAI (Quickbird) 5 m 26 Jun 2005 729

730

32

Table 2. Relative use (%) of covariates in models derived by Cubist for predicting SOC%. 731

Meso-scale Macro-scale

Rules MLR Parameter Rules MLR Parameter

All Covariates All Covariates

2 Rules

1 Rule 100% 14% Relative elev. - rect. (5200 m) 100% 100% LAI

100% Eastness (3 m)

100% Eastness (9 m)

100% LAI

Without LAI or ECa (limited) Without LAI or ECa (limited)

2 Rules

1 Rule 100% 100% Relative elev. - rect. (5200 m) 100% 100% Relative elev. - rect. (20 m)

732

Table 3. Coefficient of determination statistics (R2) for models built from the meso-scale calibration 733

points. Italics highlight the strongest performing model/s at the respective validation scales. 734

Validation Points

Models Micro Meso-All Meso-Internal Meso-External Macro

Mean model 0.00 0.00 0.00 0.00 0.00

OK 0.56 0.61 0.71 0.05 0.00

CK-LAI 0.57 0.60 0.71 0.03 0.03

CK-limited 0.54 0.54 0.70 0.06 0.02

UK 0.55 0.55 0.71 0.06 0.05

MLR-all 0.59 0.55 0.61 0.28 0.34

MLR-limited 0.58 0.45 0.48 0.04 0.32 735

Table 4. Mean absolute error statistics (MAE) for models built from the meso-scale calibration points. 736

Italics highlight the strongest performing model/s at the respective validation scales. 737

Validation Points


Mean model 0.143 0.124 0.114 0.172 0.460

OK 0.111 0.085 0.071 0.155 0.505

CK-LAI 0.110 0.088 0.069 0.183 0.477

CK-limited 0.123 0.097 0.071 0.229 0.501

UK 0.120 0.096 0.069 0.228 0.486

MLR-all 0.109 0.085 0.078 0.119 0.450

MLR-limited 0.102 0.097 0.093 0.115 0.476 738

739

33

Table 5. Coefficient of determination statistics (R2) for models built from the macro-scale calibration 740

points. Italics highlight the strongest performing model/s at the respective validation scales. 741

Validation Points

Model Micro Meso-All Meso-Internal Meso-External

Mean model 0.00 0.00 0.00 0.00

OK 0.41 0.39 0.42 0.07

CK-LAI 0.41 0.39 0.43 0.06

CK-limited 0.41 0.39 0.42 0.07

UK 0.41 0.39 0.43 0.06

MLR-all 0.57 0.56 0.56 0.56

MLR-limited 0.28 0.01 0.06 0.03 742

Table 6. Mean absolute error statistics (MAE) for models built from the macro-scale calibration points. 743

Italics highlight the strongest performing model/s at the respective validation scales. 744

Validation Points

Model Micro Meso-All Meso-Internal Meso-External

Mean model 0.231 0.307 0.334 0.172

OK 0.130 0.150 0.118 0.312

CK-LAI 0.129 0.148 0.116 0.312

CK-limited 0.130 0.150 0.118 0.312

UK 0.129 0.148 0.116 0.312

MLR-all 0.161 0.138 0.145 0.102

MLR-limited 0.176 0.310 0.281 0.461 745

Table 7. Skewness coefficients of the residuals for the observed minus the predictions of the meso-scale 746

models. 747

Calibration Points

Validation Points


Mean model 1.35 0.65 1.25 1.72 -0.53 1.76

OK 0.50 -1.14 -0.40 0.59 0.14 1.68

CK-LAI 0.34 -1.20 -0.59 0.53 0.32 1.78

CK-limited 0.37 -1.27 -1.05 0.62 0.23 1.53

UK 0.22 -1.25 -1.03 0.59 0.27 1.59

MLR-all -0.03 -1.21 0.71 0.81 0.73 -1.77

MLR-limited -0.57 -1.40 0.33 0.31 0.68 -1.70 748

749

34

Table 8. Skewness coefficients of the residuals for the observed minus the predictions of the macro-scale 750

models. 751

Calibration Points

Validation Points

Models Micro Meso-All Meso-Internal Meso-External

Mean model 1.76 0.65 1.25 1.72 -0.53

OK 1.79 -0.72 -0.74 -0.02 -0.17

CK-LAI 1.79 -0.75 -0.75 -0.01 -0.15

CK-limited 1.79 -0.72 -0.74 -0.02 -0.17

UK 1.79 -0.75 -0.75 -0.01 -0.15

MLR-all 2.28 -0.58 -0.11 -0.06 -0.94

MLR-limited 0.13 -1.28 4.73 1.46 3.21 752

Table 9. Mean of estimated errors by the respective meso-scale models and the percentage of observed 753

errors within the estimated error range at validation points. Assuming the estimated (modelled) error 754

represents one standard deviation, the actual SOC% values should be within the estimated error range 755

approximately 68% of the time. Observed errors at the validation points being within the estimated error 756

less than this rate indicates that the estimated error is underestimated, and vice versa. 757

OK CK-LAI CK-limited UK MLR-all MLR-limited

Micro Validation

Mean Est. Error 0.096 0.066 0.057 0.056 0.085 0.093

Within Range 59% 45% 37% 34% 58% 70%

Meso-Internal Validation

Mean Est. Error 0.098 0.067 0.058 0.056 0.078 0.090

Within Range 73% 61% 57% 54% 56% 56%

Meso Validation

Mean Est. Error 0.109 0.080 0.071 0.070 0.079 0.091

Within Range 70% 57% 53% 51% 55% 54%

Macro Validation

Mean Est. Error 0.160 0.149 0.262 0.259 0.084 0.092

Within Range 21% 21% 25% 25% 29% 29%

758

759

35

Table 10. Mean of estimated errors by the respective macro-scale models and the percentage of 760

observed errors within the estimated error range at validation points. Assuming the estimated 761

(modelled) error represents one standard deviation, the actual SOC% values should be within the 762

estimated error range approximately 68% of the time. Observed errors at the validation points being 763

within the estimated error more than this rate indicates that the estimated error is overestimated, and 764

vice versa. 765

OK CK-LAI CK-limited UK MLR-all MLR-limited

Micro Validation

Mean Est. Error 0.587 0.587 0.587 0.587 0.306 0.382

Within Range 100% 100% 100% 100% 86% 90%

Meso Validation

Mean Est. Error 0.631 0.631 0.631 0.631 0.306 0.382

Within Range 98% 98% 98% 98% 96% 76%

766

36

Figures 767

768 Figure 1. Location of study area within the Uckermark district of northeast Germany. 769

37

770 Figure 2. Locations of the sample points at the a) macro-scale and b) meso- and micro-scale. The points 771

used to calibrate the macro-scale models were also used to independently validate the meso-scale 772

models. The spatial interpolation area of b) is the experimental plot area for the CarboZALF project. The 773

aerial images are enhanced with hillshading of the LiDAR elevation model. Note that the soil sampling 774

and collection of covariate data was completed before this imagery was taken and before the 775

infrastructure for the CarboZALF experimental plots were installed. 776

777

38

778 Figure 3. Histograms for the a) micro-scale and b) macro-scale calibration points. Although it could be 779

argued that these calibration sets should be transformed to produce less skewed distributions, 780

complications from back-transformations for analyzing estimated and actual error with the validation 781

points was avoided by maintaining the original values. This study takes a heuristic approach by validating 782

all resulting models with independent data sets that have a higher quantity of samples. 783

39

784 Figure 4. Experimental variograms of a) meso-scale SOC%, b) macro-scale SOC%, c) meso-scale SOC% with 785

LAI, d) macro-scale SOC% with LAI, e) meso-scale SOC% with relative elevation (5200 m analysis scale), f) 786

macro-scale SOC% with relative elevation (20 m analysis scale). Points are the means of lags binned over 787

all directions and the solid lines show the models fitted to them with a spherical model. 788

789 Figure 5. Map realizations of SOC% from meso-scale calibration points by modelling with: a) ordinary 790

40

kriging (OK), b) co-kriging with LAI (CK-LAI), c) co-kriging with a more limited pool of covariates (CK-791

limited), d) universal kriging with LAI and relative elevation at a 5200m analysis scale (UK), e) rule-based 792

multiple linear regression with all available covariates (MLR-all), and f) rule-based multiple linear 793

regression with a more limited pool of covariates (MLR-limited). 794

795 Figure 6. Differences between the respective meso-scale models, using the UK map as a baseline. 796

Positive values indicate the model predicted more than UK model, negative values indicate the model 797

predicted less. a) OK, b) CK-LAI, c) CK-limited, d) MLR-all, and e) MLR-limited. Note that because the UK 798

model includes both of the covariates, the difference between it and the other kriging maps reflects the 799

pattern of the spatial information missing from the other kriging approach. 800

41

801 Figure 7. Map realizations of SOC% from macro-scale calibration points by modelling with: a) OK, b) CK-802

LAI, c) CK-limited (for this model, relative elevation at a 20 m analysis scale was the covariate), d) UK, e) 803

MLR-all, and f) MLR-limited. 804

805 Figure 8. Differences between the respective macro-scale models, using the UK map as a baseline. 806

Positive values indicate the model predicted more than UK model, negative values indicate the model 807

predicted less. a) OK, b) CK-LAI, c) CK-limited, d) MLR-all, and e) MLR-limited. Note that because the UK 808

model includes both of the covariates, the difference between it and the other kriging maps reflects the 809

pattern of the spatial information missing from the other kriging approach. 810

42

811 Figure 9.Estimated error maps produced by the respective meso-scale models: a) OK, b) CK-LAI, c) CK-812

limited, d) UK, e) MLR-all, and f) MLR-limited. 813

814 Figure 10.Estimated error maps produced by the respective macro-scale models: a) OK, b) CK-LAI, c) CK-815

limited, d) UK, e) MLR-all, and f) MLR-limited. 816

towards mapping soil carbon landscapes: issues of …

Documents