a comparison of learning methods to predict n2o fluxes and n leaching
DESCRIPTION
April, 4th, 2012 Séminaire BIA, INRA d’Auzeville, ToulouseTRANSCRIPT
![Page 1: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/1.jpg)
A comparison of learning methods to predict N2Ofluxes and N leaching
Nathalie Villa-Vialaneixhttp://www.nathalievilla.org
Joined work with Marco (Follador & Ratto) and Adrian Leip (EC, Ispra,Italy)
April, 27th, 2012 - BIA, INRA Auzeville
SAMM (Université Paris 1) &
IUT de Carcassonne (Université de Perpignan)
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 1 / 27
![Page 2: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/2.jpg)
Sommaire
1 DNDC-Europe model description
2 Methodology
3 Results
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 2 / 27
![Page 3: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/3.jpg)
DNDC-Europe model description
Sommaire
1 DNDC-Europe model description
2 Methodology
3 Results
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 3 / 27
![Page 4: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/4.jpg)
DNDC-Europe model description
General overview
Modern issues in agriculture
• fight against the food crisis;
• while preserving environments.
EC needs simulation tools to
• link the direct aids with the respect of standards ensuring propermanagement;
• quantify the environmental impact of European policies (“CrossCompliance”).
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 4 / 27
![Page 5: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/5.jpg)
DNDC-Europe model description
General overview
Modern issues in agriculture
• fight against the food crisis;
• while preserving environments.
EC needs simulation tools to
• link the direct aids with the respect of standards ensuring propermanagement;
• quantify the environmental impact of European policies (“CrossCompliance”).
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 4 / 27
![Page 6: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/6.jpg)
DNDC-Europe model description
Cross Compliance Assessment Tool
DNDC is a biogeochemical model.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 5 / 27
![Page 7: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/7.jpg)
DNDC-Europe model description
Zoom on DNDC-EUROPE
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 6 / 27
![Page 8: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/8.jpg)
DNDC-Europe model description
Moving from DNDC-Europe to metamodelingNeeds for metamodeling
• easier integration into CCAT
• faster execution and responding scenario analysis
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 7 / 27
![Page 9: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/9.jpg)
DNDC-Europe model description
Moving from DNDC-Europe to metamodelingNeeds for metamodeling
• easier integration into CCAT
• faster execution and responding scenario analysis
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 7 / 27
![Page 10: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/10.jpg)
DNDC-Europe model description
Data [Villa-Vialaneix et al., 2012]Data extracted from the biogeochemical simulator DNDC-EUROPE: ∼19 000 HSMU (Homogeneous Soil Mapping Units ' 1km2 but the area isquite varying) used for corn cultivation:• corn corresponds to ' 4.6% of UAA;• HSMU for which at least 10% of the agricultural land was used for
corn were selected.
2 outputs to be estimated (independently) from the inputs:• N2O fluxes (greenhouse gaz);• N leaching (one major cause for water pollution).
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 8 / 27
![Page 11: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/11.jpg)
DNDC-Europe model description
Data [Villa-Vialaneix et al., 2012]Data extracted from the biogeochemical simulator DNDC-EUROPE:11 input (explanatory) variables (selected by experts and previoussimulations)• N FR (N input through fertilization; kg/ha y);• N MR (N input through manure spreading; kg/ha y);• Nfix (N input from biological fixation; kg/ha y);• Nres (N input from root residue; kg/ha y);• BD (Bulk Density; g/cm3 );• SOC (Soil organic carbon in topsoil; mass fraction);• PH (Soil pH);• Clay (Ratio of soil clay content);• Rain (Annual precipitation; mm/y);• Tmean (Annual mean temperature; C);• Nr (Concentration of N in rain; ppm).
2 outputs to be estimated (independently) from the inputs:• N2O fluxes (greenhouse gaz);• N leaching (one major cause for water pollution).
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 8 / 27
![Page 12: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/12.jpg)
DNDC-Europe model description
Data [Villa-Vialaneix et al., 2012]Data extracted from the biogeochemical simulator DNDC-EUROPE:2 outputs to be estimated (independently) from the inputs:• N2O fluxes (greenhouse gaz);• N leaching (one major cause for water pollution).
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 8 / 27
![Page 13: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/13.jpg)
Methodology
Sommaire
1 DNDC-Europe model description
2 Methodology
3 Results
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 9 / 27
![Page 14: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/14.jpg)
Methodology
Methodology
Purpose: Comparison of several metamodeling approaches (accuracy,computational time...).
For every data set, every output and every method,
1 The data set was split into a training set and a test set (on a80%/20% basis);
2 The regression function was learned from the training set (with afull validation process for the hyperparameter tuning);
3 The performances were calculated on the basis of the test set: forthe test set, predictions were made from the inputs and compared tothe true outputs.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 10 / 27
![Page 15: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/15.jpg)
Methodology
Methodology
Purpose: Comparison of several metamodeling approaches (accuracy,computational time...).For every data set, every output and every method,
1 The data set was split into a training set and a test set (on a80%/20% basis);
2 The regression function was learned from the training set (with afull validation process for the hyperparameter tuning);
3 The performances were calculated on the basis of the test set: forthe test set, predictions were made from the inputs and compared tothe true outputs.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 10 / 27
![Page 16: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/16.jpg)
Methodology
Methodology
Purpose: Comparison of several metamodeling approaches (accuracy,computational time...).For every data set, every output and every method,
1 The data set was split into a training set and a test set (on a80%/20% basis);
2 The regression function was learned from the training set (with afull validation process for the hyperparameter tuning);
3 The performances were calculated on the basis of the test set: forthe test set, predictions were made from the inputs and compared tothe true outputs.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 10 / 27
![Page 17: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/17.jpg)
Methodology
Methods
• 2 linear models:• one with the 11 explanatory variables;• one with the 11 explanatory variables plus several nonlinear
transformations of these variables (square, log...): stepwise AIC wasused to train the model;
• MLP• SVM• RF• 3 approaches based on splines: ACOSSO (ANOVA splines), SDR
(improvement of the previous one) and DACE (kriging basedapproach).
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 11 / 27
![Page 18: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/18.jpg)
Methodology
Regression
Consider the problem where:
• Y ∈ R has to be estimated from X ∈ Rd ;
• we are given a learning set, i.e., N i.i.d. observations of (X ,Y),(x1, y1), . . . , (xN , yN).
Example: Predict N2O fluxes from PH, climate, concentration of N in rain,fertilization for a large number of HSMU . . .
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 12 / 27
![Page 19: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/19.jpg)
Methodology
Multilayer perceptrons (MLP)
A “one-hidden-layer perceptron” takes the form:
Φw : x ∈ Rd →
Q∑i=1
w(2)i G
(xT w(1)
i + w(0)i
)+ w(2)
0
where:
• the w are the weights of the MLP that have to be learned from thelearning set;
• G is a given activation function: typically, G(z) = 1−e−z
1+e−z ;
• Q is the number of neurons on the hidden layer. It controls theflexibility of the MLP. Q is a hyper-parameter that is usually tunedduring the learning process.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 13 / 27
![Page 20: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/20.jpg)
Methodology
Symbolic representation of MLPIN
PU
TS
x1
x2
. . .
xd
w(1)11
w(1)pQ
Neuron 1
Neuron Q
OU
TPU
TS
φw(x)
w(2)1
w(2)Q
+w(0)Q
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 14 / 27
![Page 21: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/21.jpg)
Methodology
Learning MLP
• Learning the weights: w are learned by a mean squared errorminimization scheme :
w∗ = arg minw
N∑i=1
L(yi ,Φw(xi)).
Problem: MSE is not quadratic in w and thus some solutions can belocal minima.
• Tuning the hyper-parameters, C and Q : simple validation wasused to tune first C and Q .
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 15 / 27
![Page 22: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/22.jpg)
Methodology
Learning MLP
• Learning the weights: w are learned by a mean squared errorminimization scheme penalized by a weight decay to avoidoverfitting (ensure a better generalization ability):
w∗ = arg minw
N∑i=1
L(yi ,Φw(xi))+C‖w‖2.
Problem: MSE is not quadratic in w and thus some solutions can belocal minima.
• Tuning the hyper-parameters, C and Q : simple validation wasused to tune first C and Q .
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 15 / 27
![Page 23: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/23.jpg)
Methodology
Learning MLP
• Learning the weights: w are learned by a mean squared errorminimization scheme penalized by a weight decay to avoidoverfitting (ensure a better generalization ability):
w∗ = arg minw
N∑i=1
L(yi ,Φw(xi))+C‖w‖2.
Problem: MSE is not quadratic in w and thus some solutions can belocal minima.
• Tuning the hyper-parameters, C and Q : simple validation wasused to tune first C and Q .
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 15 / 27
![Page 24: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/24.jpg)
Methodology
Learning MLP
• Learning the weights: w are learned by a mean squared errorminimization scheme penalized by a weight decay to avoidoverfitting (ensure a better generalization ability):
w∗ = arg minw
N∑i=1
L(yi ,Φw(xi))+C‖w‖2.
Problem: MSE is not quadratic in w and thus some solutions can belocal minima.
• Tuning the hyper-parameters, C and Q : simple validation wasused to tune first C and Q .
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 15 / 27
![Page 25: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/25.jpg)
Methodology
SVMSVM is also an algorithm based on penalized error loss minimization:
1 Basic linear SVM for regression: Φ(w,b) is of the form x → wT x + bwith (w, b) solution of
arg minN∑
i=1
Lε(yi ,Φ(w,b)(xi)) + λ‖w‖2
where• λ is a regularization (hyper) parameter (to be tuned);• Lε(y, y) = max{|y − y | − ε, 0} is an ε-insensitive loss function
See ε-insensitive loss function
2 Non linear SVM for regression are the same except that a nonlinear (fixed) transformation of the inputs is previously made:ϕ(x) ∈ H is used instead of x.
Kernel trick: in fact, ϕ is never explicit but used through a kernel,K : Rd × Rd → R. This kernel is used for K(xi , xj) = ϕ(xi)
Tϕ(xj).
Common kernel: Gaussian kernel
Kγ(u, v) = e−γ‖u−v‖2
is known to have good theoretical properties both for accuracy andgeneralization.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 16 / 27
![Page 26: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/26.jpg)
Methodology
SVMSVM is also an algorithm based on penalized error loss minimization:
1 Basic linear SVM for regression2 Non linear SVM for regression are the same except that a non
linear (fixed) transformation of the inputs is previously made:ϕ(x) ∈ H is used instead of x.
Kernel trick: in fact, ϕ is never explicit but used through a kernel,K : Rd × Rd → R. This kernel is used for K(xi , xj) = ϕ(xi)
Tϕ(xj).Common kernel: Gaussian kernel
Kγ(u, v) = e−γ‖u−v‖2
is known to have good theoretical properties both for accuracy andgeneralization.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 16 / 27
![Page 27: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/27.jpg)
Methodology
SVMSVM is also an algorithm based on penalized error loss minimization:
1 Basic linear SVM for regression2 Non linear SVM for regression are the same except that a non
linear (fixed) transformation of the inputs is previously made:ϕ(x) ∈ H is used instead of x.Kernel trick: in fact, ϕ is never explicit but used through a kernel,K : Rd × Rd → R. This kernel is used for K(xi , xj) = ϕ(xi)
Tϕ(xj).
Common kernel: Gaussian kernel
Kγ(u, v) = e−γ‖u−v‖2
is known to have good theoretical properties both for accuracy andgeneralization.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 16 / 27
![Page 28: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/28.jpg)
Methodology
SVMSVM is also an algorithm based on penalized error loss minimization:
1 Basic linear SVM for regression2 Non linear SVM for regression are the same except that a non
linear (fixed) transformation of the inputs is previously made:ϕ(x) ∈ H is used instead of x.Kernel trick: in fact, ϕ is never explicit but used through a kernel,K : Rd × Rd → R. This kernel is used for K(xi , xj) = ϕ(xi)
Tϕ(xj).Common kernel: Gaussian kernel
Kγ(u, v) = e−γ‖u−v‖2
is known to have good theoretical properties both for accuracy andgeneralization.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 16 / 27
![Page 29: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/29.jpg)
Methodology
Learning SVM• Learning (w, b): w =
∑Ni=1 αiK(xi , .) and b are calculated by an
exact optimization scheme (quadratic programming). The only stepthat can be time consumming is the calculation of the kernel matrix:
K(xi , xj) for i, j = 1, . . . ,N.
The resulting ΦN is known to be of the form:
ΦN(x) =N∑
i=1
αiK(xi , x) + b
where only a few αi are non zero. The corresponding xi are calledsupport vectors.
• Tuning of the hyper-parameters, C = 1/λ, ε and γ: simplevalidation has been used. To limit waste of time, ε has not beentuned in our experiments but set to the default value (1) whichensured 0.5N support vectors at most.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 17 / 27
![Page 30: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/30.jpg)
Methodology
Learning SVM• Learning (w, b): w =
∑Ni=1 αiK(xi , .) and b are calculated by an
exact optimization scheme (quadratic programming). The only stepthat can be time consumming is the calculation of the kernel matrix:
K(xi , xj) for i, j = 1, . . . ,N.
The resulting ΦN is known to be of the form:
ΦN(x) =N∑
i=1
αiK(xi , x) + b
where only a few αi are non zero. The corresponding xi are calledsupport vectors.
• Tuning of the hyper-parameters, C = 1/λ, ε and γ: simplevalidation has been used. To limit waste of time, ε has not beentuned in our experiments but set to the default value (1) whichensured 0.5N support vectors at most.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 17 / 27
![Page 31: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/31.jpg)
Methodology
Learning SVM• Learning (w, b): w =
∑Ni=1 αiK(xi , .) and b are calculated by an
exact optimization scheme (quadratic programming). The only stepthat can be time consumming is the calculation of the kernel matrix:
K(xi , xj) for i, j = 1, . . . ,N.
The resulting ΦN is known to be of the form:
ΦN(x) =N∑
i=1
αiK(xi , x) + b
where only a few αi are non zero. The corresponding xi are calledsupport vectors.
• Tuning of the hyper-parameters, C = 1/λ, ε and γ: simplevalidation has been used. To limit waste of time, ε has not beentuned in our experiments but set to the default value (1) whichensured 0.5N support vectors at most.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 17 / 27
![Page 32: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/32.jpg)
Methodology
From regression tree to random forest
Example of a regression tree
|SOCt < 0.095
PH < 7.815
SOCt < 0.025
FR < 130.45 clay < 0.185
SOCt < 0.025
SOCt < 0.145
FR < 108.45PH < 6.5
4.366 7.10015.010 8.975
2.685 5.257
26.26028.070 35.900 59.330
Each split is made such thatthe two induced subsets havethe greatest homogeneity pos-sible.The prediction of a final nodeis the mean of the Y value ofthe observations belonging tothis node.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 18 / 27
![Page 33: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/33.jpg)
Methodology
Random forest
Basic principle: combination of a large number of under-efficientregression trees (the prediction is the mean prediction of all trees).
For each tree, two simplifications of the original method are performed:
1 A given number of observations are randomly chosen among thetraining set: this subset of the training data set is called in-bag samplewhereas the other observations are called out-of-bag and are used tocontrol the error of the tree;
2 For each node of the tree, a given number of variables are randomlychosen among all possible explanatory variables.
The best split is then calculated on the basis of these variables and of thechosen observations. The chosen observations are the same for a giventree whereas the variables taken into account change for each split.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 19 / 27
![Page 34: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/34.jpg)
Methodology
Random forest
Basic principle: combination of a large number of under-efficientregression trees (the prediction is the mean prediction of all trees).For each tree, two simplifications of the original method are performed:
1 A given number of observations are randomly chosen among thetraining set: this subset of the training data set is called in-bag samplewhereas the other observations are called out-of-bag and are used tocontrol the error of the tree;
2 For each node of the tree, a given number of variables are randomlychosen among all possible explanatory variables.
The best split is then calculated on the basis of these variables and of thechosen observations. The chosen observations are the same for a giventree whereas the variables taken into account change for each split.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 19 / 27
![Page 35: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/35.jpg)
Methodology
Learning a random forestRandom forest are not very sensitive to hyper-parameters (number ofobservations for each tree, number of variables for each split): the defaultvalues have been used.
The number of trees should be large enough for the mean squared errorbased on out-of-sample observations to stabilize:
0 100 200 300 400 500
02
46
810
trees
Err
or
Out−of−bag (training)Test
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 20 / 27
![Page 36: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/36.jpg)
Methodology
Learning a random forestRandom forest are not very sensitive to hyper-parameters (number ofobservations for each tree, number of variables for each split): the defaultvalues have been used.The number of trees should be large enough for the mean squared errorbased on out-of-sample observations to stabilize:
0 100 200 300 400 500
02
46
810
trees
Err
or
Out−of−bag (training)Test
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 20 / 27
![Page 37: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/37.jpg)
Results
Sommaire
1 DNDC-Europe model description
2 Methodology
3 Results
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 21 / 27
![Page 38: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/38.jpg)
Results
Influence of the training sample size
5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1.0
N2O prediction
log size (training)
R2
LM1LM2DaceSDRACOSSOMLPSVMRF
5 6 7 8 9
0.6
0.7
0.8
0.9
1.0
N leaching prediction
log size (training)
R2
LM1LM2DaceSDRACOSSOMLPSVMRF
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 22 / 27
![Page 39: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/39.jpg)
Results
Influence of the training sample size
5 6 7 8 9
0.6
0.7
0.8
0.9
1.0
N leaching prediction
log size (training)
R2
LM1LM2DaceSDRACOSSOMLPSVMRF
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 22 / 27
![Page 40: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/40.jpg)
Results
Computational time
Use LM1 LM2 Dace SDR AcossoTrain <1 s. 50 min 80 min 4 hours 65 min nPrediction <1 s. <1 s. 90 s. 14 min 4 min.
Use MLP SVM RFTrain 2.5 hours 5 hours 15 minPrediction 1 s. 20 s. 5 s.
Time for DNDC: about 200 hours with a desktop computer and about 2days using cluster 7!
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 23 / 27
![Page 41: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/41.jpg)
Results
Further comparisons
Evaluation of the different step (time/difficulty)
Training Validation TestLM1 ++ +LM2 + +ACOSSO = + -SDR = + -DACE = - -MLP - - +SVM = - -RF + + +
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 24 / 27
![Page 42: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/42.jpg)
Results
Understanding which inputs are importantImportance: A measure to estimate the importance of the input variablescan be defined by:• for a given input variable randomly permute the input values and
calculate the prediction from this new randomly permutated inputs;• compare the accuracy of these predictions to accuracy of the
predictions obtained with the true inputs: the increase of meansquared error is called the importance.
Example (N leaching, SVM):
●
●
● ●
●
●●
●
● ●
●
2 4 6 8 10
050
010
0015
00
Rank
Impo
rtan
ce (
decr
ease
MS
E) N_FR
Nres pH
Nrclay
rain
SOCTmean Nfix
BD
N_MR
The variables N_MR, N_FR, Nres and pH are the most important foraccurate predictions.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 25 / 27
![Page 43: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/43.jpg)
Results
Understanding which inputs are importantImportance: A measure to estimate the importance of the input variablescan be defined by:• for a given input variable randomly permute the input values and
calculate the prediction from this new randomly permutated inputs;• compare the accuracy of these predictions to accuracy of the
predictions obtained with the true inputs: the increase of meansquared error is called the importance.
This comparison is made on the basis of data that are not used to definethe machine, either the validation set or the out-of-bag observations.
Example (N leaching, SVM):
●
●
● ●
●
●●
●
● ●
●
2 4 6 8 10
050
010
0015
00
Rank
Impo
rtan
ce (
decr
ease
MS
E) N_FR
Nres pH
Nrclay
rain
SOCTmean Nfix
BD
N_MR
The variables N_MR, N_FR, Nres and pH are the most important foraccurate predictions.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 25 / 27
![Page 44: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/44.jpg)
Results
Understanding which inputs are importantExample (N2O, RF):
●
●
●●
●●
●●
●●
●
2 4 6 8 10
510
1520
2530
Rank
Impo
rtan
ce (
mea
n de
crea
se M
SE
)
pH
Nr N_MR NfixN_FR
clay NresTmean BD rain
SOC
The variables SOC and PH are the most important for accuratepredictions.
Example (N leaching, SVM):
●
●
● ●
●
●●
●
● ●
●
2 4 6 8 10
050
010
0015
00
Rank
Impo
rtan
ce (
decr
ease
MS
E) N_FR
Nres pH
Nrclay
rain
SOCTmean Nfix
BD
N_MR
The variables N_MR, N_FR, Nres and pH are the most important foraccurate predictions.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 25 / 27
![Page 45: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/45.jpg)
Results
Understanding which inputs are importantExample (N leaching, SVM):
●
●
● ●
●
●●
●
● ●
●
2 4 6 8 10
050
010
0015
00
Rank
Impo
rtan
ce (
decr
ease
MS
E) N_FR
Nres pH
Nrclay
rain
SOCTmean Nfix
BD
N_MR
The variables N_MR, N_FR, Nres and pH are the most important foraccurate predictions.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 25 / 27
![Page 46: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/46.jpg)
Results
Thank you for your attention
Any questions?
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 26 / 27
![Page 47: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/47.jpg)
Results
Villa-Vialaneix, N., Follador, M., Ratto, M., and Leip, A. (2012).A comparison of eight metamodeling techniques for the simulation ofn2o fluxes and n leaching from corn crops.Environmental Modelling and Software, 34:51–66.
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 27 / 27
![Page 48: A comparison of learning methods to predict N2O fluxes and N leaching](https://reader034.vdocument.in/reader034/viewer/2022051400/554dd66bb4c905c70e8b4bfa/html5/thumbnails/48.jpg)
Results
ε-insensitive loss function
Go back
Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 27 / 27