a novel ensemble modeling approach for the spatial

17
ORIGINAL PAPER A novel ensemble modeling approach for the spatial prediction of tropical forest fire susceptibility using LogitBoost machine learning classifier and multi-source geospatial data Mahyat Shafapour Tehrany 1 & Simon Jones 1 & Farzin Shabani 2,3 & Francisco Martínez-Álvarez 4 & Dieu Tien Bui 5,6 Received: 7 May 2018 /Accepted: 3 September 2018 /Published online: 11 September 2018 # Springer-Verlag GmbH Austria, part of Springer Nature 2018 Abstract A reliable forest fire susceptibility map is a necessity for disaster management and a primary reference source in land use planning. We set out to evaluate the use of the LogitBoost ensemble-based decision tree (LEDT) machine learning method for forest fire susceptibility mapping through a comparative case study at the Lao Cai region of Vietnam. A thorough literature search would indicate the method has not previously been applied to forest fires. Support vector machine (SVM), random forest (RF), and Kernel logistic regression (KLR) were used as benchmarks in the comparative evaluation. A fire inventory database for the study area was constructed based on data of previous forest fire occurrences, and related conditioning factors were generated from a number of sources. Thereafter, forest fire probability indices were computed through each of the four modeling techniques, and performances were compared using the area under the curve (AUC), Kappa index, overall accuracy, specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV). The LEDT model produced the best performance, both on the training and on validation datasets, demonstrating a 92% prediction capability. Its overall superiority over the benchmarking models suggests that it has the potential to be used as an efficient new tool for forest fire susceptibility mapping. Fire prevention is a critical concern for local forestry authorities in tropical Lao Cai region, and based on the evidence of our study, the method has a potential application in forestry conservation management. 1 Introduction Natural disturbances that threaten forest landscapes include wildland fires, blowdown, insect pests and forest pathogens, and depending on the associated landforms, geomorphic mass movement such as avalanches, landslides, and debris flows (Pourghasemi 2016). Despite the fundamental ecological role of fire in the cyclic functioning of ecosystems, it remains an ever present danger for complete or partial forest destruction along with associated flora and fauna, as well causing wider ecological and atmospheric depletion (Pausas et al. 2018; Pellegrini et al. 2017). Further, forest fires in their natural role in the forest succession-regression formative pattern pose a perpetual threat to adjacent human settlements in terms of lives, structures, and infrastructure (Huebner et al. 2012). The symp- toms of climate change, reductions in precipitation, increasing temperatures, an extended dry season, and the negative impact of human activity have increased the potential for forest fires in many regions (Ngoc-Thach et al. 2018; Tien Bui et al. 2018a; Wallace et al. 2016), and there is global evidence of greater frequency, size, and severity, with a proportionate increase in human fatality and remedial costs (North et al. 2015). Another possible disruption of climate change would be alterations in plant distribution (Lamsal et al. 2017; Ramirez-Cabral et al. 2018; Ramirez-Cabral et al. 2017; Shabani et al. 2017), which could be dynamics of fuel load. Forest fires are a predominant environmental threat, undoing the costs and achievements of * Dieu Tien Bui [email protected] 1 Geospatial Science, School of Science, RMIT University, Melbourne, VIC 3000, Australia 2 ARC Centre of Excellence for Australian Biodiversity and Heritage, Global Ecology, College of Science and Engineering, Flinders University, GPO Box 2100, Adelaide, South Australia, Australia 3 Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia 4 Division of Computer Science, Pablo de Olavide University of Seville, Seville, Spain 5 Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Vietnam 6 Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Vietnam Theoretical and Applied Climatology (2019) 137:637653 https://doi.org/10.1007/s00704-018-2628-9

Upload: others

Post on 11-May-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A novel ensemble modeling approach for the spatial

ORIGINAL PAPER

A novel ensemble modeling approach for the spatial predictionof tropical forest fire susceptibility using LogitBoost machine learningclassifier and multi-source geospatial data

Mahyat Shafapour Tehrany1 & Simon Jones1 & Farzin Shabani2,3 & Francisco Martínez-Álvarez4 & Dieu Tien Bui5,6

Received: 7 May 2018 /Accepted: 3 September 2018 /Published online: 11 September 2018# Springer-Verlag GmbH Austria, part of Springer Nature 2018

AbstractA reliable forest fire susceptibility map is a necessity for disaster management and a primary reference source in land useplanning. We set out to evaluate the use of the LogitBoost ensemble-based decision tree (LEDT) machine learning method forforest fire susceptibility mapping through a comparative case study at the Lao Cai region of Vietnam. A thorough literature searchwould indicate the method has not previously been applied to forest fires. Support vector machine (SVM), random forest (RF),and Kernel logistic regression (KLR) were used as benchmarks in the comparative evaluation. A fire inventory database for thestudy area was constructed based on data of previous forest fire occurrences, and related conditioning factors were generated froma number of sources. Thereafter, forest fire probability indices were computed through each of the four modeling techniques, andperformances were compared using the area under the curve (AUC), Kappa index, overall accuracy, specificity, sensitivity,positive predictive value (PPV), and negative predictive value (NPV). The LEDT model produced the best performance, bothon the training and on validation datasets, demonstrating a 92% prediction capability. Its overall superiority over thebenchmarking models suggests that it has the potential to be used as an efficient new tool for forest fire susceptibility mapping.Fire prevention is a critical concern for local forestry authorities in tropical Lao Cai region, and based on the evidence of ourstudy, the method has a potential application in forestry conservation management.

1 Introduction

Natural disturbances that threaten forest landscapes includewildland fires, blowdown, insect pests and forest pathogens,and depending on the associated landforms, geomorphic mass

movement such as avalanches, landslides, and debris flows(Pourghasemi 2016). Despite the fundamental ecological roleof fire in the cyclic functioning of ecosystems, it remains anever present danger for complete or partial forest destructionalong with associated flora and fauna, as well causing widerecological and atmospheric depletion (Pausas et al. 2018;Pellegrini et al. 2017). Further, forest fires in their natural rolein the forest succession-regression formative pattern pose aperpetual threat to adjacent human settlements in terms of lives,structures, and infrastructure (Huebner et al. 2012). The symp-toms of climate change, reductions in precipitation, increasingtemperatures, an extended dry season, and the negative impactof human activity have increased the potential for forest fires inmany regions (Ngoc-Thach et al. 2018; Tien Bui et al. 2018a;Wallace et al. 2016), and there is global evidence of greaterfrequency, size, and severity, with a proportionate increase inhuman fatality and remedial costs (North et al. 2015). Anotherpossible disruption of climate change would be alterations inplant distribution (Lamsal et al. 2017; Ramirez-Cabral et al.2018; Ramirez-Cabral et al. 2017; Shabani et al. 2017), whichcould be dynamics of fuel load. Forest fires are a predominantenvironmental threat, undoing the costs and achievements of

* Dieu Tien [email protected]

1 Geospatial Science, School of Science, RMIT University,Melbourne, VIC 3000, Australia

2 ARC Centre of Excellence for Australian Biodiversity and Heritage,Global Ecology, College of Science and Engineering, FlindersUniversity, GPO Box 2100, Adelaide, South Australia, Australia

3 Department of Biological Sciences, Macquarie University,Sydney, New South Wales, Australia

4 Division of Computer Science, Pablo de Olavide University ofSeville, Seville, Spain

5 Geographic Information Science Research Group, Ton Duc ThangUniversity, Ho Chi Minh City, Vietnam

6 Faculty of Environment and Labour Safety, Ton Duc ThangUniversity, Ho Chi Minh City, Vietnam

Theoretical and Applied Climatology (2019) 137:637–653https://doi.org/10.1007/s00704-018-2628-9

Page 2: A novel ensemble modeling approach for the spatial

conservation strategies, wreaking ecological destruction, andsuffering to all forms of life in the pathway (Cortez andMorais 2007). At ground level, organic matter, fundamentalto the maintenance of optimum soil humus-level components,is compromised by fire (Ghobadi et al. 2012). The ecologicalimpact of such devastation makes it essential to recognize andmanage the forest fire-prone areas. The research question isthus: What improvements can be made to current techniquesto enhance the reliability of forest fire probability studies?

Current methods in probability mapping pinpointwhere out-breaks are likely to take place, making no clear prediction ofwhen the event will occur. This qualifies as susceptibility map-ping (Carrara and Guzzetti 2013), which is also called as spatialprediction of fire danger as defined by Food and AgricultureOrganization of the United Nation (FAO 2001). Detailed dis-cussion of forest fire hazard and risk can be found in Chuviecoet al. (2010) and FAO (2001). Further, the current technologyapplied for the control of such natural events has three compo-nent categories: predicting, monitoring, and prevention (Linet al. 2018; Yuan et al. 2015); therefore, mapping susceptibilityis imperative for the planning of prevention measures.Reviewing the associated literature reveals that many studieshave analyzed forest fire occurrence, employing a variety ofmethods, fire area simulator (FARSITE) (Jahdi et al. 2014)and fuel moisture content (FMC) (Chuvieco et al. 2004) relatedspecifically to fire analysis, while general statistical methodsinclude generalized Pareto distribution (Westerling et al.2011), Poisson regression (Wotton et al. 2010), generalizedadditive model (Vilar et al. 2010), favorability functions(Verde and Zêzere 2010), weights of evidence (Hong et al.2017a), and Monte Carlo simulations (Carmel et al. 2009).Analysis involves a complex, non-linear process and modelprediction accuracy is often unsatisfactory. Some studies haveapplied geographic information system (GIS) (Teodoro andDuarte 2013) and remote sensing (RS) (Arnett et al. 2015)techniques, incorporating a broad selection of variables.Examples of these include mapping landscape changes causedby fire (Wallace et al. 2016) and detecting and monitoring fireactivity (Hally et al. 2016; Wickramasinghe et al. 2016).

Recent advances in data mining-based pattern recognition(Lemaître et al. 2017; Perner 2018) have facilitated a wide rangeof methodologies used in conjunction with forest fire inventoriesand available meteorological geographical data. Machine learn-ing techniques such as decision tree (DT) (Tehrany et al. 2017;Tehrany et al. 2013), support vector machine (SVM) (Gonzalez-Olabarria et al. 2012), and artificial neural networks (ANN)(Petropoulos et al. 2010); statistical approaches such as frequen-cy ratio (FR) (Tehrany et al. 2014a) and kernel logistic regression(KLR) (Tien Bui et al. 2016b); and qualitative methods such asanalytic hierarchy process (AHP) (Pourghasemi et al. 2016) arecommonly used for modeling and simulation of forest fires.

There are pros and cons to each of these methods. In AHP,the forest fire and spatial data form the basis of the expert

knowledge required to weight the parameter maps from whichthe susceptibility map is derived. Where there is a lack ofinformation about the study area, the results may then be basedon unrelated information (Yilmaz 2009). The ANN techniqueis classed as robust; however, Gomez and Kavzoglu (2005)have argued that a weakness is its tendency toward oversim-plification when the data are inadequate and consequently itslimitation in applications with limited data. FR and other sta-tistical approaches are most useful for models that assume thatthe input variables may be based on previous occurrences(Pradhan et al. 2007). However, where the sets of influentialfactors are complex and factors other than the standard climaticfactors of temperature, rainfall, humidity, and wind apply,without the inclusion of these may render results inaccurate.Comparative studies have shown that, generally, machinelearningmodels have produced greater accuracy than statisticalmodels when applied to fire occurrences (Massada et al. 2013).Oliveira et al. (2012) demonstrated that the random forest (RF)algorithm had greater predictive ability than the standard mul-tiple linear regressionmodel when applied to spatial patterns offires occurring in Mediterranean Europe. However, Pourtaghiet al. (2015) indicated that RF modeling performs less effi-ciently than other machine learning models. Massada et al.(2013) compared RF, maximum entropy (ME), and general-ized linear model (GLM) in an examination of the distributionsof wildfires in Huron–Manistee National Forest (USA), con-cluding that these three machine learning models were equal interms of fit, while RF andME showed slightly better predictivecapability than GLM under that specific study application.

More recently, ensemble modeling frameworks have beenseen to enhance performance in predictive modeling (Lee andOh 2012; Tien Bui et al. 2016a; Truong et al. 2018). Theseinclude stacking, random subspace, and rotation forests(Rodriguez et al. 2006), bagging (Breiman 1996), AdaBoost,and MultiBoost (Webb 2000), which can be subdivided intothe categories of heterogeneous and homogeneous (Bian andWang 2006). The heterogeneous ensemble category (Tien Buiet al. 2016a) contains a number of different algorithms in thefinal ensemble classifier, such as in, while the homogeneousensemble category utilizes only one algorithm, with the basictraining data divided into subsets forming classifiers fromwhich a committee is constructed (Maudes et al. 2012).Nevertheless, ensemble framework is still rarely explored formodeling forest fire susceptibility.

Our aim in this study is to expand the body of knowledge byproposing a new hybrid machine learning technique, in practicea combination of LogitBoost ensemble and decision tree(DEDT), which is effective in forest fire susceptibilitymapping.To the best of our knowledge, DEDT has not been used previ-ously for forest fire modeling. Themain advantage of DEDT, asopposed to ANN or SVM, is its proven high prediction capa-bility (Dettling and Bühlmann 2003). In addition, DEDT hasthe capability to provide probability estimates, such as in the

638 M. S. Tehrany et al.

Page 3: A novel ensemble modeling approach for the spatial

case of forest fires, which can be used to create the susceptibil-ity index. Forest fire susceptibility maps are now regarded asessential tools for wildfire management warning systems(Wickramasinghe et al. 2016). The effectiveness of the pro-posed model was further assessed by a comparative evaluationwith benchmarks, SVM, RF, and KLR.

2 Study area and data

2.1 Description of the study area (Lao Cai area)

Our study area (Fig. 1) of around 1946 km2 included Lao CaiCity and parts of Sa Pa and Bat Xat districts in the mountainousnorthwest of Lao Cai Province, Vietnam. The locality is south-west of the Thao River and northeast of the Hoang Lien Son

Mountains, between longitudes 103° 32′ E and 104° 05′ E andlatitudes 22° 08′N and 22° 48′N. The site varies in altitude fromapproximately 200 m in Thao River valley to 3143 m above sealevel at the Fansipan peak in the Hoang Lien Son Mountains,which is the highest altitude in Vietnam. The Hoang Lien Sonrange has a total annual rainfall of 2000 to 3600mm, 80–85% ofwhich occurs between June and August. The Lao Cai area, inwhich the Thao is the largest river, comprises a high drainagedensity of approximately 1.75 km/km2. The rock formationcomprises a heterogeneous crust of sedimentary, metamorphic,and igneous rocks from different ages, formed in the multiplestages of the Indosinian orogeny (Trinh et al. 2012).

Annually during March and April, hot dry winds blowingfrom the Than Uyen District impact negatively on the vegeta-tion, increasing the likelihood of forest fires at that time ofyear. During this season, dry leaves fall and winds blow hotter,

Fig. 1 Location of the study area and forest fire inventory map

A novel ensemble modeling approach for the spatial prediction of tropical forest fire susceptibility using... 639

Page 4: A novel ensemble modeling approach for the spatial

producing ideal conditions for fires over areas with forest andshrub vegetation (Product et al. 2002). Between 1994 and2000, Lao Cai experienced over 200 forest fires, 16 cases(8%) occurring in the protected area. Our study is based ondata from forest fires occurring between 2008 and 2016, butexcluding 2011, during which there were no fires recorded forthe selected catchment area.

2.2 Forest fire inventory

Forest fire susceptibility models analyze the correlationsamong the historical occurrences and corresponding condi-tioning factors (Higuera et al. 2015). Thus, the first stage ofthe process is the preparation of an inventory map of forestfires of the region, based on the assumption that future forestfire events in the same location may be predicted by analyzingthe complete data of past occurrences (Tehrany and Jones2017). The inventory map displays past forest fire locationsin the selected study area. Our inventory map included 257historical fire localities (Fig. 1). Possible inventory mapsources are in situ mapping, aerial photography, and remotesensing (Cloke and Pappenberger 2009). In our study, theinventory data, based on data from forest fires occurring be-tween 2008 and 2016 (excluding 2011), was obtained fromthe Department of Forest Protection, the official forest firedatabase for Vietnam, at http://www.kiemlam.org.vn/f irewatchvn/ (Ministry of Agriculture and RuralDevelopment of Vietnam 2016). This inventory map has beenproduced by digitizing the aerial photos captured from thestudy area over the years. For the selected catchment, therewas no any fire event reordered for 2011.

Spatial and temporal robustness are two measures ofassessing an inventory (Althuwaynee et al. 2014). For temporalrobustness, the multi-temporal inventory data is divided intopast incidence, used for training purposes, and future incidence,reserved for validation purposes. For spatial robustness, theinventory is divided randomly into training (70%) and testing(30%) data. When a comprehensive dataset is available, thesemethods may be integrated. We used random selection tool inArcGIS was utilized to establish our training and testingdatasets, with inventory data for the period 2008–2012 (exclud-ing 2011) for training the model and inventory data for theperiod 2013–2016 for validation (Fig. 1). The training datawas used for model construction and the testing data forconfirming model accuracy (Chung and Fabbri 2008).

2.3 Multi-source geospatial data

The selection of appropriate conditioning factors is paramountin modeling forest fires. Pradhan et al. (2007) have argued thatsoil, vegetation, slope, aspect, and land use are major factorsthat should be included. Ghobadi et al. (2012) made use ofvegetation, topography, slope, and aspect, as well as the

Normalized Difference Vegetation Index (NDVI) and meteo-rology factors in developing, a forest fire susceptibility mapfor Iran. Motazeh et al. (2013) rated vegetation, slope, anddistance from settlements as major factors in a study mappingprobability of forest fires in the Asalem Nav forests of Iran.Thus, conditioning factors for the susceptibility mapping offorest fires are selected by the user from data that are availableand relevant, in that the factors selected influence the qualityof the resulting prediction models (Verde and Zêzere 2010).Forest fires are affected not only by the non-climatic factors oftopography and vegetation but also by human factors (Guoet al. 2016), as well as standard climatic factors such as tem-perature, wind, and rainfall.

2.3.1 Topographic data

Topographic data rate among the most important factors forinclusion in a fire susceptibility model. The impacts of slope,degree, aspect, and elevation on the behavior of fire havereceived much coverage (Bassett et al. 2015; Kane et al.2015; Nami et al. 2018; Parisien et al. 2012). Topographicfactors influence ignitions, in terms of vegetation, climaticconditions, and human accessibility. Elevation impacts direct-ly on temperature, moisture, and wind (Gao et al. 2011) and,thus, plays an important role in the spread of fires (Jaiswalet al. 2002). However, the severity of fire behavior is oftenreduced at higher altitudes, due to higher rainfall (Adab et al.2013). Another topographic parameter that may influence therate of spreading of fires is slope, in that fire ascends a slopemore quickly than it descends (Pourtaghi et al. 2015), and is amore influential factor in fire propagation than altitude. Slopeaspect, or direction, demonstrates a correlation with the quan-tity of solar energy received in the area and was, thus, chosenas a forest fire conditioning factor. In the NorthernHemisphere, south-facing slopes receive greater sunlight,leading to higher temperatures, stronger winds, lower humid-ity, and lower fuel moistures. Thus, vegetation is likely to bedrier and less dense on south-facing slopes than those facingnorth (Prasad et al. 2008). Drier vegetation has a greater ten-dency to ignition (Setiawan et al. 2004). Additionally, earlierin the day, east-facing aspects receive greater direct sunlightand ultraviolet radiation than west-facing aspects.Consequently, east-facing slopes dry faster. To compute theslope and aspect in the study area, a digital elevation model(DEM) with 30 × 30 m was generated based on national dig-ital topographic maps, employing ArcGIS 10.4 software. Theslopes produced by the DEM for the study area are displayedin Fig. 2a, the elevations in Fig. 2b, and the aspects in Fig. 2c.

2.3.2 Land use and NDVI

Human activities provide many potential sources of ignition thatmay impact on forest fire susceptibility (Nepstad et al. 2008).

640 M. S. Tehrany et al.

Page 5: A novel ensemble modeling approach for the spatial

Thus, land usewas selected as a conditioning factor. The land usemap (Fig. 2d) provided by the local authority features 11 groups,at a scale of 1:50,000. The map is the product of a national land

use inventory project, executed during 2014. The normalizeddifference vegetation index (NDVI) formed the basis of the as-sessment of vegetation cover. NDVI is the most popular index

Fig. 2 Forest fire conditioning factors. a Slope, b elevation, c aspect, d land use, e distance to residential areas, fNDVI, g rainfall, h temperature, iwindspeed, and j humidity

A novel ensemble modeling approach for the spatial prediction of tropical forest fire susceptibility using... 641

Page 6: A novel ensemble modeling approach for the spatial

for assessment of live fuel moisture levels (Chuvieco et al. 2004).The NDVI was computed using Landsat-8 Operational LandImagery at a resolution of 30 m, sourced from the USGS archive(http://earthexplorer.usgs.gov).

Several steps are involved in measuring NDVI(Skakun et al. 2016). Initially, the rescaling coefficientsprovided in the Landsat-8 metadata file were utilized toconvert digi ta l number (DN) values to top of

Fig. 2 (continued)

642 M. S. Tehrany et al.

Page 7: A novel ensemble modeling approach for the spatial

atmosphere (TOA) reflectance. The reflectance valuewas calculated using the equation below:

ρλ0 ¼ MρQcal þ Aρ ð1Þ

where ρλ′ = TOA reflectance, Mρ = band-specific multi-plicative rescaling factor from the metadata, Qcal = quan-tized and calibrated standard product pixel values (DN),and Aρ = band-specific additive rescaling factor from themetadata.

Secondly, TOA reflectance with a correction for the sunangle is then:

ρλ ¼ ρλcos θSZð Þ ¼

ρλsin θSEð Þ ð2Þ

where ρλ = TOA planetary reflectance, θSZ = local solar zenithangle, and θSE = local sun elevation angle. The scene centersun elevation angle in degrees is provided in the metadata.

Finally, the computation of NDVI was based on the follow-ing equation:

NDVI ¼ NIR−REDNIRþ RED

ð3Þ

where NIR denotes the near-infrared band (0.851–0.879 μm,band 5) and RED denotes the red band (0.636–0.673 μm,band 4). Our NDVI map (Fig. 2f) includes seven classes de-termined using the natural breaks algorithm.

2.3.3 Distance to residential areas

Forest fires in Vietnam are frequently caused by human activitiessuch as burning grass, hunting using fire, and forest exploitationof Le et al. (2014). Thus, Bdistance to residential area^ wasselected as a conditioning factor. The residential areas were ex-tracted from the land usemap (Fig. 2d) produced by theMinistryof Natural Resources and Environment (Vietnam) and used togenerate a residential area map, with six classes (Fig. 2e).

2.3.4 Climatic data

Average climatic values for the following sets of months whichhad returned the most forest fires were incorporated into theclimatic dataset: (1) February, March, April, May, September,and November 2014; and (2) February 2009. Four climaticfactors, temperature, wind speed, relative humidity, and rain-fall, were included, on the basis of their proven influence onspread rate and fire intensity in other studies (Drobyshev et al.2012; Jolly et al. 2015). In our study, climatic data of averagemaximum monthly temperature, average monthly wind speed,average monthly relative humidity, and average monthly totalprecipitation were obtained from Climate Forecast SystemReanalysis (CFSR), available at https://www.ncdc.noaa.gov/.The temperature map includes nine classes (Fig. 2h), with sixfor the wind speed map (Fig. 2i). The relative humidity map

Fig. 2 (continued)

A novel ensemble modeling approach for the spatial prediction of tropical forest fire susceptibility using... 643

Page 8: A novel ensemble modeling approach for the spatial

(Fig. 2j) includes nine classes, and the rainfall map, eight clas-ses (Fig. 2g). Classes were determined by means of the naturalbreaks algorithm incorporated into ArcGIS 10.4.

3 Theoretical background of the methodsused

3.1 LogitBoost machine learning ensemble

A literature search gave no indication that LogitBoostensemble-based decision tree (LEDT) has to date been appliedto forest fire susceptibility mapping. Boosting incorporates themachine learning ensemble algorithm (Schapire and Singer1999), into the supervised learning framework. The purposeof its design is to reduce bias and variance, and it was initiallyused for combining non-complex classifiers to enhance per-formance in classification (Cai et al. 2006). The techniqueoptimizes the multinomial likelihood, which facilitates its ap-plication in multiclass problems (Pham et al. 2016). Theboosting algorithm was formulated by Friedman et al.(2000), who incorporated the AdaBoost algorithm to improvestatistical results. The derivation of the LogitBoost algorithmis by means of applying logistic regression to the AdaBoostgeneralized additive model.

Consider the forest fire training dataset D ¼ xi; yif gNi withxi ∈ Rn as input data, n conditioning factors, and N data sam-ples. Our study included the aforementioned ten input vari-ables, and the output yk ∈ {1, 0} denotes a separation into twoclasses, forest fire and non-forest fire. LogitBoost generates anensemble model that consists of m tree-based classifiers usingthe following equation (Dettling and Bühlmann 2003):

f mð Þ ¼ argminf∑ni¼1w

mð Þi z mð Þ

i − f xið Þ� �2

ð4Þ

where f (m) is a tree-based classifier, w mð Þi is weights deriving

from Eq. 5, and z mð Þi is working responses obtaining from Eq. 6.

w mð Þi ¼ p m−1ð Þ: 1−p m−1ð Þ

� �ð5Þ

z mð Þi ¼ yi−p m−1ð Þ xið Þ

w mð Þi

ð6Þ

where p(xi) is probability estimates.The ensemble model is updated using Eq. 7 and the class

probability is computed using Eq. 8 as below:

F mð Þ xið Þ ¼ F m−1ð Þ xið Þ þ 1

2f mð Þ xið Þ ð7Þ

p mð Þ xið Þ ¼ 1

1þ e−2F mð Þ xið Þ ð8Þ

3.2 Decision stump-base classifier

In this case, LogitBoost was trained using decision stumps asbased learners, which is a common practice (Sheng et al. 2014;Valdes et al. 2016) in that it is effective for domains withoutinherent a priori learning. Decision stumps can be used with avariety of categories of features, including nominal, binary, andcontinuous, as well as in the case of missing values (De Comitéet al. 2003). By definition, a decision stump constitutes a DTmodel with only one level (Iba and Langley 1992). Thus, theroot links directly to the leaves and decisions are made entirelythrough the value of a single input attribute.

Three parameters define decision stumps: (1) the index j ofthe feature x that it cuts, xj; (2) the threshold θ of the cut; (3) thesign of the decision (values ≠ θ).

Despite its apparent simplicity, this classifier achieves a com-petitive result with boosting. LogitBoost was used to generatesubsets, whichwere used for the construction of the base classifierusing decision stump. Because the number of the base classifierinfluences themodel performance, a trial and error test was carriedout by varying the number of subsets from 1 to 100 and thencomputing the classification accuracy of the resulting model. Theresult showed that the model with 90 subsets had the highestclassification accuracy. Therefore, the best LogitBoost ensemblemodel in our studywas based on the 90 decision stump classifiers.

3.3 Benchmark methods

It has already been stated that machine learning methods gen-erally outperform statistical techniques in modeling complexnon-linear phenomenon (Kotsiantis et al. 2007; Oliveira et al.2012). Consequently, in forest fire predictions which depend onoptimum results, this was a natural choice. Therefore, RF,SVM, and KLR were used for comparison and their methodol-ogies and suggested citations were included in the manuscript.

3.3.1 Random forests

Random forests (RFs) developed by Breiman (2001) are pow-erful, manipulable decision tree-based ensemble classifiers. TheRF algorithm applies bootstrapping to subsets of the observa-tions to produce random binary trees. The original data is sam-pled randomly to produce the training data for building themodel. Data excluded are described as Bout-of-bag^ (OOB)(Catani et al. 2013). The power of a variable is estimated bythe algorithm, by replacing one variable at a time with OOBdata and measuring the change in the prediction error. Themodel output defines the probability of membership to one oftwo classes, Bforest fire^ or Bno-forest fire.^ RFs have twoparameters requiring user adjustment: (1) the number of treesBT,^ (2) the number of variables Bm,^ chosen stochasticallyfrom the available set of features. It should be noted that treesin the RF are fully grown without pruning (Chan and Paelinckx

644 M. S. Tehrany et al.

Page 9: A novel ensemble modeling approach for the spatial

2008); thus, pruning parameter is not required. Micheletti et al.(2014) proposed the selection of a large number of trees andbasing the number of variables on the square root of the inputspace dimensionality. In our study, BT^was fixed at 500 after apreparatory analysis, while the Bm^ value of variables sampledat each node was calculated as 10 (Vafaei et al. 2018). Theparameters were adjusted without the need for a calibrationset. Two types of error calculation were applied to the RF mod-el: average decrease in accuracy and average decrease in nodeimpurity (average decrease Gini). This measure of differencecan be applied to variable selection by ranking variables hier-archically (Calle and Urrea 2010). A more detailed descriptionof the method may be found in Belgiu and Drăguţ (2016).

3.3.2 Support vector machine

SVM, like RF, is a powerful supervised learning method, ca-pable of outperforming many conventional methods(Mojaddadi et al. 2017; Sugumaran et al. 2007; Tehrany et al.2015; Tien Bui et al. 2013). The technique involves the formingof a hyperplane separated from the training dataset. Generationof the hyperplane occurs in the original space formed by ncoordinates (xi parameters in vector x) between the points oftwo distinct classes (Marjanovic et al. 2011). SVM establishesthe maximum margin of separation between the two classesand creates a separation hyperplane, based on the maximummargin linear classifier (Pradhan 2013). Points above the hy-perplane are classified + 1, while the remaining points are clas-sified − 1. Training points nearest the optimal hyperplane aredescribed as support vectors. After defining the decision sur-face, new data may be classified (Tien Bui et al. 2012a).

The SVM algorithm estimates the optimal separating hy-perplane, which can divide the training dataset into the twoclasses, forest fire and non-forest fire (1, − 1). The classifica-tion decision function is expressed as:

g xð Þ ¼ sign ∑ni¼1yiα j K xi; x j

� �þ b� � ð9Þ

where K(xi, xj) is the kernel function.For SVM in this study, we used the radial basic function

(RBF) kernel (Scholkopf et al. 1997). Kernel values needcareful consideration, as SVM model performance is stronglyinfluenced by the kernel width (γ) and the regularization (C).To determine these values, we used the Grid Search Method,as suggested by Tien Bui et al. (2012b), concluding that a γvalue of 0.575 and a C value of 10 were the best-suited valuesto the study area. Tehrany et al. (2014b) and Hong et al.(2017b) both cover this algorithm in full details.

3.3.3 Kernel logistic regression

Kernel logistic regression (KLR) (Jaakkola and Haussler1999) is a powerful machine learning techniques, which uses

kernel functions to map input data from the original space intoa high-dimensional feature space in which the data are linearly

separated. Assume the training dataset xi; yið ÞNi ¼ 1 with xi ∈Rn as input data, n conditioning factors, and N data samples.Our study included the aforementioned ten input variables,and the output yk ∈ (1, 0) denotes a separation into two classes,forest fire and non-forest fire. KLR constructs a non-lineardecision boundary that separates these two classes in the fea-ture space, based on the following equation:

p xð Þ ¼ ey xð Þ

1þ ey xð Þð Þ ¼ ∑Ni¼1∝i K xi; x j

� �þ b ð10Þ

where y(x) is the logistic function with values in [0, 1], ∝i is avector of dual model parameters, b is the intercept, and K(xi,xj) is the kernel function. We selected RBF on the basis that itis the most frequently selected function (Hong et al. 2015;Tien Bui et al. 2016c).

K xi; x j� � ¼ exp − xi−x j

�� ��2� �=2δ2

� �ð11Þ

where δ is the tuning parameter controlling RBF kernelsensitivity.

The parameters ∝i and b were derived by minimizing thenegative log-likelihood function in the following expression:

Min1

2∝ K ∝þ C∑N

i¼1log 1þ exp K1i∝ð Þð Þ−C∑Ni¼1 yi K1i∝ð Þ ð12Þ

where C is the regularization parameter controlling the tradeoffbetween the model complexity and degree of data fitting; K1i isthe i-th row in the kernel matrix. The KLR modeling was per-formed using the RBF kernel, after selecting a γ value of 0.085and Lambda value of 0.097, using the Grid Search Method.

4 Research methodology

The flowchart presented in Fig. 3 illustrates the organizationof our study methodology, which falls into three stages:

& Geospatial database and feature selection& Training and arrangement of model& Assessment of model performance (TP, TN, FP, FN, PPV

(%), NPV (%), sensitivity (%), specificity (%), ACC (%),kappa, and AUC)

4.1 Geospatial database and feature selection

The first stage involved the construction of the spatial data-base comprising the inventory data and the ten conditioningfactors, slope (°), aspect (direction), elevation (m), distance toresidential areas (m), land use, NDVI, rainfall (mm),

A novel ensemble modeling approach for the spatial prediction of tropical forest fire susceptibility using... 645

Page 10: A novel ensemble modeling approach for the spatial

temperature (°C), wind speed (m/s), and humidity. Inventorydata was apportioned to training or testing.

In regression analysis, it is essential to consider how corre-lation affects the independent variables. In a machine learningmodel, removal of conditioning factors with negative predic-tive values and those equal to zero enhances the predictioncapability (Martínez-Álvarez et al. 2013; Tien Bui et al.2016c); hence, the comparative predictive ability of individualconditioning factors should be calculated before the mainmodeling. Due to its efficiency, we used the Pearson correla-tion method for this purpose to ensure the relevance of indi-vidual conditioning factors in occurrences of forest fires.

4.2 Training and arrangement of model

As our primary objective was the assessment of LEDT inpredicting forest fire, this technique was modeled first to estab-lish conditioning factor and inventory data correlations. SVM,RF, and KLR were subsequently applied for comparative pur-poses. The programming of all models was executed usingWeka 3.9 in Matlab by means of the application programminginterface (API). A probability index was computed from eachtechnique. Using AUC, Kappa index, and other evaluation tech-niques presented in Fig. 3, the reliability of the LogitBoost mod-el was rated against the three benchmark models.

4.3 Assessment of model performance

The statistical evaluation measures of overall accuracy, spec-ificity, sensitivity, positive predictive value (PPV), and nega-tive predictive value (NPV) were applied to measure compar-ative performance of our models (Pham et al. 2018a, b; TienBui et al. 2016b). Overall accuracy, sensitivity, and specificitymeasure the proportion of training and testing samples forforest fires and non-forest fires, respectively, that are correctlyclassified. PPV and NPV estimate the probability of training

and testing dataset samples correctly classified to the forestfire class and non-forest fire class, respectively.

overall accuracy ¼ TP þ TNTP þ TN þ FP þ FN

ð13Þ

Specificity ¼ TNFP þ TN

ð14Þ

Sensivity ¼ TPTP þ FN

ð15Þ

PPV ¼ TPFP þ TP

ð16Þ

NPV ¼ TNFN þ TN

ð17Þ

where true positive (TP) and true negative (TN) are the num-ber of samples in the training and validation datasets, correctlyclassified to the forest fire and non-forest fire classes, respec-tively. False positive (FP) and false negative (FN) are thenumber of samples in the training and validation datasets er-roneously classified.

The receiver operating characteristic (ROC) curve canbe used for assessing forest fire susceptibility modelingperformance. The ROC curve is formed by using sensi-tivity as the Y-axis and 1-specificity as the X-axis, in-corporating various cutoff thresholds (Chen et al. 2017;Hosmer and Lemeshow 2000; Tien Bui et al. 2018b).The area under the ROC curve (AUC) measures themodel’s ability of a model to predict forest fire andnon-forest fire pixels. An AUC value of 1 indicates aperfect model, while an AUC value of 0 indicates anon-informative model (Tien Bui et al. 2013). A higherAUC value indicates a greater predictive ability of amodel. Bui et al. (2017) and Kantardzic (2011) correlat-ed the prediction ability of the model and AUC as fol-lows: poor (0.5–0.6), average (0.6–0.7), good (0.7–0.8),very good (0.8–0.9), and excellent (0.9–1).

Slope;

Elevation;

Aspect;

Landuse

Distance to residential areas;

NDVI;

Rainfall;

Temperature

Wind speed;

Humidity

Forest fire inventory

LogitBoost ensemble

based decision tree

Support vector machine

(SVM)-RBF

Random forest (RF)

Kernel logistic

regression (KLR)

Forest fire

susceptibility maps

True positive (TP)

True negative (TN)

False positive (FP)

False negative (FN)

Positive predictive value (PPV) (%)

Negative predictive value (NPV) (%)

Sensitivity (%)

Specificity (%)

Overall accuracy (%)

Kappa statistic

AUC

ValidationModelling

Pea

rson C

orr

elat

ion E

val

uat

ion

Forest fire conditioning

factors datasetTraining points Testing points

Fig. 3 Methodology flowchartfor this research

646 M. S. Tehrany et al.

Page 11: A novel ensemble modeling approach for the spatial

5 Results and discussion

5.1 Feature selection using Pearson correlationevaluation analysis

Table 1 lists predictive value for all conditioning factors usingPearson correlation evaluation (PCE) for the study area. Nonewere excluded as all showed a positive predictive value, withNVDI (0.479) having the highest, followed by slope (0.187)and altitude (0.103).

5.2 Model performance, validation, and comparison

A forest fire probability index of quality assessment valueswas calculated for each of the four models, using the trainingdataset to compare performance and the testing dataset forvalidation. The TP, TN, FP, FN, PPV (%), NPV (%), sensitiv-ity (%), specificity (%), ACC (%), kappa, and AUC values ofthese four models are shown in Table 2.

The comparative performance evaluation of the four ma-chine learning forest fire models based on the training datasetis shown in Table 2(a). Evaluating the classification of forestfire pixels, the LogitBoost ensemble model indicated superiorsensitivity (89.7%), followed by the SVM model (87.2%) andtheKLRmodel (86.2%). Similarly, in terms of the classificationof non-forest fire pixels, LogitBoost demonstrated the greatestspecificity (93.3%), followed by SVM (89.9%) and KLR(91.0%). In terms of overall accuracy, LogitBoost exceededthe other three models, with a 91.44% rating. LogitBoost alsoreturned the highest kappa index (0.829), followed by SVM(0.781) and KLR (0.759), highlighting the significant mirroringof reality in all the models. All four models responded well tothe ROC curve method with an AUC > 0.9. LogitBoost wasagain superior with an AUC of 0.973.

A comparative validation using the testing data was alsocarried out. The results are shown in Table 2(b). This datasetfacilitates a comparison of the predictive power of the models.

A graphic illustration of the comparative AUC results isshown in Fig. 4.

The LogitBoost model once again demonstrated the greatestlevel of sensitivity (85.1%), in terms of classifying forest firepixels, followed by the SVM and KLR models. However, interms of classifying non-forest fire pixels, RF showed a higherspecificity (85.9%) than LogitBoost (82.2%). For overall accu-racy, LogitBoost returned the highest value (83.57%), with KLRlowest (75.71%). The ROC curve validation of the testing datasetis shown in Fig. 4 and Table 2(b). Both LogitBoost and RFdemonstrated high performance (AUC > 0.9), with LogitBoostslightly superior (AUC= 0.924).

In general, model predictive power based on the testing setwas lower than expected and did not match the training datasetperformances. KLR and SVM both demonstrated high overallaccuracy and kappa values using the training dataset but per-formed poorly using the testing dataset. SVM surpassed RF intraining data accuracy and AUC, but in terms of predictivepower, the RF values of 0.909 AUC and 82.86% ACCexceeded the SVM values. Overall, LogitBoost outperformedthe other methods and generated the most accurate outcomes,although all four models proved to be acceptable for suscep-tibility mapping in the study area. It can be concluded thatLogitBoost displayed the best performance in this study anddemonstrated the robustness of machine learning in complexdata modeling and prediction.

5.3 Forest fire susceptibility mapping results

All LogitBoost index values of forest fire probability were inthe range 0 to 1 and a forest fire susceptibility map was pro-duced. Using the natural break method (Bednarik et al. 2010),the probability index map was reclassified into the followingfive susceptibility categories: no forest fire (0.0–0.5), low (0.5–0.6), moderate (0.6–0.7), high (0.7–0.8), and very high (0.8–1).

There are several classification schemes available (e.g.,quantile, natural break) which each one leads to different re-sults, as they make very different statements about how valuesshould be categorized (Tehrany et al. 2014b). The quantileclassification has a disadvantage in that it places widely dif-fering values into the same class and, hence, was not used inthis study. Using equal intervals was also found to be notpractical since it emphasizes one class of susceptibility rela-tive to others (Ayalew et al. 2004). Based on the literature(Pourtaghi et al. 2015), natural break is a popular method toclassify the probability index into the susceptible zones(Pourghasemi et al. 2012). The reason is that this techniquerecognizes break points by searching for patterns inherent inthe data (Xiao et al. 2006). Therefore, this technique was uti-lized in the current research.

The susceptibility map is presented, divided into categoryareas in Fig. 5. The no forest fire category covers the largestarea (31%), followed by moderate (20.7%), very high

Table 1 Predictive ability assessment of the forest fire influencingfactor using Pearson correlation evaluation (PCE)

No. Influencing factor Predictive power value STD

1 NDVI 0.479 0.010

2 Slope 0.187 0.025

3 Elevation 0.103 0.013

4 Wind speed 0.098 0.022

5 Distance to residential areas 0.079 0.014

6 Temperature 0.073 0.019

7 Humidity 0.029 0.016

8 Land use 0.02 0.017

8 Aspect 0.014 0.008

10 Rainfall 0.013 0.011

A novel ensemble modeling approach for the spatial prediction of tropical forest fire susceptibility using... 647

Page 12: A novel ensemble modeling approach for the spatial

(18.5%), high (17.6%), and low (12.2%). Overlaying the landuse/land cover (LULC) map on the LogitBoost susceptibilitymap reveals that very high category occurs mostly in agricul-tural regions and residential zones.

5.4 Sources of uncertainty and the possibleimprovement opportunities

The estimation of areas susceptible to forest fires on a regionalscale is fraught with complications, mainly due to the complexinteractions of themany conditioning factors, the assemblage ofwhich is further complicated by variations in resolution, scale,and nature, i.e., climate and weather conditions (temperature,humidity, rainfall, wind); topography (slope, aspect, elevation);and sociocultural (land use, distance from settlement).Inaccuracies and inconsistencies of GIS data, such as spatialerrors caused by data that is out of date and data at different

scales and resolutions, introduce further complications. A num-ber of researchers have used fuzzy reasoning to address theseissues. However, the level of subjectivity involved in establish-ing fuzzy membership values precludes results of a high degreeof accuracy. Therefore, the developing of new models and theapplication of untested methods to predict forest fires and over-come current limitations are crucial. Our study has edged closer,partially filling this gap in the literature, by testing an LEDTmodel application in a forest fire susceptibility assessment.

In the susceptibility mapping of forest fires, both the selec-tion of modeling techniques and variables impacts on theevaluative quality. In forest fire modeling, conditioning fac-tors differ in predictive capability and the potential exists forinteractions of conditioning factors that may cause noise, re-ducing the model’s predictive capability. Empirical develop-ment in disaster modeling has dictated the common practice ofremoving conditioning factors without any predictive value.Reliability is enhanced by only including those which indicatepositive predictive values. It was established through the doc-umentation on previous studies that individual conditioningfactors (for example, slope angle) may vary in importance indisaster susceptibility modeling. Values are site-specific anddependent on the scale of analysis and method of selection. Itis essential that a correlation analysis precedes the probabilityanalysis if a user-defined set of variables is to be included. Afuture study establishing the minimum dataset requirementsfor forest fire susceptibility mapping reliability, given the con-straints of limitations in data availability, processing time, andbudget, would constitute research with useful, practical value.

6 Concluding remarks

This research introduced and verified a new hybrid machinelearning approach (LEDT) integrating LogitBoost ensemble

Table 2 (a) Model performance and (b) model validation of the proposed LogitBoost, RF, SVM, and KLR models

(a) Model performance (b) Model validation

Statistical index parameters LogitBoost RF SVM KLR LogitBoost RF SVM KLR

True positive (TP) 175 165 171 169 57 61 51 47

True negative (TN) 167 147 162 160 60 55 60 59

False positive (FP) 12 22 16 18 13 9 19 23

False negative (FN) 20 40 25 27 10 15 10 11

Positive predictive value (PPV) (%) 93.6 88.2 91.4 90.4 81.4 87.1 72.9 67.1

Negative predictive value (NPV) (%) 89.3 78.6 86.6 85.6 85.7 78.6 85.7 84.3

Sensitivity (%) 89.7 80.5 87.2 86.2 85.1 80.3 83.6 81.0

Specificity (%) 93.3 87.0 91.0 89.9 82.2 85.9 75.9 72.0

Overall accuracy (%) (ACC) 91.44 83.42 89.04 87.97 83.57 82.86 79.29 75.71

Kappa statistic 0.829 0.668 0.781 0.759 0.671 0.643 0.586 0.514

AUC 0.973 0.919 0.948 0.945 0.924 0.909 0.896 0.871

Fig. 4 ROC curve and AUC of the LogitBoost model, the RF model, theSVM model, and the KLR model using the validation dataset

648 M. S. Tehrany et al.

Page 13: A novel ensemble modeling approach for the spatial

and decision tree and applying it to forest fire susceptibilitymapping with a case study at Lao Cai Province atNorthwestern region of Vietnam. According to current litera-ture, DEDT has not been explored for forest fire susceptibilitymodeling. The GIS database, which was constructed with 257fire locations and ten forest conditioning factors, was used totrain and validate the proposed model. To derive forest firesusceptibility index, the LEDT was formulated as a patternrecognition model that predicts the pixels of the study areato the two classes, forest fire and non-forest fire.

Experimental outcomes show high performance of the pro-posed model, demonstrating that the LEDT is capable to pre-dict forest fire susceptible areas with high accuracy, whichcontribute to more trusty planning and management of pre-vention. The main advantage of the LEDT compared to othermachine learning techniques (i.e., ANN and SVM) is that theconfiguration and training process is simple where no sophis-ticated optimization is required. However, the performance ofthe LEDTmodel is influenced by the amount of the tree-basedclassifiers used; therefore, a trial and error test is required.

Compared to benchmarks, RF, SVM, and KLR, the perfor-mance of the LEDT model was superior, due to fact that the

LEDTmodel focused on processingmisclassified pixels in thefire susceptibility modeling by increasing weights of thesemisclassified pixels and decreasingweights of the correct clas-sified pixels. Consequently, the model works better withuncertainty data, which is a critical issue in forest firesusceptibility mapping because the data are often fromdifferent sources and at different resolutions. In addi-tion, LogitBoost ensemble is capable to handling noisedata with due to the use of a diversity of the decisiontree-based classifiers; the LEDT model is more robustand accurate than the benchmarks. These facts indicatethat the LEDT is a new valid tool for forest fire sus-ceptibility mapping.

In final conclusion, this research may assist other scientistsin the development of susceptibility maps for other areas, aswell as provide an approach applicable to geo-environmentalissues other than forest fire assessments.

Acknowledgements We would like to greatly thank the following insti-tutions for providing the data for this research: (1) Ministry of Agricultureand Rural Development (Vietnam), (2)Ministry of Natural Resources andEnvironment (Vietnam), (3) NOAA’s National Centers for EnvironmentalInformation (USA), and (4) U.S. Geological Survey.

Fig. 5 Forest fire susceptibility map for the study area

A novel ensemble modeling approach for the spatial prediction of tropical forest fire susceptibility using... 649

Page 14: A novel ensemble modeling approach for the spatial

Funding information This research was supported by the GeographicInformation Science Research Group, Ton Duc Thang University, HoChi Minh City, Vietnam.

Compliance with ethical standards

Conflict of interest The authors declare that there is no conflict ofinterest.

References

AdabH,Kanniah KD, Solaimani K (2013)Modeling forest fire risk in thenortheast of Iran using remote sensing and GIS techniques. NatHazards 65:1723–1743

Althuwaynee OF, Pradhan B, Park HJ, Lee JH (2014) A novel ensembledecision tree-based CHi-squared Automatic Interaction Detection(CHAID) and multivariate logistic regression models in landslidesusceptibility mapping. Landslides 11:1063–1078

Arnett JT, Coops NC, Daniels LD, Falls RW (2015) Detecting forestdamage after a low-severity fire using remote sensing at multiplescales. Int J Appl Earth Obs Geoinf 35:239–246

Ayalew L, Yamagishi H, Ugawa N (2004) Landslide susceptibility map-ping using GIS-based weighted linear combination, the case inTsugawa area of Agano River, Niigata Prefecture, Japan.Landslides 1:73–81. https://doi.org/10.1007/s10346-003-0006-9

Bassett M et al (2015) The effects of topographic variation and the fireregime on coarse woody debris: insights from a large wildfire. ForEcol Manag 340:126–134

Bednarik M, Magulová B, Matys M, Marschalko M (2010) Landslidesusceptibility assessment of the Kraľovany–Liptovský Mikuláš rail-way case study. Physics Chemistry Earth, Parts A/B/C 35:162–171.https://doi.org/10.1016/j.pce.2009.12.002

Belgiu M, Drăguţ L (2016) Random forest in remote sensing: a review ofapplications and future directions. ISPRS J Photogramm RemoteSens 114:24–31

Bian S, Wang W Investigation on diversity in homogeneous and hetero-geneous ensembles. In: Neural Netw, 2006. IJCNN’06. InternationalJoint Conference on, 2006. IEEE, pp 3078–3085

Breiman L (1996) Bagging predictors. Mach Learn 24:123–140Breiman L (2001) Random forests. Mach Learn 45:5–32Bui DT, Tuan TA, Hoang ND, Thanh NQ, Nguyen DB, Van Liem N,

Pradhan B (2017) Spatial prediction of rainfall-induced landslidesfor the Lao Cai area (Vietnam) using a hybrid intelligent approach ofleast squares support vector machines inference model and artificialbee colony optimization. Landslides 14:447–458

Cai YD, Feng KY, LuWC, Chou KC (2006) Using LogitBoost classifierto predict protein structural classes. J Theor Biol 238:172–176

Calle ML, Urrea V (2010) Letter to the editor: stability of random forestimportance measures. Brief Bioinform 12:86–89

Carmel Y, Paz S, Jahashan F, Shoshany M (2009) Assessing fire riskusing Monte Carlo simulations of fire spread. For Ecol Manag257:370–377

Carrara A, Guzzetti F (2013) Geographical information systems inassessing natural hazards vol 5. Springer Science&BusinessMedia,

Catani F, Lagomarsino D, Segoni S, Tofani V (2013) Landslide suscep-tibility estimation by random forests technique: sensitivity and scal-ing issues. Nat Hazards Earth Syst Sci 13:2815–2831

Chan JCW, Paelinckx D (2008) Evaluation of Random Forest andAdaboost tree-based ensemble classification and spectral band se-lection for ecotope mapping using airborne hyperspectral imagery.Remote Sens Environ 112:2999–3011. https://doi.org/10.1016/j.rse.2008.02.011

Chen Wet al (2017) A comparative study of logistic model tree, randomforest, and classification and regression tree models for spatial pre-diction of landslide susceptibility. Catena 151:147–160

Chung CJ, Fabbri AG (2008) Predicting landslides for risk analysis—spatial models tested by a cross-validation technique.Geomorphology 94:438–452

Chuvieco E et al (2010) Development of a framework for fire risk assess-ment using remote sensing and geographic information system tech-nologies. Ecol Model 221:46–58

Chuvieco E, Cocero D, Riano D, Martin P, Martınez-Vega J, de la Riva J,Perez F (2004) Combining NDVI and surface temperature for theestimation of live fuel moisture content in forest fire danger rating.Remote Sens Environ 92:322–331

Cloke H, Pappenberger F (2009) Ensemble flood forecasting: a review. JHydrol 375:613–626

Cortez P, Morais AdJR (2007) A data mining approach to predict forestfires using meteorological data. In: Neves J, Santos MF, Machado J(eds) New trends in artificial intelligence. Proceedings of the 13thEPIA 2007 - Portuguese Conference on Artificial Intelligence,December, Guimaraes, Portugal. APPIA, pp 512–523

De Comité F, Gilleron R, Tommasi M Learning multi-label alternatingdecision trees from texts and data. In: Int Workshop MachineLearning Data Mining Pattern Recognition, 2003. Springer, pp 35–49

Dettling M, Bühlmann P (2003) Boosting for tumor classification withgene expression data. Bioinformatics 19:1061–1069

Drobyshev I, Niklasson M, Linderholm HW (2012) Forest fire activity inSweden: climatic controls and geographical patterns in 20th century.Agric For Meteorol 154–155:174–186. https://doi.org/10.1016/j.agrformet.2011.11.002

FAO (2001) International handbook on forest fire protection. Technicalguide for the countries of the Mediterranean basin. DivisionAgriculture et Forêt Méditerranéennes, Groupement d’Aix enProvence, France

Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: astatistical view of boosting (with discussion and a rejoinder by theauthors). Ann Stat 28:337–407

Gao X, Fei X, Xie H Forest fire risk zone evaluation based on high spatialresolution RS image in Liangyungang Huaguo Mountain scenicspot. In: Spatial Data Mining and Geographical KnowledgeServices (ICSDM), 2011 IEEE International Conference on, 2011.IEEE, pp 593–596

Ghobadi GJ, Gholizadeh B, Dashliburun OM (2012) Forest fire risk zonemapping from geographic information system in northern forests of Iran(case study, Golestan province). Int J Agriculture Crop Sci 4:818–824

Gomez H, Kavzoglu T (2005) Assessment of shallow landslide suscep-tibility using artificial neural networks in Jabonosa River Basin,Venezuela. Eng Geol 78:11–27

Gonzalez-Olabarria JR, Brotons L, Gritten D, Tudela A, Teres JA (2012)Identifying location and causality of fire ignition hotspots in aMediterranean region. Int J Wildland Fire 21:905–914

Guo F, Su Z,Wang G, Sun L, Lin F, Liu A (2016)Wildfire ignition in theforests of southeast China: identifying drivers and spatial distribu-tion to predict wildfire likelihood. Appl Geogr 66:12–21

Hally B,Wallace L, Reinke K, Jones S (2016) Assessment of the utility ofthe advanced himawari imager to detect active fire over AustraliaInternational Archives of the Photogrammetry, Remote Sensing &Spatial Information Sciences 41

Higuera PE, Abatzoglou JT, Littell JS, Morgan P (2015) The changingstrength and nature of fire-climate relationships in the northernRocky Mountains, USA, 1902-2008. PLoS One 10:e0127563

Hong H, Naghibi SA, Dashtpagerdi MM, Pourghasemi HR, Chen W(2017a) A comparative assessment between linear and quadraticdiscriminant analyses (LDA-QDA) with frequency ratio andweights-of-evidence models for forest fire susceptibility mappingin China. Arab J Geosci 10:167

650 M. S. Tehrany et al.

Page 15: A novel ensemble modeling approach for the spatial

Hong H, Pradhan B, Bui DT, Xu C, Youssef AM, Chen W (2017b)Comparison of four kernel functions used in support vector ma-chines for landslide susceptibility mapping: a case study atSichuan area (China). Geomatics, Natural Hazards Risk 8:544–569

Hong H, Pradhan B, Xu C, Bui DT (2015) Spatial prediction of landslidehazard at the Yihuang area (China) using two-class kernel logisticregression, alternating decision tree and support vector machines.Catena 133:266–281

Hosmer D, Lemeshow S (2000) Applied logistic regression 2nd ednWiley-Interscience Publication. John Wiley, Hoboken, New Jersey,

Huebner K, Lindo Z, Lechowicz M (2012) Post-fire succession of col-lembolan communities in a northern hardwood forest. Eur J SoilBiol 48:59–65

Iba W, Langley P (1992) Induction of one-level decision trees. In:Machine learning proceedings 1992. Elsevier, pp 233–240

Jaakkola TS, Haussler D Probabilistic kernel regression models. In:AISTATS, 1999

Jahdi R et al (2014) Calibration of FARSITE fire area simulator in Iraniannorthern forests. Natural Hazards Earth System Sci Discussions 2:6201–6240

Jaiswal RK, Mukherjee S, Raju KD, Saxena R (2002) Forest fire riskzone mapping from satellite imagery and GIS. Int J Appl EarthObs Geoinf 4:1–10

Jolly WM, Cochrane MA, Freeborn PH, Holden ZA, Brown TJ,Williamson GJ, Bowman DM (2015) Climate-induced variationsin global wildfire danger from 1979 to 2013. Nat Commun 6

Kane VR et al (2015) Mixed severity fire effects within the Rim fire:relative importance of local climate, fire weather, topography, andforest structure. For Ecol Manag 358:62–79

Kantardzic M (2011) Data mining: concepts, models, methods, and algo-rithms. John Wiley & Sons, Hoboken, New Jersey

Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learn-ing: a review of classification techniques emerging artificial intelli-gence applications in computer engineering 160:3–24

Lamsal P, Kumar L, Shabani F, Atreya K (2017) The greening of theHimalayas and Tibetan Plateau under climate change. Glob PlanetChang 159:77–92

Le TH, Nguyen TNT, Lasko K, Ilavajhala S, Vadrevu KP, Justice C(2014) Vegetation fires and air pollution in Vietnam. EnvironPollut 195:267–275

Lee S, Oh HJ (2012) Ensemble-based landslide susceptibility maps inJinbu area, Korea. In: Terrigenous mass movements. Springer, pp193–220

Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: a pythontoolbox to tackle the curse of imbalanced datasets in machine learn-ing. J Machine Learning Res 18:559–563

Lin H, Liu X, Wang X, Liu Y (2018) A fuzzy inference and big dataanalysis algorithm for the prediction of forest fire based on recharge-able wireless sensor networks. Sustainable Computing: InformaticsSystems 18:101–111. https://doi.org/10.1016/j.suscom.2017.05.004

Marjanovic M, Kovacevic M, Bajat B, Vozenílek V (2011) Landslide sus-ceptibility assessment using SVM machine learning algorithm. EngGeol 123:225–234. https://doi.org/10.1016/j.enggeo.2011.09.006

Martínez-Álvarez F, Reyes J, Morales-Esteban A, Rubio-Escudero C(2013) Determining the best set of seismicity indicators to predictearthquakes. Two case studies: Chile and the Iberian Peninsula.Knowl-Based Syst 50:198–210

Massada AB, Syphard AD, Stewart SI, Radeloff VC (2013) Wildfireignition-distribution modelling: a comparative study in the Huron–Manistee National Forest, Michigan, USA. Int J Wildland Fire 22:174–183

Maudes J, Rodríguez JJ, García-Osorio C, García-Pedrajas N (2012)Random feature weights for decision tree ensemble construction.Information Fusion 13:20–30

Micheletti N, Foresti L, Robert S, Leuenberger M, Pedrazzini A,Jaboyedoff M, Kanevski M (2014) Machine learning feature

selection methods for landslide susceptibility mapping. MathGeosci 46:33–57

Ministry of Agriculture and Rural Development of Vietnam (2016) TheVietnam’s FireWatch system for onlinemonitoring andmanagementof forest fires, http://www.kiemlam.org.vn/firewatchvn Ministry ofAgriculture and Rural Development of Vietnam Accessed 12/4/2016 2016

Mojaddadi H, Pradhan B, Nampak H, Ahmad N, AHb G (2017)Ensemble machine-learning-based geospatial approach for floodrisk assessment using multi-sensor remote-sensing data and GIS.Geomatics, Natural Hazards Risk 8:1080–1102

Motazeh AG, Ashtiani EF, Baniasadi R, Choobar FM (2013) Rating andmapping fire hazard in the hardwood Hyrcanian forests using GISand expert choice software Acknowledgement to reviewers of themanuscripts submitted to Forestry Ideas in 2013:141

Nami M, Jaafari A, Fallah M, Nabiuni S (2018) Spatial prediction ofwildfire probability in the Hyrcanian ecoregion using evidential be-lief function model and GIS. Int J Environ Sci Technol 15:373–384

Nepstad DC, Stickler CM, Soares-Filho B, Merry F (2008) Interactionsamong Amazon land use, forests and climate: prospects for a near-term forest tipping point. Philosophical Trans Royal Soc B:Biological Sci 363:1737–1746

Ngoc-Thach N, Bao-Toan Ngo D, Xuan-Canh P, Hong-Thi N, Hang ThiB, Nhat-Duc H, Tien Bui D (2018) Spatial pattern assessment oftropical forest fire danger at Thuan Chau area (Vietnam) using GIS-based advanced machine learning algorithms: a comparative study.Eco Inform 46:74–85

North M, Stephens S, Collins B, Agee J, Aplet G, Franklin J, Fulé P(2015) Reform forest fire management. Science 349:1280–1281

Oliveira S, Oehler F, San-Miguel-Ayanz J, Camia A, Pereira JM (2012)Modeling spatial patterns of fire occurrence in MediterraneanEurope using Multiple Regression and Random Forest vol 275

Parisien MA, Snetsinger S, Greenberg JA, Nelson CR, Schoennagel T,Dobrowski SZ, Moritz MA (2012) Spatial variability in wildfireprobability across the western United States. Int J Wildland Fire21:313–327

Pausas JG, Belliure J, Mínguez E, Montagud S (2018) Fire benefits flow-er beetles in a Mediterranean ecosystem. PLoS One 13:e0198951

Pellegrini AF et al (2017) Convergence of bark investment according tofire and climate structures ecosystem vulnerability to future change.Ecol Lett 20:307–316

Perner P (2018) Machine learning and data mining in pattern recognition:14th International Conference, MLDM 2018, New York, NY, USA,July 15–19, 2018, Proceedings vol 10935. Springer

Petropoulos GP, Vadrevu KP, Xanthopoulos G, Karantounias G, ScholzeM (2010) A comparison of spectral angle mapper and artificial neu-ral network classifiers combined with Landsat TM imagery analysisfor obtaining burnt area mapping. Sensors 10:1967–1985

Pham BT, Bui DT, Dholakia M, Prakash I, Pham HV (2016) A compar-ative study of least square support vector machines and multiclassalternating decision trees for spatial prediction of rainfall-inducedlandslides in a tropical cyclones area. Geotech Geol Eng 34:1807–1824

Pham BT, Prakash I, Tien Bui D (2018a) Spatial prediction of landslidesusing a hybrid machine learning approach based on random sub-space and classification and regression trees. Geomorphology 303:256–270 Pham BT, Tien Bui D, Prakash I (2018b) Bagging basedSupport Vector Machines for spatial prediction of landslides.Environ Earth Sci 77:146

Pourghasemi HR (2016) GIS-based forest fire susceptibility mapping inIran: a comparison between evidential belief function and binarylogistic regression models. Scand J For Res 31:80–98

Pourghasemi HR, Beheshtirad M, Pradhan B (2016) A comparative as-sessment of prediction capabilities of modified analytical hierarchyprocess (M-AHP) and Mamdani fuzzy logic models using Netcad-

A novel ensemble modeling approach for the spatial prediction of tropical forest fire susceptibility using... 651

Page 16: A novel ensemble modeling approach for the spatial

GIS for forest fire susceptibility mapping. Geomatics, NaturalHazards Risk 7:861–885

Pourghasemi HR, Pradhan B, Gokceoglu C (2012) Application of fuzzylogic and analytical hierarchy process (AHP) to landslide suscepti-bility mapping at Haraz watershed, Iran. Nat Hazards 63:965–996

Pourtaghi ZS, Pourghasemi HR, Rossi M (2015) Forest fire susceptibilitymapping in the Minudasht forests, Golestan province, Iran. EnvironEarth Sci 73:1515–1533

Pradhan B (2013) A comparative study on the predictive ability of thedecision tree, support vector machine and neuro-fuzzy models inlandslide susceptibility mapping using GIS. Comput Geosci 51:350–365. https://doi.org/10.1016/j.cageo.2012.08.023

Pradhan B, Dini Hairi Bin Suliman M, Arshad Bin Awang M (2007)Forest fire susceptibility and risk mapping using remote sensingand geographical information systems (GIS). Disaster PreventionManagement: An Int J 16:344–352

Prasad VK, Badarinath K, Eaturu A (2008) Biophysical and anthropo-genic controls of forest fires in the Deccan Plateau. India J EnvironManagement 86:1–13

Product GGD, Reserve NN, Areas PP (2002) Assessment of the special-use forest system and its management in Lao Cai Province

Ramirez-Cabral NYZ, Kumar L, Shabani F (2018) Suitable areas ofPhakopsora pachyrhizi, Spodoptera exigua, and their host plantPhaseolus vulgaris are projected to reduce and shift due to climatechange. Theor Appl Climatol:1–16

Ramirez-Cabral NYZ, Kumar L, Shabani F (2017) Global risk levels forcorn rusts (Puccinia sorghi and Puccinia polysora) under climatechange projections. J Phytopathol 165:563–574

Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a newclassifier ensemble method. IEEE Trans Pattern Anal Mach Intell28:1619–1630

Schapire RE, Singer Y (1999) Improved boosting algorithms usingconfidence-rated predictions. Mach Learn 37:297–336

Scholkopf B, Sung KK, Burges CJ, Girosi F, Niyogi P, Poggio T, VapnikV (1997) Comparing support vector machines with Gaussian ker-nels to radial basis function classifiers. IEEE Trans Signal Process45:2758–2765

Setiawan I, Mahmud A, Mansor S, Mohamed Shariff A, Nuruddin A(2004) GIS-grid-based and multi-criteria analysis for identifyingand mapping peat swamp forest fire hazard in Pahang, Malaysia.Disaster Prevention Management: An Int J 13:379–386

Shabani F, Kumar L, Ahmadi M (2017) Climate modelling shows in-creased risk to Eucalyptus sideroxylon on the Eastern Coast ofAustralia compared to Eucalyptus albens. Plants 6:58

Sheng VS, Gu B, FangW,Wu J (2014) Cost-sensitive learning for defectescalation. Knowl-Based Syst 66:146–155

Skakun S, Kussul N, Shelestov AY, Lavreniuk M, Kussul O (2016)Efficiency assessment of multitemporal C-band Radarsat-2 intensityand Landsat-8 surface reflectance satellite imagery for crop classifi-cation in Ukraine. IEEE J Selected Topics Appl Earth ObservationsRemote Sensing 9:3712–3719

Sugumaran V, Muralidharan V, Ramachandran K (2007) Feature selec-tion using decision tree and classification through proximal supportvector machine for fault diagnostics of roller bearing. Mech SystSignal Process 21:930–942

Tehrany M, Jones S (2017) Evaluating the variations in the flood suscep-tibility maps accuracies due to the alterations in the type and extentof the flood inventory ISPRS-International Archives of thePhotogrammetry, Remote Sensing and Spatial InformationSciences:209–214

Tehrany MS, Kumar L, Drielsma MJ (2017) Review of native vegetationcondition assessment concepts, methods and future trends. J NatConserv 40:12–23

Tehrany MS, Lee M-J, Pradhan B, Jebur MN, Lee S (2014a) Floodsusceptibility mapping using integrated bivariate and multivariatestatistical models. Environ Earth Sci 72:4001–4015

Tehrany MS, Pradhan B, Jebur MN (2013) Spatial prediction of floodsusceptible areas using rule based decision tree (DT) and a novelensemble bivariate and multivariate statistical models in GIS. JHydrol 504:69–79

TehranyMS, Pradhan B, JeburMN (2014b) Flood susceptibility mappingusing a novel ensemble weights-of-evidence and support vector ma-chine models in GIS. J Hydrol 512:332–343

TehranyMS, Pradhan B,Mansor S, AhmadN (2015) Flood susceptibilityassessment using GIS-based support vector machine model withdifferent kernel types. Catena 125:91–101

Teodoro AC, Duarte L (2013) Forest fire risk maps: a GIS open sourceapplication–a case study in Norwest of Portugal. Int J Geogr Inf Sci27:699–720

Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012a) Landslide suscep-tibility assessment in vietnam using support vector machines, deci-sion tree, and Naive Bayes Models Mathematical Problems inEngineering 2012

Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2012b)Application of support vector machines in landslide susceptibilityassessment for the Hoa Binh province (Vietnam) with kernel func-tions analysis

Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick ØB (2013) Regionalprediction of landslide hazard using probability analysis of intenserainfall in the Hoa Binh province, Vietnam. Nat Hazards 66:707–730

Tien Bui D, Ho TC, Pradhan B, Pham BT, Nhu VH, Revhaug I (2016a)GIS-based modeling of rainfall-induced landslides using data min-ing based functional trees classifier with AdaBoost, bagging, andMultiBoost ensemble frameworks. Environ Earth Sci 75:1101–1123

Tien Bui D, Le K-TT, Nguyen VC, Le HD, Revhaug I (2016b) Tropicalforest fire susceptibility mapping at the cat Ba national park area,Hai Phong City, Vietnam, using GIS-based kernel logistic regres-sion. Remote Sens 8:347

Tien Bui D, Tuan TA, Klempe H, Pradhan B, Revhaug I (2016c) Spatialprediction models for shallow landslide hazards: a comparative as-sessment of the efficacy of support vector machines, artificial neuralnetworks, kernel logistic regression, and logistic model tree.Landslides 13:361–378

Tien Bui D, Le HV, Hoang N-D (2018a) GIS-based spatial prediction oftropical forest fire danger using a new hybrid machine learningmethod. Eco Inform 48:104–116

Tien Bui D et al (2018b) Land subsidence susceptibility mapping inSouth Korea using machine learning algorithms. Sensors 18:2464

Trinh PTet al (2012) Late Quaternary tectonics and seismotectonics alongthe Red River fault zone, North Vietnam. Earth Sci Rev 114:224–235

Truong XL et al (2018) Enhancing prediction performance of landslidesusceptibility model using hybrid machine learning approach ofbagging ensemble and logistic model tree. Appl Sci 8:1046

Vafaei S, Soosani J, Adeli K, Fadaei H, Naghavi H, Pham TD, Tien Bui D(2018) Improving accuracy estimation of forest aboveground bio-mass based on incorporation of ALOS-2 PALSAR-2 and Sentinel-2A imagery and machine learning: a case study of the Hyrcanianforest area (Iran). Remote Sens 10:172

Valdes G, Luna JM, Eaton E, Simone CB II, Ungar LH, Solberg TD(2016) MediBoost: a patient stratification tool for interpretable de-cision making in the era of precision medicine. Sci Rep 6:37854

Verde J, Zêzere J (2010) Assessment and validation of wildfire susceptibilityand hazard in Portugal. Nat Hazards Earth Syst Sci 10:485–497

Vilar L, Woolford DG, Martell DL, Martín MP (2010) A model forpredicting human-caused wildfire occurrence in the region ofMadrid, Spain. Int J Wildland Fire 19:325–337

Wallace L, Gupta V, Reinke K, Jones S (2016) An assessment of pre- andpost fire near surface fuel hazard in an Australian dry sclerophyllforest using point cloud data captured using a terrestrial laser scan-ner. Remote Sens 8:679

652 M. S. Tehrany et al.

Page 17: A novel ensemble modeling approach for the spatial

Webb GI (2000) Multiboosting: a technique for combining boosting andwagging. Mach Learn 40:159–196

Westerling AL, Turner MG, Smithwick EA, Romme WH, Ryan MG(2011) Continued warming could transform Greater Yellowstonefire regimes by mid-21st century. Proc Natl Acad Sci 108:13165–13170

Wickramasinghe CH, Jones S, Reinke K,Wallace L (2016) Developmentof a multi-spatial resolution approach to the surveillance of activefire lines using Himawari-8. Remote Sens 8:932

Wotton BM, Nock CA, Flannigan MD (2010) Forest fire occurrence andclimate change in Canada. Int J Wildland Fire 19:253–271

Xiao J, Shen Y, Ge J, Tateishi R, Tang C, Liang Y, Huang Z (2006)Evaluating urban expansion and land use change in Shijiazhuang,China, by using GIS and remote sensing. Landsc Urban Plan 75:69–80

Yilmaz I (2009) Landslide susceptibility mapping using frequency ratio,logistic regression, artificial neural networks and their comparison: acase study from Kat landslides (Tokat—Turkey). Comput Geosci35:1125–1138

Yuan C, Zhang Y, Liu Z (2015) A survey on technologies for automaticforest fire monitoring, detection, and fighting using unmanned aerialvehicles and remote sensing techniques. Can J For Res 45:783–792

A novel ensemble modeling approach for the spatial prediction of tropical forest fire susceptibility using... 653