calibration of logistic regression coefficients from ...€¦ · as logistic regression,...

7
-515- 1 INTRODUCTION Recent landslide disasters caused by heavy rainfall in Gansu (China, August 2010) and Bududa (Uganda, March 2010) and earthquakes in Haiti (January 12 th , 2010) and Yushu (China, April 14 th , 2010) made clear that after such events landslide susceptibility and risk maps were urgently needed to identify the areas with the highest potential to have suffered serious damage from landslides (CRITECH Team - GlobeSec - IPSC - JRC, 2010). However, such maps are often not available or accessible to authorities needing them. We therefore present a method to quickly produce a landslide susceptibility map with limited landslide information. As this study contributes to the identification of landslide hazard and risk hotspots in Europe within the EU- FP7 project SafeLand (http://www.safeland-fp7.eu), the study area selected contains the whole of Europe and extends up to the Ural and Caucasus mountains. However, if tested successfully, the proposed method could be used in other regions where landslide susceptibility maps are urgently needed and relatively complete landslide inventories are not available. 2 MATERIALS AND METHODS In order to identify landslide hotspots in Europe, three models are being produced; i.e. a heuristic, a statistical and a physically-based model. In this paper, development and testing of a statistical model is presented. From our previous experience in application of different statistical models (Van Den Eeckhaut et al., 2006; 2009; 2010) we suggest that with reliable input data and full understanding of the pre-processing (e.g. classification of independent variables if necessary, and selection of representative calibration and validation samples) and the modelling, results of statistical models such as logistic regression, discriminant analysis, and weights of evidence are more or less in agreement. As logistic regression is one of the statistical models most frequently used for assessing landslide susceptibility nowadays (e.g. Carrara et al., 1995; Van Den Eeckhaut et al., 2006; 2010), we applied Calibration of logistic regression coefficients from limited landslide inventory data for European-wide landslide susceptibility modelling M. Van Den Eeckhaut & J. Hervás Institute for Environment and Sustainability, Joint Research Centre (JRC), European Commission, Ispra, Italy C. Jaedicke Norwegian Geotechnical Institute (NGI), International Centre for Geohazards (ICG), Oslo, Norway J.-P. Malet National Centre for Scientific Research (CNRS), Strasbourg, France L. Picarelli AMRA Scarl, Naples, Italy ABSTRACT: In the collaborative activity of the EU-FP7 SafeLand project dealing with the identification of landslide hazard and risk hotspots in Europe, we applied logistic regression for landslide susceptibility modelling. The main focus was put on the calibration of the regression coefficients from a limited landslide inventory. First a representative sample of landslide-affected and landslide-free grid cells was created. Given that currently no European landslide inventory is available, landslide locations were collected from analysis of Google Earth images, and extracted from scientific publications and landslide databases available for Norway, a region around Naples, Italy and the Barcelonnette Basin, France. Then, a procedure taking account of the incompleteness of the landslide inventory and the high proportion of plain areas in Europe was set up to select landslide-free grid cells. Topographical, lithological and soil parameters were selected as input variables for the stepwise logistic regression. The model with the highest area under the ROC curve was selected for producing the landslide susceptibility map of Europe. The map was reclassified and validated by visual comparison with national landslide inventory or susceptibility maps available in literature. First results are promising, although low gradient areas known to be affected by landslides are not always classified as highly susceptible. Further validation is needed to fully evaluate and improve the methodology followed and the landslide susceptibility map produced.

Upload: others

Post on 09-Jul-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Calibration of logistic regression coefficients from ...€¦ · as logistic regression, discriminant analysis, and weights of evidence are more or less in agreement. As logistic

-515-

1 INTRODUCTION

Recent landslide disasters caused by heavy rainfall in Gansu (China, August 2010) and Bududa (Uganda, March 2010) and earthquakes in Haiti (January 12th, 2010) and Yushu (China, April 14th, 2010) made clear that after such events landslide susceptibility and risk maps were urgently needed to identify the areas with the highest potential to have suffered serious damage from landslides (CRITECH Team - GlobeSec - IPSC - JRC, 2010). However, such maps are often not available or accessible to authorities needing them. We therefore present a method to quickly produce a landslide susceptibility map with limited landslide information. As this study contributes to the identification of landslide hazard and risk hotspots in Europe within the EU-FP7 project SafeLand (http://www.safeland-fp7.eu), the study area selected contains the whole of Europe and extends up to the Ural and Caucasus mountains. However, if tested successfully, the proposed method could be used in other regions where landslide susceptibility maps are urgently needed

and relatively complete landslide inventories are not available.

2 MATERIALS AND METHODS

In order to identify landslide hotspots in Europe, three models are being produced; i.e. a heuristic, a statistical and a physically-based model. In this paper, development and testing of a statistical model is presented. From our previous experience in application of different statistical models (Van Den Eeckhaut et al., 2006; 2009; 2010) we suggest that with reliable input data and full understanding of the pre-processing (e.g. classification of independent variables if necessary, and selection of representative calibration and validation samples) and the modelling, results of statistical models such as logistic regression, discriminant analysis, and weights of evidence are more or less in agreement. As logistic regression is one of the statistical models most frequently used for assessing landslide susceptibility nowadays (e.g. Carrara et al., 1995; Van Den Eeckhaut et al., 2006; 2010), we applied

Calibration of logistic regression coefficients from limited landslide inventory data for European-wide landslide susceptibility modelling

M. Van Den Eeckhaut & J. Hervás Institute for Environment and Sustainability, Joint Research Centre (JRC), European Commission, Ispra, Italy

C. Jaedicke Norwegian Geotechnical Institute (NGI), International Centre for Geohazards (ICG), Oslo, Norway

J.-P. Malet National Centre for Scientific Research (CNRS), Strasbourg, France L. Picarelli AMRA Scarl, Naples, Italy

ABSTRACT: In the collaborative activity of the EU-FP7 SafeLand project dealing with the identification of landslide hazard and risk hotspots in Europe, we applied logistic regression for landslide susceptibility modelling. The main focus was put on the calibration of the regression coefficients from a limited landslide inventory. First a representative sample of landslide-affected and landslide-free grid cells was created. Given that currently no European landslide inventory is available, landslide locations were collected from analysis of Google Earth images, and extracted from scientific publications and landslide databases available for Norway, a region around Naples, Italy and the Barcelonnette Basin, France. Then, a procedure taking account of the incompleteness of the landslide inventory and the high proportion of plain areas in Europe was set up to select landslide-free grid cells. Topographical, lithological and soil parameters were selected as input variables for the stepwise logistic regression. The model with the highest area under the ROC curve was selected for producing the landslide susceptibility map of Europe. The map was reclassified and validated by visual comparison with national landslide inventory or susceptibility maps available in literature. First results are promising, although low gradient areas known to be affected by landslides are not always classified as highly susceptible. Further validation is needed to fully evaluate and improve the methodology followed and the landslide susceptibility map produced.

Page 2: Calibration of logistic regression coefficients from ...€¦ · as logistic regression, discriminant analysis, and weights of evidence are more or less in agreement. As logistic

-516-

stepwise logistic regression (SLR; Allison, 2001; Kleinbaum & Klein, 2002) to calibrate the weights attributed to the important independent variables. Table 1 provides an overview of the seven independent variables used in this study. A multicollinearity analysis indicated that these variables were not interdependent so that they could be introduced all together in the SRL. The base maps of the independent variables were all rescaled to the cell size of the 30 arcsec (ca. 930 m) GTOPO30 DEM available for the whole of Europe in the SafeLand project, and the categorical variables lithology, geological age, soil type and land cover, which had a high number of classes, were reclassified into 7 to 12 classes.

Table 1: Input variables used in the logistic regression analysis. __________________________________________________ Dependent variable: Data provider Landslide locations __________________________________________________ Google Earth dataset (Europe) JRC Norway NGI / ICG Naples region (9200 km2) AMRA Barcelonnette Basin (200 km2) CNRS __________________________________________________ Independent variables Data (resolution; reference) __________________________________________________ Slope gradient SRTM v2 and GTOPO30

DEMs (30 arcsec) Standard deviation of slope SRTM v2 and GTOPO30

DEMs (calculated from slope using a 3 x 3 kernel size; 30 arcsec)

Geology: material type IGME5000 (1:5M; Asch, (12 classes) 2005 ) Geology: age IGME5000 (1:5M; Asch,

(9 classes) 2005) Soil type Map of World Soil Resource

(8 classes) (5 arcmin; FAO, EC, ISRIC, 2003)

Land use GlobCover 2004-2006 (7 classes) (10 arcsec; ESA, 2008)

Soil moisture JRC (0.5°; Laguardia & Niemeyer, 2008)

Lakes * JRC (Vogt et al., 2007) __________________________________________________ * For classification of lakes on landslide susceptibility map The binary dependent variable used in the SLR is the presence (1) or absence (0) of a slide or flow because rock falls were not included in the landslide sample. First, a representative sample of landslide-affected and landslide-free grid cells was collected. A landslide inventory was produced in Google Earth and landslides were mapped as point locations. To be representative, as many landslide locations as possible were collected from all over Europe within a limited period. Not only fresh landslides in mountain regions which are generally clearly visible on Google Earth images, but also landslide locations reported in about 40 scientific publications were included in this inventory. In total, this Google Earth landslide inventory contained 1090 landslides in May 2010 (Fig. 1). Apart from this inventory,

landslide locations were available for Norway, a 9200 km2 region around Naples, Italy and the 200 km2 Barcelonnette Basin, France (Table 1). From these inventories a random sample of 100 landslides in Norway, 100 landslides in the Naples region and 50 landslides in the Barcelonnette Basin were extracted. Hence, our final sample contained 1340 landslide-affected grid cells.

Additionally, a procedure was set up to select landslide-free grid cells. Note that although we use the term landslide-free grid cell, it is not possible to be completely sure about the absence of a landslide. The strategy followed starts from the idea that the limited landslide inventory and the large proportion of low relief areas in Europe do not allow random selection of landslide-free grid cells. Therefore 41 sampling areas mainly located in mountainous and hilly regions all over Europe were defined (Table 2).

Table 2: Location of the 41 sampling areas where the landslide inventory was suggested complete and landslide-free grid cells were selected. ______________________________________________ Sampling area Number ______________________________________________ Albania, Pindus mountains 4 Belgium, Flemish Ardennes 1 Bulgaria, Balkan mountains 1 Denmark, inland 1 France, Alps 1 France, Massif Central 1 Germany, plain 1 Germany, Bavarian Alps 1 Germany, Black Forest 1 Ireland 2 Italy, Alps 2 Italy, Apennines 4 Italy, Sicily 2 Norway, Scandinavian mountains 1 Poland, Carpathians 1 Poland, Sudetes 2 Romania, Carpathians 1 Slovakia, Carpathians 2 Spain, Centre 2 Spain, Pyrenees 4 Sweden 1 Switzerland, Alps 1 UK, Pennines 3 ____________________________________________

For these sampling areas the produced landslide

inventory was as complete as possible so that we could suggest that the inventory contained most of the existing landslides in these areas and that landslide-free grid cells could be selected in each of the areas.

To increase the certainty of selecting landslide-free grid cells outside a mapped landslide a 1 km buffer was drawn around each landslide, and finally a random selection of 1827 landslide-free grid cells was made.

Page 3: Calibration of logistic regression coefficients from ...€¦ · as logistic regression, discriminant analysis, and weights of evidence are more or less in agreement. As logistic

-517-

3. RESULTS

2.1 Logistic regression model Several models using different combinations of reference categories for the categorical variables were calibrated and evaluated. The finally selected model indicates that the presence of (1) high slope gradients and high variability in slope gradient (high standard deviation of slope); (2) sedimentary rocks (shale, siltstone and sandstone) and mafic igneous rocks (basalt and gabbro); (3) Quaternary formations; (4) Cambisol, Luvisol and Leptosol soils; and (5) forests favour landslide occurrence; and that (1) old Silurian to Phanerozoic formations; (2) Chernozems, Katstanozem and Phaeozem soils; and (3) urban land cover are generally less

susceptible to landsliding. Topographical variables are the most important factors controlling the occurrence of many, though not all (see further), of the European landslides. For this model the area under the ROC curve (AUC; Swets, 1988) is 0.888, which indicates excellent discrimination of the landslide-affected and landslide-free grid cells in our sample. Hence, the model was able to correctly classify a high proportion of the landslide sample without incorrectly classifying a high proportion of the landslide-free sample.

The stability of the model was further tested by producing 10 logistic regression models using each time 75% of the sample for calibration and the remaining 25% for validation. The ROC curves for the calibration and validation datasets of this ensemble of models are similar and all have AUC

Figure 1. Landslide inventory used for calibration of the logistic regression model.

Page 4: Calibration of logistic regression coefficients from ...€¦ · as logistic regression, discriminant analysis, and weights of evidence are more or less in agreement. As logistic

-518-

values between 0.880 and 0.937. Eleven of the 12 independent variables included in the full model are also included in at least 7 of the 10 models calibrated from 75% of the data. ‘Litho1’ (sedimentary rocks such as shale, siltstone and sandstone) is the remaining independent variable and is included in 5 of the 10 models only. In addition to the 12 independent variables present in the full model, 6 other variables are included in some of the 10 models, but never more than two times and thus they are not very important for European-wide landslide susceptibility. However, given the limited sample of landslide-affected and landslide-free grid cells the repeated calibration and validation of logistic regression models with part of our sample only provides information on the stability of the model for assigning correct probabilities of landslide occurrence to our sample set, and does not provide information on the ability of the model to assign landslide probabilities to the whole selected study area. For this a larger landslide inventory of Europe is needed, or alternatively, as is presented in the next section, the landslide susceptibility map could be validated by visual comparison with excerpts from national landslide inventory or susceptibility maps available for some countries.

2.2 Landslide susceptibility map For the classification of landslide susceptibility maps currently no rules on for example the number and name of the classes and the class boundaries exist. The classification scheme used in this study contains five classes (Fig. 2) and is designed to have little areas with very high susceptibility (i.e. < 1%) and more areas with high (i.e. 4%) and moderate (i.e. 8.9%) susceptibility. Areas assigned moderate landslide susceptibility are regarded as areas where relatively regularly landslides are expected. Together these three classes contain about 85% of the landslides in our sample. Almost 60% are located in areas with very high and high susceptibility.

The classified landslide susceptibility map was then visually compared with some published landslide inventory and susceptibility maps. For Norway, for example, an overlay of the national inventory, which includes more than 19,000 landslides and avalanches, with the classified susceptibility map (Fig. 3A) showed that 41% of the slope failures are located in very high or high susceptible areas and 31.4% in moderate susceptible areas. This is considered a good result taking into account that part of the landslides in Norway are located in low relief areas with quick clays (Bentley & Smalley, 1984).

For United Kingdom and Ireland (Fig. 3B) the classified landslide susceptibility map was visually compared with landslide inventories published by

the British Geological Survey (Foster et al., 2009) and the Irish Landslides Working Group (2006). For both countries characterized by hilly regions it seems that areas classified with low susceptibility on the European landslide susceptibility map can also be affected by landslides, and that only areas classifiedwith very low susceptibility are (generally) truly stable areas.

Visual comparison was also carried out for Italy (Fig. 3C) and good agreement with a map of the AVI landslide database (Guzzetti & Tonelli, 2004) and with a landslide density map produced from the IFFI landslide database (APAT, 2007) was found.

4. DISCUSSION AND CONCLUSIONS

So far only few attempts have been made to assess landslide susceptibility, hazard or risk on a global or continental scale. Global estimates of landslide hotspot areas have been made by Nadim et al. (2006), Hong et al. (2007) and Kirschbaum et al. (2009) and a European landslide susceptibility map is under construction by the JRC landslide expert group (Hervás, 2007; Günther et al., 2008). In this study, a method was tested to quickly produce a landslide susceptibility map using a sampling strategy enabling statistical modelling even if no complete landslide inventory is available. Indeed, the absence of a more or less complete global or continental landslide inventory currently hampers calibration of statistical landslide susceptibility and hazard models. This explains why so far only heuristic models have been used.

Application of our sampling method followed by SLR resulted in a susceptibility model including slope, variation in slope, and important lithological, soil and land cover classes. The topographical variables have most influence on the model. However, according to the model also the presence of forest contributes to landslide occurrence. This might be related to the fact that active landslides surrounded by forest are easier to distinguish on Google Earth images and that older dormant or relict landslides collected through analysis of scientific publications are indeed currently located under forest because their hummocky or disrupted topography does not allow other land use practices. The attribution of a high susceptibility to Quaternary sedimentary lithologies is in agreement with the weight attributed by Nadim et al. (2006) to this lithology class.

Our results suggest that statistical modelling of landslide susceptibility for a large territory such as Europe is possible if at least two conditions with regard to the dependent variable are met.

Page 5: Calibration of logistic regression coefficients from ...€¦ · as logistic regression, discriminant analysis, and weights of evidence are more or less in agreement. As logistic

-519-

Figure 2. Classified landslide susceptibility map of Europe.

.

Figure 3. Excerpts of the landslide susceptibility map of Europe: (A) Norway; (B) UK and Ireland; and (C) Italy.

Page 6: Calibration of logistic regression coefficients from ...€¦ · as logistic regression, discriminant analysis, and weights of evidence are more or less in agreement. As logistic

-520-

The first condition is the presence of a representative dataset of landslides collected throughout various landslide-affected regions within the study area, without necessarily being complete. Within a limited time period we collected as many landslide locations in Europe as possible. From the results obtained, however, it can be suggested that inclusion of additional landslide locations in Southern Spain, Portugal, Central France and several regions in Eastern Europe would improve the results obtained. The second condition needed to be fulfilled concerns the selection of a representative dataset of grid cells that are currently not affected by landsliding, in this study defined as ‘landslide-free’. We made an expert-driven selection in representative sampling areas all over Europe (Table 2) instead of applying a random selection over Europe, because the latter (1) results in a large number of landslide-free grid cells selected in low slope gradient areas and (2) has a higher probability of assigning incorrectly landslide-affected grid cells as landslide-free. In the sampling areas used for selection of landslide-free grid cells the landslide inventory should be as complete as possible. The main conclusions drawn from the comparison of our European landslide susceptibility map with national landslide inventory and susceptibility maps published is that for mountain areas our map seems to succeed in classifying the landslide susceptible areas. For lower relief areas (e.g. hilly regions, river valleys, cliff coast lines and areas with quick clay), however, this is not always the case. This is partly due to the DEM used in this study (Table 1) and can also be related to the rather low presence of landslides in hilly regions in our limited sample. It is due to the low number of landslides in plains and hilly regions that it was not possible to produce different landslide susceptibility maps for plain and mountain regions as was done by Malet et al. (2009). Not only the quality and resolution of the DEM are important. It is clear that the availability of more accurate maps for all thematic data will allow production of more accurate landslide susceptibility maps. Finally, it can also be concluded that the fast method proposed for producing a landslide susceptibility map of Europe could be used in other regions where such a map is urgently needed and relatively complete landslide inventories are not available.

ACKNOWLEDGEMENTS

This study was carried out in the framework of the EU-FP7 project SafeLand: Living with landslide risk in Europe: Assessment, effects of global change, and risk management strategies (Grant Agreement 226479). The authors thank all the project partners that have contributed to the collection of the thematic data.

REFERENCES

Allison, P.D. 2001. Logistic Regression Using the SAS System: Theory and Application. New York: Wiley Interscience.

APAT 2007. Rapporto sulle frane in Italia: il progetto IFFI, metodologia, risultati e rapporti regionali. Rome, Italy: Agenzia per la protezione dell’ambiente e per i servizi tecnici.

Asch, K. 2005. The 1:5 Million International Geological Map of Europe and Adjacent Areas (IGME5000) map. Hannover, Germany: Bundesanstalt für Geowissenschaften und Rohstoffe.

Bentley, S. P. & Smalley, I. J. 1984. Landslips in sensitive clays. In D. Brunsden & D. B. Prior (eds.), Slope instability. John Wiley and Sons Ltd.

Carrara, A., Cardinali, M., Guzzetti, F., Reichenbach, P. 1995. GIS technology in mapping landslide hazard. In A. Carrara & F. Guzzetti (eds.), Geographical Information Systems in Assessing Natural Hazards. Kluwer Academic.

CRITECH Team - GlobeSec - IPSC – JRC 2010. HAITI Earthquakes: landslide risk map. http://www.gdacs.org/documents/JRC_slopemap_20100114.pdf

ESA 2008. GLobCover 2004 – 2006. Paris, France: European Space Agency.

FAO, EC, ISRIC 2003. WRB Map of World Soil Resources, 1:25 000 000. Rome, Italy.

Foster, C., Gibson, A., Wildman, G. 2009. The new National Landslide Database and Landslide Hazard Assessment of Great Britain. In Proceedings of the First World Landslide Forum, Tokyo, 18-21 November 2008: 203-206.

Günther, A., Reichenbach, P., Hervás, J. 2008. Approaches for Delineating Areas Susceptible to Landslides in the Framework of the European Soil Thematic Strategy. In Proceedings of the First World Landslide Forum, Tokyo, 18-21 November 2008: 235-238.

Guzzetti, F. & Tonelli, G. 2004. SICI: an information system on historical landslides and floods in Italy. Nat. Hazards Earth Syst. Sci. 4: 213-232.

Hervás, J. (ed.) 2007. Guidelines for Mapping Areas at Risk of Landslides in Europe. Proc. Experts Meeting, 23-24 October 2007, JRC, Ispra, Italy. Luxembourg: JRC Report EUR 23093 EN, Office for Official Publications of the European Communities.

Hong, Y., Adler, R., Huffman, G. 2007. Use of Satellite Remote Sensing Data in the Mapping of Global Landslide Susceptibility. Nat. Hazards 43: 245-256.

Irish Landslides Working Group 2006. Landslides in Ireland. Dublin, Ireland: Geological Survey of Ireland.

Kirschbaum, D. B. Adler, R., Hong, Y., Lerner-Lam, A. 2009. Evaluation of a preliminary satellite-based landslide hazard algorithm using global landslide inventories. Nat. Hazards Earth Syst. Sci. 9: 673-686.

Kleinbaum, D.G. & Klein, M. 2002. Logistic regression, a self-learning text, version 2. New York, Berlin, Heidelberg: Springer.

Laguardia, G. & Niemeyer, S. 2008. On the comparison between the LISFLOOD modelled and the ERS/SCAT derived soil moisture estimates. Hydrol. Earth Syst. Sci. Discuss. 5: 1227-1265.

Malet, J. P., Thiery, Y, Hervás, J., Günther, A., Puissant A, Grandjean, G. 2009. Landslide susceptibility mapping at 1:1M scale over France: exploratory results with a heuristic model. In Proc. Int. Conference on Landslide Processes: from Geomorphologic Mapping to Dynamic Modelling. A tribute to Prof. Dr. Theo van Asch, 6 -7 February 2009. Strasbourg, France: 315-320.

Nadim, F., Kjekstad, O., Peduzzi, P., Herold, C., Jaedicke, C. 2006. Global landslide and avalanche hotspots. Landslides 3: 159-173.

Page 7: Calibration of logistic regression coefficients from ...€¦ · as logistic regression, discriminant analysis, and weights of evidence are more or less in agreement. As logistic

-521-

Swets, J. A. 1988. Measuring the accuracy of diagnostic systems. Science 240: 1285-1293.

Van Den Eeckhaut, M., Vanwalleghem, T., Poesen, J., Govers, G., Verstraeten, G., Vandekerckhove, L. 2006. Prediction of landslide susceptibility using rare events logistic regression: a case-study in the Flemish Ardennes, Belgium, Geomorphology 76: 392-410.

Van Den Eeckhaut, M., Marre, A., Poesen, J. 2010. Comparison of two landslide susceptibility assessments in the Champagne-Ardenne region (France). Geomorphology 115: 141-155.

Vogt, J.V., Soille, P, de Jager, A.L. Rimaviciute, E., Mehl., W., Foisneau, S., Bódis, K., Dusart, J., Paracchini, M. L., Haastrup, P., Bamps, C. 2007. A pan-European River and Catchment Database. Luxembourg: JRC Report EUR 22920 EN, Office for Official Publications of the European Communities.