modeling spatial relationships with arcgis created by: chen shi...

1
Created By: Chen Shi Purpose of the Project The purpose of the project is to use three Modeling Spatial Relationships Tools (Exploratory Regression Tool, Ordinary Least Square Tool and Geographically Weighted Regression Tool) for predicting recreation spending per household on in one particular category and identify areas where predicted values are higher than actual ones. Table 1: Data dictionary containing marked target variable and final predictors. Using Exploratory Regression Tool Table 2: Comparing Models from Explatoprary Regression Tool with best model marked Table 3: Summary of Variable Significance Table 3 Comments: Predictors from the selected model are: HealthCare, Child in Family, Money spends in Art per household and Money spends in communication per household. HealthCare, Child in Family can be considered as strong predictors and the relationship are very stable. (Most of them are primarily positive). Art and Communication, on the other hand, both have positive (%significance) and negative values. Table 4: Summary of Variable Multicollinearity Table 4 Comments: Based on the Summary of Multicollinearity table, none of the predictors listed on the left have multicollinearity issue. Table 5: Summary of Residual Normality Table 5 Comments: This table shows top three models with the highest p-values for the Jarque-Bera test for normality of residuals. All the three values are higher than 0.1. So all these models have normally distributed residuals. The higher the p value, the better a given model is. Table 2 Comments: The most important is model fit, then collinearity, model simplicity, and normality of residuals. The selected model with four predictors is close to the highest best fit. Also, the selected model 4 has the second best JB value. This model is simpler than models 5, 6, 7 and 8 which have the similar fit. Hence, we can draw the conclusion that model 4 is the best model among the 8. Using Ordinary Least Squares Tool Table 6: Model Variable Comments: t-Statistics provides the clue regarding the relative importance of each variable in the model. The stronger predictor, the larger absolute t values. The weaker predictor, the smaller absolute t values. Therefore, predictor HealthCare has the largest absolute value, which is the most useful predictor. Art is the second useful predictor. Child in Family has the lowest value, which means the least useful predictor for this model. Table 7: OLS Diagnostic Table 8: Scatterplots from PDF created with OLS tool Using Geographically Weighted Regression Table 7 Comments: Both the Joint F- statistics and Join Wald Statistic are measures of overall model statistical significance. The Joint F-statistic is trustworthy only when the Koenker (BP) statistic is not statistically significant. However BP is 34.663, so Joint F-statistic should be used. Moreover, an asterisk next to probability indicates significant model at the 5% of confidence, which is desired. Modeling Spatial Relationships with ArcGIS Table 10 Comments:Comments: z-score is 8.889947, therefore, conclusion can be drawn that there is less than 1% likelihood that this clustered pattern could be the result of random chance. The model under and over predictions are clustered spatially. Table 9: Histogram of Standardized Residual Table 10: Spatial Autocorrelation of residuals Regression equation with variable full names The regression equation: -369.473 + (2291.99 * CHILDFAM) + (0.902 * COMMUNICATION) + (0.764 * HEALTHCARE) + (9.085 * ART) Table 12: Six GW scenarios with optimal scenario marked Fig1: Bar chart with area in km2 for each category Table 13: Frequency Table & Cross Tab Tool Results Table 11 Comments:Comments: z-score is -0.57, therefore, conclusion can be drawn that the pattern does not appear to be significantly different than random. Model under and over predictions are distributed randomly. Expendictures on Recreation ¯ 0 30 60 90 Miles Target Variable: Recreation 1774.670 - 2747.910 2747.911 - 3333.490 3333.491 - 4030.380 4030.381 - 4906.390 4906.391 - 6545.600 ¯ 0 30 60 90 Miles Using OLS Tools for Predicting Recreation Spending Target Variable: Recreation OLS Medium high potential Medium low potential Moderate potential Very high potential Very low potential ¯ 0 30 60 90 Miles ¯ 0 30 60 90 Miles ¯ 0 30 60 90 Miles ¯ 0 30 60 90 Miles Residuals: Standardized Values StdResid Medium high potential Medium low potential Medium potential Very high potential Very low potential How well does the model locally fit? LocalR2 0.231028 - 0.575324 0.575325 - 0.767869 0.767870 - 0.851444 0.851445 - 0.914738 0.914739 - 0.980726 Family With Child Rate: Slope Coefficient Importance of ChildFam -5135.899 - -2576.502 -2576.501 - 187.375 187.376 - 2089.219 2089.220 - 4425.692 4425.693 - 8156.884 Collinearity: Reclassified Values Condition Index Potential Collinearity Problem Serious Collinearity Problem Table 11: Spatial Autocorrelation graph Map (a) Map (b) Map (c ) Map (d ) Map (e ) Map (f ) Study Area Description: The study area of this project was chosen to include southwest part of Nova Scotia, 371 polygons in total. Map (c): Local R2 values below 0.5 represent poor models and above 0.5 are acceptable. GWR predicts well in most areas. South and west part of the study area show low R2 values which means the model selected is less representative. Map (e): the majortiry of the area have serious or potential collinearity problems which shows results are unreliable and unsteable. Map (f): Indicator ChildFam is the most useful predictor in the model.For every year per household spend on ChildFam, the expenditure on recreation per household will change accordingly with the largest changes in South and North part and least change in west part of study area. References: 2013 Standard Data Set.pdf Projections: WGS1984 This product is intended for students training only.

Upload: others

Post on 09-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling Spatial Relationships with ArcGIS Created By: Chen Shi …chenshi-portfolio.weebly.com/uploads/4/7/4/0/47409953/gis_and_spatial.pdf · Modeling Spatial Relationships with

Created By:Chen Shi

Purpose of the ProjectThe purpose of the project is touse three Modeling SpatialRelationships Tools(Exploratory Regression Tool,Ordinary Least Square Tool andGeographically WeightedRegression Tool) for predictingrecreation spending perhousehold on in one particularcategory and identify areaswhere predicted values arehigher than actual ones. Table 1: Data dictionary containing marked target variable and final predictors.

Using Exploratory Regression Tool

Table 2: Comparing Models from Explatoprary Regression Tool with best model marked

Table 3: Summary of Variable Significance

Table 3 Comments: Predictors from the selected model are:HealthCare, Child in Family, Money spends in Art per household andMoney spends in communication per household. HealthCare, Child inFamily can be considered as strong predictors and the relationshipare very stable. (Most of them are primarily positive). Art andCommunication, on the other hand, both have positive(%significance) and negative values. Table 4: Summary of Variable Multicollinearity

Table 4 Comments: Based on the Summary of Multicollinearity table,none of the predictors listed on the left have multicollinearity issue.

Table 5: Summary of Residual NormalityTable 5 Comments: This table shows top three models with the highest p-values for the Jarque-Bera testfor normality of residuals. All the three values are higher than 0.1. So all these models have normallydistributed residuals. The higher the p value, the better a given model is.

Table 2 Comments: The mostimportant is model fit, then collinearity,model simplicity, and normality ofresiduals. The selected model with fourpredictors is close to the highest bestfit. Also, the selected model 4 has thesecond best JB value. This model issimpler than models 5, 6, 7 and 8which have the similar fit. Hence, wecan draw the conclusion that model 4is the best model among the 8.

Using Ordinary Least Squares Tool

Table 6: Model Variable

Comments: t-Statistics provides the clue regardingthe relative importance of each variable in the model.The stronger predictor, the larger absolute t values.The weaker predictor, the smaller absolute t values.Therefore, predictor HealthCare has the largestabsolute value, which is the most useful predictor. Artis the second useful predictor. Child in Family has thelowest value, which means the least useful predictorfor this model.

Table 7: OLS Diagnostic

Table 8: Scatterplots from PDF created with OLS tool

Using Geographically Weighted Regression

Table 7 Comments: Both the Joint F-statistics and Join Wald Statistic aremeasures of overall model statisticalsignificance. The Joint F-statistic istrustworthy only when the Koenker (BP)statistic is not statistically significant.However BP is 34.663, so Joint F-statisticshould be used. Moreover, an asterisk nextto probability indicates significant model atthe 5% of confidence, which is desired.

Modeling Spatial Relationships with ArcGIS

Table 10 Comments:Comments:z-score is 8.889947, therefore,conclusion can be drawn thatthere is less than 1% likelihoodthat this clustered pattern couldbe the result of random chance.The model under and overpredictions are clusteredspatially.

Table 9: Histogram of Standardized Residual

Table 10: Spatial Autocorrelation of residuals

Regression equation with variable full namesThe regression equation: -369.473 + (2291.99 * CHILDFAM) + (0.902 *COMMUNICATION) + (0.764 * HEALTHCARE) + (9.085 * ART)

Table 12: Six GW scenarios with optimal scenario marked

Fig1: Bar chart with area in km2 for each category

Table 13: Frequency Table & Cross Tab Tool Results

Table 11 Comments:Comments: z-score is -0.57, therefore, conclusioncan be drawn that the pattern does not appear to be significantly differentthan random. Model under and over predictions are distributed randomly.

Expendictures on Recreation

¯0 30 60 90

Miles

Target Variable: Recreation1774.670 - 2747.9102747.911 - 3333.4903333.491 - 4030.3804030.381 - 4906.3904906.391 - 6545.600

¯0 30 60 90

Miles

Using OLS Tools for PredictingRecreation Spending Target Variable: RecreationOLS

Medium high potentialMedium low potentialModerate potentialVery high potentialVery low potential

¯0 30 60 90

Miles

¯0 30 60 90

Miles

¯0 30 60 90

Miles

¯0 30 60 90

Miles

Residuals: Standardized ValuesStdResid

Medium high potentialMedium low potentialMedium potentialVery high potentialVery low potential

How well does the modellocally fit?LocalR2

0.231028 - 0.5753240.575325 - 0.7678690.767870 - 0.8514440.851445 - 0.9147380.914739 - 0.980726

Family With Child Rate: Slope CoefficientImportance of ChildFam

-5135.899 - -2576.502-2576.501 - 187.375187.376 - 2089.2192089.220 - 4425.6924425.693 - 8156.884

Collinearity: Reclassified Values

Condition IndexPotential Collinearity ProblemSerious Collinearity Problem

Table 11: Spatial Autocorrelation graph

Map (a) Map (b)

Map (c ) Map (d )

Map (e ) Map (f )

Study Area Description: The study area of thisproject was chosen to include southwest part ofNova Scotia, 371 polygons in total.

Map (c): Local R2 values below 0.5 represent poor models andabove 0.5 are acceptable. GWR predicts well in most areas. Southand west part of the study area show low R2 values which meansthe model selected is less representative. Map (e): the majortiry ofthe area have serious or potential collinearity problems which showsresults are unreliable and unsteable. Map (f): Indicator ChildFam isthe most useful predictor in the model.For every year per householdspend on ChildFam, the expenditure on recreation per householdwill change accordingly with the largest changes in South and Northpart and least change in west part of study area.

References: 2013 Standard Data Set.pdf Projections: WGS1984 This product is intended for students training only.