kano gis day 2014 - the application of multivariate geostatistical analyses in environmental data

22
Adamu Mustapha PhD

Upload: ehealth-africa

Post on 15-Jul-2015

78 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

Adamu Mustapha PhD

Page 2: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

There are number of multivariate Geostatistical analyses used

in environmental studies to identify the spatial and temporal

variation of the datasets.

1. Hierarchical Agglomerative Cluster Analysis (HACA)

2. Principal Component Analysis (PCA)

3. Multiple Linear Regression (MLR)

4. Pearson’s Product Moment Correlation Analysis

5. Discriminant Analysis

Page 3: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

1. Hierarchical Agglomerative Cluster Analysis (HACA)

HACA is a multivariate Geostatistical technique whose primary

purpose is to assemble similar objects based on characteristic

they possess (Shrestha and Kazama, 2007)

The level of similarities at which observation are merged are

used to construct a dendrogram of clusters (Singh et al., 2004;

Chen et al., 2007; Juahir et al. 2011, Mustapha et al. 2012).

The resultant clusters exhibit high internal (within clusters)

homogeneity and high external (between groups) heterogeneity.

Page 4: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

Jakara Dam

0 3 Km

Domestic sources

Industrial sources

Agricultural sources

Sources apportionment of Jakara Basin (Upstream)

Nigeria

S1 S2S3

S4S5

S6

S7S8

S9S10

S11 S12

S13 S14

S15

S16

S17

S18

S19

S20

S21

S22

S23

S25S26

S27

S28

S29

S30

S1, S2, . .. Sampling sites

S24

Sampling Points

Page 5: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data
Page 6: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

2. Principal component analysis and or factor analysis (PCA)

PCA is a multivariate Geostatistical statistical technique that

examine the underlying pattern or relationship of a large number

of variables. It is use to get information about inter-relationships

among a set of variables

PCA group the variables into smaller and more meaningful set

of factors

Page 7: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

How do we determine the number of factors to be retained?

We use the Kaiser’s-one- Criterion also known as the eigen-value

rule of >1

We equally use the Catell’s scree plot

It produce plot of the eigenvalues, looking at the plot where it

becomes horizontal, then Cartell;s recommends retaining all the

factors above this points.

These factors with eigenvalues 1 and >1 contribute the most

variance in the data sets.

Page 8: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data
Page 9: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

The Important parameters in the factor have factor high factor

loading. Liu et al. 2003 suggest the following loading on

parameters

0 – 0.4 Low loading

0.5 -0.7 Moderate loading

> 0.7 High loading

Page 10: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

Parameters Unit PC1 PC2 PC3 PC4 PC5

Pb mg L-1 0.960 -0.111 0.105 0.060 -0.098

Cd mg L-1 0.953 -0.101 0.145 -0.049 -0.040

Cr mg L-1 0.940 -0.037 0.148 -0.049 -0.060

Hg mg L-1 0.854 -0.070 0.020 -0.086 -0.299

Fe mg L-1 0.706 0.172 0.327 -0.353 -0.078

EC µS/cm -0.659 -0.139 -0.181 0.094 -0.144

Ni mg L-1 0.620 -0.014 -0.331 0.602 -0.234

BOD5 mg L-1 0.537 0.594 -0.114 -0.399 0.399

DS mg L-1 -0.156 0.835 0.557 0.074 -0.167

TS mg L-1 0.042 0.670 0.514 0.133 -0.036

pH -0.418 0.260 0.633 0.217 -0.073

DO mg L-1 0.166 -0.583 -0.617 -0.017 0.240

COD mg L-1 0.332 0.527 0.565 0.187 0.090

Turbidity NTU 0.342 -0.009 0.126 0.788 0.191

Hardness mg L-1 0.178 -0.257 0.367 0.162 0.809

Eigen value 5.53 2.21 1.96 1.42 1.13

% Variance 36.91 14.79 13.06 9.49 7.57

Page 11: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

3. Multiple Linear Regression (MLR)

MLR is used to fit a model to our data and use it to predict the

value of the Y (DVs) from one or more IV’s.

Predicting out come from one or several predictors.

Mathematical techniques LSM is used to establish the line that

best describes the data.

Friday, November 28, 2014 11

Page 12: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

Y=bo +b1x1+b2x2+b3x3+……bpxp

Regression analysis is to derived a prediction equation

Where:

Y = dependent variable

Xs = independent variables

bo = Y-intercept

b1 = regression coefficient

Page 13: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

Before interpreting the result of MLR, there is need to check

for assumptions of regression analysis. i.e. Normality, linearity

and multicolinearity (Berry, 1993).

Friday, November 28, 2014 13

Page 14: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

The normal p-p plot of regression standardized residuals revealed all

observed Values fall roughly along the straight line. This indicates

residuals are from normally Distributed population

Friday, November 28, 2014 14

Page 15: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

Assumption sof linear regression model

Colinearity/Multicolinearity

Problem with correlation between Ivs that occurs when

Ivs are highly correlated which make it difficult to

determine the contribution of Ivs.

Tolerance value

Variance Inflation Factor (VIF)

Condition index

Page 16: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

a. Tolerance

This is the amount of variability not explain by other Ivs,

small tolerance value indicates high Multicolinearity smaller

than 0.10

b. Variance Inflation factor (VIF)

This is the inverse of the tolerance. The cutoff threshold of

VIF must be >1.0

Page 17: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

Condition Index (CI) is a measure of the relative amount of

variance associated with an eigen value. A large CI indicates a

high degree of collinearity

A value of CI greater than 15 indicates a possible problem and an

index greater than 30 suggests a serious problem with collinearity

(Kutner et al. 2004).

c. Condition index statistics

Page 18: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

R = 0.986

R2 = 0.971

Model RR

Square

Adjusted

R Square

SE of the

Estimate

R Square

Change

Change Statistics

F

Changedf1 df2

Sig. F

Change

Durbin-

Watson

1 0.986 0.971 0.840 2.331 0.971 7.382 15 5 0.018 2.651

Friday, November 28, 2014 18

Page 19: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

Model

Unstandadized

BETA Std. Error

Standardized

Coefficients

BETA t Sig. Tolerance VIF

1 (Constant) 102.748 39.602 2.594 0.018

Iron mg/l 0.438 0.127 0.778 3.449 0.000 0.250 15.897

Mercury mg/l 2.442 1.906 3.500 1.281 0.000 0.333 1304.69

Chromium mg/l -0.852 0.672 -3.188 -1.267 0.000 0.290 1105.85

Cadmium mg/l -5.695 2.019 -11.900 -2.821 0.000 0.540 3110.806

Lead mg/l 3.719 1.317 12.358 2.823 0.001 0.889 3350.478

Estimates of coefficient for the model

Friday, November 28, 2014 19

From the table the largest beta coefficient is 3.719 (lead), the

variable make a unique contribution in explaining DV.

Page 20: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

2 2 2 2

( )( )

( ) ( )

n xy x yr

n x n y y

4. Pearson’s Product Moment Correlation Analysis

Identify the significant relationship between bivariate

r value Interpretation

0.0 to 0.29 Negligible or little correlation

0.3 to 0.49 Low correlation

0.5 to 0.69 Moderate or marked correlation

0.7 to 0.89 High correlation

0.9 to 1.00 Very high correlation

Table 2 Guildford rule of thumb for interpreting correlation analysis (r)

Page 21: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

Headache Fever Backpain JointPain StickInjuries Scabies Rashes Catarh Cough Breathprob Diarrhoea EyeProblem StomachPain

Headache 1 0.505 0.547 .788** 0.575 .679* 0.308 0.191 -0.419 0.268 0.043 0.184 -0.049

Fever 1 0.624 0.525 0.571 .786** 0.183 0.498 0.085 .686* 0.204 .762* 0.619

Backpain 1 0.537 .862** .849** .701* 0.543 0.156 .823** 0.056 0.344 0.352

JointPain 1 0.551 .672* 0.246 0.181 -0.246 0.422 -0.443 0.344 -0.171

StickInjuries 1 .827** 0.452 0.369 0.301 .778** -0.083 0.207 0.064

Scabies 1 0.58 0.389 -0.087 .816** 0.023 0.57 0.367

Rashes 1 0.23 -0.058 0.528 -0.047 0.134 0.324

Catarh 1 0.308 0.441 0.18 .656* 0.437

Cough 1 0.314 -0.221 -0.012 0.054

Breathprob 1 -0.141 0.534 0.346

Diarrhoea 1 0.058 0.624

EyeProblem 1 0.602

StomachPain 1

Page 22: Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

Thank you