acp langat river_2013
TRANSCRIPT
-
7/26/2019 ACP Langat River_2013
1/16
The Langat River Water Quality Index Based on Principal
Component Analysis
Zalina Mohd Alia,d
, Noor Akma Ibrahima, Kerrie Mengersen
b, Mahendran Shitan
a
and Hafizan Juahirc
aInstitute for Mathematical Research, Universiti Putra Malaysia,
43400 UPM Serdang,Selangor DE, Malaysia
bSchool of Mathematical Sciences,Queensland University of Technology,Brisbane, Australia
cFaculty of Environmental Studies, Universiti Putra Malaysia,43400 UPM Serdang,Selangor DE, Malaysia
dSchool of Mathematical Sciences, Faculty of Science and Technology
Universiti Kebangsaan Malaysia, 43600 UKM Bangi,
Selangor DE, Malaysia
Abstract. River Water Quality Index (WQI) is calculated using an aggregation function of the six water quality sub-
indices variables, together with their relative importance or weights respectively. The formula is used by the Departmentof Environment to indicate a general status of the rivers in Malaysia. The six elected water quality variables used in theformula are, namely: suspended solids (SS), biochemical oxygen demand (BOD), ammoniacal nitrogen (AN), chemicaloxygen demand (COD), dissolved oxygen (DO) and pH. The sub-indices calculations, determined by quality rating curve
and their weights, were based on expert opinions. However, the use of sub-indices and the relative importance establishedin the formula is very subjective in nature and does not consider the inter-relationships among the variables. Therelationships of the variables are important due to the nature of multi-dimensionality and complex characteristics found inriver water. Therefore, a well-known multivariate technique, i.e. Principal Component Analysis (PCA) was proposed to
re-calculate the waterquality index specifically in Langat River based on the inter-relationship approach. The applicationof this approach is not well-studied in river water quality index development studies in Malaysia. Hence, the approach inthe study is relevant and important since the first river water quality development took place in 1981. The PCA results
showed that the weights obtained indicate the difference in ranking of the relative importance for particular variablescompared to the classical approaches used in WQI-DOE. Based on the new weights, the Langat River water quality index
was calculated and the comparison between both indexes was also discussed in this paper.Keywords: Water Quality Index (WQI), Principal Component Analysis (PCA), Langat RiverPACS: 92.40.qc, 92.40.Cy, 92.40.qh
INTRODUCTION
Water quality is generally described according to biological, chemical and physical properties [1]. Based on
these properties, the quality of water can be expressed via a numerical index (i.e. Water Quality Index (WQI)) by
combining measurements of selected water quality variables. The selected water quality variables were identifiedwith respective weights and the determining processes were based on personal evaluation, namely, opinion-
gathering techniques [2]. The weighting assigned to the selected variables was based on the relative importance
given by the experts [3]. The weights determination technique also retained by other researchers includes the
Malaysian Department of Environment (DOE) [4]. The selected variables, together with respective weights, are
applied to calculate water quality index in all rivers in Malaysia. Due to varying characteristics for each river, theweights for water quality variables may be different for separate rivers. Therefore, it is clear that the existing
weights of the selected variables, as per DOE are subjective in nature and no detailed studies have been done to
determine the weights objectively. The objective weights can also be obtained by using multivariate statisticaltechniques, i.e. principal component analysis (PCA).
Principal component analysis (PCA) is a statistical method used to determine a new set of artificial variables,
namely principal components (PCs).These are linear combinations of the original variable. This method is alsoknown as a variable reduction procedure. It obtains a small number of components that subsequently clarify the
largest variability of the original data set. In river water quality data analysis, PCA has been utilized to characterize
Proceedings of the 20th National Symposium on Mathematical Sciences
AIP Conf. Proc. 1522, 1322-1336 (2013); doi: 10.1063/1.4801283 2013 AIP Publishing LLC 978-0-7354-1150-0/$30.00
1322
-
7/26/2019 ACP Langat River_2013
2/16
and evaluate freshwater quality for different seasons [5], to divide selected water quality variables into groups [6]and to identify factors based on water quality variablescompositional patterns that influence particular regions [7].
PCA has also been used for water quality index development. The water quality index calculation is based on
modified un-rotated principal components and provides valuable insights into the relationship between water qualityand biological community composition [8]. In addition, the water quality index can also be calculated as the
weighted sum of all principal component scores [9] or the sum of selected principal component scores [10]. The
PCA approach was also found to be useful in developing a coastal marine eutrophication index based on the firstprincipal component criteria accounting for more than 50% of the whole variation [11]. In their study, no data
transformations were considered and data was randomly selected from three different coastal areas.
In river water quality, data with no gaps were used in all of the index developments based on PCA analyses. Inaddition, selected imputation techniques have been done to estimate missing values [10]. However, there is a large
number of missing data at all sampling sites in this study (refer Figure 1). As a result, the traditional PCA approach(i.e. either no normality assumptions or no outliers detection) with complete case analysis for data dimension
reduction may result in misleading conclusions in determining the index. Due to this problem, we have not
considered unobserved data in this study. Since missing data was not imputed in this study, we have limited the
analysis to dealing with water quality data using a parametric approach (i.e. assuming that the distribution of waterquality measurement is approximately multivariate normal with no extreme outliers). We firmly believe that the
PCA approach works better under the multivariate normality assumptions and that the assumptions made will be
useful in future inferences. We may consider the PCA with missing data values in our next analysis and propose that
the extreme values (outliers) should be considered in a different analysis or can be maintained in the same analysis
using robust PCA. Instead of using the principal component scores, researchers also used information from rotatedprincipal components to develop an index [12]. In addition, the researchers also summarized information from
rotated principal components in order to calculate positive relative weights that were used in the index calculation
[13]. However, negative relative weights are also important, especially in river water quality, as it shows the naturaleffect between the variables. Therefore, in this study, we examine the possibility of developing a PCA index, taking
into account six physicochemical variables, and suggest new scales for assessing water quality on the proposed
index. The proposed index is based on new statistical weights defined by the variablesstatistical importance from
the combination of more than one PC with consideration given to negative PC loadings (i.e. correlations between thePCs and standardized data). The proposed methodology is applied to the Langat River in Selangor, Malaysia. The
results obtained are compared with the existing PCs method and the relationship between the new index and DOE-
WQI is also discussed.
FIGURE 1. Plots of dissolved oxygen (DO) in the lower stream at Langat River from September 1995December 2007
METHODS
Data and Monitoring Sites
In this analysis, data for Langat River was collected based on the availability of recorded data from 1995-2007.
Five main monitoring stations were selected, as illustrated in Table 1, and the location of the selected stations is
1323
-
7/26/2019 ACP Langat River_2013
3/16
shown in Figure1. Data from 2008-2009 was used as an independent dataset for the validation of the proposed PCAindex.
TABLE(1). DOE sampling station at the study area
FIGURE2. Locations of the selected sampling stations
Based on the collected data, the water quality status in Langat River can be evaluated and this can be obtained by
using Water Quality Index (WQI) calculation. The WQI is formed as a weighted sum of six selected water quality
variables, namely: Suspended Solids (SS), Biochemical Oxygen Demand (BOD), Ammonia Nitrogen (AN),
Chemical Oxygen Demand (COD), Dissolved Oxygen (DO) and pH4. These variables were selected by a panel ofexperts as being the variables that will give some indication on the water quality level or water quality index of a
river. The relative importance or weights determined by the experts are shown in Table 3. The weights for eachvariable indicate the relative importance of the variables in determining water quality index in Malaysia and detailsare discussed in Mustapha (1981) [14].
TABLE(3). The relative importance determined by the experts
Water Quality Variables Weights
Dissolved Oxygen (%) 0.22Biochemical Oxygen Demand 0.19Chemical Oxygen Demand 0.16
Ammonia Nitrogen 0.15Suspended Solids 0.16Potential of Hydrogen 0.12
Data Treatment
PCA was applied on physicochemical variables defined by the DOE in Table 3. In this study, the PCA approach
requires variables to conform to a normal distribution. Some variables showed values either too low or too high,
with the skewness and kurtosis being high for the original data. The results in Table 4 show that the DO (%) and PH
were tested as normal and the other water quality variables were identified as being non-normally distributed. All
the non-normal variables were log10transformed. This method is very common in environmental data [15]. After theselected water quality data was transformed, univariate normality for each variable was checked and the results are
as shown in Table 5. This is a normal practice among researchers as part of data screening. Detection on the
DOE Station
Number
Station
Number
DOE Station
Code
Distance From
Estuary (Km)
Grid Reference Location
2814602 1 IL01 4.19 2
52.027 101
26.241 Air Tawar Village2815603 2 IL02 33.49 248.952 10130.780 TelokDatuk, near Banting
town
2817641 3 IL03 63.43 251.311 10140.882 Bridge at Dengkil Village
2917642 4 IL05 86.94 259.533 10147.219 Kajang Bridge
3118647 5 IL07 113.99 309.953 10150.926 Bridge at Batu 18
1324
-
7/26/2019 ACP Langat River_2013
4/16
presence of outliers in water quality data is also important due to the effects on the normality distribution of the data.Detailed discussion on the evaluation of water quality data for outliers can be found in Robinson et al. [16]. Initially,
the univariate approach in water quality considers whether the variables are independent and not correlated. The
approach may not be very helpful in checking each of them separately due to the nature of the relationship existingamong water quality data statistics. To the best of our knowledge, most of the researchers in the previous studies did
not discuss in detail the existence of outliers in water quality data in their analyses. Apart from that, the outliers were
maintained in the multivariate analysis as stated in Gazzaz et al. [17]. Since the outliers detection is also importantin the analysis, the commonly-used practice may not be very beneficial in confirming multivariate normality. If the
multivariate normal variables are proven, then the variables are considered to be univariate normal as well.
However, the univariate normal variables are not necessarily multivariate normal [18]. Therefore, multivariateoutlier detections must be screened and this can be achieved by using Mahalanobis distance or their associated p-
value threshold of 0.001. The data presents as multivariate normal if the distances plot forms a straight-line patternor the associated p-value > 0.001. If the associated p-value is smaller than the threshold, then we can say that the
influential outlier was detected. We should stress that the influential outliers detected from the Mahalanobis distance
were not considered in our study. This is due to the sensitivity of the presence of outliers in the multivariate
techniques [18].
TABLE (4). Descriptive Statistics (n=466) of the untransformed selected Water Quality Variables in the Langat River
Variables Min Max Mean Standard
Deviation
Median Trimmed Mean
(5%)
Skewness Kurtosis
DO 0.00 116.20 61.48 27.48 64.65 62.22 -0.36 -0.71BOD 0.30 80.80 5.71 7.27 4.00 4.66 4.91 35.86
COD 0.50 3850.00 49.70 185.46 33.00 35.86 18.93 382.73
AN 0.0050 14.30 1.17 1.30 0.78 1.035 3.17 22.56
SS 0.50 5010.00 294.43 490.19 157.00 220.89 5.15 38.51
PH 3.60 8.34 6.66 0.75 6.84 6.72 -1.40 2.55
Principal Component Analysis
PCA was applied on the standardized log transformation physicochemical variables after the multivariate normal
variables are proven. The multivariate normality result on the descriptive analysis is shown in Table 6 and the
visualization results in Figure 3. The use of standardization minimizes the differences scales in measurement units
and variance [19]. All six PCs were derived easily from SAS PRINCOMP procedure and default settings in SASPROC FACTOR procedure. The results from both procedures remained the same and additional information on the
factor loadings based on PCA was obtained from the PROC FACTOR procedure. The PCA scores can be calculated
by using equation 1.
1 1 2 2 6 6i i j i j i jPC e Z e Z e Z (1)
where PC is the principal component score, e is the component loading or weights obtained from the eigenvector,
Z is the standardized transformed data, i is the component number and j is the sample number. In PCA, the first
PC signifies the largest variability of the original data set and is obtained from the linear combination of thevariables with maximal variance. The second PC is the linear combination with the next largest variability and is
non-correlated (orthogonal) to the first component. All PCs are arranged in decreasing order of importance
according to their variability. Usually, the WQI was calculated as the weighted sum of site score from all possible
axes where, theoretically, the possible axes are equivalent to the number of variables used. Instead of using all thePCs to calculate the WQI score as suggested by Chow-Fraser [9], or selected PCs based on the eigenvalues of 1 or
greater as proposed by Kaiser in 1958 [20], we calculated the index based on the idea of positive weights loading in
rotated PCs by Hurdlikova and Fischer [13] with consideration given to the negative weight loading in un-rotatedPCs.
1325
-
7/26/2019 ACP Langat River_2013
5/16
TABLE (5). Descriptive Statistics (n=453) of the transformed selected Water Quality Variables
Variables Min Max Mean Standard
Deviation
Median Trimmed Mean
(5%)
Skewness Kurtosis
DO 0.00 116.20 61.53 27.26 64.70 62.28 -0.37 -0.65BOD -0.30 1.91 0.57 0.37 0.60 0.56 0.29 0.05COD 0.60 2.22 1.51 0.24 1.52 1.52 -0.23 1.44
AN -2.30 1.16 -0.32 0.81 -0.11 -0.27 -1.29 0.90
SS -0.30 3.70 2.06 0.71 2.20 2.10 -0.92 1.07PH 4.05 8.34 6.67 0.71 6.83 6.72 -1.20 1.81
TABLE (6). Descriptive Statistics (n=446) of the transformed selected Water Quality VariablesVariables Min Max Mean Standard
Deviation
Median Trimmed Mean
(5%)
Skewness Kurtosis
DO 0.00 116.20 61.6 27.33 64.95 62.36 -0.37 -0.65BOD -0.30 1.91 0.58 0.37 0.60 0.57 0.31 0.05
COD 0.70 2.22 1.52 0.23 1.52 1.52 -0.07 1.20AN -2.30 1.16 -0.32 0.80 -0.11 -0.26 -1.3 0.95SS -0.30 3.70 2.06 0.70 2.2 2.1 -0.94 1.16PH 4.05 8.34 6.69 0.68 6.84 6.74 -1.12 1.64
FIGURE 3.Probability plots of Mahalanobis Distance before Outliers Deletion (Left) and after Outliers Deletion (Right)
RESULTS
Correlation Analysis
The Pearson correlation matrix for the variables is found in Tables 7-8 and most of the paired water quality
variables show similar results. However, the relationship between SS-BOD, SS-COD, COD-DO, and COD-BODbefore and after outliersdeletion shows a slight difference. A significant positive correlation was found in most of
the paired water quality variables, excluding negative correlation between DO and other pollutants. A significant
positive correlation was also found between the two indicators, i.e. DO and PH. A weak positive correlation between
BOD-PH shows that these two variables maybe redundant.
TABLE (7). The Pearson correlation matrix of the Untransformed Selected Water Quality Variables (N=466)Variables DO BOD COD AN SS
BOD -0.355 (0.000)
COD -0.427 (0.000) 0.532 (0.000)AN -0.504 (0.000) 0.460 (0.000) 0.315 (0.000)SS -0.488 (0.000) 0.411 (0.000) 0.474 (0.000) 0.464 (0.000)PH 0.508 (0.000) 0.007 (0.875) -0.104 (0.024) -0.283 (0.000) -0.272 (0.000)
* correspondingp-values in brackets.
1326
-
7/26/2019 ACP Langat River_2013
6/16
TABLE (8). The Pearson correlation matrix of the Transformed Selected Water Quality Variables (N=446)
Variables DO BOD COD AN SS
BOD -0.366 (0.000)
COD -0.501 (0.000) 0.629 (0.000)AN -0.502 (0.000) 0.480 (0.000) 0.388 (0.000)SS -0.484 (0.000) 0.385 (0.000) 0.513 (0.000) 0.476 (0.000)PH 0.530 (0.000) 0.010 (0.836) -0.182 (0.000) -0.289 (0.000) -0.285 (0.000)
*correspondingp-values in brackets.
River Water Quality Index Development
The eigenvalues and contribution of the principal components are listed in Table 9. Eigenvalues of the first,
second and third principal components were 3.05, 1.15 and 0.60 respectively. The respective contributions (inpercentage) for each principal component were 51%, 70% and 80%. Only the results for the first and second
principal components were discussed (i.e. the eigenvalues were greater than 1). The eigenvectors (loadings) for each
axis are shown in Table 10. The loadings indicate the relative importance of each variable within the individual
axes. The importance can be determined based on the absolute magnitude of the eigenvector loadings. No specific
rules were used for picking out the loadings and large loadings were chosen based on the values which are greaterthan 0.40 in this study. The eigenvectors were then used to determine the latent variables (i.e. PC scores that signify
water quality scores [10]. When normalized, the scores will give the values of the water quality indices undernormality distribution. The calculated indices are in the area under the curve and may be regarded as a degree of
pollution value [12]. The values are then multiplied by 100 to give a range between 0-100, with zero representinggood water quality. Conversely, other researchers used the sum of weighted PC scores to obtain water quality
indices with zero representing low water quality [9]. The number of variables entered into the PCA will give a
possible similar number of PCs. In our case, since six variables were entered, six possible axes or PCs were fitted.From Table 10, the first PC appears to have large positive loadings on the COD, AN, SS and large negative loadings
on the DO. This means that all four variables determine more of the variance explained by the first PC. The second
PC has a large loading on the BOD and pH and this variable determines more of the variance explained by the
second PC. The results also appear consistent with the correlation analysis.
TABLE (9).Summary of eigenvalues produced by PCA using Standardized Values of Six Water-Quality Variables
PC Axis Eigenvalue Proportion Of Variation Explained Cumulative Proportion Of Variation
Explained
1 3.05 0.51 0.512 1.15 0.19 0.70
3 0.60 0.10 0.804 0.55 0.09 0.895 0.35 0.06 0.95
6 0.29 0.05 1.00
TABLE (10).Eigenvectors produced by PCA using Standardized Values of Six Water-Quality Variables
Variables PC1 PC2 PC3 PC4 PC5 PC6
DO -0.45 0.31 0.133 0.24 0.77 0.16
BOD 0.40 0.53 0.01 -0.36 0.26 -0.61COD 0.44 0.29 -0.54 -0.084 0.13 0.64AN 0.43 0.005 0.82 -0.08 0.06 0.37SS 0.43 -0.003 -0.04 0.87 0.03 -0.22
PH -0.27 0.73 0.14 0.193 -0.56 0.11Results on the un-rotated PCA/FA of the correlations coefficients (i.e. loadings between the two principal
components and the standardized water quality variables) are shown in Table 11. The loadings are classified as
strong or high to absolute loading values of >0.75, moderate to the values of 0.75-0.50 and weak or low to the
values of 0.50-0.30 [21]. The high loadings between the first PC and a variable indicate that the variable is related to
the maximum amount of variation in the dataset. A strong association between the second PC and a variable
indicates that the variable is responsible for the next largest variation in the data perpendicular to the first PC. Thesum of the squared loading for each principal component is the percentage of variance in that variable explained by
the principal component. The normalized squared component was then used to group the selected highest loading to
the lowest loading of the variables into temporary loadings [22]. First temporary loadings include DO (with a weight
1327
-
7/26/2019 ACP Langat River_2013
7/16
0.21), BOD (0.16), COD (0.20), AN (0.18), SS (0.18) and the second temporary loading is formed by pH (0.79).Subsequently, the actual weights were obtained by assigning temporary loadings for each variable with a weight to
each of them, i.e. the weight is equal to the proportion of the explained variance: 0.73=3.05/ (3.05+1.15) and 0.27
for the second. Finally, with preserving the negative sign of DO on the PCA loadings into the actual weights, thenew PCA-weights can be determined by normalizing the actual weights. The negative weights (as well as positive
weights) should be maintained in calculating the index as long as the sum of weights is greater than zero.
The negative loading on DO shows the natural effect between DO and other pollutant variables. The negativeloadings on DO were also reported in other studies [7, 6, 23, 15, 17]. These are suspected to have come from
domestic wastewater, wastewater treatment plants, industries and agricultural activities. Hence, it is clear that the
increase in organic matter will decrease the DO [7]. Apart from that, DO was eliminated from WQI calculation dueto the negative weights or loadings [24]. However, we preserved the negative sign of DO in the component loadings
to the new weights due to the natural effect (i.e. the influence of DO in determining the quality of water withpresence of the pollutants). For comparability, the final PCA weights were rescaled to sum up to one (i.e.
normalization of the actual weights). The weights determined were based on the effect of decreasing or increasing
the water quality variable to the quality of water. For instance, the negative sign DO shows the effect of decreasing
DO, and the higher the effect, the DO score (i.e. the negative weights of DO multiplied by the standardized value ofDO) will be higher at the polluted area compared to the clean area (refer Figure 4). Conversely, the new positive
weights of the pollutants (i.e. BOD, COD, AN and SS) show the increasing effect of the pollutants to the river water.
The higher the effect of the pollutants, the higher the pollutant score will be. Positive BOD and COD related to
anthropogenic pollution sources, and were expected to come from point sources pollution such as sewage treatment
plants and industrial effluents. Positive BOD and AN also represent the influence of organic pollutants from pointsources (such as discharge from wastewater treatment plants, domestic wastewater and industrial effluent), while
positive COD and SS were explained as being the erosion from upland areas during rainfall events and soil
cultivation. The presence of SS in water quality also explained discharge from urban development areas involvingland clearing or specifically as surface runoff sources. Detailed information on the pollution sources can be found in
Juahir et al. [7] and Mohd Nasir et al. [15].
On the other hand, the weights determined by DOE-WQI show the relative importance of the variables in
determining the quality of water. It means that the higher the weights the more important the variables should be.However, the natural effect of the water quality variables which are based on the inter-relationship between the
variables was not clear from the DOE-WQI weights. Therefore, we firmly believe the new weights obtained in this
study will give beneficial information on the status of the river from a different perspective.
TABLE (11). Summary of Component Loadings and New PCA Weights
VariablesComponent
loadingsSquared component
loadings
(normalized)
Temporary
loadings
Actual
Weights
PCA
Weights
PC 1 PC 2 PC 1 PC 2
DO -0.79 0.33 0.21 0.09 0.21 -0.15 -0.29BOD 0.69 0.57 0.16 0.28 0.16 0.11 0.22
COD 0.78 0.32 0.20 0.09 0.20 0.14 0.28AN 0.74 0.01 0.18 0.00 0.18 0.13 0.25SS 0.75 0.00 0.18 0.00 0.18 0.13 0.26PH -0.48 0.79 0.08 0.54 0.54 0.15 0.29
Variation Explained, VE 3.05 1.15Proportion of VE 0.73 0.27
The PCA approach used in this study summarizes the relative importance based on the inter-relationship between
the variables. The results from Table 11 show that all variables signified similar influences in determining thequality of water. In detail, the highest relative importance is PH and DO with the lowest being BOD. The negative
statistical weight of DO maintained the natural relationship between DO and other pollutants. The positive statistical
weight of PH also shows the natural effect of water quality variable to the quality of water. A study done by Mamun
et al. [25] suggested that pH should be monitored to assess the suitability of water for other usages. From the expert
opinion approach on the other hand in Table 3, it can be seen that DO (%) is claimed as having the highest relativeimportance, followed by BOD, with the lowest being PH. The weight assigned in DOE-WQI was in accordance with
its relative importance in the overall quality of surface water for general purposes. Hence, we may consider that thestrong negative weights on DO in the new WQI formula are related to the organic pollution suspected as coming
1328
-
7/26/2019 ACP Langat River_2013
8/16
from domestic wastewater, wastewater treatment plants, industries, agricultural activities and forest areas [7].Therefore, the following expression is used to calculate the PCA-WQI.
0.29 0.22 0.28 0.25 0.26 0.29PCA WQI ZDO ZBOD ZCOD ZAN ZSS ZPH (2)
The PCA-WQI values in this study are in the interim between -3 and 3. The PCA-WQI score was transformed to
the percentage of the normal score as a PCA-WQI final score. The final score was more comparable with DOE-WQI, as the index is in the range of 0-100. The relationship between DOE-WQI and PCA-WQI in Figure 4 clearly
indicates that the better the quality of the water, the lower PCA-WQI values.
FIGURE 4.Plot of DOE-WQI versus PCA-WQI values
Validation Analysis
In this study, we proposed new scores of the individual variables that preserve the same pattern as the first two
PCs scores. Both of the PCs scores were calculated based on the weighted principal component scores as suggested
by Chow-Fraser [9], since the results were more encouraging for our data. From Figure 5, the same pattern wasfound between PC1 score and the new scores for the first important variables (i.e. DO, COD and SS). However,
there was a slight difference between the BOD and AN patterns. A similar pattern was also found between PC2score and the new score for PH. A general score of WQI was then calculated as:(i) weighted sum of six PCs, (ii)weighted sum of two PCs, and (iii) sum of the new weight PCA-WQI. The average of the general score (except the
score in 1995 at Station 1) were calculated and plotted as shown in Figure 6. The plots show similar patterns in the
scores (except for data in 1995 at Station 2). The results confirmed the potential of the third method, i.e. the newscore of WQI, which is easier in calculation than the existing method using the weighted sum of six and two PCs.
The new WQI calculated were clustered in particular groups, using the hierarchical agglomerative clustering
analysis (HACA) (i.e. Wards method with Euclidean distance as a measure of similarity). The calculation can be
easily computed in SPSS package and the results were summarized in Table 12 below.
TABLE (12). Summary of the New PCA-WQI based on HACA
Group Min Max Mean SD Status
1 27 74 53.57 13.36 Slightly Polluted2 75.00 99.00 84.68 6.87 Polluted
3 1.00 25.00 9.70 7.10 Clean
1329
-
7/26/2019 ACP Langat River_2013
9/16
FIGURE 5. Plot of the PC Scores and the Individual Variables Scores
1330
-
7/26/2019 ACP Langat River_2013
10/16
We re-classified the PCA-WQI index range and slightly modified the index range for items of slightly pollutedstatus, so it will be comparable with DOE-WQI findings that are based on an expert opinion (EO) approach as
shown in Table 13. The new groups with selected ranges for PCA-WQI (refer Table 14) were used in the validation
part for independent data from 2008-2009.
TABLE (13). Summary of the DOE-WQI based on EOGroup Min Max Mean SD Status
1 81.00 96.00 90.36 4.13 Clean2 60.00 80.00 69.15 5.42 Slightly Polluted
3 16.00 59.00 48.53 9.45 Polluted
TABLE (14). Index range of Water Quality based on PC-WQI and DOE-WQIPC-WQI DOE-WQI
Index Range Status Index Range Status
1-25 (3) Clean 81-100 (1) Clean26-74 (2) Slightly Polluted 60-80 (2) Slightly Polluted75-99 (1) Polluted 0-59.4 (3) Polluted
The descriptive statistics of the transformed variables for 2008-2009 show all values in the range of original
values from data sets and are illustrated in Table 6. The results permitted us to use the standardized transformed datafrom 2008-2009 in the new WQI calculation.
TABLE (15). Descriptive Statistics (n=96) of the transformed selected Water Quality Variables for 2008-2009
Variables Min Max Mean Standard Deviation Skewness Kurtosis
DO 38.50 104.80 72.71 15.80 -0.05 -0.63
BOD 0.00 1.18 0.58 0.28 -0.32 -0.21COD 0.30 1.85 1.35 0.28 -1.08 1.60
AN -2.30 0.46 -0.62 0.87 -1.07 -0.16SS 0.00 3.18 2.15 0.63 -1.27 1.83
PH 5.84 7.71 6.83 0.38 -0.47 -0.07
1331
-
7/26/2019 ACP Langat River_2013
11/16
FIGURE 6. Plot of the general scores of WQI for three different methods at station 1-5
1332
-
7/26/2019 ACP Langat River_2013
12/16
Combining the results from both methods, we summarized the general WQI score in Table 16. We then plottedthe data from Table 16, as shown in Figure 7. The plots confirmed the inverse relationship between both methods.
TABLE (16). Summary of general scores for PCA-WQI and DOE-WQI in Station 1-5
Year
Station 1
N
Station 2
N
Station 3
N
Station 4
N
Station 5
NPC DOE PC DOE PC DOE P
C
DOE PC DOE
1995 56 62 1 53 43 2 57 61 2 74 52 2 2 93 21996 77 46 3 63 48 4 62 60 4 86 35 3 11 92 21997 88 44 2 95 33 2 68 53 4 82 40 4 9 87 21998 56 58 4 74 39 5 80 48 6 80 49 6 9 90 51999 49 62 6 44 61 6 57 64 6 72 53 6 4 93 52000 58 60 5 64 43 6 64 54 5 73 53 6 17 88 62001 57 56 6 69 48 9 64 54 9 68 62 9 6 92 6
2002 44 67 5 55 51 11 66 60 12 65 61 12 2 94 62003 36 68 6 54 58 12 49 69 12 64 65 12 9 92 62004 41 75 6 57 63 12 61 67 12 68 61 12 5 90 62005 26 76 6 53 64 12 53 70 12 73 59 12 5 93 6
2006 28 74 6 46 72 11 56 72 12 63 69 12 9 92 62007 32 77 6 49 72 12 50 75 12 61 69 12 10 91 62008 37 73 6 45 70 12 36 76 12 53 68 12 5 93 62009 38 72 6 51 66 12 45 72 12 54 68 12 6 92 6
Classification Analysis
A cross-tabulation in the status of both methods was performed, as shown in Tables 17 and 18. The results show
a high percentage in the same category, indicating the ability of the new PCA-WQI to be in the same group withDOE-WQI or vice versa. Overall, 81.2% of the original data set is classified in the same group and 91.7% for the
independent data sets.
.TABLE (17). Cross-tabulation analysis of PCA-WQI and DOE-WQI status from 1995-2007
PCA-WQI
DOE-WQI
Clean Slightly Polluted Polluted Total Percentage of
Same Agreement
Clean 73 18 2 93 78.5Slightly Polluted 1 199 53 253 78.7
Polluted 0 10 90 100 90.0
Total 74 227 145 446
Percentage of
Same Agreement98.6 87.7 62.1 81.2
TABLE (18). Cross-tabulation analysis of PCA-WQI and DOE-WQI status from 2008-2009DOE-WQI
Clean Slightly Polluted Polluted Total Percentage of
Same Agreement
Clean 6 0 0 6 100.0
PCA-WQI Slightly Polluted 1 65 0 66 98.5
Polluted 0 7 17 24 70.8
Total 7 72 17 96Percentage ofSame Agreement
85.7 90.3 100.091.7
1333
-
7/26/2019 ACP Langat River_2013
13/16
FIGURE 7. Plots of the general scores of PCA-WQI and DOE-WQI in stations 1-5
1334
-
7/26/2019 ACP Langat River_2013
14/16
CONCLUSIONS
The aim of this study was to develop the water quality index procedure in Langat River based on the establishedmethod, PCA, by using water quality data measured during 1995-2007. The distribution of water quality
measurement is assumed to be approximately multivariate normal with no extreme outliers after a thorough
evaluation was performed. Unobserved data were not considered in this study. All selected water quality data were
then analyzed using PCA and the results show strong positive loading on BOD and COD and the relationshiprepresenting influences from Non-Pollution Sources (NPS), such as agricultural activities and forest areas. Thestrong negative loadings on DO are related to high levels of organic matter consuming large amounts of oxygen. On
the other hand, the positive loadings on PH shows the natural effect on the body of Langat river water and small
variations of pH were found in Langat River between the periods of this study. Conversely, high variations of othervariables were found in the same river.
The loadings were then re-calculated to perform new statistical weights. The new statistical weights which were
based on the modification of the variable loading makes the WQI calculation easier and simpler to handle. Tovalidate the new weights, the PCA-WQI was compared with other existing PCs methods. We found that the new
weights used in the PCA-WQI calculation generated fairly similar scores with the existing method of using the
weighted sum of all PCs or selected PCs. The new PCA-WQI also shows the inverse relationship with the DOE-
WQI. This relationship clearly signifies that the better the quality of the water, the lower are PCA-WQI values. The
results of the water quality status in this study are also consistent with Kambe et al. [10] and Colleti et al. [26]. We
also defined a new index range concerning river water quality status. Based on the new index range, the ability ofthe new PCA-WQI to be in the same group with DOE-WQI or vice versa was found to be good. It was classified
with a high percentage in the same category (i.e. 81.2% of the original data set is classified in the same group
compared to 91.7% for the independent data sets). Thus, the simplicity of the proposed PCA-WQI calculation wasvery sound and the methodology can be applied to other rivers in Malaysia.
ACKNOWLEDGMENTS
We would like to thank the Malaysian Department of Environment for supplying the data on which this workwas based.
REFERENCES
1.
S.E. Cooke, S.M. Ahmed and N.D. Macalphine, Introductory Guide to Surface Water Quality Monitoring in Agriculture,Alberta: Alberta Agriculture, Food and Rural Development (2005).
2.
R. Brown, N. McClelland, R. Deininger, and R. Tozer, Water and Sewage Works, 339-343 (1970) .
3.
D.G. Smith, Water Research10, 1237-1244 (1990).4.
Department of Environment, DOE, Malaysian Environmental Quality Reports, Kuala Lumpur: Ministry of Science,Technology and Environment (1997).
5.
A. Z. Garizi, V. Sheikh and A. Sadoddin,International Journal of Environmental Science and Technology8, 581-592 (2011).
6.
P.T.M. Hanh, S. Sthiannopkao, D. The Ba. and K-W. Kim,J Environ Eng-ASCE137, 273-283 (2011).7.
H. Juahir, M. S. Zain, M. Yusoff, T. Tengku Hanidza, A. Mohd Armi and M. Toriman, Environmental Monitoring andAssessment173, 625-641 (2010).
8. R. Mahmood, J.J. Messer, F.J. Nemanich, C.I. Liff and D.B. George,Reports, Paper 231.9.
P. Chow-Fraser, Development of the Wetland Water Quality Index (WQI) to Assess Effects of Basin -Wide Land-UseAlteration on Coastal Marshes of the Laurentian Great Lakes in Coastal wetlands of the Laurentian Great Lakes: health,
habitat and indicators, edited by T.P. Simon and P.M. Stewart, Indiana Biological Survey, Bloomington, IN. Chapter 5.2006, pp. 137-166.
10.
J. Kambe, T. Aoyama, A. Yamauchi and U. Nagashima,Journal of Computer Chemistry Japan, 6, 19-26 (2007).11.
I. Primpas, G. Tsirtsis, M. Karydis and G.D. Kokkoris,Ecological Indicators10, 178-183 (2010).
12.B.N. Lohani, M. Asce and G. Todino,Journal of Environmental Engineering110, 1163-1176 (1984).13.L. Hudrlkov and J. Fischer,Journal of Applied Mathematics4, 291-298 (2011).14.
N.Mustapha, "Indices for Water Quality Assessment in a River", Master Thesis, Asian Institution of Technology, 1981.15.
M.F. Mohd Nasir, M.S. Samsudin, I. Mohamad, M.R.A. Awaluddin, M.A. Mansor, H. Juahir and N. Ramli, World AppliedSciences Journal14, 73-82 (2011).
16.
R. B. Robinson, M. ASCE., C.D. Cox and K.M.A. Odom,Journal of Environmental Engineering131,651-65 (2005).
1335
-
7/26/2019 ACP Langat River_2013
15/16
17.
N.M. Gazzaz., M.K. Yusoff, M.F. Ramli, A.Z. Aris and H. Juahir, Marine Pollution Bulletin64, 688-698 (2012).18.
J.J. Hair, R. Anderson, R. Tathma and W. Black, Multivariate Data Analysis,US: Pearson Prentice Hall, 2005.
19.
I. Gupta, S. Dhage and R. Kumar,Indian J. Mar. Sci.38, 170-177 (2009)20.
H.F. Kaiser,Psychometrika23, 187-200 (1958)21.
Y. Ouyang, P. Nkedi-Kizza, Q.T. Wu, D. Shinde and C.H. Huang, Water Research40, 3800-3810 (2006).
22.G. Nicoletti, S. Scarpetta and O. Boylaud.,Economics department working papers. No. 26, ECO/WKP(99)18 (2000).23.
J. Zhao, G. Fu, K. Lei and Y. Li,Journal of Environmental Sciences23, 1460-1471 (2011).
24.
P. Debels, R. Figueroa, R. Urrutia, R. Barra, X. Niell X,Environ Monit Assess110, 301322 (2005).25.
A.A. Mamun and A. Idris, Revised Water Quality Indices for the Protection of Rivers in Malaysia, Twelfth InternationalWater Technology Conference, IWTC12 2008, Alexandria, Egypt, 2008, pp. 1687-1698.
26.
C. Coletti, R. Testezlaf, T.A.P. Ribeiro, R.T.G. de Souza, D.A. Pereira, Revista Brasileira de Engenharia Agrcola eAmbiental. 14, 517522 (2010).
1336
-
7/26/2019 ACP Langat River_2013
16/16
Copyright of AIP Conference Proceedings is the property of American Institute of Physics and its content may
not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written
permission. However, users may print, download, or email articles for individual use.