acp langat river_2013

Upload: marlene-estrada

Post on 01-Mar-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/26/2019 ACP Langat River_2013

    1/16

    The Langat River Water Quality Index Based on Principal

    Component Analysis

    Zalina Mohd Alia,d

    , Noor Akma Ibrahima, Kerrie Mengersen

    b, Mahendran Shitan

    a

    and Hafizan Juahirc

    aInstitute for Mathematical Research, Universiti Putra Malaysia,

    43400 UPM Serdang,Selangor DE, Malaysia

    bSchool of Mathematical Sciences,Queensland University of Technology,Brisbane, Australia

    cFaculty of Environmental Studies, Universiti Putra Malaysia,43400 UPM Serdang,Selangor DE, Malaysia

    dSchool of Mathematical Sciences, Faculty of Science and Technology

    Universiti Kebangsaan Malaysia, 43600 UKM Bangi,

    Selangor DE, Malaysia

    Abstract. River Water Quality Index (WQI) is calculated using an aggregation function of the six water quality sub-

    indices variables, together with their relative importance or weights respectively. The formula is used by the Departmentof Environment to indicate a general status of the rivers in Malaysia. The six elected water quality variables used in theformula are, namely: suspended solids (SS), biochemical oxygen demand (BOD), ammoniacal nitrogen (AN), chemicaloxygen demand (COD), dissolved oxygen (DO) and pH. The sub-indices calculations, determined by quality rating curve

    and their weights, were based on expert opinions. However, the use of sub-indices and the relative importance establishedin the formula is very subjective in nature and does not consider the inter-relationships among the variables. Therelationships of the variables are important due to the nature of multi-dimensionality and complex characteristics found inriver water. Therefore, a well-known multivariate technique, i.e. Principal Component Analysis (PCA) was proposed to

    re-calculate the waterquality index specifically in Langat River based on the inter-relationship approach. The applicationof this approach is not well-studied in river water quality index development studies in Malaysia. Hence, the approach inthe study is relevant and important since the first river water quality development took place in 1981. The PCA results

    showed that the weights obtained indicate the difference in ranking of the relative importance for particular variablescompared to the classical approaches used in WQI-DOE. Based on the new weights, the Langat River water quality index

    was calculated and the comparison between both indexes was also discussed in this paper.Keywords: Water Quality Index (WQI), Principal Component Analysis (PCA), Langat RiverPACS: 92.40.qc, 92.40.Cy, 92.40.qh

    INTRODUCTION

    Water quality is generally described according to biological, chemical and physical properties [1]. Based on

    these properties, the quality of water can be expressed via a numerical index (i.e. Water Quality Index (WQI)) by

    combining measurements of selected water quality variables. The selected water quality variables were identifiedwith respective weights and the determining processes were based on personal evaluation, namely, opinion-

    gathering techniques [2]. The weighting assigned to the selected variables was based on the relative importance

    given by the experts [3]. The weights determination technique also retained by other researchers includes the

    Malaysian Department of Environment (DOE) [4]. The selected variables, together with respective weights, are

    applied to calculate water quality index in all rivers in Malaysia. Due to varying characteristics for each river, theweights for water quality variables may be different for separate rivers. Therefore, it is clear that the existing

    weights of the selected variables, as per DOE are subjective in nature and no detailed studies have been done to

    determine the weights objectively. The objective weights can also be obtained by using multivariate statisticaltechniques, i.e. principal component analysis (PCA).

    Principal component analysis (PCA) is a statistical method used to determine a new set of artificial variables,

    namely principal components (PCs).These are linear combinations of the original variable. This method is alsoknown as a variable reduction procedure. It obtains a small number of components that subsequently clarify the

    largest variability of the original data set. In river water quality data analysis, PCA has been utilized to characterize

    Proceedings of the 20th National Symposium on Mathematical Sciences

    AIP Conf. Proc. 1522, 1322-1336 (2013); doi: 10.1063/1.4801283 2013 AIP Publishing LLC 978-0-7354-1150-0/$30.00

    1322

  • 7/26/2019 ACP Langat River_2013

    2/16

    and evaluate freshwater quality for different seasons [5], to divide selected water quality variables into groups [6]and to identify factors based on water quality variablescompositional patterns that influence particular regions [7].

    PCA has also been used for water quality index development. The water quality index calculation is based on

    modified un-rotated principal components and provides valuable insights into the relationship between water qualityand biological community composition [8]. In addition, the water quality index can also be calculated as the

    weighted sum of all principal component scores [9] or the sum of selected principal component scores [10]. The

    PCA approach was also found to be useful in developing a coastal marine eutrophication index based on the firstprincipal component criteria accounting for more than 50% of the whole variation [11]. In their study, no data

    transformations were considered and data was randomly selected from three different coastal areas.

    In river water quality, data with no gaps were used in all of the index developments based on PCA analyses. Inaddition, selected imputation techniques have been done to estimate missing values [10]. However, there is a large

    number of missing data at all sampling sites in this study (refer Figure 1). As a result, the traditional PCA approach(i.e. either no normality assumptions or no outliers detection) with complete case analysis for data dimension

    reduction may result in misleading conclusions in determining the index. Due to this problem, we have not

    considered unobserved data in this study. Since missing data was not imputed in this study, we have limited the

    analysis to dealing with water quality data using a parametric approach (i.e. assuming that the distribution of waterquality measurement is approximately multivariate normal with no extreme outliers). We firmly believe that the

    PCA approach works better under the multivariate normality assumptions and that the assumptions made will be

    useful in future inferences. We may consider the PCA with missing data values in our next analysis and propose that

    the extreme values (outliers) should be considered in a different analysis or can be maintained in the same analysis

    using robust PCA. Instead of using the principal component scores, researchers also used information from rotatedprincipal components to develop an index [12]. In addition, the researchers also summarized information from

    rotated principal components in order to calculate positive relative weights that were used in the index calculation

    [13]. However, negative relative weights are also important, especially in river water quality, as it shows the naturaleffect between the variables. Therefore, in this study, we examine the possibility of developing a PCA index, taking

    into account six physicochemical variables, and suggest new scales for assessing water quality on the proposed

    index. The proposed index is based on new statistical weights defined by the variablesstatistical importance from

    the combination of more than one PC with consideration given to negative PC loadings (i.e. correlations between thePCs and standardized data). The proposed methodology is applied to the Langat River in Selangor, Malaysia. The

    results obtained are compared with the existing PCs method and the relationship between the new index and DOE-

    WQI is also discussed.

    FIGURE 1. Plots of dissolved oxygen (DO) in the lower stream at Langat River from September 1995December 2007

    METHODS

    Data and Monitoring Sites

    In this analysis, data for Langat River was collected based on the availability of recorded data from 1995-2007.

    Five main monitoring stations were selected, as illustrated in Table 1, and the location of the selected stations is

    1323

  • 7/26/2019 ACP Langat River_2013

    3/16

    shown in Figure1. Data from 2008-2009 was used as an independent dataset for the validation of the proposed PCAindex.

    TABLE(1). DOE sampling station at the study area

    FIGURE2. Locations of the selected sampling stations

    Based on the collected data, the water quality status in Langat River can be evaluated and this can be obtained by

    using Water Quality Index (WQI) calculation. The WQI is formed as a weighted sum of six selected water quality

    variables, namely: Suspended Solids (SS), Biochemical Oxygen Demand (BOD), Ammonia Nitrogen (AN),

    Chemical Oxygen Demand (COD), Dissolved Oxygen (DO) and pH4. These variables were selected by a panel ofexperts as being the variables that will give some indication on the water quality level or water quality index of a

    river. The relative importance or weights determined by the experts are shown in Table 3. The weights for eachvariable indicate the relative importance of the variables in determining water quality index in Malaysia and detailsare discussed in Mustapha (1981) [14].

    TABLE(3). The relative importance determined by the experts

    Water Quality Variables Weights

    Dissolved Oxygen (%) 0.22Biochemical Oxygen Demand 0.19Chemical Oxygen Demand 0.16

    Ammonia Nitrogen 0.15Suspended Solids 0.16Potential of Hydrogen 0.12

    Data Treatment

    PCA was applied on physicochemical variables defined by the DOE in Table 3. In this study, the PCA approach

    requires variables to conform to a normal distribution. Some variables showed values either too low or too high,

    with the skewness and kurtosis being high for the original data. The results in Table 4 show that the DO (%) and PH

    were tested as normal and the other water quality variables were identified as being non-normally distributed. All

    the non-normal variables were log10transformed. This method is very common in environmental data [15]. After theselected water quality data was transformed, univariate normality for each variable was checked and the results are

    as shown in Table 5. This is a normal practice among researchers as part of data screening. Detection on the

    DOE Station

    Number

    Station

    Number

    DOE Station

    Code

    Distance From

    Estuary (Km)

    Grid Reference Location

    2814602 1 IL01 4.19 2

    52.027 101

    26.241 Air Tawar Village2815603 2 IL02 33.49 248.952 10130.780 TelokDatuk, near Banting

    town

    2817641 3 IL03 63.43 251.311 10140.882 Bridge at Dengkil Village

    2917642 4 IL05 86.94 259.533 10147.219 Kajang Bridge

    3118647 5 IL07 113.99 309.953 10150.926 Bridge at Batu 18

    1324

  • 7/26/2019 ACP Langat River_2013

    4/16

    presence of outliers in water quality data is also important due to the effects on the normality distribution of the data.Detailed discussion on the evaluation of water quality data for outliers can be found in Robinson et al. [16]. Initially,

    the univariate approach in water quality considers whether the variables are independent and not correlated. The

    approach may not be very helpful in checking each of them separately due to the nature of the relationship existingamong water quality data statistics. To the best of our knowledge, most of the researchers in the previous studies did

    not discuss in detail the existence of outliers in water quality data in their analyses. Apart from that, the outliers were

    maintained in the multivariate analysis as stated in Gazzaz et al. [17]. Since the outliers detection is also importantin the analysis, the commonly-used practice may not be very beneficial in confirming multivariate normality. If the

    multivariate normal variables are proven, then the variables are considered to be univariate normal as well.

    However, the univariate normal variables are not necessarily multivariate normal [18]. Therefore, multivariateoutlier detections must be screened and this can be achieved by using Mahalanobis distance or their associated p-

    value threshold of 0.001. The data presents as multivariate normal if the distances plot forms a straight-line patternor the associated p-value > 0.001. If the associated p-value is smaller than the threshold, then we can say that the

    influential outlier was detected. We should stress that the influential outliers detected from the Mahalanobis distance

    were not considered in our study. This is due to the sensitivity of the presence of outliers in the multivariate

    techniques [18].

    TABLE (4). Descriptive Statistics (n=466) of the untransformed selected Water Quality Variables in the Langat River

    Variables Min Max Mean Standard

    Deviation

    Median Trimmed Mean

    (5%)

    Skewness Kurtosis

    DO 0.00 116.20 61.48 27.48 64.65 62.22 -0.36 -0.71BOD 0.30 80.80 5.71 7.27 4.00 4.66 4.91 35.86

    COD 0.50 3850.00 49.70 185.46 33.00 35.86 18.93 382.73

    AN 0.0050 14.30 1.17 1.30 0.78 1.035 3.17 22.56

    SS 0.50 5010.00 294.43 490.19 157.00 220.89 5.15 38.51

    PH 3.60 8.34 6.66 0.75 6.84 6.72 -1.40 2.55

    Principal Component Analysis

    PCA was applied on the standardized log transformation physicochemical variables after the multivariate normal

    variables are proven. The multivariate normality result on the descriptive analysis is shown in Table 6 and the

    visualization results in Figure 3. The use of standardization minimizes the differences scales in measurement units

    and variance [19]. All six PCs were derived easily from SAS PRINCOMP procedure and default settings in SASPROC FACTOR procedure. The results from both procedures remained the same and additional information on the

    factor loadings based on PCA was obtained from the PROC FACTOR procedure. The PCA scores can be calculated

    by using equation 1.

    1 1 2 2 6 6i i j i j i jPC e Z e Z e Z (1)

    where PC is the principal component score, e is the component loading or weights obtained from the eigenvector,

    Z is the standardized transformed data, i is the component number and j is the sample number. In PCA, the first

    PC signifies the largest variability of the original data set and is obtained from the linear combination of thevariables with maximal variance. The second PC is the linear combination with the next largest variability and is

    non-correlated (orthogonal) to the first component. All PCs are arranged in decreasing order of importance

    according to their variability. Usually, the WQI was calculated as the weighted sum of site score from all possible

    axes where, theoretically, the possible axes are equivalent to the number of variables used. Instead of using all thePCs to calculate the WQI score as suggested by Chow-Fraser [9], or selected PCs based on the eigenvalues of 1 or

    greater as proposed by Kaiser in 1958 [20], we calculated the index based on the idea of positive weights loading in

    rotated PCs by Hurdlikova and Fischer [13] with consideration given to the negative weight loading in un-rotatedPCs.

    1325

  • 7/26/2019 ACP Langat River_2013

    5/16

    TABLE (5). Descriptive Statistics (n=453) of the transformed selected Water Quality Variables

    Variables Min Max Mean Standard

    Deviation

    Median Trimmed Mean

    (5%)

    Skewness Kurtosis

    DO 0.00 116.20 61.53 27.26 64.70 62.28 -0.37 -0.65BOD -0.30 1.91 0.57 0.37 0.60 0.56 0.29 0.05COD 0.60 2.22 1.51 0.24 1.52 1.52 -0.23 1.44

    AN -2.30 1.16 -0.32 0.81 -0.11 -0.27 -1.29 0.90

    SS -0.30 3.70 2.06 0.71 2.20 2.10 -0.92 1.07PH 4.05 8.34 6.67 0.71 6.83 6.72 -1.20 1.81

    TABLE (6). Descriptive Statistics (n=446) of the transformed selected Water Quality VariablesVariables Min Max Mean Standard

    Deviation

    Median Trimmed Mean

    (5%)

    Skewness Kurtosis

    DO 0.00 116.20 61.6 27.33 64.95 62.36 -0.37 -0.65BOD -0.30 1.91 0.58 0.37 0.60 0.57 0.31 0.05

    COD 0.70 2.22 1.52 0.23 1.52 1.52 -0.07 1.20AN -2.30 1.16 -0.32 0.80 -0.11 -0.26 -1.3 0.95SS -0.30 3.70 2.06 0.70 2.2 2.1 -0.94 1.16PH 4.05 8.34 6.69 0.68 6.84 6.74 -1.12 1.64

    FIGURE 3.Probability plots of Mahalanobis Distance before Outliers Deletion (Left) and after Outliers Deletion (Right)

    RESULTS

    Correlation Analysis

    The Pearson correlation matrix for the variables is found in Tables 7-8 and most of the paired water quality

    variables show similar results. However, the relationship between SS-BOD, SS-COD, COD-DO, and COD-BODbefore and after outliersdeletion shows a slight difference. A significant positive correlation was found in most of

    the paired water quality variables, excluding negative correlation between DO and other pollutants. A significant

    positive correlation was also found between the two indicators, i.e. DO and PH. A weak positive correlation between

    BOD-PH shows that these two variables maybe redundant.

    TABLE (7). The Pearson correlation matrix of the Untransformed Selected Water Quality Variables (N=466)Variables DO BOD COD AN SS

    BOD -0.355 (0.000)

    COD -0.427 (0.000) 0.532 (0.000)AN -0.504 (0.000) 0.460 (0.000) 0.315 (0.000)SS -0.488 (0.000) 0.411 (0.000) 0.474 (0.000) 0.464 (0.000)PH 0.508 (0.000) 0.007 (0.875) -0.104 (0.024) -0.283 (0.000) -0.272 (0.000)

    * correspondingp-values in brackets.

    1326

  • 7/26/2019 ACP Langat River_2013

    6/16

    TABLE (8). The Pearson correlation matrix of the Transformed Selected Water Quality Variables (N=446)

    Variables DO BOD COD AN SS

    BOD -0.366 (0.000)

    COD -0.501 (0.000) 0.629 (0.000)AN -0.502 (0.000) 0.480 (0.000) 0.388 (0.000)SS -0.484 (0.000) 0.385 (0.000) 0.513 (0.000) 0.476 (0.000)PH 0.530 (0.000) 0.010 (0.836) -0.182 (0.000) -0.289 (0.000) -0.285 (0.000)

    *correspondingp-values in brackets.

    River Water Quality Index Development

    The eigenvalues and contribution of the principal components are listed in Table 9. Eigenvalues of the first,

    second and third principal components were 3.05, 1.15 and 0.60 respectively. The respective contributions (inpercentage) for each principal component were 51%, 70% and 80%. Only the results for the first and second

    principal components were discussed (i.e. the eigenvalues were greater than 1). The eigenvectors (loadings) for each

    axis are shown in Table 10. The loadings indicate the relative importance of each variable within the individual

    axes. The importance can be determined based on the absolute magnitude of the eigenvector loadings. No specific

    rules were used for picking out the loadings and large loadings were chosen based on the values which are greaterthan 0.40 in this study. The eigenvectors were then used to determine the latent variables (i.e. PC scores that signify

    water quality scores [10]. When normalized, the scores will give the values of the water quality indices undernormality distribution. The calculated indices are in the area under the curve and may be regarded as a degree of

    pollution value [12]. The values are then multiplied by 100 to give a range between 0-100, with zero representinggood water quality. Conversely, other researchers used the sum of weighted PC scores to obtain water quality

    indices with zero representing low water quality [9]. The number of variables entered into the PCA will give a

    possible similar number of PCs. In our case, since six variables were entered, six possible axes or PCs were fitted.From Table 10, the first PC appears to have large positive loadings on the COD, AN, SS and large negative loadings

    on the DO. This means that all four variables determine more of the variance explained by the first PC. The second

    PC has a large loading on the BOD and pH and this variable determines more of the variance explained by the

    second PC. The results also appear consistent with the correlation analysis.

    TABLE (9).Summary of eigenvalues produced by PCA using Standardized Values of Six Water-Quality Variables

    PC Axis Eigenvalue Proportion Of Variation Explained Cumulative Proportion Of Variation

    Explained

    1 3.05 0.51 0.512 1.15 0.19 0.70

    3 0.60 0.10 0.804 0.55 0.09 0.895 0.35 0.06 0.95

    6 0.29 0.05 1.00

    TABLE (10).Eigenvectors produced by PCA using Standardized Values of Six Water-Quality Variables

    Variables PC1 PC2 PC3 PC4 PC5 PC6

    DO -0.45 0.31 0.133 0.24 0.77 0.16

    BOD 0.40 0.53 0.01 -0.36 0.26 -0.61COD 0.44 0.29 -0.54 -0.084 0.13 0.64AN 0.43 0.005 0.82 -0.08 0.06 0.37SS 0.43 -0.003 -0.04 0.87 0.03 -0.22

    PH -0.27 0.73 0.14 0.193 -0.56 0.11Results on the un-rotated PCA/FA of the correlations coefficients (i.e. loadings between the two principal

    components and the standardized water quality variables) are shown in Table 11. The loadings are classified as

    strong or high to absolute loading values of >0.75, moderate to the values of 0.75-0.50 and weak or low to the

    values of 0.50-0.30 [21]. The high loadings between the first PC and a variable indicate that the variable is related to

    the maximum amount of variation in the dataset. A strong association between the second PC and a variable

    indicates that the variable is responsible for the next largest variation in the data perpendicular to the first PC. Thesum of the squared loading for each principal component is the percentage of variance in that variable explained by

    the principal component. The normalized squared component was then used to group the selected highest loading to

    the lowest loading of the variables into temporary loadings [22]. First temporary loadings include DO (with a weight

    1327

  • 7/26/2019 ACP Langat River_2013

    7/16

    0.21), BOD (0.16), COD (0.20), AN (0.18), SS (0.18) and the second temporary loading is formed by pH (0.79).Subsequently, the actual weights were obtained by assigning temporary loadings for each variable with a weight to

    each of them, i.e. the weight is equal to the proportion of the explained variance: 0.73=3.05/ (3.05+1.15) and 0.27

    for the second. Finally, with preserving the negative sign of DO on the PCA loadings into the actual weights, thenew PCA-weights can be determined by normalizing the actual weights. The negative weights (as well as positive

    weights) should be maintained in calculating the index as long as the sum of weights is greater than zero.

    The negative loading on DO shows the natural effect between DO and other pollutant variables. The negativeloadings on DO were also reported in other studies [7, 6, 23, 15, 17]. These are suspected to have come from

    domestic wastewater, wastewater treatment plants, industries and agricultural activities. Hence, it is clear that the

    increase in organic matter will decrease the DO [7]. Apart from that, DO was eliminated from WQI calculation dueto the negative weights or loadings [24]. However, we preserved the negative sign of DO in the component loadings

    to the new weights due to the natural effect (i.e. the influence of DO in determining the quality of water withpresence of the pollutants). For comparability, the final PCA weights were rescaled to sum up to one (i.e.

    normalization of the actual weights). The weights determined were based on the effect of decreasing or increasing

    the water quality variable to the quality of water. For instance, the negative sign DO shows the effect of decreasing

    DO, and the higher the effect, the DO score (i.e. the negative weights of DO multiplied by the standardized value ofDO) will be higher at the polluted area compared to the clean area (refer Figure 4). Conversely, the new positive

    weights of the pollutants (i.e. BOD, COD, AN and SS) show the increasing effect of the pollutants to the river water.

    The higher the effect of the pollutants, the higher the pollutant score will be. Positive BOD and COD related to

    anthropogenic pollution sources, and were expected to come from point sources pollution such as sewage treatment

    plants and industrial effluents. Positive BOD and AN also represent the influence of organic pollutants from pointsources (such as discharge from wastewater treatment plants, domestic wastewater and industrial effluent), while

    positive COD and SS were explained as being the erosion from upland areas during rainfall events and soil

    cultivation. The presence of SS in water quality also explained discharge from urban development areas involvingland clearing or specifically as surface runoff sources. Detailed information on the pollution sources can be found in

    Juahir et al. [7] and Mohd Nasir et al. [15].

    On the other hand, the weights determined by DOE-WQI show the relative importance of the variables in

    determining the quality of water. It means that the higher the weights the more important the variables should be.However, the natural effect of the water quality variables which are based on the inter-relationship between the

    variables was not clear from the DOE-WQI weights. Therefore, we firmly believe the new weights obtained in this

    study will give beneficial information on the status of the river from a different perspective.

    TABLE (11). Summary of Component Loadings and New PCA Weights

    VariablesComponent

    loadingsSquared component

    loadings

    (normalized)

    Temporary

    loadings

    Actual

    Weights

    PCA

    Weights

    PC 1 PC 2 PC 1 PC 2

    DO -0.79 0.33 0.21 0.09 0.21 -0.15 -0.29BOD 0.69 0.57 0.16 0.28 0.16 0.11 0.22

    COD 0.78 0.32 0.20 0.09 0.20 0.14 0.28AN 0.74 0.01 0.18 0.00 0.18 0.13 0.25SS 0.75 0.00 0.18 0.00 0.18 0.13 0.26PH -0.48 0.79 0.08 0.54 0.54 0.15 0.29

    Variation Explained, VE 3.05 1.15Proportion of VE 0.73 0.27

    The PCA approach used in this study summarizes the relative importance based on the inter-relationship between

    the variables. The results from Table 11 show that all variables signified similar influences in determining thequality of water. In detail, the highest relative importance is PH and DO with the lowest being BOD. The negative

    statistical weight of DO maintained the natural relationship between DO and other pollutants. The positive statistical

    weight of PH also shows the natural effect of water quality variable to the quality of water. A study done by Mamun

    et al. [25] suggested that pH should be monitored to assess the suitability of water for other usages. From the expert

    opinion approach on the other hand in Table 3, it can be seen that DO (%) is claimed as having the highest relativeimportance, followed by BOD, with the lowest being PH. The weight assigned in DOE-WQI was in accordance with

    its relative importance in the overall quality of surface water for general purposes. Hence, we may consider that thestrong negative weights on DO in the new WQI formula are related to the organic pollution suspected as coming

    1328

  • 7/26/2019 ACP Langat River_2013

    8/16

    from domestic wastewater, wastewater treatment plants, industries, agricultural activities and forest areas [7].Therefore, the following expression is used to calculate the PCA-WQI.

    0.29 0.22 0.28 0.25 0.26 0.29PCA WQI ZDO ZBOD ZCOD ZAN ZSS ZPH (2)

    The PCA-WQI values in this study are in the interim between -3 and 3. The PCA-WQI score was transformed to

    the percentage of the normal score as a PCA-WQI final score. The final score was more comparable with DOE-WQI, as the index is in the range of 0-100. The relationship between DOE-WQI and PCA-WQI in Figure 4 clearly

    indicates that the better the quality of the water, the lower PCA-WQI values.

    FIGURE 4.Plot of DOE-WQI versus PCA-WQI values

    Validation Analysis

    In this study, we proposed new scores of the individual variables that preserve the same pattern as the first two

    PCs scores. Both of the PCs scores were calculated based on the weighted principal component scores as suggested

    by Chow-Fraser [9], since the results were more encouraging for our data. From Figure 5, the same pattern wasfound between PC1 score and the new scores for the first important variables (i.e. DO, COD and SS). However,

    there was a slight difference between the BOD and AN patterns. A similar pattern was also found between PC2score and the new score for PH. A general score of WQI was then calculated as:(i) weighted sum of six PCs, (ii)weighted sum of two PCs, and (iii) sum of the new weight PCA-WQI. The average of the general score (except the

    score in 1995 at Station 1) were calculated and plotted as shown in Figure 6. The plots show similar patterns in the

    scores (except for data in 1995 at Station 2). The results confirmed the potential of the third method, i.e. the newscore of WQI, which is easier in calculation than the existing method using the weighted sum of six and two PCs.

    The new WQI calculated were clustered in particular groups, using the hierarchical agglomerative clustering

    analysis (HACA) (i.e. Wards method with Euclidean distance as a measure of similarity). The calculation can be

    easily computed in SPSS package and the results were summarized in Table 12 below.

    TABLE (12). Summary of the New PCA-WQI based on HACA

    Group Min Max Mean SD Status

    1 27 74 53.57 13.36 Slightly Polluted2 75.00 99.00 84.68 6.87 Polluted

    3 1.00 25.00 9.70 7.10 Clean

    1329

  • 7/26/2019 ACP Langat River_2013

    9/16

    FIGURE 5. Plot of the PC Scores and the Individual Variables Scores

    1330

  • 7/26/2019 ACP Langat River_2013

    10/16

    We re-classified the PCA-WQI index range and slightly modified the index range for items of slightly pollutedstatus, so it will be comparable with DOE-WQI findings that are based on an expert opinion (EO) approach as

    shown in Table 13. The new groups with selected ranges for PCA-WQI (refer Table 14) were used in the validation

    part for independent data from 2008-2009.

    TABLE (13). Summary of the DOE-WQI based on EOGroup Min Max Mean SD Status

    1 81.00 96.00 90.36 4.13 Clean2 60.00 80.00 69.15 5.42 Slightly Polluted

    3 16.00 59.00 48.53 9.45 Polluted

    TABLE (14). Index range of Water Quality based on PC-WQI and DOE-WQIPC-WQI DOE-WQI

    Index Range Status Index Range Status

    1-25 (3) Clean 81-100 (1) Clean26-74 (2) Slightly Polluted 60-80 (2) Slightly Polluted75-99 (1) Polluted 0-59.4 (3) Polluted

    The descriptive statistics of the transformed variables for 2008-2009 show all values in the range of original

    values from data sets and are illustrated in Table 6. The results permitted us to use the standardized transformed datafrom 2008-2009 in the new WQI calculation.

    TABLE (15). Descriptive Statistics (n=96) of the transformed selected Water Quality Variables for 2008-2009

    Variables Min Max Mean Standard Deviation Skewness Kurtosis

    DO 38.50 104.80 72.71 15.80 -0.05 -0.63

    BOD 0.00 1.18 0.58 0.28 -0.32 -0.21COD 0.30 1.85 1.35 0.28 -1.08 1.60

    AN -2.30 0.46 -0.62 0.87 -1.07 -0.16SS 0.00 3.18 2.15 0.63 -1.27 1.83

    PH 5.84 7.71 6.83 0.38 -0.47 -0.07

    1331

  • 7/26/2019 ACP Langat River_2013

    11/16

    FIGURE 6. Plot of the general scores of WQI for three different methods at station 1-5

    1332

  • 7/26/2019 ACP Langat River_2013

    12/16

    Combining the results from both methods, we summarized the general WQI score in Table 16. We then plottedthe data from Table 16, as shown in Figure 7. The plots confirmed the inverse relationship between both methods.

    TABLE (16). Summary of general scores for PCA-WQI and DOE-WQI in Station 1-5

    Year

    Station 1

    N

    Station 2

    N

    Station 3

    N

    Station 4

    N

    Station 5

    NPC DOE PC DOE PC DOE P

    C

    DOE PC DOE

    1995 56 62 1 53 43 2 57 61 2 74 52 2 2 93 21996 77 46 3 63 48 4 62 60 4 86 35 3 11 92 21997 88 44 2 95 33 2 68 53 4 82 40 4 9 87 21998 56 58 4 74 39 5 80 48 6 80 49 6 9 90 51999 49 62 6 44 61 6 57 64 6 72 53 6 4 93 52000 58 60 5 64 43 6 64 54 5 73 53 6 17 88 62001 57 56 6 69 48 9 64 54 9 68 62 9 6 92 6

    2002 44 67 5 55 51 11 66 60 12 65 61 12 2 94 62003 36 68 6 54 58 12 49 69 12 64 65 12 9 92 62004 41 75 6 57 63 12 61 67 12 68 61 12 5 90 62005 26 76 6 53 64 12 53 70 12 73 59 12 5 93 6

    2006 28 74 6 46 72 11 56 72 12 63 69 12 9 92 62007 32 77 6 49 72 12 50 75 12 61 69 12 10 91 62008 37 73 6 45 70 12 36 76 12 53 68 12 5 93 62009 38 72 6 51 66 12 45 72 12 54 68 12 6 92 6

    Classification Analysis

    A cross-tabulation in the status of both methods was performed, as shown in Tables 17 and 18. The results show

    a high percentage in the same category, indicating the ability of the new PCA-WQI to be in the same group withDOE-WQI or vice versa. Overall, 81.2% of the original data set is classified in the same group and 91.7% for the

    independent data sets.

    .TABLE (17). Cross-tabulation analysis of PCA-WQI and DOE-WQI status from 1995-2007

    PCA-WQI

    DOE-WQI

    Clean Slightly Polluted Polluted Total Percentage of

    Same Agreement

    Clean 73 18 2 93 78.5Slightly Polluted 1 199 53 253 78.7

    Polluted 0 10 90 100 90.0

    Total 74 227 145 446

    Percentage of

    Same Agreement98.6 87.7 62.1 81.2

    TABLE (18). Cross-tabulation analysis of PCA-WQI and DOE-WQI status from 2008-2009DOE-WQI

    Clean Slightly Polluted Polluted Total Percentage of

    Same Agreement

    Clean 6 0 0 6 100.0

    PCA-WQI Slightly Polluted 1 65 0 66 98.5

    Polluted 0 7 17 24 70.8

    Total 7 72 17 96Percentage ofSame Agreement

    85.7 90.3 100.091.7

    1333

  • 7/26/2019 ACP Langat River_2013

    13/16

    FIGURE 7. Plots of the general scores of PCA-WQI and DOE-WQI in stations 1-5

    1334

  • 7/26/2019 ACP Langat River_2013

    14/16

    CONCLUSIONS

    The aim of this study was to develop the water quality index procedure in Langat River based on the establishedmethod, PCA, by using water quality data measured during 1995-2007. The distribution of water quality

    measurement is assumed to be approximately multivariate normal with no extreme outliers after a thorough

    evaluation was performed. Unobserved data were not considered in this study. All selected water quality data were

    then analyzed using PCA and the results show strong positive loading on BOD and COD and the relationshiprepresenting influences from Non-Pollution Sources (NPS), such as agricultural activities and forest areas. Thestrong negative loadings on DO are related to high levels of organic matter consuming large amounts of oxygen. On

    the other hand, the positive loadings on PH shows the natural effect on the body of Langat river water and small

    variations of pH were found in Langat River between the periods of this study. Conversely, high variations of othervariables were found in the same river.

    The loadings were then re-calculated to perform new statistical weights. The new statistical weights which were

    based on the modification of the variable loading makes the WQI calculation easier and simpler to handle. Tovalidate the new weights, the PCA-WQI was compared with other existing PCs methods. We found that the new

    weights used in the PCA-WQI calculation generated fairly similar scores with the existing method of using the

    weighted sum of all PCs or selected PCs. The new PCA-WQI also shows the inverse relationship with the DOE-

    WQI. This relationship clearly signifies that the better the quality of the water, the lower are PCA-WQI values. The

    results of the water quality status in this study are also consistent with Kambe et al. [10] and Colleti et al. [26]. We

    also defined a new index range concerning river water quality status. Based on the new index range, the ability ofthe new PCA-WQI to be in the same group with DOE-WQI or vice versa was found to be good. It was classified

    with a high percentage in the same category (i.e. 81.2% of the original data set is classified in the same group

    compared to 91.7% for the independent data sets). Thus, the simplicity of the proposed PCA-WQI calculation wasvery sound and the methodology can be applied to other rivers in Malaysia.

    ACKNOWLEDGMENTS

    We would like to thank the Malaysian Department of Environment for supplying the data on which this workwas based.

    REFERENCES

    1.

    S.E. Cooke, S.M. Ahmed and N.D. Macalphine, Introductory Guide to Surface Water Quality Monitoring in Agriculture,Alberta: Alberta Agriculture, Food and Rural Development (2005).

    2.

    R. Brown, N. McClelland, R. Deininger, and R. Tozer, Water and Sewage Works, 339-343 (1970) .

    3.

    D.G. Smith, Water Research10, 1237-1244 (1990).4.

    Department of Environment, DOE, Malaysian Environmental Quality Reports, Kuala Lumpur: Ministry of Science,Technology and Environment (1997).

    5.

    A. Z. Garizi, V. Sheikh and A. Sadoddin,International Journal of Environmental Science and Technology8, 581-592 (2011).

    6.

    P.T.M. Hanh, S. Sthiannopkao, D. The Ba. and K-W. Kim,J Environ Eng-ASCE137, 273-283 (2011).7.

    H. Juahir, M. S. Zain, M. Yusoff, T. Tengku Hanidza, A. Mohd Armi and M. Toriman, Environmental Monitoring andAssessment173, 625-641 (2010).

    8. R. Mahmood, J.J. Messer, F.J. Nemanich, C.I. Liff and D.B. George,Reports, Paper 231.9.

    P. Chow-Fraser, Development of the Wetland Water Quality Index (WQI) to Assess Effects of Basin -Wide Land-UseAlteration on Coastal Marshes of the Laurentian Great Lakes in Coastal wetlands of the Laurentian Great Lakes: health,

    habitat and indicators, edited by T.P. Simon and P.M. Stewart, Indiana Biological Survey, Bloomington, IN. Chapter 5.2006, pp. 137-166.

    10.

    J. Kambe, T. Aoyama, A. Yamauchi and U. Nagashima,Journal of Computer Chemistry Japan, 6, 19-26 (2007).11.

    I. Primpas, G. Tsirtsis, M. Karydis and G.D. Kokkoris,Ecological Indicators10, 178-183 (2010).

    12.B.N. Lohani, M. Asce and G. Todino,Journal of Environmental Engineering110, 1163-1176 (1984).13.L. Hudrlkov and J. Fischer,Journal of Applied Mathematics4, 291-298 (2011).14.

    N.Mustapha, "Indices for Water Quality Assessment in a River", Master Thesis, Asian Institution of Technology, 1981.15.

    M.F. Mohd Nasir, M.S. Samsudin, I. Mohamad, M.R.A. Awaluddin, M.A. Mansor, H. Juahir and N. Ramli, World AppliedSciences Journal14, 73-82 (2011).

    16.

    R. B. Robinson, M. ASCE., C.D. Cox and K.M.A. Odom,Journal of Environmental Engineering131,651-65 (2005).

    1335

  • 7/26/2019 ACP Langat River_2013

    15/16

    17.

    N.M. Gazzaz., M.K. Yusoff, M.F. Ramli, A.Z. Aris and H. Juahir, Marine Pollution Bulletin64, 688-698 (2012).18.

    J.J. Hair, R. Anderson, R. Tathma and W. Black, Multivariate Data Analysis,US: Pearson Prentice Hall, 2005.

    19.

    I. Gupta, S. Dhage and R. Kumar,Indian J. Mar. Sci.38, 170-177 (2009)20.

    H.F. Kaiser,Psychometrika23, 187-200 (1958)21.

    Y. Ouyang, P. Nkedi-Kizza, Q.T. Wu, D. Shinde and C.H. Huang, Water Research40, 3800-3810 (2006).

    22.G. Nicoletti, S. Scarpetta and O. Boylaud.,Economics department working papers. No. 26, ECO/WKP(99)18 (2000).23.

    J. Zhao, G. Fu, K. Lei and Y. Li,Journal of Environmental Sciences23, 1460-1471 (2011).

    24.

    P. Debels, R. Figueroa, R. Urrutia, R. Barra, X. Niell X,Environ Monit Assess110, 301322 (2005).25.

    A.A. Mamun and A. Idris, Revised Water Quality Indices for the Protection of Rivers in Malaysia, Twelfth InternationalWater Technology Conference, IWTC12 2008, Alexandria, Egypt, 2008, pp. 1687-1698.

    26.

    C. Coletti, R. Testezlaf, T.A.P. Ribeiro, R.T.G. de Souza, D.A. Pereira, Revista Brasileira de Engenharia Agrcola eAmbiental. 14, 517522 (2010).

    1336

  • 7/26/2019 ACP Langat River_2013

    16/16

    Copyright of AIP Conference Proceedings is the property of American Institute of Physics and its content may

    not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written

    permission. However, users may print, download, or email articles for individual use.