e analysis of annual maximum rainfall events

18
water Article Eect of Multicollinearity on the Bivariate Frequency Analysis of Annual Maximum Rainfall Events Chulsang Yoo and Eunsaem Cho * School of Civil, Environmental and Architectural Engineering, College of Engineering, Korea University, Seoul 02841, Korea; [email protected] * Correspondence: [email protected]; Tel.: +82-10-4547-2681 Received: 21 March 2019; Accepted: 26 April 2019; Published: 29 April 2019 Abstract: A rainfall event, simplified by a rectangular pulse, is defined by three components: the rainfall duration, the total rainfall depth, and mean rainfall intensity. However, as the mean rainfall intensity can be calculated by the total rainfall depth divided by the rainfall duration, any two components can fully define the rainfall event (i.e., one component must be redundant). The frequency analysis of a rainfall event also considers just two components selected rather arbitrarily out of these three components. However, this study argues that the two components should be selected properly or the result of frequency analysis can be significantly biased. This study fully discusses this selection problem with the annual maximum rainfall events from Seoul, Korea. In fact, this issue is closely related with the multicollinearity in the multivariate regression analysis, which indicates that as interdependency among variables grows the variance of the regression coecient also increases to result in the low quality of resulting estimate. The findings of this study are summarized as follows: (1) The results of frequency analysis are totally dierent according to the selected two variables out of three. (2) Among three results, the result considering the total rainfall depth and the mean rainfall intensity is found to be the most reasonable. (3) This result is fully supported by the multicollinearity issue among the correlated variables. The rainfall duration should be excluded in the frequency analysis of a rainfall event as its variance inflation factor is very high. Keywords: annual maximum rainfall event; bivariate frequency analysis; copula; multicollinearity 1. Introduction The bivariate frequency analysis (BFA) is used for the quantification of the probabilistic characteristic of two correlated variables. This analysis has been conducted in a wide range of study areas. De Haan and De Ronde [1] evaluated the risk of wave overtopping by conducting the BFA on the still water level and the wave height. Yue [2] assessed the applicability of the bivariate exponential distribution for two univariate data that follow the exponential distribution. Yue and Wang [3] compared the Gumbel mixed model and the Gumbel logistic model, which are used for the BFA. In terms of hydrology and water resources, the BFA has been conducted mostly for processes like the rainfall event, drought, and river discharge. In the case of the rainfall event, two variables among several characteristics, such as the mean rainfall intensity, rainfall duration, total rainfall depth, and maximum rainfall intensity, are selected to perform the analysis [4,5]. For the drought, two variables among the characteristics of the drought event, such as the average drought severity, drought duration, total drought severity, and maximum drought severity, are usually used for the analysis [68]. The analysis of the river discharge is also similar [911]. Bivariate probability distribution functions (PDFs) have been generally used for the BFA. The bivariate exponential distribution, bivariate gamma distribution, and bivariate extremal distribution have been frequently used regarding hydrology and water resources [1215]. The use of the Water 2019, 11, 905; doi:10.3390/w11050905 www.mdpi.com/journal/water

Upload: others

Post on 02-Oct-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

water

Article

Effect of Multicollinearity on the Bivariate FrequencyAnalysis of Annual Maximum Rainfall Events

Chulsang Yoo and Eunsaem Cho *

School of Civil, Environmental and Architectural Engineering, College of Engineering, Korea University,Seoul 02841, Korea; [email protected]* Correspondence: [email protected]; Tel.: +82-10-4547-2681

Received: 21 March 2019; Accepted: 26 April 2019; Published: 29 April 2019�����������������

Abstract: A rainfall event, simplified by a rectangular pulse, is defined by three components:the rainfall duration, the total rainfall depth, and mean rainfall intensity. However, as the meanrainfall intensity can be calculated by the total rainfall depth divided by the rainfall duration, any twocomponents can fully define the rainfall event (i.e., one component must be redundant). The frequencyanalysis of a rainfall event also considers just two components selected rather arbitrarily out of thesethree components. However, this study argues that the two components should be selected properlyor the result of frequency analysis can be significantly biased. This study fully discusses this selectionproblem with the annual maximum rainfall events from Seoul, Korea. In fact, this issue is closelyrelated with the multicollinearity in the multivariate regression analysis, which indicates that asinterdependency among variables grows the variance of the regression coefficient also increases toresult in the low quality of resulting estimate. The findings of this study are summarized as follows:(1) The results of frequency analysis are totally different according to the selected two variables out ofthree. (2) Among three results, the result considering the total rainfall depth and the mean rainfallintensity is found to be the most reasonable. (3) This result is fully supported by the multicollinearityissue among the correlated variables. The rainfall duration should be excluded in the frequencyanalysis of a rainfall event as its variance inflation factor is very high.

Keywords: annual maximum rainfall event; bivariate frequency analysis; copula; multicollinearity

1. Introduction

The bivariate frequency analysis (BFA) is used for the quantification of the probabilistic characteristicof two correlated variables. This analysis has been conducted in a wide range of study areas. De Haanand De Ronde [1] evaluated the risk of wave overtopping by conducting the BFA on the still water leveland the wave height. Yue [2] assessed the applicability of the bivariate exponential distribution fortwo univariate data that follow the exponential distribution. Yue and Wang [3] compared the Gumbelmixed model and the Gumbel logistic model, which are used for the BFA.

In terms of hydrology and water resources, the BFA has been conducted mostly for processeslike the rainfall event, drought, and river discharge. In the case of the rainfall event, two variablesamong several characteristics, such as the mean rainfall intensity, rainfall duration, total rainfall depth,and maximum rainfall intensity, are selected to perform the analysis [4,5]. For the drought, twovariables among the characteristics of the drought event, such as the average drought severity, droughtduration, total drought severity, and maximum drought severity, are usually used for the analysis [6–8].The analysis of the river discharge is also similar [9–11].

Bivariate probability distribution functions (PDFs) have been generally used for the BFA.The bivariate exponential distribution, bivariate gamma distribution, and bivariate extremal distributionhave been frequently used regarding hydrology and water resources [12–15]. The use of the

Water 2019, 11, 905; doi:10.3390/w11050905 www.mdpi.com/journal/water

Water 2019, 11, 905 2 of 18

bivariate probability distribution, such as bivariate exponential distribution [16] and bivariate gammadistribution [17], is only valid when two variables follow the same PDF. Unfortunately, this assumptionis not valid in most cases of BFA. For example, the PDF for the mean rainfall intensity of a rainfallevent is different from that for rainfall duration. The former is generally modeled by the log-normaldistribution, but the exponential distribution is used for the latter [18]. In this case, the copula can be asound option [19] and many examples where the copula is applied can also be found in hydrologyand water resources [7,12,18,20]. De Michele et al. [12] analyzed the flood peak and flood volume datafrom Northern Italy using bivariate copulas to check adequacy of dam spillway. Zhang and Singh [18]derived bivariate distribution of rainfall variables using copula and tested the applicability on the datafrom the Amite river basin in Louisiana, United States. Mirabbasi et al. [7] applied bivariate copulason drought duration and severity data from Iran and conducted drought frequency analysis.

The objective of this study is to provide a guideline about the selection of two variables for the BFAof rainfall events. ‘Rainfall event’ is a basic unit for characterizing storm. In the case of ‘independentrainfall event’ is defined as a rainfall event which is not correlated with any other rainfall events. It ispossible to define interdependent rainfall event by separating rainfall events with correlation time,which means the minimum time required to make rainfall event independent. This correlation time isalso referred to Inter-event Time Definition (IETD).

In Korea, 10 h is generally applied as the IETD [21,22]. However, this study considered variousIETD based on frequency of independent rainfall event to determine reasonable IETD. As a result,the 12 h of IETD was effective to separate independent rainfall event. This IETD was also considered inthis study to separate the independent rainfall events. Before separating the rainfall event, a thresholdvalue was applied to distinguish the rain/no rain condition. In this study, 1 mm/hour was consideredfor this purpose.

It is generally accepted that the structure of a rainfall event can be simplified with the rectangularpulse model [23–27]. In the rectangular pulse model, a rainfall event is quantified by three components;the rainfall duration, the total rainfall depth and mean rainfall intensity. Here, the mean rainfallintensity can be calculated by the total rainfall depth divided by the rainfall duration. Thus, any twocomponents can fully define the rainfall event and one component must be redundant (Figure 1).

Water 2019, 11, 905 2 of 20

Bivariate probability distribution functions (PDFs) have been generally used for the BFA. The

bivariate exponential distribution, bivariate gamma distribution, and bivariate extremal distribution

have been frequently used regarding hydrology and water resources [12–15]. The use of the

bivariate probability distribution, such as bivariate exponential distribution [16] and bivariate

gamma distribution [17], is only valid when two variables follow the same PDF. Unfortunately, this

assumption is not valid in most cases of BFA. For example, the PDF for the mean rainfall intensity of

a rainfall event is different from that for rainfall duration. The former is generally modeled by the

log-normal distribution, but the exponential distribution is used for the latter [18]. In this case, the

copula can be a sound option [19] and many examples where the copula is applied can also be found

in hydrology and water resources [7,12,18,20]. De Michele et al. [12] analyzed the flood peak and

flood volume data from Northern Italy using bivariate copulas to check adequacy of dam spillway.

Zhang and Singh [18] derived bivariate distribution of rainfall variables using copula and tested the

applicability on the data from the Amite river basin in Louisiana, United States. Mirabbasi et al. [7]

applied bivariate copulas on drought duration and severity data from Iran and conducted drought

frequency analysis.

The objective of this study is to provide a guideline about the selection of two variables for the

BFA of rainfall events. ‘Rainfall event’ is a basic unit for characterizing storm. In the case of

‘independent rainfall event’ is defined as a rainfall event which is not correlated with any other

rainfall events. It is possible to define interdependent rainfall event by separating rainfall events

with correlation time, which means the minimum time required to make rainfall event independent.

This correlation time is also referred to Inter-event Time Definition (IETD).

In Korea, 10 hours is generally applied as the IETD [21,22]. However, this study considered

various IETD based on frequency of independent rainfall event to determine reasonable IETD. As a

result, the 12 hours of IETD was effective to separate independent rainfall event. This IETD was

also considered in this study to separate the independent rainfall events. Before separating the

rainfall event, a threshold value was applied to distinguish the rain/no rain condition. In this study,

1 mm/hour was considered for this purpose.

It is generally accepted that the structure of a rainfall event can be simplified with the

rectangular pulse model [23–27]. In the rectangular pulse model, a rainfall event is quantified by

three components; the rainfall duration, the total rainfall depth and mean rainfall intensity. Here,

the mean rainfall intensity can be calculated by the total rainfall depth divided by the rainfall

duration. Thus, any two components can fully define the rainfall event and one component must be

redundant (Figure 1).

(a) (b)

Figure 1. Simplification of an observed rainfall event (a) by a rectangular pulse (b).

However, a way to properly select the two components of a rainfall event has not been

established yet and the way that the random selection of two variables may affect the results of the

Figure 1. Simplification of an observed rainfall event (a) by a rectangular pulse (b).

However, a way to properly select the two components of a rainfall event has not been establishedyet and the way that the random selection of two variables may affect the results of the BFA hasnot been confirmed. Therefore, many researchers had performed the BFA by selecting two variablesthrough their subjective decision [13,20,28,29]. For example, Fu and Butler [20] selected the rainfallduration and the total rainfall depth of a rainfall event to perform the BFA. Jain et al. [29] selected the

Water 2019, 11, 905 3 of 18

rainfall duration and the mean rainfall intensity for the same BFA. Park et al. [13] used the total rainfalldepth and the mean rainfall intensity for the BFA.

In fact, the authors assume that the selection issue of the two variables for the BFA is closelyrelated with the multicollinearity issue in regression analysis. The dependency among predictors,which is generally called interdependency, deteriorates the quality of estimate result [30]. Thus, the twovariables should be selected to minimize this problem. This manuscript is composed of a total offive sections including introduction and conclusions. Theoretical background of the multicollinearityproblem and the BFA using copula is summarized in Section 2. As an application, the annual maximumrainfall events observed in Seoul, Korea are analyzed in Sections 3 and 4. Finally, the conclusions anddiscussion are given in Section 5.

2. Theoretical Background

2.1. Multicollinearity

2.1.1. Multicollinearity Problem in Regression Analysis

Regression analysis is conducted to find a relation between two or more variables, of which atleast one is subject to random variation. For example, when a linear model is concerned between thepredictors X and response Y, the least square method leads to the estimates

β̂ = (XTX)−1

XTY (1)

and the variance-covariance matrix of the estimates is given as follows [31]

var(β̂) = σ2(XTX)−1

(2)

Multicollinearity is assumed here as an interdependency condition in X. Multicollinearity can thusbe defined as a lack of independence or a presence of interdependence. It is generally evaluated by thecorrelation matrix (XTX). As interdependence among predictors grows, the correlation matrix (XTX)

approaches singularity and elements of the inverse matrix (XTX) become infinite. As an extremecase, when perfect linear dependence within a predictor set is secured, a perfect singularity on (XTX)

is resulted and lead a completely indeterminate set of parameter estimates β̂. As elements of theinverse matrix (XTX) becomes infinite, variances of regression coefficients also become infinite and thevariance can be allocated completely arbitrarily among predictors [32,33]. Similarly, the large varianceson regression coefficients produced by interdependent variables can lead the low quality of resultingparameter estimates [33,34]. It becomes almost impossible to distinguish the independent contributionof a predictor to the variance of the response.

In the process of analyzing several variables, the multicollinearity is also an important factorthat distorts the result. Highly correlated variables are the main cause of the multicollinearityproblem. It is known that the effect of the multicollinearity distortion increases with the increasingof the degree of correlation [32,35–37]. If highly correlated variables are included in the multilinearregression equation, the predicted result can be distorted due to the multicollinearity problem of thesevariables. This problem is generally evaluated through a quantification of the variance inflation froma variable [38–40]. The variance inflation factor is defined as a measure to quantify the level of themulticollinearity in a multiple regression analysis. The variance inflation factor can be calculatedas follows:

VIFi =1

1−R2i

(3)

where Ri is the coefficient of determination that indicates how much the variable can be explainedby other variables. Commonly, a variable with a variance factor that is higher than four will cause amulticollinearity problem [41–43].

Water 2019, 11, 905 4 of 18

2.1.2. Possible Multicollinearity Issue in Frequency Analysis

Frequency analysis is a statistical method used to predict how often certain values of a variablephenomenon occurs. In hydrology, it is a tool for determining design rainfalls and/or design discharges,which are used as input for designing hydraulic structures. It is also possible to determine the returnperiod of a flood event by analyzing the rainfall or discharge time series data.

The connection between the multicollinearity and the frequency analysis can be found in theso-called rainfall intensity formula. In fact, the rainfall intensity formula is derived for estimating therainfall intensity under the given return period and rainfall duration. The general form of the rainfallintensity formula is as follows [44]:

I =a + b log T

Dn + c(4)

where I is the rainfall intensity (mm/hour), D is the rainfall duration (hour), T is the return period(year), and a, b, c and n are the parameters representing the climate of the study area.

The relation between the return period and rainfall intensity can easily be found in the rainfallintensity formula. First, this equation is solved for T as follows:

b log T = I × (Dn + c) − a (5)

orlog T = α+ β× I (6)

where α and β are constants. Above equation says that log(T) can be expressed by a linear function of Iin case that the rainfall duration D is fixed.

It is also possible to derive a similar equation for the total rainfall depth R = I ×D. That is,

b log T = I × (Dn + c) − a = I × (DDn−1 + c) − a = R×Dn−1 + c× I − a (7)

Under the condition that the rainfall intensity I and rainfall duration D are given, the followinglinear relation can be derived:

log T = α+ β×R (8)

However, this linear relation assumption is somewhat excessive as the total rainfall depth Ris generally proportional to the rainfall duration D. Log(T) may increase rather exponentially as Rincreases. The relation between the log(T) and D is also similar with that of log(T) and R. As can befound in Equation (4), their relation is not linear.

However, at this moment, the authors assume that their relations are all linear. The authorsalso realize that this linear assumption is somewhat immoderate or excessive, but we believe thislinearization assumption can be used effectively for explaining the possible existence of multicollinearityissue in the multivariate frequency analysis of rainfall events. By adding all these possible relations,the authors assumed that log(T) is expressed as a multiple linear function of I, D and R.

log T = α+ β1 × I + β2 ×D + β3 ×R (9)

It is thus obvious that the possible interdependence among three predictors can cause a significantmulticollinearity problem even in the frequency analysis. That is, the multicollinearity can cause thesame problem as in the regression analysis and hinder the effective estimation of design rainfall or thereturn period of a given rainfall event based on frequency analysis.

2.2. Copula

In this study, bivariate frequency analysis was conducted by applying copula. Sklar [19] introducedthe copula as a tool to derive joint probability distribution (PDF) from several marginal PDFs. Any kind

Water 2019, 11, 905 5 of 18

of joint PDF can be derived by applying copula on various marginal PDFs. More information aboutcopula is summarized in [45–47].

This study considered Clayton copula [48], Frank copula [49], Gumbel-Hougaard copula [50], andGaussian copula [51] for the analysis. Among these copula, Clayton, Frank and Gumbel-Hougaardare classified as Archimedean copula family and Gaussian is in the class of elliptical copula family.These copulas are known for easy application and have many references in hydrology area [7,12,18,20].Only one parameter, which is referred as θ, is required to derive the joint PDF of copula considered inthis study.

Parameter θ for copula can be estimated by calculating Kendall’s τ. Kendall’s τ is the statisticindicating the rank correlation between two target variables. The equation to calculate Kendall’s τ canbe expressed as Equation (10).

Kendall′s τ =In − Jn

nC2(10)

In Equation (10), a total number of target data is n. If two target data are defined as (Xi, Yi)

and (X j, Y j), In is the number of cases that (Xi −X j)(Yi −Y j) > 0 and Jn is the number of cases that(Xi −X j)(Yi −Y j) < 0. nC2 is the 2-combination from a given set n.

Each copula has different relationship between Kendall’s τ and parameter θ. For Clayton copula,the relationship Kendall’s τ and parameter θ is rather simply defined as τ = θ

θ+2 . Gumbel-Hougaardcopula also has simple relationship as τ = 1− 1

θ . In the case of Gaussian copula, the relationship isderived as τ = 2

πarcsinθ. The relationship for Frank copula is far complicated. It is defined as follows:

τ = 1−4θ+ 4

D1(θ)

θ(11)

where D1(θ) can be calculated by∫ θ

0t

et−1 dt.There is a valid range of parameter θ for each copula and only the parameter θ in the valid

range can be applied for the analysis. The parameter θ for the Clayton copula must be defined in[0, ∞). The valid range for the Gumbel-Hougaard copula is [1, ∞). In the case of Gaussian copula,the parameter θ is valid within [−1, 1]. Frank copula can be applied with any parameter θ except forthe zero.

Equations (12)–(15) are the joint PDFs of the copula considered in this study. Respectively,Equations (12)–(15) are for Clayton, Frank, Gumbel-Hougaard and Gaussian. Here, µ1 and µ2 are thecumulative probabilities of the bivariate data. These values can be calculated using the marginal PDF.In the case of Φ−1, it is the inverse function of the standard normal distribution.

CClayton(u1, u2) = (u−θ1 + u−θ2 − 1)−1/θ

(12)

CFrank(u1, u2) = −1θ

ln[1 +(exp(−θu1) − 1)(exp(−θu2) − 1)

exp(−θ) − 1] (13)

CGumbel−Houhaard(u1, u2) = exp(−((− ln(u1))θ + (− ln(u2))

θ )1/θ

) (14)

CGaussian(u1, u2) =

∫ Φ−1(u1)

−∞

∫ Φ−1(u2)

−∞

1

2π√

1− θ2exp(−

s2 + w2− 2swθ

2(1− θ2))dsdw (15)

Among four copulas, this study selected an optimal one for the analysis. Since there are threepairs of bivariate data, three optimal copulas had to be determined. This study estimated the Akaikeinformation criterion (AIC) to select best copula for analysis. Theoretically, the AIC may not be appliedto test a goodness-of-fit of model, however, this statistic can be used to determine the optimal modelamong many other admissible models available [18].

Generally, the maximized likelihood statistic is required to calculate the AIC. Alternatively, themean square error (MSE) between the cumulative probability from copula and that from empirical

Water 2019, 11, 905 6 of 18

equation can be used to estimate the AIC [52,53]. This study calculated the MSE to estimate the AIC ofeach copula. For empirical cumulative probability, the Gringorten plotting position formula [54] wasused. Then, the MSE can be calculated by the mean of sum of squared error. Finally, the AIC can beestimated using the following Equation (16).

AIC = n log(MSE) + 2(PAR) (16)

In Equation (16), n is the number of data, and PAR is the number of fitted parameter. The AICcalculated by Equation (16) is the penalty statistic. That is, the model with lower AIC can be consideredas better model for given data. Therefore, this study determined the optimal copula by selecting thecopula with the lowest AIC.

In this study, the optimal copula determined by the lowest AIC was additionally tested with theKolmogorov-Smirnov (K-S) statistic. The K-S statistic is equal to the maximum absolute differencebetween the cumulative probability from copula and that from empirical equation [55]. If the calculatedK-S statistic is lower than the critical value of the K-S test, the goodness-of-fit of the copula is guaranteed.For the critical value of the K-S test, the significance level of 5% was considered.

Exceedance probability of bivariate data can be calculated with the determined optimal copula.Finally, the return period of bivariate data can then be estimated using the calculated exceedanceprobability. This study considered three types of return period [12,56]. The type of return period isdefined based on the method used to calculate the exceedance probability. First, Equation (17) showsthe return period of an OR case. An OR case return period is calculated with the probability that eitherone or two variables exceed the threshold.

Tor =1

1−C(FX(xi), FY(yi))(17)

where C(FX(xi), FY(yi)) is the joint cumulative probability calculated by the copula. Next, an ANDcase of return period is defined as Equation (18). In this case, the exceedance probability is calculatedwith the probability that both two variables exceed the threshold.

Tand =1

1− FX(xi) − FY(yi) + C(FX(xi), FY(yi))(18)

In Equation (18), FX(xi) and FY(yi) are the cumulative probabilities calculated by their marginalprobability distributions. Finally, Equation (19) is for the COND case of the return period. This returnperiod is based on the event of X > x given Y = y.

TX|Y =1

1− ∂∂y C(FX(xi), FY(yi))

(19)

In this study, the bivariate frequency analysis was conducted to calculate an AND case returnperiod. In addition, the return period for OR and COND cases were considered to validate the resultsof the bivariate frequency analysis.

3. Annual Maximum Rainfall Events in Seoul, Korea

In this study, the annual maximum rainfall events of Seoul from 1961 to 2010 were secured forbivariate frequency analysis [57]. Freund’s bivariate exponential mixture distribution was appliedto select the annual maximum rainfall events. In this case, the total rainfall depth and meanrainfall intensity were used for frequency analysis. The parameters of Freund’s bivariate exponentialdistribution were estimated annually. It is generally known that the annually estimated parameters arebetter to consider the different rainfall characteristics of wet and dry years [58]. Since the number ofindependent rainfall events is more than 30 every year, there was no serious problem in estimating the

Water 2019, 11, 905 7 of 18

parameters annually. Based on frequency analysis result, annual maximum rainfall event with thehighest return period for each year was determined.

The basic statistics of the rainfall duration, total rainfall depth, and mean rainfall intensity of therainfall events are summarized in Table 1. The mean of the rainfall duration was 21.7 h and that ofthe total rainfall depth was 172.1 mm. For the mean rainfall intensity, the mean was calculated to be12.3 mm/hour. In the case of the standard deviations, they were calculated to be 19.0 h, 102.9 mm,7.7 mm/hour for the rainfall duration, total rainfall depth, and mean rainfall intensity, respectively.The ranges of the rainfall duration, total rainfall depth, and mean rainfall intensity were 2 to 94 h,39.4 mm to 446.0 mm and 2.9 mm/hour to 32.5 mm/hour, respectively.

Table 1. The means, standard deviations, and ranges of three components of the annual maximumrainfall events observed in Seoul, Korea.

Mean Standard Deviation Range

Rainfall duration (hour) 21.7 19.0 2.0–94.0Total rainfall depth (mm) 172.1 102.9 39.4–446.0

Mean rainfall intensity (mm/hour) 12.3 7.7 2.9–32.5

Before conducting statistical analysis including frequency analysis, it is essential to check thehomogeneity and stationarity of the data. In this study, the Bartlett’s test has been conducted toinvestigate the homogeneity of the components of independent rainfall events. Further, the authorsfound that the null hypothesis of homogeneity was rejected with p-value less than 0.001. For thisreason, t-test in this study was conducted under unequal variances. Moreover, the Box-Pierce andLjung-Box test were applied to examine the null hypothesis of independence among rainfall events.The p-values for both Box-Pierce and Ljung-Box test were all estimated to be higher than 0.05 at lags upto 10. Based on this result, the authors could assume the independence of the rainfall events consideredin this study. Finally, in the case of stationarity, the Augmented Dickey-Fuller test was applied on eachcomponent of the independent rainfall events. As a result, all of three components were figured out tobe stationary with the p-value less than 0.05. The bivariate frequency analysis in this study could thusbe proceeded under assumption of stationarity.

The return period of each annual maximum rainfall event was calculated using the rainfallintensity formula (RIF) suggested by Bernard [59].

I =KTx

Db(20)

where I is the rainfall intensity (mm/hour), D is the rainfall duration (hour), T is the return period(year), and K, x and b are the parameters representing the climate of the study area. The parameters K, xand b were determined as 8.0, 0.5 and 0.5, respectively, to agree with the statistical characteristics of theobserved annual maximum rainfall events observed in Seoul, Korea [57]. With these parameters, thereturn period T for each annual maximum rainfall event could be calculated by solving Equation (20).

Figure 2 shows the basic components of annual maximum rainfall events and the calculatedreturn periods. The longest return period was calculated to be 129.7 years for the rainfall eventobserved in 1972, whose duration was 24.0 h and total rainfall depth 446.0 mm (i.e., mean rainfallintensity 18.6 mm/hour). Among 50 annual maximum rainfall events considered in this study, 4 eventswere found to have return periods less than 10 years (i.e., T < 10), 30 events less than 30 years (i.e.,10 < T < 30), 8 events less than 50 years (i.e., 30 < T < 50), and the remaining 8 events longer than50 years (i.e., T > 50).

Water 2019, 11, 905 8 of 18

Water 2019, 11, 905 8 of 20

study, 4 events were found to have return periods less than 10 years (i.e., T < 10), 30 events less than

30 years (i.e., 10 < T < 30), 8 events less than 50 years (i.e., 30 < T < 50), and the remaining 8 events

longer than 50 years (i.e., T > 50).

Figure 2. Variation of the basic components of the annual rainfall events observed in Seoul Korea and

their return periods.

4. Evaluation of Multicollinearity Problem with Observed Data

4.1. Results of Bivariate Frequency Analysis

First, the Kendall’s was calculated to estimate the parameter of the copula. The Kendall’s

for the total rainfall depth and rainfall duration was 0.57 and that for the mean rainfall intensity and

the rainfall duration was −0.61. In the case of the total rainfall depth and the mean rainfall intensity,

Kendall’s was −0.18. As expected, a positive correlation was estimated between the total rainfall

depth and the rainfall duration and a negative correlation was estimated between the mean rainfall

intensity and the rainfall duration. These three Kendall’s in this study were also found to be

statistically significant based on the p-value of the t-test. The p-value for total rainfall depth and

rainfall duration was estimated to be lower than 0.001. It was 0.0020 for the mean rainfall intensity

Figure 2. Variation of the basic components of the annual rainfall events observed in Seoul Korea andtheir return periods.

4. Evaluation of Multicollinearity Problem with Observed Data

4.1. Results of Bivariate Frequency Analysis

First, the Kendall’s τ was calculated to estimate the parameter of the copula. The Kendall’s τfor the total rainfall depth and rainfall duration was 0.57 and that for the mean rainfall intensityand the rainfall duration was −0.61. In the case of the total rainfall depth and the mean rainfallintensity, Kendall’s τ was −0.18. As expected, a positive correlation was estimated between the totalrainfall depth and the rainfall duration and a negative correlation was estimated between the meanrainfall intensity and the rainfall duration. These three Kendall’s τ in this study were also found tobe statistically significant based on the p-value of the t-test. The p-value for total rainfall depth andrainfall duration was estimated to be lower than 0.001. It was 0.0020 for the mean rainfall intensityand the rainfall duration, and it was also lower than 0.001 for the total rainfall depth and the meanrainfall intensity.

Water 2019, 11, 905 9 of 18

Table 2 shows the parameter θ estimated using the relationship between the parameter θ and theKendall’s τ. In this table, NA represents the invalid estimate of the parameter θ because it is beyondthe valid range. Especially, the valid range of the parameter θ for the Clayton and Gumbel-Hougaardmodels is θ ≥ 1.

Table 2. The parameter θ estimated for each data pair with different copulas.

Copula Model Total Rainfall Depthand Rainfall Duration

Mean Rainfall Intensityand Rainfall Duration

Total Rainfall Depth andMean Rainfall Intensity

Clayton 2.65 NA NAFrank 7.17 −1.19 −1.02

Gumbel-Hougaard 2.32 NA NAGaussian 0.78 −0.82 −0.28

This study determined the PDF for each variable of the annual maximum rainfall event to conductthe BFA. As a result, the exponential distribution, the Gumbel distribution and the gamma distributionwere chosen for the rainfall duration, total rainfall depth and mean rainfall intensity, respectively;these are frequently used in the modeling of rainfall events [60–62]. Each probability distributionwas goodness-of-fit tested by the Kolmogorov-Smironov test computed via a Monte-Carlo procedure.The p-value for the rainfall duration with exponential distribution was calculated as 0.5216, and thatfor the total rainfall depth with Gumbel distribution was 0.7800. Finally, the p-value for the meanrainfall intensity with gamma distribution was 0.7023. The high p-values for these three probabilitydistributions indicate that they cannot be rejected at 0.05 significance level.

Before the optimal copula is determined, this study conducted the goodness-of-fit test using thefunction of “gofCopula()” provided by the R package “copula” [63]. The p-values for the three cases ofbivariate analysis could be calculated as Table 3. As can be seen in this table, the p-values of all copulasin this study were high enough to confirm the goodness-of-fit. That is, all copulas were found to bestatistically significant.

Table 3. P-values calculated for three bivariate data pairs of the annual maximum rainfall eventsobserved in Seoul, Korea.

Case Clayton Frank Gumbel-Hougaard Gaussian

Total rainfall depth andrainfall duration 0.4461 0.6009 0.5120 0.9306

Mean rainfall intensity andrainfall duration NA 0.3841 NA 0.6199

Total rainfall depth andmean rainfall intensity NA 0.3012 NA 0.3412

Next, this study estimated MSE and the AIC to select optimal copula for each bivariate data pair.Estimated MSE and AIC are summarized in Table 4. The optimal copula was determined as the copulawith the lowest MSE and AIC. Therefore, the optimal copula for the total rainfall depth and the rainfallduration was found to be Clayton copula. In the case of the mean rainfall intensity and the rainfallduration, the Gaussian copula was selected as the optimal copula. For the total rainfall depth and themean rainfall intensity, the Frank copula was chosen.

Figure 3 is the scatter plot indicating the cumulative probability calculated by the Gringortenplotting position formula and the optimal copula. In Figure 3, dotted lines represent the critical valueto examine the goodness-of-fit of copula. In this study, the critical value was calculated to be 0.192for n = 50 and 5% of the significance level. All the points from three bivariate data pairs found to bewithin the upper and lower limit of K-S test. Therefore, all the optimal copula in this study can beregarded as verified one under K-S test.

Water 2019, 11, 905 10 of 18

Table 4. The mean square errors (MSEs) and Akaike information criterions (AICs) calculated for threebivariate data pairs of the annual maximum rainfall events observed in Seoul, Korea.

Case Clayton Frank Gumbel-Hougaard Gaussian

MSETotal rainfall depth and rainfall duration 0.000352 0.000916 0.0216 0.000881

Mean rainfall intensity and rainfall duration NA 0.00449 NA 0.00161Total rainfall depth and mean rainfall intensity NA 0.00132 NA 0.00168

AICTotal rainfall depth and rainfall duration −156.206 −149.911 −81.323 −150.757

Mean rainfall intensity and rainfall duration NA −115.382 NA −137.620Total rainfall depth and mean rainfall intensity NA −141.971 NA −136.681

Water 2019, 11, 905 10 of 20

Table 4. The mean square errors (MSEs) and Akaike information criterions (AICs) calculated for

three bivariate data pairs of the annual maximum rainfall events observed in Seoul, Korea.

Case Clayton Frank Gumbel-

Hougaard Gaussian

MSE

Total rainfall depth and

rainfall duration 0.000352 0.000916 0.0216 0.000881

Mean rainfall intensity

and rainfall duration NA 0.00449 NA 0.00161

Total rainfall depth and

mean rainfall intensity NA 0.00132 NA 0.00168

AIC

Total rainfall depth and

rainfall duration −156.206 −149.911 −81.323 −150.757

Mean rainfall intensity

and rainfall duration NA −115.382 NA −137.620

Total rainfall depth and

mean rainfall intensity NA −141.971 NA −136.681

Figure 3 is the scatter plot indicating the cumulative probability calculated by the Gringorten

plotting position formula and the optimal copula. In Figure 3, dotted lines represent the critical

value to examine the goodness-of-fit of copula. In this study, the critical value was calculated to be

0.192 for n = 50 and 5% of the significance level. All the points from three bivariate data pairs found

to be within the upper and lower limit of K-S test. Therefore, all the optimal copula in this study can

be regarded as verified one under K-S test.

(a)

Water 2019, 11, 905 11 of 20

(b)

(c)

Figure 3. K-plot and Kolmogorov-Smirnov (K-S) test criteria of three optimal copula models for the

annual maximum rainfall events observed in Seoul, Korea. (a) Clayton copula model; (b) Gaussian

copula model; (c) Frank copula model.

The BFA was then conducted with the optimal copula model that was selected for each

bivariate data of the annual maximum rainfall events. As a result, the AND return period andT was

estimated for each rainfall event, which was then compared with the return period calculated by the

RIF (Equation (20)). In Figure 4, the return periods estimated using the BFA are presented by contour

lines and those calculated using the RIF by solid circles. In addition, for easier comparison, different

darkness was applied to fill the circle. For example, white color was applied to those rainfall events

whose return periods were calculated to be less than 10 years by the RIF (i.e., T < 10). The light grey

was applied to those rainfall events whose return periods were less than 30 years (i.e., 10 < T < 30)

and the dark grey was applied to those rainfall events whose return periods were less than 50 years

(i.e., 30 < T < 50). Finally, black color was applied to those rainfall event whose return period was

calculated to be longer than 50 years (i.e., T > 50).

Figure 3. K-plot and Kolmogorov-Smirnov (K-S) test criteria of three optimal copula models for theannual maximum rainfall events observed in Seoul, Korea. (a) Clayton copula model; (b) Gaussiancopula model; (c) Frank copula model.

Water 2019, 11, 905 11 of 18

The BFA was then conducted with the optimal copula model that was selected for each bivariatedata of the annual maximum rainfall events. As a result, the AND return period Tand was estimatedfor each rainfall event, which was then compared with the return period calculated by the RIF(Equation (20)). In Figure 4, the return periods estimated using the BFA are presented by contourlines and those calculated using the RIF by solid circles. In addition, for easier comparison, differentdarkness was applied to fill the circle. For example, white color was applied to those rainfall eventswhose return periods were calculated to be less than 10 years by the RIF (i.e., T < 10). The light greywas applied to those rainfall events whose return periods were less than 30 years (i.e., 10 < T < 30)and the dark grey was applied to those rainfall events whose return periods were less than 50 years(i.e., 30 < T < 50). Finally, black color was applied to those rainfall event whose return period wascalculated to be longer than 50 years (i.e., T > 50).Water 2019, 11, 905 12 of 20

(a)

(b)

(c)

Figure 4. Comparison of return periods estimated by the bivariate frequency analysis (BFA) and

those calculated by the rainfall intensity formula (RIF). (a) Total rainfall depth and rainfall duration;

(b) Mean rainfall intensity and rainfall duration; (c) Total rainfall depth and mean rainfall intensity.

Figure 4. Comparison of return periods estimated by the bivariate frequency analysis (BFA) and thosecalculated by the rainfall intensity formula (RIF). (a) Total rainfall depth and rainfall duration; (b) Meanrainfall intensity and rainfall duration; (c) Total rainfall depth and mean rainfall intensity.

Water 2019, 11, 905 12 of 18

Figure 4 shows the way that the return period of a rainfall event can be very different from thatwhich has been estimated from the BFA. As the definitions of the two return periods in this figure arenot identical, the results of frequency analysis can also be different [30,64]. Still, the authors expectedto find a certain extent of consistency between the return period calculated by the RIF and the returnperiod estimated by the BFA. However, this expectation was not fulfilled in the case where the totalrainfall depth and the rainfall duration were considered as the two variables, as can be seen Figure 4a.The return periods of the rainfall events with the same return period calculated by the RIF could betotally different. That is, there are so many cases where the return period estimated by BFA is totallydifferent from the return period calculated by the RIF. This finding indicates that the result of the BFAfor the total rainfall depth and the rainfall duration does not reflect the return period calculated bythe RIF.

The return period calculated by the BFA in Figure 4b,c is somewhat proportional to the returnperiod calculated by the RIF. This is more vivid in Figure 4c for which the total rainfall depth and themean rainfall intensity were considered. This trend is still valid, even though it is not strong, for thecase where the mean rainfall intensity and the rainfall duration were considered (Figure 4b). Obviously,when the rainfall duration is involved, the proportional trend between the return period calculated bythe RIF and the return period estimated by the BFA becomes weak.

4.2. Effect of Multicollinearity on the Estimated Return Periods

The authors assumed that the results of the frequency analysis (i.e., the return period) for the samerainfall event should be the same regardless of the selected two variables (out of three). For example,this study considered a rainfall event with its duration of 2 h and rainfall intensity of 50 mm/hour(i.e., the total rainfall depth is 100 mm in 2 h). Among three variables, the rainfall intensity, rainfallduration and total rainfall depth, one is redundant as it is calculated by the other two. Now, we canselect any two variables for the bivariate frequency analysis. Regardless of the two variables selected,it is expected to obtain a similar result of bivariate frequency analysis (i.e., the return period) for thegiven rainfall event.

To confirm this assumption, scatter plots were made to compare the return period calculated byRIF and the return period estimated by the BFA (Figure 5). In this figure, the x-axis represents thereturn period calculated by the RIF on a linear scale and the y-axis represents the resulting returnperiod estimated by the BFA on a logarithmic scale. The authors tested all the linear, semi-log andlog-log graphs and found that the semi-log was the best to search a possible relation between thetwo variables.

Water 2019, 11, 905 13 of 20

Figure 4 shows the way that the return period of a rainfall event can be very different from that

which has been estimated from the BFA. As the definitions of the two return periods in this figure

are not identical, the results of frequency analysis can also be different [30,64]. Still, the authors

expected to find a certain extent of consistency between the return period calculated by the RIF and

the return period estimated by the BFA. However, this expectation was not fulfilled in the case

where the total rainfall depth and the rainfall duration were considered as the two variables, as can

be seen Figure 4a. The return periods of the rainfall events with the same return period calculated by

the RIF could be totally different. That is, there are so many cases where the return period estimated

by BFA is totally different from the return period calculated by the RIF. This finding indicates that

the result of the BFA for the total rainfall depth and the rainfall duration does not reflect the return

period calculated by the RIF.

The return period calculated by the BFA in Figure 4b,c is somewhat proportional to the return

period calculated by the RIF. This is more vivid in Figure 4c for which the total rainfall depth and the

mean rainfall intensity were considered. This trend is still valid, even though it is not strong, for the

case where the mean rainfall intensity and the rainfall duration were considered (Figure 4b).

Obviously, when the rainfall duration is involved, the proportional trend between the return period

calculated by the RIF and the return period estimated by the BFA becomes weak.

4.2. Effect of Multicollinearity on the Estimated Return Periods

The authors assumed that the results of the frequency analysis (i.e., the return period) for the

same rainfall event should be the same regardless of the selected two variables (out of three). For

example, this study considered a rainfall event with its duration of 2 hours and rainfall intensity of

50 mm/hour (i.e., the total rainfall depth is 100 mm in 2 hours). Among three variables, the rainfall

intensity, rainfall duration and total rainfall depth, one is redundant as it is calculated by the other

two. Now, we can select any two variables for the bivariate frequency analysis. Regardless of the two

variables selected, it is expected to obtain a similar result of bivariate frequency analysis (i.e., the

return period) for the given rainfall event.

To confirm this assumption, scatter plots were made to compare the return period calculated by

RIF and the return period estimated by the BFA (Figure 5). In this figure, the x-axis represents the

return period calculated by the RIF on a linear scale and the y-axis represents the resulting return

period estimated by the BFA on a logarithmic scale. The authors tested all the linear, semi-log and

log-log graphs and found that the semi-log was the best to search a possible relation between the

two variables.

(a)

Figure 5. Cont.

Water 2019, 11, 905 13 of 18Water 2019, 11, 905 14 of 20

(b)

(c)

Figure 5. Scatter plots of return periods estimated by the BFA and those calculated by the RIF. (a)

Total rainfall depth and rainfall duration; (b) Mean rainfall intensity and rainfall duration; (c) Total

rainfall depth and mean rainfall intensity.

When the three scatter plots were compared, firstly it was found that the three results of the

BFA are not consistent at all. The overall result is very dependent on the two variables that are

considered in the frequency analysis. Even though the two variables were selected from the same

rainfall event data, the results of the frequency analysis are totally different from each other. A closer

look at this figure shows that, as can be seen in Figure 5a, the return periods that were estimated

from the analysis of the total rainfall depth and the rainfall duration do not show any obvious

increasing trends with respect to the return period. That is, the return period calculated by the RIF

did not affect the return period. A rather consistent increasing trend was, however, found in Figure

5b,c. Between the cases that are shown in Figure 5b,c, Figure 5c more effectively shows the way that

the return period estimated by the BFA is more similar with the return period calculated by the RIF

and that the range of the return period estimated by the BFA is also much smaller than that of Figure

5b.

The above findings are also summarized in Table 5 for a comparison. As shown in Table 5, the

means of the three cases are very different from one another. The mean for the case where the total

rainfall depth and the mean rainfall intensity were considered is the lowest at approximately 31.1

Figure 5. Scatter plots of return periods estimated by the BFA and those calculated by the RIF. (a) Totalrainfall depth and rainfall duration; (b) Mean rainfall intensity and rainfall duration; (c) Total rainfalldepth and mean rainfall intensity.

When the three scatter plots were compared, firstly it was found that the three results of the BFAare not consistent at all. The overall result is very dependent on the two variables that are consideredin the frequency analysis. Even though the two variables were selected from the same rainfall eventdata, the results of the frequency analysis are totally different from each other. A closer look at thisfigure shows that, as can be seen in Figure 5a, the return periods that were estimated from the analysisof the total rainfall depth and the rainfall duration do not show any obvious increasing trends withrespect to the return period. That is, the return period calculated by the RIF did not affect the returnperiod. A rather consistent increasing trend was, however, found in Figure 5b,c. Between the cases thatare shown in Figure 5b,c, Figure 5c more effectively shows the way that the return period estimated bythe BFA is more similar with the return period calculated by the RIF and that the range of the returnperiod estimated by the BFA is also much smaller than that of Figure 5b.

The above findings are also summarized in Table 5 for a comparison. As shown in Table 5,the means of the three cases are very different from one another. The mean for the case where the totalrainfall depth and the mean rainfall intensity were considered is the lowest at approximately 31.1 years,

Water 2019, 11, 905 14 of 18

but it is rather high at approximately 34.1 years for the case for which the total rainfall depth andthe rainfall duration were considered. The result is extreme for the case for which the mean rainfallintensity and the rainfall duration were considered, which shows that the mean return period is higherthan 100 years. Even in the case for which the total rainfall depth and the mean rainfall intensity wereconsidered, the range of the estimated return periods is rather absurd.

Table 5. The mean and range of the return period (years) for annual maximum rainfall events observedin Seoul, Korea.

Case Mean Range

Total rainfall depth and rainfall duration 34.1 1.1–1105.0Mean rainfall intensity and rainfall duration 118.2 2.1–1289.4

Total rainfall depth and mean rainfall intensity 31.1 1.3–629.6

In fact, above results could be expected by comparing the variance inflation factors estimated forthe three components that are considered in this study. The variance inflation factor of the rainfallduration was estimated as 4.87 and that of the total rainfall depth was estimated as 3.65. In the caseof the mean rainfall intensity, the variance inflation factor is 1.81. As a result, the rainfall duration ishighly vulnerable to the multicollinearity problem because its variance inflation factor is higher thanfour. It could also be concluded that the BFA for which the total rainfall depth and the mean rainfallintensity are used is more reasonable than the use of the bivariate data including the rainfall duration.In case of considering the rainfall duration, high uncertainty about the estimation of return periodcannot be avoided.

However, a note of caution is due here. In fact, several ties (i.e., repeated values) are observed forthe rainfall duration. As discussed in [65], on the one hand, an issue of model identifiability may arise(both considering the marginals and the copula at play), and on the other hand, the assessment of thereturn periods may be affected. Several randomization techniques have been outlined in [66,67] inorder to deal with ties, but in the present work we pass over the issue.

The explanation in the previous paragraph could also be confirmed by the regression analysisbased on Equation (9). First, the regression equation derived was as follows:

log T = −0.74 + 0.0031D + 0.0045R + 0.068I (21)

The significance of the above equation was found to be very high (p-value < 2× 10−16). However,the significance of those predictors varied. The p-values for I and R were found to be much smallerthan that of D. This result indicates that there is some problem in relating the return period with allthose three predictors.

The authors thus considered the multicollinearity issue to decrease the number of predictors.One variable, the rainfall duration, was excluded in this step as its variance inflation factor was veryhigh to be 4.87. In general, the variable with its variance inflation factor higher than 4 is excluded inthe multivariate regression analysis [43–45]. Now the regression equation with two predictors, thetotal rainfall depth R and the mean rainfall intensity I, was derived again.

log T = −0.68 + 0.0049R + 0.064I (22)

The significance for above Equation (22) was still very high (p-value < 2× 10−16). Furthermore,this equation did not contain any variable statistically insignificant. This result indicates thatthe multicollinearity issue was removed. Furthermore, these results indicate that it is importantin the multivariate frequency analysis of rainfall events to consider the multicollinearity issue.This multicollinearity among predictors can hinder the effective estimation of the return period of agiven rainfall event.

Water 2019, 11, 905 15 of 18

Additionally, the choice of the method to calculate return period has a significant influence on theoutcome [68]. For this reason, the OR case return period and the COND case return period were alsoestimated using total rainfall depth and mean rainfall intensity.

Figure 6 shows the estimated OR case return period and conditional return period with the returnperiod calculated by the RIF. In the case of OR case return period, there were many results overlappedby other results. Still, there was a consistency between the OR case return periods calculated by BFAand those calculated by the RIF. There was also a similar tendency of results for the case of conditionalreturn period. Therefore, the results of OR case return period and conditional return period found tosupport the result of this study derived by considering multicollinearity.

Water 2019, 11, 905 16 of 20

Additionally, the choice of the method to calculate return period has a significant influence on

the outcome [68]. For this reason, the OR case return period and the COND case return period were

also estimated using total rainfall depth and mean rainfall intensity.

Figure 6 shows the estimated OR case return period and conditional return period with the

return period calculated by the RIF. In the case of OR case return period, there were many results

overlapped by other results. Still, there was a consistency between the OR case return periods

calculated by BFA and those calculated by the RIF. There was also a similar tendency of results for

the case of conditional return period. Therefore, the results of OR case return period and conditional

return period found to support the result of this study derived by considering multicollinearity.

(a) (b)

Figure 6. Scatter plots of OR case and Conditional return periods estimated by the BFA and those

calculated by the RIF. (a) OR case return period; (b) Conditional return period.

In the study from Kroll and Song [69], the regression model results with/without

multicollinearity issue have been compared based on the prediction ability. The variables with the

high variance inflation factor have been screened for the regression model. As a result. it was

concluded that the negative effect of multicollinearity could be magnified without screening, as the

size of sample (or degree of freedom) decreased. In this study, the authors suggest selecting the

variables having small variation inflation factor for the bivariate frequency analysis. Actually, this

conclusion about excluding variables with the multicollinearity problem well agrees with the

conclusion from Kroll and Song [69].

5. Conclusions

A rainfall event simplified by a rectangular pulse is defined by three components: the rainfall

duration, the total rainfall depth, and mean rainfall intensity. However, as the mean rainfall

intensity can be calculated by the total rainfall depth divided by the rainfall duration, any two

components can fully define the rainfall event (i.e., one component must be redundant). In this

study, the selection issue of the two variables for the BFA was discussed. This issue is possibly

associated with the multicollinearity in multivariate regression analysis. That is, high

interdependency among predictors cause the increase of the variance of the regression coefficient

and finally lead to the low quality of resulting estimate. To verify this assumption, the annual

maximum rainfall events collected in Seoul, Korea was analyzed as an example. The BFA was

repeated three times with different pairs of the two variables among the three components of the

rainfall events, i.e., rainfall duration, total rainfall depth, and mean rainfall intensity. Finally, the

return periods of the rainfall events were estimated and compared with those calculated by the RIF.

Figure 6. Scatter plots of OR case and Conditional return periods estimated by the BFA and thosecalculated by the RIF. (a) OR case return period; (b) Conditional return period.

In the study from Kroll and Song [69], the regression model results with/without multicollinearityissue have been compared based on the prediction ability. The variables with the high variance inflationfactor have been screened for the regression model. As a result. it was concluded that the negativeeffect of multicollinearity could be magnified without screening, as the size of sample (or degree offreedom) decreased. In this study, the authors suggest selecting the variables having small variationinflation factor for the bivariate frequency analysis. Actually, this conclusion about excluding variableswith the multicollinearity problem well agrees with the conclusion from Kroll and Song [69].

5. Conclusions

A rainfall event simplified by a rectangular pulse is defined by three components: the rainfallduration, the total rainfall depth, and mean rainfall intensity. However, as the mean rainfall intensitycan be calculated by the total rainfall depth divided by the rainfall duration, any two components canfully define the rainfall event (i.e., one component must be redundant). In this study, the selectionissue of the two variables for the BFA was discussed. This issue is possibly associated with themulticollinearity in multivariate regression analysis. That is, high interdependency among predictorscause the increase of the variance of the regression coefficient and finally lead to the low quality ofresulting estimate. To verify this assumption, the annual maximum rainfall events collected in Seoul,Korea was analyzed as an example. The BFA was repeated three times with different pairs of thetwo variables among the three components of the rainfall events, i.e., rainfall duration, total rainfalldepth, and mean rainfall intensity. Finally, the return periods of the rainfall events were estimated andcompared with those calculated by the RIF.

As a result of the BFA of the annual maximum rainfall events, the return period of a rainfallevent was estimated very differently for each event among the three cases that are considered in

Water 2019, 11, 905 16 of 18

this study. For example, return period of the rainfall event observed in 2010 range from 6.6 years to295.0 years depending on the two variables selected. This dependency of the resulted return period onthe selected variable pair could be explained effectively using the multicollinearity among the variables.Among the three variables that are considered in this study, the rainfall duration caused the mostserious multicollinearity problem. As a result, the resulting return period of the BFA could be mostbiased when the rainfall duration was considered in the BFA. This result indicates that the result of theBFA is most realistic when the total rainfall depth and the mean rainfall intensity were considered.The return periods estimated by the BFA are very consistent with return periods calculated by the RIF.

In conclusion, the result of the BFA with the total rainfall depth and the mean rainfall intensitywas confirmed as the most reasonable case among the three cases for which different pairs of thetwo variables were considered. This result also completely agreed with the multicollinearity issueamong the correlated predictors. Similar results with this study are expected in other regions, unlessthe correlation structure among mean rainfall intensity, rainfall duration and the total rainfall depthis totally different. However, it is also true that the result can vary a bit region by region, which isobviously dependent upon the regional climate.

Author Contributions: C.Y. had done conceptualization and writing review and editing E.C. had done datacuration and visualization and writing original draft preparation.

Funding: This work is supported by the Korea Environmental Industry & Technology Institute (KEITI) grantfunded by the Ministry of Environment (Grant RE201902084) and a grant (19CTAP-C143641-02) from Infrastructureand transportation technology promotion research Program funded by Ministry of Land Infrastructure andTransport of Korean government.

Conflicts of Interest: The authors declare no conflict of interest.

References

1. De Haan, L.; De Ronde, J. Sea and wind: Multivariate extremes at work. Extremes 1998, 1, 7. [CrossRef]2. Yue, S. Applicability of the Nagao–Kadoya bivariate exponential distribution for modeling two correlated

exponentially distributed variates. Stoch. Environ. Res. Risk Assess. 2001, 15, 244–260. [CrossRef]3. Yue, S.; Wang, C. A comparison of two bivariate extreme value distributions. Stoch. Environ. Res. Risk Assess.

2004, 18, 61–66. [CrossRef]4. Yue, S. Joint probability distribution of annual maximum storm peaks and amounts as represented by daily

rainfalls. Hydrol. Sci. J. 2000, 45, 315–326. [CrossRef]5. Herr, H.D.; Krzysztofowicz, R. Generic probability distribution of rainfall in space: The bivariate model.

J. Hydrol. 2005, 306, 234–263. [CrossRef]6. Nadarajah, S. A bivariate pareto model for drought. Stoch. Environ. Res. Risk Assess. 2009, 23, 811–822.

[CrossRef]7. Mirabbasi, R.; Fakheri-Fard, A.; Dinpashoh, Y. Bivariate drought frequency analysis using the copula method.

Theor. Appl. Climatol. 2012, 108, 191–206. [CrossRef]8. Yoo, J.; Kwon, H.-H.; Kim, T.-W.; Ahn, J.-H. Drought frequency analysis using cluster analysis and bivariate

probability distribution. J. Hydrol. 2012, 420, 102–111. [CrossRef]9. Zhang, L.; Singh, V. Bivariate flood frequency analysis using the copula method. J. Hydrol. Eng. 2006, 11,

150–164. [CrossRef]10. Grimaldi, S.; Serinaldi, F. Asymmetric copula in multivariate flood frequency analysis. Adv. Water Resour.

2006, 29, 1155–1167. [CrossRef]11. Chebana, F.; Ouarda, T.B. Depth-based multivariate descriptive statistics with hydrological applications.

J. Geophys. Res. Atmos. 2011, 116, D10120. [CrossRef]12. De Michele, C.; Salvadori, G.; Canossi, M.; Petaccia, A.; Rosso, R. Bivariate statistical approach to check

adequacy of dam spillway. J. Hydrol. Eng. 2005, 10, 50–57. [CrossRef]13. Park, C.-S.; Yoo, C.-S.; Jun, C.-H. Bivariate Rainfall Frequency Analysis and Rainfall-runoff Analysis for

Independent Rainfall Events. J. Korea Water Resour. Assoc. 2012, 45, 713–727. [CrossRef]14. Yoon, P.; Kim, T.-W.; Yoo, C. Rainfall frequency analysis using a mixed GEV distribution: A case study for

annual maximum rainfalls in South Korea. Stoch. Environ. Res. Risk Assess. 2013, 27, 1143–1153. [CrossRef]

Water 2019, 11, 905 17 of 18

15. De Michele, C.; Salvadori, G. A generalized Pareto intensity-duration model of storm rainfall exploiting2-copulas. J. Geophys. Res. 2003, 108. [CrossRef]

16. Bacchi, B.; Becciu, G.; Kottegoda, N.T. Bivariate exponential model applied to intensities and durations ofextreme rainfall. J. Hydrol. 1994, 155, 225–236. [CrossRef]

17. Yue, S. A bivariate gamma distribution for use in multivariate flood frequency analysis. Hydrol. Process.2001, 15, 1033–1045. [CrossRef]

18. Zhang, L.; Singh, V.P. Bivariate rainfall frequency distributions using Archimedean copulas. J. Hydrol. 2007,332, 93–109. [CrossRef]

19. Sklar, M. Fonctions de repartition an dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris 1959, 8, 229–231.20. Fu, G.; Butler, D. Copula-based frequency analysis of overflow and flooding in urban drainage systems.

J. Hydrol. 2014, 510, 49–58. [CrossRef]21. Lee, D.R.; Jeong, S.M. A SpatialTemporal Characteristics of Rainfall in the Han River Basin. J. Korean Assoc.

Hydrol. Sci. 1992, 25, 75–85.22. Yoo, C.; Park, C. Comparison of Annual Maximum Rainfall Series and Annual Maximum Independent

Rainfall Event Series. J. Korea Water Resour. Assoc. 2012, 45, 431–444. [CrossRef]23. Rodriguez-Iturbe, I.; De Power, B.F.; Valdes, J. Rectangular pulses point process models for rainfall: Analysis

of empirical data. J. Geophys. Res. Atmos. 1987, 92, 9645–9656. [CrossRef]24. Onof, C.; Wheater, H.S. Modelling of British rainfall using a random parameter Bartlett-Lewis rectangular

pulse model. J. Hydrol. 1993, 149, 67–95. [CrossRef]25. Smithers, J.; Pegram, G.; Schulze, R. Design rainfall estimation in South Africa using Bartlett–Lewis

rectangular pulse rainfall models. J. Hydrol. 2002, 258, 83–99. [CrossRef]26. Yoo, C.; Kim, D.; Kim, T.-W.; Hwang, K.-N. Quantification of drought using a rectangular pulses Poisson

process model. J. Hydrol. 2008, 355, 34–48. [CrossRef]27. Burton, A.; Fowler, H.; Blenkinsop, S.; Kilsby, C. Downscaling transient climate change using a Neyman–Scott

Rectangular Pulses stochastic rainfall model. J. Hydrol. 2010, 381, 18–32. [CrossRef]28. Gräler, B.; van den Berg, M.; Vandenberghe, S.; Petroselli, A.; Grimaldi, S.; De Baets, B.; Verhoest, N.

Multivariate return periods in hydrology: A critical and practical review focusing on synthetic designhydrograph estimation. Hydrol. Earth Syst. Sci. 2013, 17, 1281–1296. [CrossRef]

29. Jain, A.; Vinnarasi, R.; Dhanya, C. Intensity-Duration-Frequency Relationship using Event Based Approach.J. Agroecol. Nat. Resour. Manag. 2015, 2, 178–180.

30. Chong, I.G.; Jun, C.-H. Performance of some variable selection methods when multicollinearity is present.Chemom. Intell. Lab. Syst. 2005, 78, 103–112. [CrossRef]

31. Silvey, S. Multicollinearity and imprecise estimation. J. R. Stat. Soc. Ser. B Methodol. 1969, 31, 539–552. [CrossRef]32. Farrar, D.E.; Glauber, R.R. Multicollinearity in regression analysis: The problem revisited. Rev. Econ. Stat.

1967, 49, 92–107. [CrossRef]33. Cortina, J.M. Interaction, nonlinearity, and multicollinearity: Implications for multiple regression. J. Manag.

1993, 19, 915–922. [CrossRef]34. Wheeler, D.; Tiefelsdorf, M. Multicollinearity and correlation among local regression coefficients in

geographically weighted regression. J. Geogr. Syst. 2005, 7, 161–187. [CrossRef]35. Harvey, A. Some comments on multicollinearity in regression. Appl. Stat. 1977, 26, 188–191. [CrossRef]36. Morrow-Howell, N. The M word: Multicollinearity in multiple regression. Soc. Work Res. 1994, 18, 247–251.

[CrossRef]37. Viswesvaran, C. Multiple Regression in Behavioral Research: Explanation and Prediction. Pers. Psychol.

1998, 51, 223.38. Allison, P.D. Multiple Regression: A Primer; Pine Forge Press: Newbury Park, CA, USA, 1999.39. Kutner, M.; Nachtsheim, C.; Neter, J.J. Applied Linear Regression Models; McGraw-Hill: New York, NY, USA, 2004.40. Robinson, C.; Schumacker, R.E. Interaction effects: Centering, variance inflation factor, and interpretation

issues. Mult. Linear Regres. Viewp. 2009, 35, 6–11.41. Demuth, S.; Steffensmeier, D. Ethnicity effects on sentence outcomes in large urban courts: Comparisons

among White, Black, and Hispanic defendants. Soc. Sci. Q. 2004, 85, 994–1011. [CrossRef]42. O’brien, R.M. A caution regarding rules of thumb for variance inflation factors. Qual. Quant. 2007, 41,

673–690. [CrossRef]

Water 2019, 11, 905 18 of 18

43. Gilliam, J.; Chatterjee, S.; Grable, J. Measuring the perception of financial risk tolerance: A tale of two measures.J. Financ. Counsel. Plan. 2010, 21, 1–14.

44. Lee, W.H.; Park, S.D.; Choi, S.Y. A derivation of the typical probable rainfall intensity formula in Korea.J. Korean Soc. Civ. Eng. 1993, 13, 115–120.

45. Nelsen, R.B. An Introduction to Copulas; Springer Science & Business Media: Berlin, Germany, 2007.46. Genest, C.; Favre, A.-C. Everything you always wanted to know about copula modeling but were afraid

to ask. J. Hydrol. Eng. 2007, 12, 347–368. [CrossRef]47. Salvadori, G.; De Michele, C.; Kottegoda, N.T.; Rosso, R. Extremes in Nature: An Approach Using Copulas;

Springer Science & Business Media: Berlin, Germany, 2007.48. Clayton, D.G. A model for association in bivariate life tables and its application in epidemiological studies of

familial tendency in chronic disease incidence. Biometrika 1978, 65, 141–151. [CrossRef]49. Frank, M.J. On the simultaneous associativity of F (x, y) and x+y− F (x, y). Aequ. Math. 1979, 19, 194–226.

[CrossRef]50. Hutchinson, T.P.; Lai, C.D. Continuous Bivariate Distributions Emphasising Applications; Rumsby Scientific

Publishing: Adelaide, Australia, 1990.51. Schmidt, R.; Stadtmüller, U. Non-parametric estimation of tail dependence. Scand. J. Stat. 2006, 33, 307–335.

[CrossRef]52. Brooks, C.; Burke, S.P. Information criteria for GARCH model selection. Eur. J. Financ. 2003, 9, 557–580.

[CrossRef]53. Juneja, V.K.; Melendres, M.V.; Huang, L.; Gumudavelli, V.; Subbiah, J.; Thippareddi, H. Modeling the effect

of temperature on growth of Salmonella in chicken. Food Microbiol. 2007, 24, 328–335. [CrossRef]54. Gringorten, I.I. A plotting rule for extreme probability paper. J. Geophys. Res. 1963, 68, 813–814. [CrossRef]55. Yevjevich, V. Probability and Statistics in Hydrology; Water Resources Publications, LLC: Highlands Ranch, CO,

USA, 1972.56. Salvadori, G.; De Michele, C. On the use of copulas in hydrology: Theory and practice. J. Hydrol. Eng. 2007,

12, 369–380. [CrossRef]57. Yoo, C.; Park, M.; Kim, H.J.; Jun, C. Comparison of annual maximum rainfall events of modern rain gauge data

(1961–2010) and Chukwooki data (1777–1910) in Seoul, Korea. J. Water Clim. Chang. 2018, 9, 58–73. [CrossRef]58. Park, M.; Yoo, C.; Kim, H.; Jun, C. Bivariate frequency analysis of annual maximum rainfall event series in

Seoul, Korea. J. Hydrol. Eng. 2013, 19, 1080–1088. [CrossRef]59. Bernard, M.M. Formulas for rainfall intensities of long duration. Trans. Am. Soc. Civ. Eng. 1932, 96, 592–606.60. Brutsaert, W. Hydrology: An Introduction; Cambridge University Press: Cambridge, UK, 2005.61. Gupta, S.K. Modern Hydrology and Sustainable Water Development; John Wiley & Sons: Hoboken, NJ, USA, 2011.62. Naghettini, M. Fundamentals of Statistical Hydrology; Springer: Berlin, Germany, 2017.63. Genest, C.; Rémillard, B.; Beaudoin, D. Goodness-of-fit tests for copulas: A review and a power study.

Insur. Math. Econ. 2009, 44, 199–213. [CrossRef]64. Salvadori, G.; Durante, F.; De Michele, C. On the return period and design in a multivariate framework.

Hydrol. Earth Syst. Sci. 2011, 15, 3293–3305. [CrossRef]65. Pappada, R.; Durante, F.; Salvadori, G. Quantification of the environmental structural risk with spoiling ties:

Is randomization worthwhile? Stoch. Environ. Res. Risk Assess. 2017, 31, 2483–2497. [CrossRef]66. De Michele, C.; Salvadori, G.; Vezzoli, R.; Pecora, S. Multivariate assessment of droughts: Frequency analysis

and Dynamic Return Period. Water Resour. Res. 2013, 49, 6985–6994. [CrossRef]67. Salvadori, G.; Tomasicchio, G.R.; D’Alessandro, F. Practical guidelines for multivariate analysis and design

in coastal and off-shore engineering. Coast. Eng. 2014, 88, 1–14. [CrossRef]68. Brunner, M.I.; Seibert, J.; Favre, A.C. Bivariate return periods and their importance for flood peak and volume

estimation. Wiley Interdiscip. Rev. Water 2016, 3, 819–833. [CrossRef]69. Kroll, C.N.; Song, P. Impact of multicollinearity on small sample hydrologic regression models. Water Resour. Res.

2013, 49, 3756–3769. [CrossRef]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (http://creativecommons.org/licenses/by/4.0/).