doi fy2012 annual eeo program report (md 715)

23
A Study of Fuzzy Linear Regression Dr. Jann-Huei Jinn Department of Statistics Grand Valley State University Allendale, Michigan, 49401 and Dr. Chwan-Chin Song and Mr. J. C. Chao Department of Applied Mathematics National Cheng-Chi University Taipei, Taiwan, R.O.C. 1. Introduction We often use regression analysis to model the relationship between dependent (response) and independent (explanatory) variables. In traditional regression analysis, residuals are assumed to be due to random errors. Thus, statistical techniques are applied to perform estimation and inference in regression analysis. However, the residuals are sometimes due to the indefiniteness of the model structure or imprecise observations. The uncertainty in this type of regression model becomes fuzziness, not random. Since Zadeh (1965) proposed fuzzy sets, fuzziness has received more attention and fuzzy data analysis has become increasingly important. In order to consider the fuzziness in regression analysis, Tanaka et al. (1982) first proposed a study of fuzzy linear regression (FLR) model. They considered the parameter estimations of FLR models under two factors, namely the degree of the fitting and the vagueness of the model. The estimation problems were then transformed into linear programming (LP) based on these two factors. Since the measure of best fit by residuals under fuzzy consideration is not presented in Tanaka’s approach, Diamond (1988) proposed the fuzzy least-squares approach, which is a fuzzy extension of the ordinary least squares based on a new defined distance on the space of fuzzy numbers. In general, the fuzzy regression methods can be roughly divided into two categories. The first is based on Tanaka’s LP approach. The second category is based on the fuzzy least-squares approach. In section 2, we introduced the fuzzy number and its operation, a simple distance formula, two fuzzy linear regression models and their least squares estimates. In section 3, we introduced LR type fuzzy number and nonsymmetrical doubly linear adaptive fuzzy regression model, Yang and Ko’s distance formula, and least squares estimates which relate to membership functions. In section 4, we applied traditional methods of detecting possible outliers and influence points to derive the leverage values, residuals and Cook distance formula for the fuzzy linear regression models..

Upload: others

Post on 09-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

A Study of Fuzzy Linear Regression

Dr. Jann-Huei Jinn

Department of Statistics

Grand Valley State University

Allendale, Michigan, 49401

and

Dr. Chwan-Chin Song and Mr. J. C. Chao

Department of Applied Mathematics

National Cheng-Chi University

Taipei, Taiwan, R.O.C.

1. Introduction

We often use regression analysis to model the relationship between dependent

(response) and independent (explanatory) variables. In traditional regression analysis,

residuals are assumed to be due to random errors. Thus, statistical techniques are applied

to perform estimation and inference in regression analysis. However, the residuals are

sometimes due to the indefiniteness of the model structure or imprecise observations. The

uncertainty in this type of regression model becomes fuzziness, not random. Since Zadeh

(1965) proposed fuzzy sets, fuzziness has received more attention and fuzzy data analysis

has become increasingly important.

In order to consider the fuzziness in regression analysis, Tanaka et al. (1982) first

proposed a study of fuzzy linear regression (FLR) model. They considered the parameter

estimations of FLR models under two factors, namely the degree of the fitting and the

vagueness of the model. The estimation problems were then transformed into linear

programming (LP) based on these two factors. Since the measure of best fit by residuals

under fuzzy consideration is not presented in Tanaka’s approach, Diamond (1988)

proposed the fuzzy least-squares approach, which is a fuzzy extension of the ordinary

least squares based on a new defined distance on the space of fuzzy numbers. In general,

the fuzzy regression methods can be roughly divided into two categories. The first is

based on Tanaka’s LP approach. The second category is based on the fuzzy least-squares

approach.

In section 2, we introduced the fuzzy number and its operation, a simple distance

formula, two fuzzy linear regression models and their least squares estimates.

In section 3, we introduced LR type fuzzy number and nonsymmetrical doubly linear

adaptive fuzzy regression model, Yang and Ko’s distance formula, and least squares

estimates which relate to membership functions.

In section 4, we applied traditional methods of detecting possible outliers and

influence points to derive the leverage values, residuals and Cook distance formula for

the fuzzy linear regression models..

In section 5, we used the theoretical results in the previous chapters to analyze the

Tanaka’s (1987) data.

The derivation of some important formulas is given in appendices.

2. Introduction to Fuzzy Linear Regression

2.1 Fuzzy �umber and Its Operation

Fuzzy data is a natural type of data, like non-precise data or data with a source of

uncertainty not caused by randomness. This kind of data is easy to find in natural

language, social science, psychometrics, environments and econometrics, etc. Fuzzy

numbers have been used to represent fuzzy data. These are also used to model fuzziness

of data.

Let ℜ be a one-dimensional Euclidean space with its norm denoted by . . A fuzzy

number is an upper semicontinuous convex function F: ℜ → [0,1] with { }1)( =ℜ∈ xFx

non-empty. In other words, a fuzzy number A is defined as a convex normalized fuzzy

set of the real line ℜ so that there exists exactly one ox ∈ ℜ with F( ox )=1, and its

membership F(x) is piecewise continuous.

Definition 2.1 (Zimmermann [pp.62-63])

Let L (and R) be decreasing, shape functions from +ℜ to [0,1] with L(0)=1; L(x)<1

for all x>0; L(x)>0 for all x<1; L(1)=0 for all x and L(+∞ )=0). Then a fuzzy number A is called of LR-type if for m, α >0, 0>β in ℜ ,

A(x)=

,

,

mxmx

R

mxxm

L

β

α where m is called the center (mean or mode)value of A

and α and β are called left and right spreads, respectively. Symbolically, A is denoted by ( )LRm βα ,, . If α = β , A=(m,α , α ) LR is called symmetrical fuzzy number, denoted by A=(m, α ) LR . For instance, the algebraic and geometric characteristics of the membership function of the more utilized LR fuzzy number, the triangular fuzzy

number, are shown in the following:

A(x)=

+≤≤−

≤≤−−

,1

,1

αβ

αα

mxmmx

mxmxm

Another example, the exponential fuzzy number, its membership function

A(x)=

−−

−−

,exp

,exp

mxs

mx

mxs

xm

n

n

where s is the spread.

Definition 2.2 (Dubois(1980))

Let A= ( )LRaaam βα ,, and B= ( )

LRbbbm βα ,, be two LR-type fuzzy numbers. Then by

the extension principle, the following operations are defined:

1. A+B= ( )LRbababa mm ββαα +++ ,,

2. λ A= λ ( )LRaaam βα ,, = ( )

LRaaam λβλαλ ,, when λ >0

3. λ A= λ ( )LRaaam βα ,, = ( )

RLaaam λαλβλ −− ,, when λ <0

4. –B= ( )RLbbbm αβ ,,−

5. A–B= ( )LRaaam βα ,, – ( )

RLbbbm αβ ,,− = ( )LRbababa mm αββα ++− ,,

Definition 2.3 (a Euclidean distance formula)

Let A= ( )LRaam α, and B= ( )

LRbbm α, be two symmetrical fuzzy numbers, then the

distance between A and B is defined as:

D= 22 )()( baba mm αα −+− (2.1)

Let A= ( )LRaaam βα ,, and B= ( )

LRbbbm βα ,, be two LR-type fuzzy numbers, then the

distance between A and B is defined as:

D= βα ββαα wwwmm babamba

222 )()()( −+−+− (2.2)

Where 0>mw , 0>αw , and 0>βw are arbitrary weights.

2.2 General Fuzzy Linear Regression Model

Consider the following general fuzzy linear regression model and call it Model I:

ippiioi xAxAxAAy ++++= ...2211 i=1,2,…,n (2.3)

where ijx are real numbers, ],[ iiiii scscy +−= are fuzzy numbers, ic is the center (mean

or mode) value, is is the spread, and ],[ mmmmm raraA +−= are the fuzzy regression

parameters, which has the same membership function as iy . How should we estimate

mA if the distance between two fuzzy numbers are undefined? We may treat

iiLi scy −= and iiRi scy += as the left and right end points of the sample data,

respectively. For the left end data points ( ){ }nixxxy ipiiLi ,...,2,1,...,, 21, = (and similarly to

the right end data points ( ){ }nixxxy ipiiRi ,...,2,1,...,,, 21 = we may use the linear regression

model ppo xxy βββ +++= ...11 to obtain the following estimates, respectively,

nixLxLxLLy ippiioLi ,...,2,1,...2211 =++++=)))))

nixRxRxRRy ippiioRi ,...,2,1... ,2211 =++++=)))))

Then, [ ]mmmmm raraA)))))

+−= , where 2

mmm

RLa

))) +

= , 2

mm

m

LRr

))

) −= .

Using this way to estimate the regression parameters, mA , didn’t consider the

advantage of using the membership function to describe the data. The fuzzy concept were

not used in the estimation of parameters. In order to obtain more appropriate estimates of

fuzzy regression parameters, mA , the least squares method and the distance between two

fuzzy numbers should be considered.

Based on the definition 2.3, we can use ordinary least-squares method to estimate the

fuzzy parameters in the general fuzzy linear regression model ((2.3), Model I). Assuming

that iy = ),( ii sc and ),( mmm raA = have same membership function, after appropriate

translation, we can make all of 0>ijx . Then (2.3) can be expressed as

ipppiiooii xraxraxrarasc ),(...),(),(),(),( 222111 ++++=

According to the Euclidean distance formula of (2.1), the least-squares estimates of ia

and ir are the values of ia , ir which minimize the value of 2D where

2D = [ ]∑=

+++−++++−n

i

ippioiippioi xrxrrsxaxaac1

2

11

2

11 ))...(())...((

Let vr denote the length of vector v

r, then by using vector and matrix expressions

2D can be rewritten as 2D =2

Ca −Χ +2

Sr −Χ where Χ is a )1( +× pn design

matrix, )',...,,( 1 po aaaa = , )',...,,( 1 po rrrr = , )',...,,( 21 ncccC = , and )',...,,( 21 nsssS = .

Let 02

=∂∂a

D and 0

2

=∂

∂r

Dthen the solutions of a and rwhich minimize 2D are as

follows: Ca ')'(ˆ 1ΧΧΧ= −

Sr ')'(ˆ 1ΧΧΧ= − (2.4)

The above method used regression with respect to center and spread. The estimation

results are not related to the membership functions. But, in the later real data analyses,

this method provided better results in the estimation of fuzzy parameter values.

2.3 Symmetrical Doubly Linear Adaptive Fuzzy Regression Model

Under the structure of model I, if we use the Euclidean distance formula and the least-

squares method to do linear regression with respect to center and spread respectively,

then the estimates show that the centers and spreads are not related. But, D’Urso and

Gastaldi (2000) think that the dynamic of the spreads is somehow dependent on the

magnitude of the (estimated) centers. Therefore, they proposed the doubly linear adaptive

fuzzy regression model (call it Model II) to obtain the parameter estimates.

They considered symmetrical fuzzy numbers with triangular membership function.

Where a fuzzy number, iy = ),( ii sc , is completely identified by the two parameters c

(center) and s (left and right spread). Model II is defined as follows:

cCC ε+= * aC Χ=* (2.5)

sSS ε+= * dbCS 1** += (2.6)

whereΧ is a )1( +× pn matrix containing the input variables (data

matrix), )',...,,( 1 po aaaa = is a column vector containing the regression parameters of the

first model (referred to as core regression model), )',...,,( 21 ncccC = and aC Χ=* are the

vector of the observed centers and the vector of the interpolated centers, respectively,

both having dimensions 1×n , and )',...,,( 21 nsssS = and *S are the vector of the

assigned spreads and the vector of the interpolated spreads, respectively, both having

dimension 1×n , 1 is a 1×n -vector of all 1's, b and d are the regression parameters for

the second regression model (referred to as spread regression model).

Apparently, the above model is based on two linear models. The first one interpolates

the centers of the fuzzy observations, the second one yields the spreads, by building

another linear model over the first one. Observe that predictive variables X are taken into

consideration in Eq. (2.6) through the observed centers. The model is hence capable to

take into account possible linear relations between the size of the spreads and the

magnitude of the estimated centers. This is often the case in real world applications,

where dependence among centers and spreads is likely (for instance, the uncertainty or

fuzziness with a measurement could depend on its magnitude).

D’Urso and Gastaldi used the Euclidean distance formula of (2.1) and the least-

squares method to obtain the estimates of a ,b and d such that the value of 2D is

minimized, where

2D =2

*2

* SSCC −+−

= 22 1''21'2'2')1('''2' ndbdadSabSSSbaaaCCC +Χ+−Χ−++ΧΧ+Χ−

Let 02

=∂∂a

D, 0

2

=∂∂b

D and 0

2

=∂∂d

D, they obtained the following equations:

02

=∂∂a

D= bdSbbaC 1'')1('' 2 Χ+Χ−+ΧΧ+Χ−

02

=∂∂b

D= daaSaba 1''''' Χ+Χ−ΧΧ

02

=∂∂d

D= ndbaS +Χ+− 1''1' (2.7)

Based on the equations in (2.7), they obtained the following least-squares iterative

solutions of a ,b and d :

a = ))1(')'(()1(

1 1

2bdSbC

b−+ΧΧΧ

+−

b = )1'''()''( 1 daaSaa Χ−ΧΧΧ −

d = )1''1'(1

baSn

Χ− (2.8)

The derivation of the recursive solutions of a ,b and d : from the first equation of (2.7),

we can easily obtain a = ))1(')'(()1(

1 1

2bdSbC

b−+ΧΧΧ

+− , substituting it into the

second and third equations of (2.7), we obtained:

02ˆˆ'ˆ'ˆ 22

222

=−++−−−+ nbddCnbdSnbSCSdbCnbSCbC (2.9)

and 0=+− dSCb (2.10)

where CC ˆ')'( 1 =ΧΧΧΧ − , SS ˆ')'( 1 =ΧΧΧΧ − , CCn

='11

, SSn

='11

.

From (2.10), we obtained CbSd −= , substituting it back into (2.9), we obtained a

simplified quadratic equation of b: 032

2

1 =++ MbMbM

Where SCnSCM −= ˆ'1 , 2222

2ˆˆ CnSnSCM −+−= , CSSCnM ˆ'3 −= .

By solving the quadratic equation of b, we obtained

1

31

2

22

2

4ˆM

MMMMb

−±−= , and the corresponding solutions of

CbSd ˆˆ −= ,

))ˆˆ1ˆ(')'(()ˆ1(

1ˆ 1

2dbbSC

ba −+ΧΧΧ

+= −

The least-squares estimates were obtained by substituting these two sets of a , b , and

d into 2D such that the value of 2D is minimized. Based on the equations of a , b , and

d , we can conclude that no matter what kind of membership function of the response

fuzzy number, iy = ),( ii sc , the estimates of parameters are the same. Therefore, these

least squares estimates do not consider other possible shapes of fuzzy numbers.

3. LR type of Fuzzy Linear Regression

3.1 �onsymmetrical Doubly Linear Adaptive Fuzzy Regression Model

When we have numerical (crisp) explanatory variables jX ( ),...,2,1 kj = and a LR

fuzzy dependent variable ),,( qpcY ≡ (where c is the center, p and q , respectively, the

left and right spreads), a model capable to incorporate the possible influence of the

magnitude of the centers on the spreads, can be taken into account (D’Urso and Gastaldi,

2000, 2001, 2002). If the fuzzy response numbers ],[ iiiii pcqcy +−= are

nonsymmetrical with triangular membership function. D’Urso (2003) proposed a fuzzy

regression model (call it Model III) which is expressed in the matrix form:

ε+= *CC aC Χ=* (3.1)

λ+= *PP dbCP 1** += (3.2)

ρ+= *qq hgCq 1** += (3.3)

whereΧ is a )1( +× kn matrix containing the vector 1 concatenated to k crisp input

variables;C , *C are 1×n vectors of the observed centers and interpolated centers,

respectively; P , *P are 1×n vectors of observed left spreads and interpolated left spreads,

respectively; q , *q are 1×n vectors of observed right spreads and interpolated right

spreads, respectively; a is a 1)1( ×+k vector of regression parameters for the regression

model for C ; hgdb ,,, are regression parameters for the regression models for P and q ;1

is a 1×n vector of all 1’s; ρλε ,, are 1×n vectors of residuals.

This model is based on three sub-models. The first one interpolates the centers of the

fuzzy data, the other two sub-models are built over the first one and yield the spreads.

This formulation allows the model to consider possible relations between the size of the

spreads and the magnitude of the estimated centers, as it is often necessary in real case

studies. Model III can be called a nonsymmetrical doubly linear adaptive fuzzy regression

model.

D’Urso used the Euclidean distance formula of (2.2) and the least-squares method to

obtain the estimates of a , hgdb ,,, such that the value of 2D is minimized, where

2D = cCC π2

*− + pPP π2

*− + qqq π2

*− = caaaCCC π)'''2'( ΧΧ+Χ−

+ pndbdaabadabPPP π)1''2'')1('2'( 22 +Χ+ΧΧ++Χ−

+ qnhghaagahagqqq π)1''2'')1('2'( 22 +Χ+ΧΧ++Χ− (3.4)

and cπ , pπ , qπ are arbitrary positive weights.

Recursive solutions to the above system are found by equating to zeros the partial

derivates with respect to the parameters a , hgdb ,,, :

a = [ ]qpc

qpc

ghqbdPCgb

ππππππ

)1())1((')'(1 1

22−+−+ΧΧΧ

++−

b = )1''''()''( 1 daPaaa Χ−ΧΧΧ −

d = )1''1'(1

baPn

Χ−

g = )1''''()''( 1 haqaaa Χ−ΧΧΧ −

h = )1''1'(1

gaqn

Χ− = )1''''1'(1

haqaqn

Χ−Χ− (3.5)

Where hgdba ˆ,ˆ,ˆ,ˆ,ˆ are the iterative least-squares estimates (obtained at the end of the

iterative process). The optimization procedure does not guarantee the attainment of the

global minimum, only a local one. For this reason, it is suggested to initialize the iterative

algorithm by considering several possible starting points in order to check the stability of

the solution. Based on the equations of hgdba ˆ,ˆ,ˆ,ˆ,ˆ , we can conclude that the estimates of

parameters are not related to the membership function of the response fuzzy number.

3.2 Yang and Ko’s Distance Formula

Under the structure of Model I, II, III and the use of Euclidean distance, all the least-

squares estimates are not able to consider the possible effect of the membership function

of fuzzy response numbers. In this section, we will adapt the Yang and Ko’s (1996)

distance formula to try to find the least-squares estimates which are related to the

membership function of fuzzy response numbers.

Definition 3.1 (Yang and Ko’s distance formula(1996))

Let )(ℜLRF denote the set of all LR-type fuzzy numbers. Define a new type of

distance for any LRaaamA ),,( βα= , B= ( )LRbbbm βα ,, in )(ℜLRF as follows:

2222 ))()(())()(()(),( bbaabbaabaLR rmrmlmlmmmBAd ββαα +−++−−−+−= (3.6)

where ∫ −=1

0

1 )( ωω dLl and ∫ −=1

0

1 )( ωω dRr

Yang and Ko (1996) also proved that ( )(ℜLRF , LRd ) is a complete metric space. If A

and B are symmetrical LR type fuzzy numbers then rl = and 2222 )(2)(3),( babaLR lmmBAd αα −+−= . If A and B are symmetrical triangular type of

fuzzy numbers then ∫ ∫ =−== −1

0

1

0

1

2

1)1()( dxxdxxLl . If A and B are exponential type of

fuzzy numbers then ∫ ∫ −== −1

0

1

0

/11 )ln()( dxxdxxLl m = )1

1(m

+Γ . Compare with the

distance formulas of (2.1) and (2.2), the distance formula of (3.6) can avoid the subjective

choice of the weights ( 0>mw , 0>αw , and 0>βw are arbitrary weights).

3.3 The Least Squares Estimates (Based on Yang and Ko’s Distance)

In this section, we will consider LR type of response fuzzy numbers and use the

distance formula of (3.6) to find least squares estimates of regression parameters. Under

the structure of Model I, if we have symmetrical LR type fuzzy response numbers

LRiii scy ),(= , then rl = in (3.6). The sum of squared error 2D can be expressed in

vector form:

2D =222)()()()( lSCrlalSCrlaCa +−Χ+Χ+−−Χ−Χ+−Χ

= SSlSrlrrlCaaa '2''4''2''6''3 222 +Χ−ΧΧ+Χ−ΧΧ

Let 02

=∂∂a

D and 0

2

=∂

∂r

Dthen

02

=∂∂a

D= Ca '6'6 Χ−ΧΧ

02

=∂

∂r

D= Slrl '4'4 22 Χ−ΧΧ

and the solutions of a and rwhich minimize 2D are as follows:

Ca ')'(ˆ 1ΧΧΧ= −

Sr ')'(ˆ 1ΧΧΧ= − (3.7)

Therefore, under the structure of Model I, no matter whether we used the distance

formula of (2.1) or (3.6), we obtained the same least squares estimates and they are are

not related with their class membership functions.

Next, let us consider Model II (D’Urso and Gastaldi (2000), doubly linear adaptive

fuzzy regression model), the sum of squared error 2D can be expressed in vector form:

2D =

222)()]1([)()]1([ lSCdablalSCdablaCa +−+Χ+Χ+−−+Χ−Χ+−Χ

= SbalbdalaablCCaCaa ''41''4''2'3'6''3 2222 Χ−Χ+ΧΧ++Χ−ΧΧ

+ SSlSdnlndl '242 2222 +−

Let 02

=∂∂a

D, 0

2

=∂∂b

D and 0

2

=∂∂d

D, after lengthy tedious and complicated calculations

(see Appendix I) we obtained the following least squares estimates: a , b , and d

b =1

31

2

22

2

4

K

KKKK −±−

CbSd ˆˆ −= ,

a = )1'ˆ21'ˆ2'ˆ2'3()'(ˆ23

1 22221

22XCblSblSblC

bl+Χ−Χ+ΧΧΧ

+− (3.8)

where )'ˆ(2 2

1 SCnSClK −= , )ˆ(2)ˆ(3 22

22

2 SnSCnCK −−−= , SCCSnK ˆ'333 −= ,

and CC ˆ')'( 1 =ΧΧΧΧ − , SS ˆ')'( 1 =ΧΧΧΧ − , CCn

='11

, SSn

='11

.

The least-squares estimates were obtained by substituting these two sets of a , b , and

d into 2D such that the value of 2D is minimized. Based on the equations of a , b , and

d , we can conclude that these least squares estimates do relate to the membership

function of the response fuzzy number, LRiii scy ),(= .

Under the structure of Model III (D’Urso (2001)) and consider nonsymmetrical LR

type of response fuzzy numbers, the sum of squared error 2D can be expressed in vector

form:

2D =222)()]1([)()]1([ rqChaglalPCdablaCa +−+Χ+Χ+−−+Χ−Χ+−Χ

= aCrglbaagrrgbllbaCCC Χ−+ΧΧ+++−+Χ− ')22('')223('6'3 2222

+ PCaghrrhlddblaqrgaPlb '21'')2222(')22(')22( 22 −Χ++−+Χ+−Χ−

+ qqPPnhrndlqrhnPldnCnrhldqC ''22)22('2 2222 ++++−−−+

Let 02

=∂∂a

D, 0

2

=∂∂b

D, 0

2

=∂∂d

D, 0

2

=∂∂g

Dand 0

2

=∂∂h

D, after lengthy tedious and

complicated calculations we obtained the following equations:

1'2'2'2'2')24246('6 2222 Χ−Χ−Χ+Χ+ΧΧ+++−+Χ− ldPlbPClbagrrgbllbC

+ 1'21'2'2'2'21'2 22 Χ+Χ+Χ−Χ−Χ−Χ ghrrhqrgqCrgdbl =0

1''2''2''2'2'2 22 Χ+ΧΧ+ΧΧ−Χ−Χ dalabalalaalPalC =0

dnlballalPlC 22 21''21''21'21'2 +Χ+Χ−− =0

1''2''2''2'2'2 22 Χ+ΧΧ+ΧΧ+Χ−Χ− haragararaarqarC =0

hnrgarrarqrC 22 21''21''21'21'2 +Χ+Χ+−− =0

Since the equations are too complicated to find general solutions of hgdba ,,,, we just

list the following recursive equations and try to use mathematics software to find possible

solutions.

1'2'2'2'2'3[)'(223

1 21

2222Χ+Χ+Χ−Χ−ΧΧΧ

+−+−= − ldPblPlClbC

grblrglba

1'212'2'2'21'2 222 Χ+Χ−Χ+Χ+Χ+Χ− rhghrqgrqrCrgbdl ]

)1''''''()''(1 1 Χ+ΧΧ−Χ−ΧΧΧ= − ldaaaalPaCaal

b

)1''''''()''(1 1 Χ+ΧΧ−Χ+ΧΧΧ= − rhaaaarqaCaar

g

)1''1''(1

Χ−Χ++−= lbaaPlCl

d

)1''1''(1

Χ+Χ−+= argaqrCr

h

From the above equations, it is obvious that the least squares estimates are related to

the membership function of the response fuzzy number, LRiii scy ),(= .

4. Diagnostic of Outliers and Influences

4.1 Diagnostic of Outliers and Influences in Linear Regression Model

Although a residual analysis is useful in assessing model fit, departures from the

regression model are often hidden by the fitting process. For example, there may be

“outliers” in either the response or explanatory variables that can have a considerable

effect on the analysis. Observations that significantly affect inferences drawn from the

data are said to be influential. Methods for assessing influence are typically based on the

change in the vector of parameter estimates when observations are deleted.

The leverage jjjj xxh 1' )'( −ΧΧ= is associated with the thj data point and measures, in

the space of the explanatory variables, how far the thj observation is from the other n-1

observations. For a data point with high leverage, jjh approaches 1 )10( ≤≤ jjh ,

indicates it is a possible outlier. The residuals iii yye ˆ−= are used to detect possible

outliers for the response variable y, where iy is the thi predicted y value. A large value of

ie indicates the thi data point could be an outlier. One may also use

)()(ˆiii yye −= =

ii

i

h

e

−1 to detect possible outliers, where )(

ˆiy is the predicted y value when

the thi observation is dropped from the analysis. A large value of )(ie also indicates the

thi data point could be an outlier.

In traditional linear regression analysis, one may use the Cook distance,

2

2

)(ˆˆ

ksCD

i

i

Υ−Υ= =

22

2

)1( ii

iii

h

h

ks

e

−to detect possible influential data points where )(

ˆiΥ is

the predicted Y vector value when the thi observation is dropped from the analysis, k is

the number of parameters, and kn

e

s

n

i

i

−=∑=1

2

2 is the mean square error. A large value of iCD

indicates that thi data point could be an influential observation. One of the advantages of

using Cook distance is that no matter what measurement units are used in the explanatory

and response variables, the value of iCD will not be affected.

4.2 Diagnostic of Outliers and Influences in Fuzzy Linear Regression Model

In this section, we will consider the Model I (see (2.3)) and derive the corresponding

formulas of ie , )(ie , and iCD to detect possible outliers and influential data points. For

Model II (see (2.5) and (2.6)) and Model III (see (3.1), (3.2), and (3.3)), we were not able

to derive any formulas of ie , )(ie , and iCD to detect possible outliers and influential data

points.

Based on the Euclidean distance, we obtained (see the derivations in Appendix A.2)

22222 )()()ˆ()ˆ( s

i

c

iiiiii eerxsaxce +=−+−= (4.1)

2

)(

2

)(

2

)( )ˆ()ˆ( iiiiiii rxsaxce −+−= =

2

1

− ii

i

h

e (4.2)

where axce ii

c

iˆ−= is the residual from the center of a fuzzy number and rxse ii

s

iˆ−= is

the residual from the spreads of a fuzzy number. a and r are defined in (2.4).

Similarly, based on the Yang and Ko’s distance we obtained (see the derivations in

Appendix A.2)

22222 )(2)(3)ˆ,( s

i

c

iiiLRi eleyyde +== (4.3)

)ˆ,( )(

22

)( iiLRi yyde = =3 2)1(

ii

c

i

h

e

−+ 22 )

1(2

ii

s

i

h

el

−=

2

1

− ii

i

h

e (4.4)

From (4.2) and (4.4), the relation between ie and )(ie are the same as in general linear

regression model. That is, a large value of )(ie , indicates the thi data point could be an

outlier.

In order to derive a formula similar to the Cook’s distance under the fuzzy

environment, we need to define a new type of distance between fuzzy vectors. Let

)(ℜLRF denote the set of all LR-type fuzzy numbers, and

)(~

ℜLRF = ( ){ })(',...,, 21 ℜ∈ LRik FXXXX is the set of all fuzzy k dimensional vectors.

Based on the distance definition in )(ℜLRF , we can define a new distance in )(~

ℜLRF .

Lemma 4.1 Let d : )(ℜLRF × )(ℜLRF ℜ→ be a metric, for any two fuzzy vectors

)',...,,( 21 kXXX=Χ , ∈=Υ )',...,,( 21 kYYY )(~

ℜLRF , define

∑=

=ΥΧk

i

iiLR YXdd1

2 ),(),(~

(4.5)

then LRd~is a metric in )(

~ℜLRF . If d is a complete metric then so does LRd

~(see the proof

in Appendix 3).

When d is a simple metric, define Cook’s distance iCD as follows:

iCD =2

)(

2 )ˆ,ˆ(~

ks

d iLR ΥΥ=

2

2

)(

2

)(ˆˆˆˆ

ks

rraa ii Χ−Χ+Χ−Χ

then we obtained (see the derivation in Appendix 4)

iCD =2

2

2 )1(

1

ii

iii

h

he

ks − (4.6)

where kn

e

s

n

i

i

−=∑=1

2

2 and 222 )()( s

i

c

ii eee += .

When d is Yang and Ko’s metric, define Cook’s distance iCD as follows:

iCD =2

)(

2 )ˆ,ˆ(~

ks

d iLR ΥΥ

= { }2)()(

2

)()(

2

)(2ˆˆ()ˆˆ()ˆˆ()ˆˆ(ˆˆ

1iiiii rlarlarlarlaaa

ksΧ+Χ−Χ+Χ+Χ−Χ−Χ−Χ+Χ−Χ

then we obtained (see Appendix A.4)

iCD =2

2

2 )1(

1

ii

iii

h

he

ks − (4.7)

where kn

e

s

n

i

i

−=∑=1

2

2 and 2222 )(2)(3 s

i

c

ii elee += .

Although the formulas (4.6) and (4.7) looks the same, the values of 2

ie and 2s are

different. In general, 2s in (4.7) is larger than the value of 2s in (4.6), therefore the Cook

distance calculated in (4.6) is larger than the Cook distance calculated in (4.7). From (4.6)

and (4.7) we knew that iCD is affected by the leverage value jjh and residual ie . This is

the same as in the traditional regression analysis.

Since we were not able to derive similar formulas as (4.1) – (4.4) for Model II and III,

the best we can do is to delete a data point (per time) and recalculate the values of )(ie ,

iCD , etc.

5. Data Analysis

In this section, we will use the Tanaka’s data (1987, see Table 1) to illustrate the

theoretical results which we obtained in the previous sections. The data set contains three

independent variables, one fuzzy response variable and ten data points. We only consider

exponential fuzzy response values. The advantage of using exponential membership

function is that we only need to choose appropriate m value ( Note: m is the mean value

of LR type fuzzy numbers) to reflect the distribution of response variable. If the values of

response variable tend to fall outside the interval of existing data then we choose smaller

m value. Otherwise, we will choose larger m value to describe the membership function.

Since we were not able to derive the least squares estimates for model III and we only

consider exponential membership function, we will use model I and II to do data analysis.

Tables 2 – 11 show the results of using the Euclidean distance, Yang and Ko’s distance

and different m values. In each table, it contains the least squares estimates, the sum of

squared residuals, the leverage value jjh , the values of 2

ie and2

)(ie , and the COOK

distance, iCD . Since under the Euclidean simple distance formula, the m value will not

affect the results of using model I and II, therefore we only give the results of m=2 (see

Table 2 and 3).

Table 1: Tanaka’s Data (1987)

Case # Predictors

1ix 2ix 3ix

Fuzzy Response Variable

),( iii rcY =

1 3 5 9 (96,42)

2 14 8 3 (120,47)

3 7 1 4 (52,33)

4 11 7 3 (106,45)

5 7 12 15 (189,79)

6 8 15 10 (194,65)

7 3 9 6 (107,42)

8 12 15 11 (216,78)

9 10 5 8 (108,52)

10 9 7 4 (103,44)

Table 2: Model I, m=2, Least-Squares Estimates Under Euclidean Distance

Case # ),( ii sc )ˆ,ˆ( ii sc iih 2

ie 2

)(ie iCD

1 (96,42) (93.20, 44.62) 0.40 14.69 41.25 0.25

2 (120,47) (122.48, 49.13) 0.43 10.67 32.44 0.21

3 (52,33) (49.36, 32.11) 0.41 7.75 21.90 0.13

4 (106,45) (104.82, 43.01) 0.26 5.35 9.75 0.04

5 (189,79) (191.79, 76.71) 0.55* 13.06 63.57* 0.52*

6 (194,65) (193.64, 67.67) 0.39 7.25 19.38 0.11

7 (107,42) (109.77, 40.85) 0.60* 9.08 55.55* 0.50*

8 (216,78) (211.65, 77.08) 0.42 19.73 58.34* 0.37

9 (108,52) (110.89, 53.24) 0.37 9.91 25.12 0.14

10 (103,44) (103.36, 42.58) 0.18 2.14 3.22 0.01

)'03.5,92.7,25.3,39.1(ˆ −=a )'85.2,20.1,64.1,01.8(ˆ =r 63.992 =∑ ie

Table 3: Model II, m=2, Least-Squares Estimates Under Euclidean Distance

Case # ),( ii sc )ˆ,ˆ( ii sc iih 2

ie 2

)(ie iCD

1 (96,42) (93.86, 42.38) 0.40 4.71 11.08 0.36

2 (120,47) (122.04, 50.63) 0.43 17.34 31.44 0.38

3 (52,33) (50.11, 29.56) 0.41 15.41 92.87* 0.66*

4 (106,45) (104.13, 45.38) 0.26 3.65 6.55 0.35

5 (189,79) (193.31, 71.51) 0.55* 74.67* 90.38* 0.38

6 (194,65) (192.58, 71.30) 0.39 41.67 118.34* 0.63*

7 (107,42) (108.12, 46.55) 0.60* 21.97 38.00 0.39

8 (216,78) (211.71, 76.90) 0.42 19.64 78.55* 0.61*

9 (108,52) (112.48, 47.86) 0.37 37.44 82.31* 0.51

10 (103,44) (102.67, 44.96) 0.18 1.02 5.34 0.34

)'40.5,62.7,43.3,14.3(ˆ −=a , b =0.29, d =14.88, ∑ 2

ie =236.98

Table 4: Model I, m=1.2, Least-Squares Estimates Under Yang and Ko’s Distance

Case # ),( ii sc )ˆ,ˆ( ii sc iih 2

ie 2

)(ie iCD

1 (96,42) (93.20, 44.62) 0.40 35.61 100.02 0.24

2 (120,47) (122.48, 49.13) 0.43 26.43 80.41 0.20

3 (52,33) ( 49.36, 32.11) 0.41 22.26 62.94 0.15

4 (106,45) (104.82, 43.01) 0.26 11.17 31.59 0.03

5 (189,79) (191.79, 76.71) 0.55* 32.71 159.18* 0.51*

6 (194,65) (193.64, 67.67) 0.39 12.99 34.72 0.08

7 (107,42) (109.77, 40.85) 0.60* 25.61 156.77* 0.55*

8 (216,78) (211.65, 77.08) 0.42 58.16 171.95* 0.42

9 (108,52) (110.89, 53.24) 0.37 27.84 70.53 0.16

10 (103,44) (103.36, 42.58) 0.18 3.95 5.94 0.01

)'03.5,92.7,25.3,39.1(ˆ −=a , )'85.2,20.1,64.1,01.8(ˆ =r , ∑ 2

ie =256.73

Table 5: Model II, m=1.2, Least-Squares Estimates Under Yang and Ko’s Distance

Case # ),( ii sc )ˆ,ˆ( ii sc iih 2

ie 2

)(ie iCD

1 (96,42) (94.91, 37.94) 0.40 32.79 60.26 0.02

2 (120,47) (122.32, 49.77) 0.43 29.66 80.87 0.04

3 (52,33) (52.54, 19.65) 0.41 316.42* 360.47* 0.08

4 (106,45) (105.01, 42.30) 0.26 15.89 24.84 0.01

5 (189,79) (191.09, 79.46) 0.55* 13.48 63.15 0.04

6 (194,65) (190.68, 79.29) 0.39 394.20* 536.55* 0.07

7 (107,42) (108.98, 44.01) 0.60* 18.96 104.09 0.07

8 (216,78) (209.07, 87.22) 0.42 294.63* 498.27* 0.13*

9 (108,52) (112.82, 45.67) 0.37 140.58 207.55 0.05

10 (103,44) (103.52, 41.69) 0.18 10.51 13.72 0.002

)'19.5,41.7,30.3,28.1(ˆ =a , b =0.43, d = - 3.03, ∑ 2

ie =1267.13

Table 6: Model I, m=2, Least-Squares Estimates Under Yang and Ko’s Distance

Case # ),( ii sc )ˆ,ˆ( ii sc iih 2

ie 2

)(ie iCD

1 (96,42) (93.20, 44.62) 0.40 34.24 96.18 0.23

2 (120,47) (122.48, 49.13) 0.43 25.53 77.68 0.20

3 (52,33) (49.36, 32.11) 0.41 22.10 62.49 0.15

4 (106,45) (104.82, 43.01) 0.26 10.38 18.91 0.03

5 (189,79) (191.79, 76.71) 0.55* 31.66 154.08* 0.51*

6 (194,65) (193.64, 67.67) 0.39 11.57 30.93 0.07

7 (107,42) (109.77, 40.85) 0.60* 25.35 155.17* 0.55*

8 (216,78) (211.65, 77.08) 0.42 57.99 171.46* 0.43

9 (108,52) (110.89, 53.24) 0.37 27.53 69.75 0.16

10 (103,44) (103.36, 42.58) 0.18 3.55 5.34 0.01

)'03.5,92.7,25.3,39.1(ˆ −=a , )'85.2,20.1,64.1,01.8(ˆ =r , ∑ 2

ie =249.91

Table 7: Model II, m=2, Least-Squares Estimates Under Yang and Ko’s Distance

Case # ),( ii sc )ˆ,ˆ( ii sc iih 2

ie 2

)(ie iCD

1 (96,42) (94.76, 37.78) 0.40 32.73 61.86 0.02

2 (120,47) (122.33, 49.76) 0.43 28.30 77.95 0.04

3 (52,33) (52.28, 19.29) 0.41 295.52* 336.34* 0.09

4 (106,45) (105.00, 42.22) 0.26 15.12 23.74 0.01

5 (189,79) (191.12, 79.67) 0.55* 14.15 67.62 0.06

6 (194,65) (190.93, 79.59) 0.39 362.74* 493.27* 0.07

7 (107,42) (109.07, 43.99) 0.60* 19.09 107.17 0.08

8 (216,78) (209.27, 87.57) 0.42 279.81* 476.79* 0.13*

9 (108,52) (112.65, 45.54) 0.37 130.24 194.90 0.05

10 (103,44) (103.59, 41.60) 0.18 10.06 13.27 0.002

)'17.5,45.7,29.3,11.1(ˆ =a , b =0.43, d = - 3.45, ∑ 2

ie =1187.75

Table 8: Model I, m=3, Least-Squares Estimates Under Yang and Ko’s Distance

Case # ),( ii sc )ˆ,ˆ( ii sc iih 2

ie 2

)(ie iCD

1 (96,42) (93.20, 44.62) 0.40 34.41 96.65 0.23

2 (120,47) (122.48, 49.13) 0.43 25.64 78.01 0.20

3 (52,33) (49.36, 32.11) 0.41 22.12 62.54 0.15

4 (106,45) (104.82, 43.01) 0.26 10.48 19.08 0.03

5 (189,79) (191.79, 76.71) 0.55* 31.79 154.70* 0.51*

6 (194,65) (193.64, 67.67) 0.39 11.74 31.39 0.07

7 (107,42) (109.77, 40.85) 0.60* 25.38 155.36* 0.55*

8 (216,78) (211.65, 77.08) 0.42 58.01 171.52* 0.43

9 (108,52) (110.89, 53.24) 0.37 27.57 69.85 0.16

10 (103,44) (103.36, 42.58) 0.18 3.60 5.41 0.01

)'03.5,92.7,25.3,39.1(ˆ −=a , )'85.2,20.1,64.1,01.8(ˆ =r , ∑ 2

ie =250.74

Table 9: Model II, m=3, Least-Squares Estimates Under Yang and Ko’s Distance

Case # ),( ii sc )ˆ,ˆ( ii sc iih 2

ie 2

)(ie iCD

1 (96,42) (94.78, 37.79) 0.40 32.73 61.65 0.02

2 (120,47) (122.33, 49.76) 0.43 28.46 78.30 0.04

3 (52,33) (52.31, 19.34) 0.41 298.04* 339.26* 0.08

4 (106,45) (105.00, 42.23) 0.26 15.21 23.87 0.01

5 (189,79) (191.11, 79.64) 0.55* 14.06 67.06 0.05

6 (194,65) (190.90, 79.55) 0.39 366.55* 498.51* 0.07

7 (107,42) (109.06, 43.99) 0.60* 19.08 102.80 0.08

8 (216,78) (209.24, 87.52) 0.42 281.59* 479.11* 0.13*

9 (108,52) (112.67, 45.56) 0.37 131.49 196.43 0.05

10 (103,44) (103.59, 41.61) 0.18 10.11 13.32 0.002

)'17.5,44.7,29.3,13.1(ˆ =a , b =0.43, d = - 3.39, ∑ 2

ie =1197.32

Table 10: Model I, m=10, Least-Squares Estimates Under Yang and Ko’s Distance

Case # ),( ii sc )ˆ,ˆ( ii sc iih 2

ie 2

)(ie iCD

1 (96,42) (93.20, 44.62) 0.40 35.88 100.80 0.24

2 (120,47) (122.48, 49.13) 0.43 26.62 80.97 0.20

3 (52,33) (49.36, 32.11) 0.41 22.29 63.03 0.15

4 (106,45) (104.82, 43.01) 0.26 11.33 20.64 0.03

5 (189,79) (191.79, 76.71) 0.55* 32.92 160.21* 0.51*

6 (194,65) (193.64, 67.67) 0.39 13.28 35.49 0.08

7 (107,42) (109.77, 40.85) 0.60* 25.67 157.10* 0.54*

8 (216,78) (211.65, 77.08) 0.42 58.19 172.06* 0.42

9 (108,52) (110.89, 53.24) 0.37 27.90 70.69 0.15

10 (103,44) (103.36, 42.58) 0.18 4.02 6.06 0.01

)'03.5,92.7,25.3,39.1(ˆ −=a , )'85.2,20.1,64.1,01.8(ˆ =r , ∑ 2

ie =258.10

Table 11: Model II, m=10, Least-Squares Estimates Under Yang and Ko’s Distance

Case # ),( ii sc )ˆ,ˆ( ii sc iih 2

ie 2

)(ie iCD

1 (96,42) (94.93, 37.97) 0.40 32.80 59.96 0.01

2 (120,47) (122.31, 49.77) 0.43 29.93 81.46 0.04

3 (52,33) (52.59, 19.71) 0.41 320.68* 365.36* 0.08

4 (106,45) (105.01, 42.31) 0.26 16.05 25.07 0.01

5 (189,79) (191.09, 79.43) 0.55* 13.36 62.29 0.04

6 (194,65) (190.63, 79.23) 0.39 400.57* 545.32* 0.07

7 (107,42) (108.96, 44.02) 0.60* 18.93 103.48 0.07

8 (216,78) (209.03, 87.16) 0.42 297.67* 502.71* 0.13*

9 (108,52) (112.86, 45.70) 0.37 142.67 210.11 0.05

10 (103,44) (103.59, 41.70) 0.18 10.61 13.82 0.002

)'19.5,40.7,30.3,31.1(ˆ =a , b =0.43, d = - 2.96, ∑ 2

ie =1283.29

5.1 Discussion

From Table 2 and 3, the estimates of center and spread under model I are better than

those estimates in model II. In theory, if we useYang and Ko’s distance, the estimates of

center and spread under model II should be affected by the value of m. But, based on

Tables 5,7,9,11, we found that different m values do not affect very much on the

estimates.

In theory, the distance formula and m values do not affect the estimates of model I

parameters. But, they do affect the parameter estimates in model II. Based on Tables 3

and 5, the usage of different formula has more effect on the parameter estimates in model

II.

Case #5 and #7 have larger leverage values iih , they are possible outliers from the

predictors. In model I, based on the value of ie it seems no possible outliers from the

response variable. However, based on the values of )(ie in Tables 2,4,6,8, case #5,7,8 are

possible outliers from the response variable. In model II under the Euclidean distance,

Tale 3 shows that case #3,5,6,8,9 are the five possible outliers from the response variable.

But, under the Yang and Ko’s distance, Tables 5,7,9,11 show that only case #3,6,8 are the

three possible outliers from the response variable.

Under model I, based on Tables 2,4,6,8,10, case #5,7 have larger iCD values and they

are influential observations. Under model II and Euclidean distance, table 3 shows that

case #3,6,8 have larger iCD values. But, in model II and use Yang and Ko’s distance,

only case #8 has large iCD value and is an influential point (see Tables 5,7,9,11).

If we use exponential membership function for our fuzzy numbers and useYang and

Ko’s distance, how to best choose the m value to do fuzzy liner regression under model

II? The simplest rule is to choose the m value such that the residual sum of squares, 2

ie , is

smallest. Based on tables 5,7,9,11, we can see the best choice is m=2.

APPENDIX

A.1: The derivation of a , b , and d in (3.8)

2D =222)()]1([)()]1([ lSCdablalSCdablaCa +−+Χ+Χ+−−+Χ−Χ+−Χ

= SbalbdalaablCaCCCaaCaa ''21''2''''2'''''' 2222 Χ−Χ+ΧΧ+Χ++Χ−Χ−ΧΧ

+ SSldSlabSlCCaCSdldlabdl '21'2'2'2'2'121'12'12 2222222 +−Χ−+Χ−−+Χ

= SbalbdalaablCCaCaa ''41''4''2'3'6''3 2222 Χ−Χ+ΧΧ++Χ−ΧΧ

+ SSlSdnlndl '242 2222 +−

Let 02

=∂∂a

D, 0

2

=∂∂b

D and 0

2

=∂∂d

D, we obtained

02

=∂∂a

D= SblbdlablCa '21'2'2'3'3 2222 Χ−Χ+ΧΧ+Χ−ΧΧ (A.1.1)

02

=∂∂b

D= Sadaaba ''1'''' Χ−Χ+ΧΧ (A.1.2)

02

=∂∂d

D= Sndba '11'' −+Χ (A.1.3)

From (A.1.1), we obtained

a = )1'ˆ21'ˆ2'ˆ2'3()'(ˆ23

1 22221

22XCblSblSblC

bl+Χ−Χ+ΧΧΧ

+− , substituting a into

(A.1.2) and (A.1.3), we obtained

02

=∂∂b

D= dCnCdnblSCblCb 96'ˆ6ˆ9 2222

2

+−+

+ bSlSCnbdlbdSnl22222 6ˆ'9612 −−− (A.1.4)

02

=∂∂d

D= SdCb −+ (A.1.5)

where CC ˆ')'( 1 =ΧΧΧΧ − , SS ˆ')'( 1 =ΧΧΧΧ − , CCn

='11

, SSn

='11

.

From (A.1.5), we obtained CbSd ˆˆ −= and substituting it into (A.1.4) we obtained a

quadratic equation of b, 032

2

1 =++ KbKbK . The solution is

b =1

31

2

22

2

4

K

KKKK −±−

A.2: The derivation of (4.2), (4.3), and (4.4)

I. Based on Euclidean distance formula, we have 2

)(

2

)(

2

)( )ˆ()ˆ( iiiiiii rxsaxce −+−=

since '1

)( )'(1

ˆˆi

ii

c

ii x

h

eaa −ΧΧ

−−= , and '1

)( )'(1

ˆˆi

ii

s

ii x

h

err −ΧΧ

−−= therefore

2

)(ie = 2'12'1 ))'(1

ˆ())'(1

ˆ( ii

ii

s

iiiii

ii

c

iii xx

h

erxsxx

h

eaxc −− ΧΧ

−+−+ΧΧ

−+−

= 22 )1()

1(

ii

s

i

ii

c

i

h

e

h

e

−+

=

2

1

− ii

i

h

e

II. Based on Yang and Ko’s distance formula, we have

2

ie =222 )]ˆˆ()[()]ˆˆ()[()ˆ( rlxaxlscrlxaxlscaxc iiiiiiiiii +−++−−−+−

= 22 )]ˆ([2)ˆ(3 rxslaxc iiii −+−

= 222 )(2)(3 s

i

c

i ele +

2

)(ie = 2

)(

'

)(

'2

)(

'

)(

2

)( )]ˆˆ()[()]ˆˆ()[()ˆ( iiiiiiiiiiiiiii rlxaxlscrlxaxlscaxc +−++−−−+−

= 2

)(

2

)( ))ˆ((2)ˆ(3 iiiiii rxllsaxc −+−

=3 2)1(

ii

c

i

h

e

−+ 22 )

1(2

ii

s

i

h

el

=

2

1

− ii

i

h

e

A.3: Proof of Lemma 4.1

In order to prove LRd~ is a metric, we need to prove the following three properties:

1. ∀ ∈ΥΧ, )(~

ℜLRF , 0),(~

≥ΥΧLRd . If ),(~

ΥΧLRd =0 then Υ=Χ .

2. ∀ ∈ΥΧ, )(~

ℜLRF , ),(~

ΥΧLRd = ),(~

ΧΥLRd .

3. ∀ Χ ,Υ ,Ζ∈ )(~

ℜLRF , ),(~

ΥΧLRd ≤ ),(~

ΖΧLRd + ),(~

ΥΖLRd .

Since d is a metric, it’s easy to show that properties 1 and 2 are satisfied. We need

to show that property 3 is satisfied:

),(~

ΥΧLRd =∑=

k

i

ii YXd1

2 ),( ≤ ∑=

k

i

ii ZXd1

2 ),( +∑=

k

i

ii YZd1

2 ),(

≤ ),(~

ΖΧLRd + ),(~

ΥΖLRd +2 ∑=

k

i

ii ZXd1

2 ),( ∑=

k

i

ii YZd1

2 ),(

= ( )2),(~

),(~

ΥΖ+ΖΧ LRLR dd

Therefore, ),(~

ΥΧLRd ≤ ),(~

ΖΧLRd + ),(~

ΥΖLRd .

Assume that ( )(ℜLRF , )LRd is a complete metric. Let { }∞=Χ 1m

m be a Cauchy

sequence in )(~

ℜLRF , i.e., ∀ 0>ε , ∃ Ν∈l lmm >∋ ', ⇒ ε<ΧΧ ),(~ 'mm

LRd . Then,

for ∀ lmm >', , ),( 'm

j

m

j XXd < ∑=

k

i

m

i

m

i XXd1

'2 ),( = ε<ΧΧ ),(~ 'mm

LRd . Hence,

∀ kj ≤≤1 , { }∞=1m

m

jX is a Cauchy sequence in )(ℜLRF . Therefore, ∃ ∈jX )(ℜLRF ,

∋ j

m

j XX → . Let )',...,,( 21 kXXX=Χ . Since j

m

j XX → for ∀ 0>ε ,∃ Ν∈jn

∋ for jnm > , we have k

XXd j

m

j

ε<),( , kj ,...,2,1= . Let { }knnnn ,...,,max 21= ,

Then for ∀ nm > , we have ),(~

ΧΧm

LRd = ε<∑=

k

i

i

m

i XXd1

2 ),( . That is, Χ→Χm .

A.4: The derivation of equations (4.6) and (4.7)

Under the Euclidean distance: )ˆ,ˆ(~1

)(2 iLRi dks

CD ΥΥ= = ∑=

n

i

ii YYdks 1

)(

2

2)ˆ,ˆ(

1

=

−+−∑ ∑= =

n

i

n

i

iiiiii rxrxaxaxks 1 1

2

)(

2

)(2)ˆˆ()ˆˆ(

1

=

−+

− ii

ii

s

i

ii

ii

c

i hh

eh

h

e

ks

22

2)

1()

1(

1

=2

2

2 )1(

1

ii

iii

h

he

ks −

Under Yang and Ko’s distance: )ˆ,ˆ(~1

)(2 iLRi dks

CD ΥΥ= = ∑=

n

i

ii YYdks 1

)(

2

2)ˆ,ˆ(

1

= ∑∑==

−−−+−n

i

iiiiii

n

i

iii rlxaxrlxaxaxaxks 1

2

)()(

1

2

)(2)]ˆˆ()ˆˆ[()ˆˆ({

1

+∑=

+−+n

i

iiiiii rlxaxrlxax1

2

)()( })]ˆˆ()ˆˆ[(

=

−+

− ii

ii

s

iii

ii

c

i hh

elh

h

e

ks

2

2

2

2 12

13

1

=2

2

2 )1(

1

ii

iii

h

he

ks −

REFERENCES

1. D. Dubois, H. Prade, ”Fuzzy Sets and Systems: Theory and Applications”, Academic

Publishers, New York, 1980.

2. Zimmermann, H. J., (1996), “Fuzzy Set Theory and Its Applications”, Kluwer

Academic Press, Dordrecht.

3. Draper, N. R. and Smith, H. (1980), “Applied Regression Analysis”, Wiley, New

York.

4. D’Urso, P. and Gastaldi, T. (2000), “A Least-squares Approach to Fuzzy Linear

Regression Analysis”, Computational Statistics and Data Analysis 34, 427-440.

5. D’Urso, P., (2003), “Linear Regression Analysis for Fuzzy/Crisp Input and

Fuzzy/Crisp Output Data”, Computational Statistics and Data Analysis 42, 47-72.

6. Tanaka, H., (1987), “Fuzzy Data Analysis by Possibilistic Linerar Models”, Fuzzy Sets

and Systems 24, 363-375.

7. Tanaka, H., Uejima, S., Asai, K., (1982), “Fuzzy Linear Regression Model”, IEEE

Trans. Systems Man Cybernet 12, 903-907.

8. Xu, R. and Li, C., (2001), “Multidimensional Least-squares Fitting With a Fuzzy

Model”, Fuzzy Sets and Systems 119, 215-223.

9. Yang, M. S. and Ko, C. H., (1996), “On a Class of c-numbers Clustering Procedures

for Fuzzy Data”, Fuzzy Sets and Systems 84, 49-60.

10. Yang, M. S. and Liu, H. H., (2003), “Fuzzy Least-squares Algorihms for Interactive

Fuzzy Linear Regression Models”, Fuzzy Sets and Systems 135, 305-316.

11. PENA, Daniel (2005), “A New Statistics for Influence in Linear Regression”,

Technometrics VOL. 47, NO. 1, 1-12.