estimation of the correlation coefficient using bivariate ranked set sampling with application to...

16
This article was downloaded by: [Case Western Reserve University] On: 30 October 2014, At: 07:26 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Communications in Statistics - Theory and Methods Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lsta20 Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution MOHAMMAD FRAIWAN AL-SALEH a & HANI M. SAMAWI a a Department of Statistics , Yarmouk University , Irbid, Jordan Published online: 15 Feb 2007. To cite this article: MOHAMMAD FRAIWAN AL-SALEH & HANI M. SAMAWI (2005) Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution, Communications in Statistics - Theory and Methods, 34:4, 875-889, DOI: 10.1081/ STA-200054382 To link to this article: http://dx.doi.org/10.1081/STA-200054382 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms- and-conditions

Upload: hani-m

Post on 07-Mar-2017

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

This article was downloaded by: [Case Western Reserve University]On: 30 October 2014, At: 07:26Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Communications in Statistics - Theoryand MethodsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/lsta20

Estimation of the Correlation CoefficientUsing Bivariate Ranked Set Samplingwith Application to the Bivariate NormalDistributionMOHAMMAD FRAIWAN AL-SALEH a & HANI M. SAMAWI aa Department of Statistics , Yarmouk University , Irbid, JordanPublished online: 15 Feb 2007.

To cite this article: MOHAMMAD FRAIWAN AL-SALEH & HANI M. SAMAWI (2005) Estimation of theCorrelation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate NormalDistribution, Communications in Statistics - Theory and Methods, 34:4, 875-889, DOI: 10.1081/STA-200054382

To link to this article: http://dx.doi.org/10.1081/STA-200054382

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

Communications in Statistics—Theory and Methods, 34: 875–889, 2005Copyright © Taylor & Francis, Inc.ISSN: 0361-0926 print/1532-415X onlineDOI: 10.1081/STA-200054382

Multivariate Analysis

Estimation of the Correlation Coefficient UsingBivariate Ranked Set Sampling with Application to the

Bivariate Normal Distribution

MOHAMMAD FRAIWAN AL-SALEH ANDHANI M. SAMAWI

Department of Statistics, Yarmouk University, Irbid, Jordan

Bivariate ranked set sampling (BVRSS) was introduced by Al-Saleh and Zheng(2002) as a bivariate version of the ordinary ranked set sampling (RSS). Theprocedure can be used when we deal with two characteristics simultaneously. In thisarticle, the BVRSS procedure is used to estimate the correlation coefficient betweentwo variables. The proposed estimators are compared to other existing estimatorsbased on bivariate simple random sample (BVSRS). The case of bivariate normaldistribution is considered in details.

Keywords Bivariate normal distribution; Bivariate ranked set sampling;Modified maximum likelihood estimator; Ranked set sampling.

Mathematics Subject Classification 62G05.

1. Introduction

Ranked set sampling (RSS) was first suggested by McIntyre (1952), as a methodfor estimating pasture yields. The RSS procedure consists of drawing m simplerandom samples (SRS) of size m each from a population, and ranking each ofthem by judgment with respect to (w.r.t.) the characteristic of interest. Then, fori = 1� 2� � � � � m, the ith smallest observation from the ith set is chosen for actualquantification. The RSS consists of these m selected units. In practice, it is obviousthat m should be small (2, 3 or 4). If a sample of larger size is needed, then theentire cycle may be repeated r times to produce a RSS sample of size n = rm. Let�̂RSS be the sample average of a RSS of size n = rm, and �̂SRS be the sample averageof a SRS of the same size. It was shown by Takahasi and Wakimoto (1968) that therelative efficiency (RE) of �̂RSS w.r.t. �̂SRS satisfies the following relation:

Received April 14, 2004; Accepted August 27, 2004Address correspondence to Mohammad Fraiwan Al-Saleh, Department of Statistics,

Yarmouk University, Irbid, Jordan; E-mail: [email protected]

875

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

07:

26 3

0 O

ctob

er 2

014

Page 3: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

876 Al-Saleh and Samawi

1 ≤ RE��̂RSS� �̂SRS� =Var��̂SRS�

Var��̂RSS�≤ m+ 1

2�

where the upper bound is achieved if and only if the distribution is uniform. For moreabout RSS, see Kaur et al. (1995) and Patil et al. (1999). For recent works, see Samawiand Al-Saleh (2004), Al-Saleh and Zheng (2003), Zheng and Al-Saleh (2002).

Estimating multiple characteristics using RSS has been considered by fewauthors. McIntyre (1952) suggested applying the RSS procedure to a singleselected characteristic and taking one’s chance regarding the performance of themethod for the other characteristics. Takahasi (1970) studied the behavior of RSSunder random allocations, where in each set a unit was selected at random forquantification and its rank was also recorded. It was required that each rank wasquantified at least once. Patil et al. (1994) investigated two different approaches todeal with multiple characteristics, of which the first is as in McIntyre (1952), thesecond approach is based on the concept of size biased permutation.

A new RSS plan for multiple characteristics was introduced by Al-Saleh andZheng (2002). For simplicity, the method was introduced for two characteristicsand referred to it as bivariate ranked set sampling (BVRSS). The procedure will beoutlined in the next section.

In this article, BVRSS procedure is used for estimating the correlationcoefficient between two variables. This estimation problem using RSS was firstconsidered by Stokes (1977, 1980). Stokes (1980) has shown that the maximumlikelihood estimator (MLE) of the correlation coefficient � of the bivariate normaldistribution using RSS is asymptotically as efficient as that obtained from SRS. Thebivariate data here consists of a RSS of the first variable �Y � together with theirconcomitant X. Stokes (1980) proposed an RSS modification that involves only theextreme order statistics of one component and the corresponding concomitant orderstatistics for the other component. For this modification, the MLE of � turned outto be more efficient than that obtained from a SRS.

The rest of the article is organized as follows. In Sec. 2, we introducesome terminology and important basic results about BVRSS. In Sec. 3, the nonparametric estimation of � is considered as well as the modified MLE, when allparameters of the distribution except � are known. In Sec. 4, the estimation of� when all parameters are unknown is addressed. The case of bivariate normaldistribution is taken as an example and treated in details.

2. Terminology and Some Basdic Results

Suppose �X� Y � is a bivariate random vector with the joint probability densityfunction (pdf) fX�Y �x� y�. To obtain a BVRSS sample, we follow five steps describedbelow (Al-Saleh and Zheng, 2002):

1. For a given set size m, a random sample of size m4 is identified from thepopulation and randomly allocated into m2 pools of size m2 each, where eachpool is a squared matrix with m rows and m columns.

2. In the first pool, identify by judgment the minimum value w.r.t. the firstcharacteristic, for each of the m rows.

3. For the m minima obtained in Step 2, choose the one that corresponds to the(judgment) minimum value with respect to the second characteristic, for actual

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

07:

26 3

0 O

ctob

er 2

014

Page 4: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

Bivariate Ranked Set Sampling 877

quantification. This pair, which resembles the label (1,1), is the first element ofthe BVRSS sample.

4. Repeat Steps 2 and 3 for the second pool, but in Step 3, the one that correspondsto the second minimum value w.r.t. the second characteristic is chosen for actualquantification. This pair resembles the label (1,2).

5. The process continues until the label �m�m� is resembled from the m2th (last)pool.

This process produces a BVRSS sample of size m2. If a sample of higher sizeis required, then the whole process can be repeated r times until the required sizen = rm2 is achieved.

Note that although m4 units are identified for the BVRSS sample (for r = 1),only m2 are chosen for actual quantification. However, all m4 units contributeinformation to the m2 quantified units. It was pointed out by Al-Saleh and Zheng(2002), that a general area of application of BVRSS, is when a variable of interest islinked jointly to two concomitant variables that can be easily ranked without actualquantification.

It should be mentioned here, as noted by the referee, that this procedure mayrequire extra cost for screening and ranking compared to univariate RSS and SRS.It is emphasized, however, that the actual quantification is only done on the lastelements; all rankings in between are done visually or by using rough but cheapquantitative or qualitative information. The extra cost should not be huge, so that itdramatically reduces the advantage of the method over RSS and SRS. The high costfor obtaining a BVRSS sample may lead to a cost-inefficient scheme. The decisionto use this scheme instead of RSS or SRS will rest on whether the efficiency gainedis enough to compensate for the extra work needed (see Al-Saleh, 2004 and Modeet al., 1999).

Assume we have a random sample of m2 square pools of size m2 each. Theelements of each pool are assumed to be randomly divided into m sets of size m.Denote the values of the two characteristics of the elements in the kth pool by��Xk

ij� Ykij�� i = 1� � � � � m� j = 1� � � � � m�, k = 1� � � � � m2, where Xk

ij is the jth elementof the ith row in the kth pool for the first characteristic and Y k

ij is defined similarlyfor the second characteristic. Let Xk

i�j� be the jth minimum of the elements inthe ith row in the kth pool, where i = 1� � � � � m, j = 1� � � � � m, and k = �j − 1�m+1� � � � � jm, and Y k

ij be the corresponding value of Y . Finally, let Y k�i�j be the ith

minimum of the elements Y kij, i = 1� � � � � m, and Xk

i�j� be the corresponding value ofX. Then the BVRSS sample consists of m2 pairs �Xk

i�j�� Yk�i�j�, where i = 1� � � � � m,

j = 1� � � � � m, and k = �j − 1�m+ i, which are independent but not identicallydistributed. Note that the small brackets on subscripts are used to indicate that theordering is perfect while the square brackets are used to indicate that the orderingis w.r.t. the perceived ranks induced by the other variable (concomitant variable).Obviously, the index k can be dropped.

Let the joint pdf of the random vectors �Xi�j�� Yij� and �Xi�j�� Y�i�j� bedenoted by fXi�j��Yij

�x� y� and fXi�j��Y�i�j�x� y�, respectively. The marginal densities

and conditional densities can be denoted similarly. Then from Al-Saleh and Zheng(2002) we have the following identities:

fXi�j��Yij�x� y� = fX�j�

�x�fY �X�y�x�� (1)

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

07:

26 3

0 O

ctob

er 2

014

Page 5: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

878 Al-Saleh and Samawi

m∑j=1

m∑i=1

fXi�j��Yij�x� y� = m2fX�Y �x� y�� (2)

fXi�j��Y�i�j�x� y� = fY�i�j �y�fX�j�

�x�fY �X�y�x�fYjj �y�

� (3)

m∑j=1

m∑i=1

fXi�j��Y�i�j�x� y� = m2fX�Y �x� y�� (4)

fX�x� =1m2

m∑j=1

m∑i=1

fXi�j��x� (5)

fY �y� =1m2

m∑j=1

m∑i=1

fY�i�j �y�� (6)

where fX�j�is the density of the jth order statistic of a random sample of size m

from the marginal density fX , and fY �X�y�x� is the conditional density of Y given X.Clearly, if X and Y are independent, then �Xi�j�� Y�i�j� has the same distributionas �Xi�j�� Y�i�j�, where �Xi�j�� j = 1� � � � � m� i = 1� � � � � m� and �Yi�j�� j = 1� � � � � m� i =1� � � � � m� are m cycles of univariate RSS of set size m from fX and fY , respectively.Hence, in this case, the BVRSS is equivalent to two univariate RSS, one for eachcharacteristics, each of size n = m×m = m2 . On the other hand, if X and Yare perfectly correlated with correlation coefficient � = 1, then �Xi�j�� Y�i�j� hasthe same distribution as �X�i��j�� Y�i��j��, where �X�i��j�� j = 1� � � � � m� and �Y�i��j�� j =1� � � � � m�, are equivalent to one cycle of univariate RSS of set size m from fX�i�

and fY�i� , respectively, for each i = 1� � � � � m . The case of � = −1, can be dealtwith similarly. In fact, if � = −1, then �Xi�j�� Y�i�j� has the same distribution as�X�m−i+1��j�� Y�i��m−j+1��. Thus, the two cases of negative and positive � are equivalent.

3. BVRSS Estimator of �—All Other Parameters Are Known

3.1. Non Parametric Estimation

Assume that �Xi�j�� Y�i�j�� i = 1� � � � � m, j = 1� � � � � m, is a BVRSS sample of size m2

from the joint pdf fX�Y �x� y�. Let �, �, �2X , �

2Y , and � be, respectively, the mean of X,

the mean of Y , the variance of X, the variance of Y , and the correlation coefficientbetween X and Y . We assume that the BVRSS sample can be obtained without errorin the judgment ranking. The interest is in estimating the correlation coefficient �.First, we consider the case when all the parameters except � are known. Withoutloss of generality, let � = � = 0, �2

X = �2Y = 1. This implies that � = Cov�X� Y � =

E�XY�. This assumption is quite strong and rarely valid in practice. However, it ishoped usually that if the estimator in this narrow case has good properties, thenan analog of the estimator in the general case, will roughly retain these properties.If the estimator is not good in this special case, then there is no need to derive itsanalog in the general case.

Under no distributional assumption, the natural estimator to consider for � isthe non parametric estimator given by

�̂BVRSS =∑m

i=1

∑mi=1 Xi�j�Y�i�j

m2� (7)

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

07:

26 3

0 O

ctob

er 2

014

Page 6: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

Bivariate Ranked Set Sampling 879

From identity (2) above we have

E�XY� =∫ �

−�

∫ �

−�xyf�x� y�dxdy

= 1m2

∫ �

−�

∫ �

−�

m∑i=1

m∑i=1

xyfXi�j��Y�i�j�x� y�dxdy

= 1m2

m∑i=1

m∑i=1

E�Xi�j�Y�i�j�� (8)

Hence, E��̂BVRSS� = E�XY� = �. Therefore, �̂BVRSS is an unbiased estimator of �.Moreover,

Var�XY� = E�XY − ��2 = 1m2

∫ �

−�

∫ �

−�

m∑j=1

m∑i=1

�xy − ��2fXi�j��Y�i�j�x� y�dxdy

= 1m2

∫ �

−�

∫ �

−�

m∑j=1

m∑i=1

�xy − �ij�2fXi�j��Y�i�j

�x� y�dxdy

+ 1m2

∫ �

−�

∫ �

−�

m∑j=1

m∑i=1

��ij − ��2fXi�j��Y�i�j�x� y�dxdy

=∑m

i=1

∑mi=1 Var�Xi�j�Y�i�j�

m2+ 1

m2

m∑j=1

m∑i=1

��ij − ��2

= m2Var��̂BVRSS�+1m2

m∑j=1

m∑i=1

��ij − ��2� (9)

where �ij = E�Xi�j�Y�i�j�.Hence,

Var��̂BVRSS� = Var��̂BVSRS�−1m4

m∑i=1

m∑i=1

��ij − ��2� (10)

where �̂BVSRS is the corresponding estimator based on a BVSRS of sizem2 from f�x� y�.The efficiency of �̂BVRSS w.r.t �̂BVSRS is defined by eff��̂BVRSS �̂BVSRS� = Var��̂BVSRS�

Var��̂BVRSS�.

Based on the above observations, we have the following lemma.

Lemma 3.1.

(i) �̂BVRSS is an unbiased estimator of �.(ii) eff��̂BVRSS �̂BVSRS� > 1. More specifically, we have

eff��̂BVRSS �̂BVSRS� =1

1− 1m2Var�XY�

∑mi=1

∑mi=1��ij − ��2

� (11)

Assume that we have r copies of BVRSS of size m2, i.e., n = rm2, then we haver copies of �̂BVRSS , say �̂BVRSS�h, h = 1� 2� � � � � r. In this case the BVRSS estimator isdefined as

�̂BVRSS�r =1r

r∑h=1

�̂BVRSS�h� (12)

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

07:

26 3

0 O

ctob

er 2

014

Page 7: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

880 Al-Saleh and Samawi

Note that �̂BVRSS�h, h = 1� 2� � � � � r, are independent and identically distributed(iid), with common mean � and variance Var��̂BVRSS� given by (10). Hence we maystate the following asymptotic properties of �̂BVRSS�r .

Lemma 3.2. For a fixed set size, m:

(i) �̂BVRSS�r is strongly consistent estimator of �, i.e.,

�̂BVRSS�r

w�p�1→ �� as r → ��

(ii) �̂BVRSS�r is asymptotically normal, i.e.,

√r��̂BVRSS�r − ��

in dist−−−→ N�0� �2�

where �2 is given by (10).

In the worst case of BVRSS, when � = 0, �Xi�j�� Y�i�j� has the same distributionas �Xi�j�� Y�i�j�. But X1�j�� � � � � Xm�j� are iid with the same density as that of X�j� (thejth order statistics of a random sample from fX). Similarly, Y�i�1� � � � � Y�i�m are iidwith the same density as Y�i� (the ith order statistics of a random sample fromfY ). Hence, say �ij = E�Xi�j�Y�i�j� = E�X�j�Y�i�� = E�X�j��E�Y�i�� = �j�i. However,Var�XY � = 1 and therefore, (11) reduced to

eff0��̂BVRSS �̂BVSRS� =1

1−(

1m

∑mi=1 �

2i

)2 � (13)

From Takahasi and Wakimoto (1968), 0 < 1m

∑mi=1 �

2j ≤ m−1

m+1 , with equality ifand only if the marginal densities are uniform. Therefore, we have the followinginequality for the worst value of the efficiency:

0 < eff0��̂BVRSS �̂BVSRS� ≤1m

(m+ 1

2

)2

� (14)

Similarly, in the best case of BVRSS, when � = 1, �Xi�j�� Y�i�j� has the samedistribution as �X�i��j�� Y�i��i��. Hence, �ij = E�Xi�j�Y�i�j� = E��X2

�i��j��; Var�XY � =Var�X2�. Therefore (11) reduced to

eff1��̂BVRSS �̂BVSRS� =1

1− 1m2Var�X2�

∑mj=1

∑mi=1

(E�X2

�i��j��− 1)2 � (15)

Note. It can be easily seen that �̂BVRSS and �̂BVSRS can assume values outside thevalid range of �. To overcome this difficulty, the two estimators can be modified asfollows:

�̃BVRSS =

−1 if �̂BVRSS ≤ −1

�̂BVRSS if − 1 < �̂BVRSS < 1

1 �̂BVRSS ≥ 1

�̃BVSRS can be written similarly.

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

07:

26 3

0 O

ctob

er 2

014

Page 8: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

Bivariate Ranked Set Sampling 881

3.2. Modified Maximum Likelihood Estimator of �

Assume again that we have r copies of the BVRSS denoted by �Xi�j��h� Y�i�j�h �,

h = 1� 2 � � � � r. Then the log-likelihood equation, subject to a constant, is given by

r∑h=1

m∑j=1

m∑i=1

log f�xi�j��h� y�i�j�h�+r∑

h=1

m∑j=1

m∑i=1

log(fY�i�j �y�i�j�h�

fYjj �y�i�j�h�

)� �∗�

Then, under the suitable regularity conditions, the MLE of � satisfies the maximumlikelihood equation

r∑h=1

m∑j=1

m∑i=1

f ′�xi�i��h� y�i�j�h�f�xi�i��h� y�i�j�h�

+r∑

h=1

m∑j=1

m∑i=1

(fY�i�j

�y�i�j�h�

fYjj�y�i�j�h�

)′

(fY�i�j

�y�i�j�h�

fYjj�y�i�j�h�

) = 0�

where the derivative is with respect to the unknown parameter �. The aboveequation can be easily simplified to

r∑h=1

m∑j=1

m∑i=1

f ′�xi�i��h� y�i�j�h�f�xi�i��h� y�i�j�h�

+r∑

h=1

m∑j=1

m∑i=1

[f ′Y�i�j

�y�i�j�h�

fY�i�j �y�i�j�h−

f ′Yjj

�y�i�j�h�

fYjj �y�i�j�h�

]= 0� (16)

Clearly, (16) is more difficult to solve than the maximum likelihood obtainedfrom BVSRS, i.e.,

n∑i=1

f ′�xi� yi�f�xi� yi�

= 0� (17)

A modified MLE can be obtained by solving a modification of (16). Onemodification is to replace the second term of (16) by its expectation. Thismodification was suggested by Mehrotra and Nanda (1974), who estimatedparameters of normal and gamma distributions based on Type II censored data.Zheng and Al-Saleh (2002) considered this modification in the usual RSS, to obtaina modified MLE based on ranked set samples. Now, using the fact that under someregularity conditions

d

d�

∫ �

−�

∫ �

−�fXi�j��Y�i�j

�x� y�dxdy =∫ �

−�

∫ �

−��

��fXi�j��Y�i�j

�x� y�dxdy = 0� (18)

we can see easily that the MLE, is an unbiased estimating equation, i.e., if we denote(16) above by w�X� Y �� = 0, then E�w�X� Y ��� = 0. Furthermore, using (18) againand (2), it can be easily shown that the expectation of the first term of (16) iszero. Hence, the expectation of the second term of (16) is also zero. Therefore, themodified likelihood equation can be simplified to

r∑h=1

m∑j=1

m∑i=1

f ′�xi�i��h� y�i�j�h�f�xi�i��h� y�i�j�h�

= 0� (19)

Also, (18) and (2) imply that this modified likelihood equation is still an unbiasedestimating equation. The solution of (19), denoted by �̂∗

BVRSS , which will be referred

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

07:

26 3

0 O

ctob

er 2

014

Page 9: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

882 Al-Saleh and Samawi

to as MMLE, has the same expression as of the MLE using BVSRS (the solutionof 17). However, the BVSRS data is replaced by the BVRSS data. Thus, as notedby Zheng and Al-Saleh (2002), it is very easy to obtain the MMLE using BVRSS,when the MLE using BVSRS has a closed expression, or if the iterative procedure tosolve the MLE using BVSRS has already being programmed. Note also that, eventhough the derivations above are obtained for one parameter, very minor changesare needed to deal with general parameters (see Sec. 4).

Using similar notations to that of Zheng and Al-Saleh (2002), let

Tr��� =1r

r∑h=1

m∑j=1

m∑i=1

f ′�Xi�j��h� Y�i�j�h�

f�Xi�j��h� Y�i�j�h�= 1

r

r∑h=1

g�Zh�Wh ���

where g�Zh�Wh �� =∑m

j=1

∑mi=1

f ′�Xi�j��h�Y�i�j�h�

f�Xi�j��h�Y�i�j�h�. Then we have the following lemma.

Lemma 3.3. Assume that g is continuous in �, 0 <∫R

∫Rg2�x� y�f�x� y�dxdy < �,

and dd�

∫ �−�

∫ �−� f�x� y�dxdy = ∫ �

−�∫ �−�

���f�x� y�dxdy= 0. Let n= rm2, then as n→�,

and fixed m, we have

√nTr���

D→N�0�������

where ���� = m4I���−m2 ∑mj=1

∑mi=1

(E( f ′�Xi�j��1�Y�i�j�1�

f�Xi�j��1�Y�i�j�1�

))2, I��� is the Fisher

information number given by I��� = E� ���log f�x� y��2.

Proof. Clearly, E�Tr���� = 0. Also we have

Var�Tr���� =1rE

(( m∑j=1

m∑i=1

f ′�Xi�j��1� Y�i�j�1�

f�Xi�j��1� Y�i�j�1�

)2)

= 1r

m∑j=1

m∑i=1

E

(f ′�Xi�j��1� Y�i�j�1�

f�Xi�j��1� Y�i�j�1�

)2

+ 1r

∑ ∑�i�j�=�i′�j′�

E

(f ′�Xi�j��1� Y�i�j�1�

f�Xi�j��1� Y�i�j�1�

)(f ′�Xi′�j′��1� Y�i′�j′�1�

f�Xi′�j′��1� Y�i′�j′�1�

)

= m2

rI���− 1

r

m∑j=1

m∑i=1

(E

(f ′�Xi�j��1� Y�i�j�1�

f�Xi�j��1� Y�i�j�1�

))2

The normality follows from that g�Zh�Wh ��� h = 1� � � � � r, are iid randomvariables.

Let �0 is the true value of �, then by Taylor’s expansion of Tr��̂∗BVRSS� around

�0, we have

Tr��̂∗BVRSS�− Tr��0� = T ′

r ��0���̂∗BVRSS − �0�+

12T ′′r �����̂

∗BVRSS − �0�

2�

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

07:

26 3

0 O

ctob

er 2

014

Page 10: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

Bivariate Ranked Set Sampling 883

where � is a number between �̂∗BVRSS and �0. However Tr��̂

∗BVRSS� = 0, therefore, we

have,

√n��̂∗

BVRSS − �0� = −√nTr��0�

T ′r ��0�+ 1

2T′′r �����̂

∗BVRSS − �0�

Under some regularity conditions, the strong consistency of �̂∗BVRSS , that T ′

r ��0� →−m2I2��0� in probability, (see Zheng and Al-Saleh, 2002), and from Lemma (3.3),we can conclude the following.

Lemma 3.4. With the above set up, and under the usual regularity conditions, if �0 isthe true value of � then we have

√n��̂∗

BVRSS − �0�D→N

(0�

���0�

m4I2��0�

)� (20)

Hence, the asymptotic efficiency of the MMLE based on a BVRSS of size n w.r.t. theMLE based on a BVSRS of size n is given by

eff��̂∗BVRSS �̂

∗BVSRS� =

1

1− 1I��0�m

2

∑mj=1

∑mi=1

(E(

f ′�Xi�j��1�Y�i�j�1�

f�Xi�j��1�Y�i�j�1�

))2 � (21)

Clearly,

eff��̂∗BVRSS �̂

∗BVSRS� ≥ 1�

For more properties of this modification of the MLE, see Bhattacharyya (1985)and Zheng and Al-Saleh (2002).

3.3. Example: Bivariate Normal Distribution

Assume that �Xi�j�� Y�i�j�� i = 1� � � � � m, j = 1� � � � � m, is a BVRSS sample of size m2

from the joint pdf

fX�Y �x� y� =1

2��X�Y

√1− �2

e− 1

2�1−�2�� x−��X

�2+� y−��Y

�2−2�� x−��X

�� y−��Y

�(22)

where, �, �, �2X , �

2Y , and � are, respectively, the mean of X, the mean of Y , the

variance of X, the variance of Y , and the correlation between X and Y . Denotethis bivariate normal density by N2��� �� �

2X� �

2Y � ��. If � = � = 0 and �2

X = �2Y = 1,

then the density is reduced to N2�0� 0� 1� 1� ��. Using (2) above, the joint density ofXi�j�� Y�i�j can be written as

fXi�j��Y�i�j�x� y� = fY�i�j �y�fX�j�

�x�fY �X�y�x�fYj �y�

= m

(m− 1i− 1

)�FY�i�j

�y��i−1�1− FY�i�j�y��m−im

(m− 1j − 1

)

× �FX�x��j−1�1− FX�x��

m−jfX�x�fY �X�y�x�� (23)

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

07:

26 3

0 O

ctob

er 2

014

Page 11: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

884 Al-Saleh and Samawi

where FY�i�j�y� is the distribution function of the ith order statistics of a SRS of size

m from the density of the concomitant random variable Yj given by

fYj �y� =∫ �

−�fX�j�

�x�fY �X�y�x�dx�

fX is the density of the standard normal random variable, FX is its distribution, andfY �X is the conditional density, which is normal with mean �x and variance �1− �2�.

For least set size but most practical (i.e., m = 2�, we have

fY2 �y� = 2��y�1√

1− �2

∫ �

−���x��

(x − �y√1− �2

)dx = 2��y��

(�y√2− �2

)

fY1 �y� = 2��y��( −�y√

2− �2

)= fY2 �−y��

(24)

where � and � are the density and cumulative distribution function of the standardnormal distribution, respectively. Hence,

fX1�1��Y�1�1�x� y� = 4��−x�f�x� y�

(1−

∫ y

−�2��z��

( −�z√2− �2

)dz

)

= 4��−x�f�x� y�− fX2�1��Y�2�1�x� y�

fX1�2��Y�1�2�x� y� = 4��x�f�x� y�

(1−

∫ y

−�2��z��

(�z√2− �2

)dz

)

= 4��x�f�x� y�− fX2�2��Y�2�2�x� y��

(25)

Using these joint densities, it can be shown easily that

�2�1 = �2�2

�1�1 = �1�2 = 2�− �2�2

(26)

Hence, for the non parametric estimator, we have

eff��̂BVRSS �̂BVSRS� =1

1− 11+�2

��2�2 − ��2� (27)

Note that Var�XY � = 1+ �2 is an increasing function in ��� with minimum value of1 at � = 0. This means that as ��� gets large, the estimation of � gets more difficult.It can be shown that �2�2 = E�X2�2�Y�2�2� is given by

�2�2 =√2�

�1− �2�

√2− �2

4− �2+∫ �

−�2�y2FY2

�y�fY2 �y�dy� (28)

For � = 0� �2�2 = 1�

and hence, eff��̂BVRSS �̂BVSRS� = 11− 1

�2= 1�1127, while

as � → 1, �2�2 →∫ �−� 4y2�3�y���y�dy = 1+ 12

∫ �−� y�2�y��2�y�dy = 1+

√3�

andhence, eff��̂BVRSS �̂BVSRS� → 1

1− 32�2

= 1�1792. For other positive values of �, the

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

07:

26 3

0 O

ctob

er 2

014

Page 12: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

Bivariate Ranked Set Sampling 885

second term is calculated numerically. It can also be checked easily that the function�2�2 − � is symmetric in � and hence the efficiency depends only on � through ���.Table 1 gives the values of eff��̂BVRSS �̂BVSRS� for some selected values of � and m.It can be seen from the table, that eff��̂BVRSS �̂BVSRS� is increasing in �. For m = 2,the lower and upper bounds are given by the following inequality:

�2

�2 − 1≤ eff��̂BVRSS �̂BVSRS� ≤

2�2

2�2 − 3� (29)

As expected, the efficiency is increasing in m. However, in practice m should be keptsmall, less than four say; otherwise error in ranking can reduce the gain of efficiency.

Now, the MMLE of � based on a BVRSS of size n = rm2 is the solution of theequation

r∑h=1

m∑j=1

m∑i=1

���2 − 1�− �1+ �2�xi�j��hy�i�i�h + ��x2i�j��h + y2�i�i�h �

�1− �2�2= 0� (30)

The solution of the above cubic equation can be found, see Johnson and Kotz (1972).Asm → � and r → �, one can show that the probability that the above equation hasmore than one root tends to zero, see Dickson (1939) and Stokes (1980).

Since I��� = 1+�2

�1−�2�2, the asymptotic efficiency of �̂∗

BVRSS with respect to the usualMLE of � based on a BVSRS of size n = rm2, �̂∗

BVSRS , is

AE��̂∗BVRSS �̂

∗BVSRS� =

1

1− 1m2�1+�2��1−�2�2

Em

(31)

where,

Em =m∑j=1

m∑i=1

[E(�3 − �− �1+ �2�xi�j�y�i�i + ��x2i�j� + y2�i�i�

)]2�

Table 1eff��̂BVRSS �̂BVSRS� for m = 2� 3� 4� 5

� m eff��̂BVRSS �̂BVSRS�

.2 2 1.1118

.5 2 1.1344

.9 2 1.1699

.2 3 1.3056

.5 3 1.3592

.9 3 1.5267

.2 4 1.5105

.5 4 1.6106

.9 4 1.9411

.2 5 1.7181

.5 5 1.8632

.9 5 2.3776

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

07:

26 3

0 O

ctob

er 2

014

Page 13: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

886 Al-Saleh and Samawi

Table 2eff��̂∗

BVRSS �̂∗BVSRS�

� m AE��̂∗BVRSS �̂

∗BVSRS�

.2 2 1.0998

.5 2 1.0512

.9 2 1.0025

.2 3 1.2602

.5 3 1.1349

.9 3 1.0086

.2 4 1.4313

.5 4 1.2260

.9 4 1.0175

.2 5 1.6041

.5 5 1.3195

.9 5 1.0303

Table 2 contains the values of AE��̂∗BVRSS �̂

∗BVSRS�, for some selected values of

m and �. As expected, the asymptotic efficiency is increasing in m for each fixed�. However, for each fixed m, the efficiency is decreasing in �. It can be seen fromEqs. (16) and (19), that the amount of information lost by using the modifiedMLE, depends on the density fYj �y� =

∫ �−� fX�j�

�x�fY �X�y�x�dx. Equivalently, the lostinformation about � depends on fY �X�y�x�. The loss is maximum when � = 0 andminimum when � = 1. This is would be a justification of the behavior of efficiencyas � increases.

Also, the relative efficiency of the estimator is simulated in the case of finitesample size. Table 3 contains the efficiency for r = 1 and m = 3, 4, and 5,for some selected values of �. The simulation was based on 5000 replicationsfrom bivariate normal BVN(0,0,1,1, �). Also, the bias is calculated for the twocompeting estimators. It can be seen from the table that the relative efficiency hassimilar behavior as the asymptotic one. The bias for both estimators is relativelynegligeable.

Table 3eff��̂∗

BVRSS �̂∗BVSRS� r = 1

� m E��̂∗BVRSS �̂

∗BVSRS� �Bias��̂∗

BVRSS�� �Bias��̂∗BVSRS��

.2 3 1.14 .0251 .0415

.5 3 1.20 .0118 .0142

.9 3 1.13 .0006 .0005

.2 4 1.44 .0007 .0109

.5 4 1.34 .0019 .0101

.9 4 1.03 .0007 .0002

.2 5 1.51 .0057 .0097

.5 5 1.50 .0061 .0003

.9 5 1.04 .0000 .0002

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

07:

26 3

0 O

ctob

er 2

014

Page 14: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

Bivariate Ranked Set Sampling 887

Another example that may be of interest and was suggested by the refereeis the bivariate exponential distribution (Downton’s form). In this distribution,the marginal of the two variates are exponential; so the marginal densities arenot symmetric, as in the normal case, but skewed distributions. The conditional ofone of the two variate given the other is not exponential; so conditionally, theyare not linearly related as in the case of the bivariate normal distribution. Theestimation of the parameters of this distribution using different RSS plans is nowunder consideration.

4. Estimation of �: All Parameters Are Unknown

Assume that all the parameters, �, �, �2X , �

2Y , and �, are unknown. In the BVSRS,

the MLEs of the parameters based on {�Xi� Yi�� i = 1� � � � n� are the solution ofthe maximum likelihood equation (17), where the derivative in this case is withrespect to the vector (�, �, �2

X , �2Y , �). The solution in the case of bivariate normal

distribution is well known. If we have a BVRSS from a bivariate normal distributionthen the likelihoods as well as the maximum likelihood equations are so complicatedand do not appear to be easily solvable. Using the modified MLE eases the problem.In this case, it is easy to see that (19) is still valid, where the derivative is with respectto (�, �, �2

X , �2Y , �), i.e., to find the modified MLE for each of the five parameters, we

need to solve five modified maximum likelihood equations for the five parameters.The solution of (19) is the same as that of (17) except that the BVSRS data isreplaced by the MBVRSS data. Thus, the modified MLE of � is

�̂∗∗BVRSS =

∑rh=1

∑mi=1

∑mi=1�Xi�j��h�−XBVRSS��Y�i�j�h − Y BVRSS�√∑r

h=1

∑mi=1

∑mi=1�Xi�j��h�−XBVRSS�

2∑r

h=1

∑mi=1

∑mi=1�Y�i�j�h − Y BVRSS�

2�

(32)

where,

XBVRSS =1n

r∑h=1

m∑i=1

m∑i=1

Xi�j��h�

Y BVRSS =1n

r∑h=1

m∑i=1

m∑i=1

Y�i�j�h�

Table 4 contains some simulated values of the efficiency in the case of bivariatenormal distribution. We observed that the efficiency is not effected by the valuesof the other unknown parameters, �, �, �2

X , �2Y , and hence we based our simulation

on the values � = � = 0 and �2X = �2

Y = 1. It can be seen from Table 4 that �̂∗∗BVRSS

is substantially more efficient than �̂∗∗BVSRS , the usual MLE from BVSRS. We have

observed also that the bias in all the cases we considered is negligible. Note that forfixed m, eff increases in �. For small values of �, Eff is increasing in m, while forlarge values of �, eff is decreasing in m.

Stokes (1980) proposed two other approaches to obtain a bivariate RSS,for the purpose of estimating � in the case of bivariate normal distribution. Inthe first approach, the bivariate RSS sample consists of the Y ’s together withtheir concomitant X’s. This approach turned out to be inefficient in estimating �compared to BVSRS. In the second approach, the BVRSS is obtained by including

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

07:

26 3

0 O

ctob

er 2

014

Page 15: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

888 Al-Saleh and Samawi

Table 4eff��̂∗∗

BVRSS �̂∗∗BVSRS� r = 1

� m Eff��̂∗∗BVRSS �̂

∗∗BVSRS�

0.2 2 1.11350.5 2 1.21080.9 2 2.07640.2 3 1.26770.5 3 1.44290.9 3 2.12320.2 4 1.51640.5 4 1.58390.9 4 1.88450.2 5 1.73000.5 5 1.73250.9 5 1.8754

only the extreme Y ’s together with their concomitant X’s. Estimation of � based ona BVRSS obtained this second approach found to be substantially more efficientthe corresponding estimators obtained based on BVSRS. The author investigatedthe case when all parameters are unknown. The BVRSS in this case consists of n

2order statistics from each extreme. The asymptotic efficiency of MMLE of �� �̂s,w.r.t. �̂∗∗

BVSRS is given in Table 5 which is taken from Stokes (1980).Though the efficiency is high, the author raised the concern about the non

robustness of the procedure to deviation from bivariate normality. Stokes’ BVRSSprocedure is unbalanced in the sense that it only chooses the two extremes. Thecurrent procedure is balanced because it takes all ranks into consideration. Thus, webelieve that the current procedure is likely to be robust to deviation from normality.Actually, as we have seen in previous sections, this procedure has some favorableproperties regardless of the underlying distribution.

5. CONCLUDING REMARKS

The use of bivariate ranked set sampling (BVRSS) is limited to situations, whereranking of a small number of units by judgment can be done with negligible ranking

Table 5Efficiency of �̂s w.r.t. �̂

∗∗BVSRS

� m r Eff��̂s �̂∗∗BVSRS�

0.00 2 5 1.000.50 2 5 0.830.75 2 5 0.400.00 5 2 1.750.50 5 2 1.650.75 5 2 1.37

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

07:

26 3

0 O

ctob

er 2

014

Page 16: Estimation of the Correlation Coefficient Using Bivariate Ranked Set Sampling with Application to the Bivariate Normal Distribution

Bivariate Ranked Set Sampling 889

errors. When the BVRSS is applicable, then the usual estimators obtained aremore efficient than their random sampling counterparts even for very small setsize. We believe that the balanced BVRSS that we consider in this article may notbe the optimal plan. Searching for an optimal plan for specific types of bivariatedistributions is a possible future work. The procedure should be next investigatedfor other bivariate distributions, especially those for which the two variables may beconditionally nonlinearly related (for example bivariate exponential distribution).

References

Al-Saleh, M. F. (2004). Steady state ranked set sampling and parametric estimation. J. Statist.Plann. Infer. 123:83–95.

Al-Saleh, M. F., Zheng, G. (2002). Estimation of bivariate characteristics using ranked setsampling. Austral. New Zealand J. Statist. 44:221–232.

Al-Saleh, M. F., Zheng, G. (2003). Controlled sampling using ranked set sampling.J. Nonparametric Statist. 15:505–516.

Bhattacharyya, G. K. (1985). The asymptotic of maximum likelihood and related estimatorsbased on type II censored data. J. Amer. Statist. Assoc. 80:398–404.

Dickson, L. E. (1939). New First Course in the Theory of Equations. New York: JohnWiley & Sons.

Johnson, N. L., Kotz, S. (1972). Continuous Multivariate Distributions. New York: JohnWiley & Sons.

Kaur, A., Patil, G. P., Sinha, A. K., Taillie, C. (1995). Ranked set sampling: an annotatedbibliography. Environ. Ecol. Statist. 2:25–54.

McIntyre, G. A. (1952). A method for unbiased selective sampling using ranked sets. Austral.J. Agricul. Res. 3:385–390.

Mehrotra, K. G., Nanda, P. (1974). Unbiased estimation of parameters by order statistics inthe case of censored samples. Biometrika 61:601–606.

Mode, N., Conquest, L., Marker, D. (1999). Ranked set sampling for ecological research:accounting for the total costs of sampling. Environmetrics 10:179–194.

Patil, G. P., Sinha, A. K., Taillie, C. (1994). Ranked set sampling for multiple characteristics.Intern. J. Ecology Environ. Sci. 20:357–373.

Patil, G. P., Sinha, A. K., Taillie, C. (1999). Ranked set sampling: a bibliography. Environ.Ecol. Statist. 6:91–98.

Samawi, H., Al-Saleh, M. F. (2004). On bivariate ranked set sampling for quantile estimationand quantile interval estimation using ratio estimator. Commun. Statist. Theor. Meth.33(8):1801–1819.

Stokes, S. L. (1977). Ranked set sampling with concomitant variables. Commun. Statist. Theor.Meth. 6:1207–1211.

Stokes, S. L. (1980). Inferences on the correlation coefficient in bivariate normal populationfrom ranked set sampling. JASA 75:989–995.

Takahasi, K. (1970). Practical note on estimation of population means based on samplesstratified by means of ordering. Ann. Inst. Statist. Math. 22:421–428.

Takahasi, K., Wakimoto, K. (1968). On unbiased estimates of the population mean basedon the sample stratified by means of ordering. Ann. Inst. Statist. Math. 20:1–31.

Zheng, G., Al-Saleh, M. F. (2002). Modified maximum likelihood estimators based onranked set sampling. Ann. Instit. Statist. Math. 54:641–658.

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

07:

26 3

0 O

ctob

er 2

014