estimation of population parameters in survey...

24
1 Chapter-I Introduction For a phenomenon that varies over a continuous (or even a large finite) spatial domain, it is seldom feasible to observe every potential datum of some study variable associated with that phenomenon. Thus, important parts of statistics are statistical sampling theory as well as design of experiment where inference about the study variable may be made from a subset or sample of the population of potential data. The theory of sampling and design of experiment has its origin way back in the history of mankind. Spatial sampling refers to the sampling of geo-referenced of spatially labeled phenomena. In the spatial context, interest is usually in the prediction of the study variable at un-sampled sites. For such purpose, it is necessary to collect information regarding the population with respect to some characteristics. For example, in agricultural surveys, to estimate the production of food, the data are collected on some portion of land under different crops. Most of the government and non-government bodies collect information regularly about the total population, its geographical distribution, sex, age, age-sex etc. for future planning. In business, information is also required for the role and character of wholesale, retail and service traders etc. Given some predictand together with its predictor, a best sampling plan or network refers to the choice of locations at which to sample the phenomenon in order to achieve optimally according to a given criterion. In practice, optimal sampling plans may be extremely difficult to achieve, but atleast good sampling plans may be obtained and designed by constructing best sampling frame which can be formed by using appropriate technique of design of experiments.

Upload: buinguyet

Post on 11-May-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

1

Chapter-I

Introduction

For a phenomenon that varies over a continuous (or even a large finite) spatial domain, it

is seldom feasible to observe every potential datum of some study variable associated with that

phenomenon. Thus, important parts of statistics are statistical sampling theory as well as design

of experiment where inference about the study variable may be made from a subset or sample of

the population of potential data. The theory of sampling and design of experiment has its origin

way back in the history of mankind. Spatial sampling refers to the sampling of geo-referenced of

spatially labeled phenomena. In the spatial context, interest is usually in the prediction of the

study variable at un-sampled sites. For such purpose, it is necessary to collect information

regarding the population with respect to some characteristics. For example, in agricultural

surveys, to estimate the production of food, the data are collected on some portion of land under

different crops. Most of the government and non-government bodies collect information

regularly about the total population, its geographical distribution, sex, age, age-sex etc. for future

planning. In business, information is also required for the role and character of wholesale, retail

and service traders etc. Given some predictand together with its predictor, a best sampling plan

or network refers to the choice of locations at which to sample the phenomenon in order to

achieve optimally according to a given criterion. In practice, optimal sampling plans may be

extremely difficult to achieve, but atleast good sampling plans may be obtained and designed by

constructing best sampling frame which can be formed by using appropriate technique of design

of experiments.

2

The main objective of the thesis is to construct best design of sample survey and design

of experiments for estimating the parameters of interest and testing the hypothesis under study

respectively. It has been done by developing an appropriate procedure for selecting best sample

from the population and best partially balanced incomplete block designs under the given

constraints. The best estimators and best test statistic have been constructed for estimating the

parameters and testing the hypothesis based on the sample having minimum extraneous

variation.

Design of experiments plays a vital role in statistical research. It includes planning of

experiment, obtaining pertinent information from it regarding the population under study,

making a statistical analysis of the data and drawing valid inferences from it. Proper

consideration of the statistical analysis before conducting the experiment, forces (stimulates) the

experimenter to plan more carefully the design of the experiment. The observations obtained

from a carefully planned and well designed experiment in advance having homogeneous

experimental materials give entirely valid inferences over the treatments under study without

introducing the complexity of design of experiment. But in most of the practical situations,

experimental material (sample) is not homogeneous. Therefore block designs are extensively

used to reduce the heterogeneity among the experimental material. Block designs are used in

many fields of research as these are more practicable. When block size becomes very large (large

number of treatments under study) then the purpose of blocking is defeated as it is sometimes

impracticable to get one complete replicate of units which is relatively homogeneous and as a

result the heterogeneity is introduced in the designed experiment and reduces the discriminating

3

power of the tests of significance. Hence the precision of the block design is adversely affected if

the treatments are large in number. In order to maintain the homogeneity within the blocks, the

experimenter must either cut down the number of treatments which means loss of information

over sacrificed treatments or use an incomplete block design. Therefore, in such situations

incomplete block designs are used because there is no loss of information for using them. That is

why, incomplete block designs are very popular and widely used in agriculture, animal

husbandry and some allied fields. Among the class of binary, equi-replicated, equi-block sized

designs, balanced incomplete block (BIB) designs are the most efficient class of designs. But, it

is difficult to develop BIB designs for all parametric combinations needed for various

experimental situations. So, in such situations, partially balanced incomplete block (PBIB)

designs have been generally used. PBIB designs are based on the concept of association scheme

defined on treatments. First of all, we define terminology used in the thesis.

Population : In a given region at a particular point or period of time, the collection of

identifiable and distinguishable units on which observations can be made or from which the

required statistical information can be ascertained according to a well defined procedure, is

termed as population. Mathematically, a population P is defined as

P = {U1, U2,…, UN}

Where Uj; j = 1, 2,…, N denote the distinguishable and identifiable units of the population.

4

Sample : A sample of size n, selected under some procedure from a given finite population P of

size N, is defined as a finite sequence

s = {Uj1, Uj2,…, Ujn}; j = 1, 2,…, N

of n units.

Sample Space : The collection of all possible samples is generally known as sample space

denoted by S = {s}.

Sampling Design : Let P(s) denotes the probability measure of sample s S. The duplet (S, P(s))

is known as sampling design and is denoted by ps.

Parameter : Let y denotes the real valued variable defined on each unit of the population about

which certain conclusions are to be drawn and Yj ; j = 1, 2,…, N denote the value of variable y

on the jth

unit of the population. Any function of Y1, Y2,…,YN is known as a population

parameter or simply a parameter. For example, the customary parameters of interest in survey

sampling are :

Population Total =

N

j

j YY1

Population Mean =

N

j

j YYN 1

1

Population Variance =

N

j

yj YYN 1

221

Population Coefficient of Variation = y

yC

Y

, etc.

5

Statistic and Estimator : Any function of sample values of variable y, independent of unknown

parameters, is known as statistic. A statistic used for estimating the parameter of interest is

termed as an estimator of that parameter.

Sampling Strategy : Let ts be an estimator of parameter under any sampling design ps. The

duplet (ps , ts) is known as sampling strategy for estimating . The mean square error for

sampling strategy (ps , ts) is denoted by MSE(ps , ts) or MSE(ts) and is defined as:

2, spss tEtpMSE

s

Where, spE denotes the expectation under sampling design sp .

A sampling strategy (ps, ts (1)

) for estimating is said to be better than any other

sampling strategy (ps, ts (2)

) for estimating if

21

ss tMSEtMSE

The efficiency of sampling strategy (ps , ts(1)

) over the sampling strategy (ps , ts(2)

) for

estimating is defined as

1

2

s

s

tMSE

tMSEE

6

Association Scheme : An abstract relationship defined on v symbols (treatments) is called an m-class

association scheme (m2) if the following conditions are satisfied:

1. Any two symbols and are either 1st , 2

nd,…, or m

th associates, the relation of

association being symmetrical i.e. if the symbol is the ith associate of the symbol

then is the ith

associate of .

2. Each symbol has ni, ith

associates and the number ni being independent of .

3. If any two symbols and are ith

associates, then the number of symbols which are

simultaneously jth

associate of and kth

associates of is i

jkp and is independent of

the pair of ith

associates and .

The numbers v, ni and i

jkp (i, j, k=1, 2,…, m) are called the parameters of the m-class

association scheme. Here i

jkp ‟s are represented by mxm matrices as follows:

mxm

i

mm

i

m

i

m

i

m

ii

i

m

ii

i

ppp

ppp

ppp

P

.

....

.......

.......

.......

.......

....

....

21

22221

11211

; i = 1, 2, 3,…, m

7

i

kj

i

jk pp was taken a part of the definition of the association scheme till Bose and Mesner

(1959) showed that the condition was redundant. For a given association scheme for v symbols

(treatments), Bose and Mesner (1959) defined PBIB designs as follows:

If we have an association scheme with m associate classes and given parameters , then a

PBIB design with m associate classes is obtained if the v symbols (treatments) are arranged into

b sets (blocks) of size k (k<v) such that

1. Every symbol occurs at the most once in a set (block).

2. Every symbol occurs in exactly r sets (blocks).

3. If two symbols and are ith

associates, then they occur together in

i blocks or sets, the number i being independent of the particular pair of ith

associates and .

Conventionally all the numbers v, b, r, k, i (i = 1, 2,…, m) are called the parameters of

the design. If every symbol is taken as its 0th

associates, then

no = 1, ijiij np 0 , ik

i

kp 0 , o = r,

Where ij is defined as

, when i = j

, otherwise

The following relations between parameters of the association scheme and of the PBIB

designs are well known [Ch. Raghavarao (1971) p. 123-124]:

0

1ij

8

vr = bk, 11

vnm

i

i , 11

krn i

m

i

i

ijj

m

k

i

jk np 1

, k

ij

j

ikj

i

jki ppnpn

For m=2, Bose and Shimamoto (1952) classified the known PBIB designs into Group

Divisible (GD), Simple (S.I.) , Triangular (T2) , Latin square (Li) and cyclic designs.

The main problem in sample survey is to estimate the population parameters of interest of

characteristics under study with the desired precision and accuracy within the available resources

and to develop the procedures of selection of sampling units. Survey sampling techniques are

being implemented in private and government bodies like in state, industry, business, scientific

institutions, public organizations and international agencies. Market research, demography and

epidemiological studies are heavily dependent on the sampling approach. Survey sampling is

responsible for gathering reliable information on the rates of births, deaths, population growth,

incidence of diseases, on present nutritional standard, level of education and living conditions of

the people for planning for the improvement of social and economic life of the people. This

indicates that survey sampling plays a vital role in the maintenance and development of the

economic and social welfare of people in a country. Survey sampling also most widely used in

research studies carried out in various fields of sciences as well as social sciences. A sample

survey may also become a necessity in dealing with characteristics where serious biases or non-

sampling errors are expected when special precautionary measures cannot be taken during

9

collection and tabulation of data. Sampling theory plays a very important role to develop the test

statistic for testing the hypothesis.

Just as in sampling theory where the interest lies in finding out the estimators (statistic) of

interest based upon the sample selected, the main interest in design of experiment is to find out

the best possible combination of treatments under the given conditions so that we may be able to

get a group of treatments which is significantly effective as compared to any other set of

treatments over a predetermined experimental area. In 1920, Fisher introduced the concept of

design of experiment in which he was able to get a set of treatments which is significantly

effective under the condition that the experimental units are relatively homogeneous, which gave

rise to the concept of completely randomized design (CRD). The condition of homogeneity is not

usually justified except in some type of laboratory experiments. To overcome this problem, first

a randomized complete block design (RCBD) and later on, latin square designs (LSD) were

introduced. These are actually the designs in which blocking is complete in the sense that all

treatments under investigation occur in each block of RBD and in each row and column of LSD.

When the number of treatments under study is large then the size of the block/row/column

increases, as a result, variation enters among the experimental units of block/row/column.

Obviously, experimental units within a block will not remain relatively homogeneous. So, we

have already explained that partially balanced incomplete block designs are generally used in

such situations.

10

In most of the survey sampling situations, we may have information on more than one

variable defined on each unit of the population, which may be highly correlated with each other.

Suppose one of them is a variable under study y and information on other variables, termed as

auxiliary variables. The information on auxiliary variables may be available in one form or the

other or can be made available by diverting a part of the survey resources at a moderate cost. For

example, in many repetitive surveys, the value of the same variable on a previous occasion say

last census is usually taken as the values of auxiliary variable. In such situations, it is a widely

accepted phenomenon that the efficiency of usual estimators of population parameter of interest

can be increased by utilizing judiciously such information of auxiliary variables. In whatever

form of the information, on auxiliary variable(s) is available, one may always utilize it to devise

sampling strategies which are better than those in which no auxiliary information is used. In

survey sampling, as per established facts, the information on auxiliary variables may be utilized

at the following three stages:

i. At the stage of planning or designing of the survey i.e. in stratifying the population.

ii. At the stage of sample selection i.e. in selecting the units for the sample by the use of

unequal probabilities sampling with probability proportional to some measure of size

of the units based on auxiliary variables

iii. At the stage of estimation i.e. in the use of ordinary ratio, product, difference and

regression estimators etc.

11

For estimating the parameters of interest, ratio/ratio-type and product/product-type

estimators have been widely used when there is respective positive and negative correlation

among the variable under study and auxiliary variables. Among the three estimators, namely

ratio estimator, product estimator and mean per unit estimator of population mean, one should

use the

Ratio estimator when y

x

C

C

2

Product estimator when y

x

C

C

2

Mean per unit when y

x

y

x

C

C

C

C

22

Where Cy and Cx are coefficient of variation of variables y and x respectively and is the

correlation coefficient between them.

The importance of stratification in survey sampling is well known. Sometimes, it becomes

necessary to stratify the population to get the valid, accurate and efficient results regarding the

parameter. To increase the efficiency of the estimators, the population is stratified in such a way

that the units belonging to same stratum (block) are as possible as homogeneous with respect to

the variable under study y. Let x is the auxiliary variable denoting the size of the units which is

highly correlated with variable y. If the distribution of the study variable y is completely known

then the boundary points of the strata are determined on the basis of distribution of variable y so

12

that the variance of the estimator becomes minimum for a given allocation. Otherwise the

knowledge about the distribution of auxiliary variable x may be used for the same purpose.

In general practice, it is expected that the bigger units in the population will contribute more than

the smaller ones. So, the selection procedure giving higher selection probability to bigger units

than to the smaller ones would be more efficient than the procedure giving equal selection

probabilities to all units. Hansen and Hurwitz (1943) were the first who discussed the selection

of units with probability proportional to some size measure based on auxiliary variable with

replacement. Later on, in 1949, they showed that the sampling with probability proportional to

the square root of the size of the unit is more efficient than the probability proportional to size

(PPS) under certain conditions. Horvitz and Thomsom (1952) discussed the theory of sampling

with probability proportional to size and without replacement. Lahri (1951) suggested a method

of selection of a sample with PPS scheme and with replacement. Later on, Rao, Hartley and

Cochran (1962) described a very simple method of selecting a sample with pps scheme and

without replacement. The sampling schemes for the selection of the units from the population

based on the inclusion probabilities of units proportional to their size (IPPS) and without

replacement has been discussed by several authors such as Goodman and Kish (1950), Narain

(1951), Midzuno (1952), Sen (1952), Yates and Grudy (1953), Godambe (1955), Hanurav

(1962), Hartley and Rao (1962), Brewer (1963), Fellegi (1963), Rao (1965b), Durbin (1967),

Hanurav (1967), Sampford (1967), Das and Mohanty (1973), Singh (1978) etc. Further, Mohanty

(1978), Prasad and Srivenkataramana (1980), suggested suitable transformation to improve the

efficiency of the Horvitz-Thompson estimator under Midzuno sampling scheme. Singh and

13

Srivastava (1980) gave two sampling schemes for selecting units with unequal probabilities

under which the usual regression estimator and some regression type estimators becomes

unbiased. Later on, Nigam et al (1984), Singh and Singal (1986), Jhajj and Srivastava (1983)

have also defined some other PPS sampling schemes based on auxiliary variables.

Mukhopadhyay (1991), Mangat and Singh (1992-93) reviewed the literature concerning the

sampling schemes with varying probability and without replacement and discussed them

critically with respect to their stability, efficiencies and simplicities. Vasantha et al (1996) have

used ranks in unequal probability sampling and examined the same for sample selection and

stratification.

The problem of improving upon the conventional unbiased estimators (based on the study

variable only) of the population parameters of interest by using the information on one or more

than one auxiliary variables have received a considerable attention of statisticians in the survey

sampling and practice. Whenever there is high positive correlation between variables y and x,

and the population mean X of variable x is known in advance then the ratio and ratio type

estimators have been widely used in practice. On the other hand for the case of negative

correlation between them the use of product estimators have been advocated by Goodman

(1960), Murthy (1964), Srivastava (1966b) etc. Watson (1937) was the first person to develop a

regression estimator for estimating the average area of leaves on a plant by taking leaf weight as

auxiliary variable. The ratio and difference estimators initially have been developed by Cochran

(1940, 1942) and Hanson et al (1953) respectively. The main drawback of the use of ratio type

and regression type estimators is that the exact expressions (in terms of the moments of bivariate

14

population) for their biases and mean square errors are not obtained. For large samples, the

contribution of the biases (being of order n-1

) of such estimators to their mean square errors is

negligible so the precision of these estimators is adjusted on the biases of their asymptotic

variances. It has also been shown that the regression estimator is always better than ratio, product

and mean per unit estimators. But, Johnson (1950) has shown that sometimes the ratio estimator

is better than the regression estimators for the case of small sample size in certain type of

populations.

Srivenkataramana and Tracy (1981) suggested a product method of estimation by using a simple

transformation on auxiliary variables when there is a positive correlation between variables y and

x which leads to the advantage that the resulting estimator has exact expression for its bias and

mean square error. Later on, a number of ratio type estimators and their generalizations for

estimating the parameters of interest such as ratio of ratios, difference of two ratios, mean, ratio

of means, variance, coefficient of variation, correlation coefficient etc. have been made by

several authors viz Kish and Hess (1959), Yates (1960), Smith (1966), Walsh (1970), Mohanty

and Das (1971), Cochran (1977), Kulkarni (1978), Rai and Sahai (1980), Srivenkataramana

(1980), Sisodia and Dwivedi (1981), Singh and Shukla (1987), Pal and Mishra (1988) Rao

(1988), Sampath and Durairajan (1988), Prasad (1989), Bisht and Sisodia (1990) etc. Jain (1987)

defined a new kind of estimator of population mean by using information on auxiliary variable

which becomes better than conventional regression estimator, ratio estimator and mean per unit

estimator for a wide range of correlation coefficient between variables y and x. Kumar and Dayal

(1998) have considered ratio and regression estimators of population mean under sampling on

15

two occasions by taking samples of sizes on both the occasions and compared them with respect

to their efficiencies.

Hartely and Ross (1954) defined an unbiased ratio type estimator and obtained an upper bound

for the ratio of the bias to its standard error. Cochran (1977) has shown that the bias may be

safely regarded as negligible with respect to the standard error when coefficient of variation of x

is less than 0.1. The general method for reduction of bias from the first order (n-1

) to the second

order (n-2

)was first developed by Quenouille (1956) and named it as Jack-knife technique. The

jack-knife technique for reducing the bias of ratio estimator to order n-2

was used by several

authors such as Durbin (1959), Goodman (1960), Rao (1965a) etc. Ratio estimator has also been

made unbiased using the procedure of selecting a sampling units with PPS by some authors such

as Midzuno (1950, 52), Lahiri (1951), Sen (1952, 53) etc. The selection and estimation

procedures, which provide unbiased ratio estimator for the case of certain general class of

population parameters were considered by Nanjamma et al (1959). Some contributions for

making existing estimators unbiased are made by Micky (1959), Murthy and Nanjamma(1959),

Goodman (1960), Pascual (1961), Beale (1962), Rao (1964), Tin (1965), Rao (1966), Shukla

(1976), Sahoo and Swain (1980) etc.

In a situation when we have information on more than one auxiliary variable highly correlated

with study variable y without any appreciable increase in cost of survey and all auxiliary

variables may themselves be correlated with each other. The theory of two phase sampling in

regression estimator using several auxiliary variables has been discussed by Ghosh (1947) and

Seal (1951, 53). Using the prior knowledge on q-auxiliary variables, all are highly positively

16

correlated with y, Olkin (1958) has generalized the usual ratio estimator. He also extended the

same estimator in the case of stratified random sampling. Gowswami and Sukhatme (1965)

extended the use of multivariate ratio estimator for multistage sampling. On using multi-

auxiliary information in different form, several estimators of parameters of interest have been

proposed by different authors such as Williams (1963), Raj (1965), Srivastava (1965, 66a, 67),

Singh (1965, 67a), Shulkla (1965, 66), Smith (1966), Rao and Mudholkar (1967), Khan and

Tripathi (1967), John (1969), Tripathi (1970, 76, 78, 87), Adhvaryu (1975), Chakrabarty (1979),

Sahai et al (1980), Agarwal (1980), Agarwal and Kumar (1980), Srivastava and Jhajj (1983a),

Kiregyera (1984), Bedi (1985), Mukerjee et al (1987), Tankou and Dharmadhikari (1989),

Kothwala and Gupta (1989), Tripathi and Behl (1991), Tuteja and Behl (1991), Tripathi and

Ahmed (1995) etc. Dharmadhikari and Tankou (1991) defined an improved ratio type estimator

of population mean using multi-auxiliary information which generalizes the results of Agarwal

(1980) and Agarwal and Kumar (1985). They have shown that the improved ratio type estimator

is as efficient as the regression estimator, upto first order of approximation.

Most of the authors in literature have attempted for generalization of the existing

estimators defined by different authors. Srivastava and Jhajj (1981) have defined a class of

estimators of Y using known information on auxiliary variable x in the form of its population

mean X and variance 2

xS as

2

2

,.~

x

x

tS

s

X

xtyy , where t(.,.) is a parametric function with

11,1 t and satisfying certain regularity conditions. They have shown that the optimum

17

estimators of class ty~ are more efficient than the linear regression estimator when there is non-

linear relationship between the two variables y and x. In the same paper, they also considered a

wider class of estimators of Y , which is defined as

2

2

,,~

x

x

wgwS

s

X

xygy such

that YYg w 1,1, , Y and parametric function gw(.,.,.) satisfies certain regularity conditions.

The most of the estimators defined by different authors so far are particular members of ty~ and

gwy~ such as estimator suggested by Chakrabarty (1968) in which he has taken a linear

combination of mean per unit estimator and ratio estimator, Walsh(1970) estimator

11

Xxxy and similar estimators have been studied by Gupta (1971, 78), Reddy (1973,

74), Shah and Shah (1978), Das and Tripathi (1980), Das (1988) , the class of estimators defined

by Srivastava (1967 & 1971) etc.

By considering the size of population units w.r.t. auxiliary variable, Jhajj and Srivastava

(1983) have defined the class of estimators of Y under PPS sampling technique. Srivastava and

Jhajj (1983a) generalized the class of estimators defined by Srivastava and Jhajj (1981) to the

case when the information on more than one auxiliary variable is available. For estimating the

population variance 2

yS using information on population mean Z and variance 2

zS , Jhajj et al

(2005) defined a class of chain estimators of 2

yS for the case of tri-variate distribution in which

variable z is correlated with x and further, x is correlated with y. Jhajj et al (2006) defined two

unbiased estimators of population mean of study variable on applying linear transformation to

auxiliary variable by using its extreme values along with mean in the population that are

18

generally available in practice. Recently, Jhajj and Walia (2012) proposed a generalized

difference-cum-ratio type estimator for estimating the population mean by using double

sampling technique. Several authors in literature viz. Fuller (2002), Van Hees (2002), Hald

(2003), Singh and Espejo (2003), Singh and Rani Rashmi (2005), Sahoo et al (2005, 2006),

Chao and Chiang (2006), Singh and Priyanka (2007) etc. have also defined sampling strategies

for estimation of parameters by the use of ratio, ratio-type, regression and regression-type

estimators.

The concept of partially balanced incomplete block designs was introduced by Bose and Nair

(1939). Partially balanced incomplete block designs are generally based on the concept of

association scheme defined on treatments. Bose and Mesner (1959) defined partially balanced

incomplete block designs by introducing the concept of association scheme and association

matrices and provided a sound mathematical footing to the concept of association scheme to

study the partially balanced incomplete block designs in a mathematical framework. They have

also obtained many interesting results concerning combinatorial properties of association

schemes. By using the properties of association matrices, Kusumoto (1967) gave the necessary

and sufficient condition under which the (m+1) symmetric matrices are the association matrices

of an association scheme with m associate classes. Several association schemes with more than

two associate classes have also been introduced by various authors in literature. Roy (1953-54)

and Raghavarao (1960) have generalized the two associate class group divisible (GD) association

scheme to an m-associate class GD scheme. Vartak (1955, 1959) defined the kronecker product

designs and three associate class partially balanced incomplete block designs based on

rectangular association scheme. Hinklemann and Kempthrone (1963) and Hinklemann (1964)

19

generalized the rectangular association scheme to the extended group divisible association

scheme.

Other well known association schemes of higher associate classes defined by authors are

three associate class extended triangular scheme of John (1966), four associate class right-

angular and generalized right-angular association scheme of Tharthare (1963, 1967) and some

higher class association scheme of Adhikary (1966, 1967) etc.. Kageyama (1972) gave necessary

and sufficient conditions for partially balanced incomplete block designs with m associate

classes constructed by the kronecker product of some balanced incomplete block designs to be

reducible to partially balanced incomplete block designs with mi associate classes, where mi < m.

Kusumoto and Surendran (1968) defined kronecker product association schemes in terms of the

kronecker product of the association matrices of two association schemes. Peter (1977)

introduced two series of semi-regular group divisible designs. The first has v = sN + s

N-1, m =

s + 1, k = v/s and second has v = 6t + 6, m = 3, k = 2t + 2. Both the series have λ1 = λ2 – 1.

The authors viz. Dey and Midha (1974), Dey et al (1974), Aggarwal (1975, 1977), Saha and

Gauri (1976), Dey (1977), Peter and John (1977), John and Turner (1977), Kageyama (1980),

Aggarwal and Singh (1981), Bhagwandas and Parihar (1980, 1982), Mohan (1983), Nair (1980),

Singla (1983), Bhagwandas et al(1985), Dey and Nigam (1985), Kageyama and Mohan (1985),

Banerjee et al (1987), Puri et al (1987), Banerjee and Kageyama (1988), Kageyama et al (1989),

Mitra et al (2002), Sinha et al (2002a), Sinha et al (2002b), Kageyama and Sinha (2003), Bagchi

(2004), Garg (2005, 2008), Sinha and Kageyama (2006), Garg and Gurinder Pal (2011) etc.

constructed different designs under different situations. Aggarwal (1975) gave a Modified Latin

Square (MLS) type partially balanced incomplete block designs using new association scheme

20

with three associate classes by slightly modifying the definition of Li(S) association scheme

given by Bose and Shimamoto (1952) for s2 treatments. Banerjee et al (1987) constructed two

associate class PBIB designs having group divisible or L2 type association scheme by using

patterned matrices and methods of taking unions of sets of blocks. Sinha et al (2002a) gave some

methods of construction of group divisible and nested group divisible and defined resolvable

rectangular designs. Sinha et al (2002b) constructed rectangular designs, some of which were E-

optimal designs. A series of E-optimal designs, two series of group divisible designs and a new

cyclic solution of a group divisible design were also obtained by them. Kageyama and Sinha

(2003) described some new patterned methods of constructions of rectangular designs and nested

balanced incomplete block designs. Bagchi (2004) presented a general procedure for

construction of group divisible designs and rectangular designs by utilizing resolvable and

“almost resolvable” balanced incomplete block designs. As a special case, he obtained E-optimal

group divisible designs with λ3 = λ2 – 1 = λ1 + 1. Garg (2005) obtained new partially balanced

incomplete block designs with two (group divisible designs) and three (rectangular designs)

associate classes following certain patterns for their block structures by using the incidence

matrices of certain balanced incomplete block designs. Garg (2008) introduced a new

association scheme with three associate classes for v = m2 treatments by newly modifying the

Modified Latin Square association scheme of Aggarwal (1975). Further, Garg (2010) defined a

new association scheme named Pseudo New Modified Latin Square type.

So far, large number of such designs has been defined by various authors. However, there can

still be a situation in which an experimenter will find no design from the designs already defined

21

that will serve his purpose. This may happen because such lists may not include a design with the

particular parametric combination that may suit him or an available design may provide an

undesirable large number of accuracies with which the individual contrasts are estimated.

22

The present thesis have been divided into seven chapters

First Chapter is introductory in which importance of topic, definitions, terminology used and

literature relating to the topic has been given in detail.

In the second chapter, for estimating the population mean Y of the variable under study y, two

ratio type estimators have been proposed by making the linear transformation on auxiliary

variable x which are unbiased upto first order of approximation. The expressions for the biases

and variances of the proposed estimators have been obtained upto second order of

approximation. It has been found that upto first order of approximation; proposed estimators are

unbiased as well as equally efficient with the existing ones. The comparison of proposed

estimators with the existing ones has also been made upto second order of approximation w.r.t.

their mean square errors. The results have also been illustrated numerically which show that the

proposed estimators are almost unbiased and decrease in efficiency is negligible as compare to

the existing ones.

In the third chapter, following the same idea of chapter II, an almost unbiased estimator of

population mean has been proposed when the relationship between the variable under study and

auxiliary variable is completely linear. The expressions for the bias and mean square error of the

proposed estimator has been obtained upto second order of approximation and it has been shown

that proposed estimator is unbiased as well as equally efficient with the regression estimator upto

first order of approximation. The comparison of proposed estimator with the regression estimator

23

has also been made upto second order of approximation w.r.t. their biases and mean square

errors. The results have also been illustrated numerically.

In the fourth chapter, starting from the association matrices of a group divisible partially

balanced incomplete block design with two associate classes, we obtain some new PBIB designs

with two, three and four associate classes following certain juxtaposition patterns by using some

association matrices and some other types of well known matrices. For the purpose of

completeness, association scheme of GD design and one new association scheme have been

discussed. Methods of construction of regular group divisible (RGD), singular group divisible,

designs with three and four associate classes following new association scheme have also been

discussed which are supported by illustrations. The efficiencies of various treatment contrasts

and average efficiency factor have been calculated for the purpose of comparison.

Lastly, it has been shown that new designs constructed as well as listed are desirable and of

much statistical interest because some of them are more efficient than the existing ones.

In the fifth chapter, following the work done by Ramakrishnan (1956) and R.N. Mohan (1983),

some new series of triangular and four associate class PBIB designs with two replications have

been constructed by using dualization technique. The efficiencies of the constructed designs have

been calculated for the purpose of comparison. The effort has also been made to illustrate the

results numerically.

24

In the sixth chapter, application of Design of Experiment in Sampling has been discussed. For

estimating the population mean, efficient sampling strategy has been proposed under controlled

sampling design. Balanced incomplete block design which is a special case of partially balanced

incomplete block design, is used for preparing the sampling frame which gives quite high

probability of selection of preferred samples as compared to non-preferred sample. The

expressions for bias and variance of proposed sampling strategy have been obtained. It has been

shown that efficiency of the estimator under the given design remains the same as in the case of

simple random sampling design but the probability of selection of non-preferred samples

becomes less in the proposed sampling design as compared to the conventional simple random

sampling design.

In the seventh chapter, a study of the primary hypertension among the euglycemic patients of

ischemic heart disease has been done. The work is related to the prediction of the hypertension.

So, two separate multiple linear regression models have been used by taking systolic blood

pressure & diastolic blood pressure as dependent variables and body mass index, tri-glyceride,

low density lipoprotein- cholesterol and high density lipoprotein-cholesterol as independent

variables (independent risk factors) to study the essential hypertension among the euglycemic

patients of ischemic heart disease. Adequacy of the models has been verified using analysis of

variance (F-test), histogram, Q-Q plot and box whiskers plot. In this chapter, an effort has been

made to analyze the data and interpret it with the help of statistical software package SPSS 14.0.