conversion of ordinal attitudinal scales

8/2/2019 Conversion of Ordinal Attitudinal Scales

1/22

Quant Mark EconDOI 10.1007/s11129-011-9116-1

Conversion of ordinal attitudinal scales: An inferential

Bayesian approach

Michael Evans Zvi Gilula Irwin Guttman

Received: 22 February 2011 / Accepted: 6 November 2011 Springer Science+Business Media, LLC 2011

Abstract The need for scale conversion may arise whenever an attitude ofindividuals is measured by independent entrepreneurs each using an ordi-nal scale of its own with possibly different numbers of (arbitrary) ordinalcategories. Such situations are quite common in the marketing realm. Theconversion of a score of an individual measured on one scale into an estimatedscore of a similar scale with a different range is the concern of this paper. An

inferential Bayesian approach is adopted to analyze the situation where webelieve the scale with fewer categories can be obtained by collapsing the finerscale. This leads to inferences concerning rules for the conversion of scales.Further, we propose a method for testing the validity of such a model. Theuse of the proposed methodology is exemplified on real data from surveysconcerning performance evaluation and satisfaction.

Keywords Ordinal scales Collapsing scales Scale conversions Bayesian inference

JEL Classification C11

M. Evans I. GuttmanDepartment of Statistics, University of Toronto, Toronto, ON, Canada M5S 3G3e-mail: [email protected]

Z. Gilula (B)Department of Statistics, Hebrew University, Jerusalem 91905, Israele-mail: [email protected]

I. GuttmanDepartment of Statistics, State University of New York, Buffalo, NY 14214, USAe-mail: [email protected]


2/22

M. Evans et al.

1 Introduction

The need for scale conversion may arise whenever an attitude of individualsis measured by independent entrepreneurs each using an ordinal scale of

his/her own with possibly different number of (arbitrary) ordinal categories.In the marketing realm, it is not uncommon to encounter situations wheresatisfaction of customers is measured independently by several competingmarketing research companies. Each company has its own satisfaction scaleand they all use (non-identical) independent samples of customers from thesame population.

Although scales may differ by the number of ordinal categories used, theymay all share the same structure, where the lowest category refers to verydissatisfied and the last category refers to most satisfied. Another, not un-

common situation arises when a company that used to measure the satisfactionof its own customers in the past by a R-category scale, is now employinga scale of only K < R categories for a variety of reasons (this has beenthe case with several universities measuring student satisfaction from coursesattended).

The problem of how many ordinal categories an attitudinal scale shouldhave is an old one. See for instance (Miller 1956), and the classical paper(Green and Rao 1970). Although some compelling scientific arguments arementioned in the literature as to some optimal aspects of ordinal scales, there

is no universal consensus as to the number of categories to use. As a result,situations as described above occur quite frequently.

It might be of interest to be able to answer the question: can we sys-tematically convert a score of an individual measured on a given scale intoan estimated score on a similar-in-nature scale but with a different numberof categories?. An attempt to answer this question is the concern of thispaper.

The methodology proposed in this paper assumes the following:

(a) All the underlying scales measure the same kind of attitude (satisfaction,

say). We do NOT propose to convert a satisfaction score measured byone scale to an agreement score measured by another scale.

(b) All the underlying scales are ordinal-categorical scales (and not intervalor ratio scales). Ordinality of categories reflects the ordinal nature of anattitude as a concept (more, most, better, best, etc.).

(c) All samples measured on these scales come from the same populationand no individual is sampled more than once.

Suppose then that we have two scales, labelled I and II, that are presumably

measuring the same phenomenon. Scale I classifies responses into R orderedvalues and scale II classifies responses into Kordered values and suppose thatR > K.


3/22

Conversion of ordinal attitudinal scales: An inferential Bayesian approach

Let X {1, . . . , R} and Y {1, . . . , K} denote the values taken by a ran-domly selected population element on scale I and scale II respectively. Letpi = P(X = i), for i = 1, . . . , R, and qj = P(Y = j), for j= 1, . . . , K, denotethe distributions of these variables over the population in question. Of course,

these distributions are unknown and we suppose that data have been col-lected from sampling studies to make inferences about these distributions.Let ( f1, . . . , fR), with

Ri=1 fi = n, denote the counts from the sampling study

involving scale I and (g1, . . . ,gK), withK

j=1 gj = m, denote the counts fromthe sampling study involving scale II.

Our problem is concerned with determining how we should combine theresults from the two studies. We take the point-of-view that our aim shouldbe collapsing, in some fashion, scale I into scale II and then combining ourinferences. In the typical application it is not at all clear how this collapsing

should take place. We would like this collapsing to depend on the data andproceed according to sound inferential principles rather than follow some adhoc procedure.

In Section 2 we formulate a model for the collapsing. This is perhapsthe simplest model for investigating the possibility that two scales are ineffect identical, as the coarser scale is obtained from the finer scale by adirect mapping of whole categories on the coarser scale to whole categorieson the finer scale. Of course, we could imagine a much more complicatedrelationship where categories on the coarser scale are split and our approach

does not address this possibility. So a conclusion that there is no collapsingis not an assertion that the two scales are not effectively identical. We note,however, that a model for the more elaborate situation will require moreassumptions as our model only needs the assumption of two independentsamples. Furthermore, inferences for the more involved model will still requirean assessment of whether or not a collapsing is feasible and we believe that thiswill entail using the methods we describe here. We make some more commentson this in Section 7.

We find it most convenient to place our approach in a Bayesian context and

include discussion on appropriate priors for this problem in Section 2. Also inSection 2 we discuss a computational approach for implementing a Bayesiananalysis and provide a small artificial example. In Section 3 we discuss the basicBayesian inferences that follow from the model and provide the results of asimulation study to demonstrate the effectiveness of our approach. In Section4we show how inferences about the true collapsing lead to rules for convertingone scale to another and to an assessment of the uncertainty associated withsuch rules. In Section 5 we consider methods for checking the model specifiedin Section 2. In Section 6 we apply these results to a problem of some practical

importance and draw some conclusions in Section 7. We note that we do notcompare our approach to any existing methodology as, surprisingly, there doesnot appear to be any literature on this problem.


4/22

M. Evans et al.

2 The model, the prior, and the posterior

Given that we have observed ( f1, . . . , fR) and (g1, . . . ,gK), the likelihood forthe unknown p and q is given by

L(p, q | f,g) =R

i=1

pfii

Kj=1

qgjj , (1)

where p SR, q SK and SN denotes the (N 1)-dimensional simplex. LetR denote a prior on p. Typically, we will take this prior to be a uniform prior,i.e., DirichletR(1, . . . , 1), to reflect noninformativity, but we will also considera general DirichletR(a1, . . . , aR).

Now suppose that p is given. Of course, we do not know these values butnow we will effectively put a prior on the set of all possible collapsings assum-

ing that we know p. In other words we will specify the prior hierarchically byspecifying a prior for the collapsings given the value of p, and then place amarginal prior on p as we have already specified.

First we develop a model for the collapsing. Let (P1, . . . ,PK) denote anordered partition of{1, . . . , R} into Kdisjoint subsets with Ki=1Pi = {1, . . . , R}such that Pi = for all i and, if1 i < j K, then Pi = {s,s + 1 . . . , t}, Pj ={u, u + 1 . . . , v} with 1 s t < u v R. The set Pi consists of those cate-gories on scale I that are collapsed to form category i on scale II. Note that wehave restricted to those collapsings for which no null sets are allowed in the

partition, because we know that we can observe observations in each categoryof scale II. Note that the number of such partitions (P1, . . . , PK) equalsthe number of solutions of z1 + + zK = R in positive integers z1, . . . , zK.Therefore, there are

R1K1

such partitions. Typically, for the applications we

have in mind, this number will not be too large.It turns out to be convenient for tabulation purposes to denote an ordered

partition (P1, . . . ,PK) by the notation (i1, . . . , iK) where 1 i1 < i2 < 1, j= 1, . . . , K, and p DirichletR(a1, . . . , aR). Then we have that

(P1, . . . , PK)

Kj=1

b j +lP

j

al

lPj

al

(6)and

(P1, . . . , PK | f,g)

Kj=1

b j +gj +

lPj(fl + al)

lPj

(fl + al) . (7)

Note that the uniform prior on (P1, . . . , PK) corresponds to b 1 = =b K = 0. If a1 = = aR = 1, i.e., we place a uniform prior on p, and b 1 > 0while b 2 = = b K = 0, then partitions that have more elements in P1 willreceive more prior weight than partitions with fewer elements in P1. Similarinterpretations can be made for other choices of the b j, although they aredifficult to describe precisely. It would seem, however, if we want to emphasizepartitions that place more elements in certain cells of the partition then weshould choose the corresponding b j relatively large. We will refer to the priorgiven in Eq. 5 by the K-tuple [b 1, . . . , b K] hereafter, whenever the prior on p

is uniform. So, for example, [0

, . . . ,0

] corresponds to the uniform prior on theset of partitions and the uniform prior on SR.We can systematically tabulate any prior and posterior via Eqs. 6 and 7 by

using the method we have previously described for listing the elements ofTR,K.The following example illustrates this.

Table 1 Several priors on partitions in Example 2

partition( )\prior[ ] [0, 0, 0] [10, 0, 0] [10, 10, 0] [1, 1, 1] [.5, 0, 0]

(1, 2, 5) .167 .011 .004 .143 .229

(1, 3, 5) .167 .011 .040 .190 .229

(1, 4, 5) .167 .011 .239 .143 .229

(2, 3, 5) .167 .121 .040 .190 .114

(2, 4, 5) .167 .121 .438 .190 .114

(3, 4, 5) .167 .725 .239 .143 .086


7/22


Table 2 Posterior probabilities on partitions in Example 2

partition( ) \ prior[ ] [0, 0, 0] [10, 0, 0] [10, 10, 0] [1, 1, 1] [.5, 0, 0]

(1, 2, 5) .007 .001 .000 .006 .007

(1, 3, 5) .189 .018 .037 .185 .208

(1

,4

,5

) .181

.017

.162

.167

.199

(2, 3, 5) .200 .176 .059 .208 .189

(2, 4, 5) .398 .351 .714 .409 .376

(3, 4, 5) .025 .437 .028 .024 .020

Example 2 (Tabulating priors and posteriors in Example 1.) In Table 1 we havetabulated several priors for the situation described in Example 1. Note thatthe left column lists the elements ofT5,3, while the first row specifies several

priors. From this we can see that the larger the value ofb i > 0 is, the moreprior weight is being placed on partitions that have more elements in Pi, whilethe closer b i is to 1 the more prior weight is being placed on partitions thatplace fewer elements in Pi.

Suppose we observe (f1, f2, f3, f4, f5) = (4, 3, 6, 3, 7) and (g1,g2,g3) =(7, 9, 7). Note that (g1,g2,g3) corresponds exactly to partition (2, 4, 5). InTable 2 we have tabulated the posterior probabilities for the priors in Table 1.Notice that the correct partition (2, 4, 5) is the posterior mode for each analysisexcept that using the prior [10, 0, 0]. This prior puts too much weight on P1

having most of the collapsing and so (3, 4, 5) results as the mode. Still for arelatively small amount of data the methodology is clearly on the right track inidentifying suitable collapsings. Also worth noting is that the most incorrectcollapsing, given by (1, 2, 5), is always given low posterior probability. Thebenefit of having appropriate strong prior information is clearly demonstratedby the prior [10, 10, 0].

3 Inferences for collapsings

One possibility for inferences is to use those based on the highest posteriordensity (hpd) principle (see Box and Tiao 1973 for a discussion of variousBayesian inference terms). For this we record a -credible region of the form

B( f,g) = {(P1, . . . , PK) : (P1, . . . , PK | f,g) > k( f,g)}

where k(f,g) is the largest value such that (B( f,g) | f,g) where( | f,g) denotes posterior measure. The size of such regions, say for = .95,

then tells us something about the accuracy of our inferences. This gives anested sequence of subsets, indexed by , with the smallest set containing onlythe posterior mode.

In general, inferences based on the highest posterior density principle sufferfrom a lack of invariance. For example, if we wanted to make inference aboutthe continuous parameters p, or q as derived from p (see Section 4), then a


8/22

M. Evans et al.

-hpd region does not transform appropriately under 1-1, smooth transforma-tions. A class of inferences that are invariant under such transformations isgiven by the relative surprise principle (see Evans et al. 2006 and Evans andShakhatreh 2008 for discussion of relative surprise inferences). A -relative

surprise credible region for the true partition is of the form

C(f,g) =

(P1, . . . , PK) :

(P1, . . . , PK | f,g)

(P1, . . . , PK) k( f,g)

and k( f,g) is the largest value such that (C( f,g) | f,g) . We see thatthe relative belief ratio (see Evans and Shakhatreh 2008) for (P1, . . . , PK),given by

RB(P1, . . . , PK) = (P1, . . . ,PK | f,g)/(P1, . . . , PK),

is a measure of how belief in the validity of the partition (P1, . . . , PK) haschanged from a priori to a posteriori. Note that a relative belief ratio is similarto a Bayes factor, which is the ratio of the posterior odds to the prior odds,as both are measuring the change in belief from a priori to a posteriori. Therelative belief ratio measures change in belief on the probability scale whilethe Bayes factor measures change in belief on the odds scale.

Accordingly, C( f,g) contains all the partitions whose relative belief ratiosare above some cut-off. The partition that maximizes the relative belief ratiois referred to as the least relative surprise estimate (LRSE) (Evans, Guttmanand Swartz 2006) where the terminology Least in LRSE is explained afterEq. 9. The computations of the LRSE and the credible regions C( f,g) arecarried out in a fashion similar to computing a posterior mode and hpd regions.For the examples in this paper, we tabulated each possible value for theposterior density (using Eq. 7) and the prior density (using Eq. 6), tabulatedthe values of the relative belief ratio using these quantities and obtained the

LRSE by searching the list finding where this ratio is maximized. To findcredible regions, namely, to compute k(f,g) in C( f,g), we proceeded byfirst ordering the partitions by the value of their relative belief ratios fromlargest to smallest. Starting with the LRSE, which has the largest relative beliefratio, we added partitions to the set C( f,g), going down the ordered list, untilthe sum of the posterior probabilities of the added partitions is just greaterthan or equal to . The relative belief ratio corresponding to the last partitionadded to C( f,g) gives the value ofk( f,g). So the credible regions C(f,g)are nested, namely, C1 ( f,g) C2 ( f,g) whenever 1 2 with the smallest

nonempty one containing only the LRSE. Note that the LRSE partition isthe same as the maximum a posteriori (MAP) partition, i.e., the mode of theposterior, whenever the prior (P1, . . . , PK) is uniform, but otherwise theseestimates can be different, as demonstrated in Example 3.

In the cited references various optimality properties have been proved forrelative surprise inferences in the class of all Bayesian inferences. Note, in


9/22


particular, that because relative surprise inferences are based on relative beliefratios, we can expect these inferences to be less dependent on the prior thansay hpd-inferences, as we are dividing the posterior by the prior. In fact itcan be proven that the relative belief ratio is independent of the choice of the

marginal prior (P1, . . . , PK).Consider now inferences for Example 1.

Example 3 (Estimating the true partition in Example 2.) In Table 3 we havegiven the relative belief ratios RB(P1, . . . , PK), for the problem discussedin Example 2. These are obtained by dividing the entries in Table 2 bythe corresponding entries in Table 1. Notice that the LRSE is given by thecorrect partition (2, 4, 5) in every case, while the mode failed for the prior[10, 0, 0]. This is characteristic of relative surprise inferences as the prior has

less influence in these inferences than in other Bayesian inferences.From the posterior probabilities in Table 2 and the relative belief ratios in

Table 3, we can construct C( f,g). For example, when the prior is [0, 0, 0],then

C.95( f,g) = {(1, 3, 5), (1, 4, 5), (2, 3, 5), (2, 4, 5), (3, 4, 5)}, (8)

i.e., every partition but (1, 2, 5) is included and k.95( f,g) = 0.150. This is notvery informative, but then we do not have a lot of data. Actually, from Table 2,the posterior content ofC.95( f,g) is 0.993 in this case.

Suppose we wished to assess the hypothesis that a certain partition wastrue say,

H0 : (P1, . . . , PK) = (P1

, . . . , PK),

i.e., we want to assess whether or not the actual collapsing is given by(P1

, . . . ,

PK), as prescribed by some theory. This can be assessed by computing

= inf{ : (P1

, . . . ,PK) C( f,g)}

and concluding that we have evidence against H0 when is near 1. Equiva-

lently we can compute the relative surprise tail probability given by

(P1, . . . , PK | f,g)

(P1, . . . , PK)

(P1

, . . . , PK | f,g)

(P1

, . . . , PK)

f,g

, (9)

Table 3 Relative belief ratios for partitions in Example 2

partition( ) \ prior[ ] [0, 0, 0] [10, 0, 0] [10, 10, 0] [1, 1, 1] [.5, 0, 0]

(1, 2, 5) 0.042 0.091 0.000 0.042 0.031

(1, 3, 5) 1.132 1.636 0.925 0.974 0.908

(1, 4, 5) 1.084 1.546 0.678 1.168 0.869

(2, 3, 5) 1.198 1.455 1.475 1.095 1.658

(2, 4, 5) 2.383 2.901 1.630 2.153 3.298

(3, 4, 5) 0.150 0.603 0.117 0.168 0.233


10/22

M. Evans et al.

i.e., the posterior probability of obtaining a relative belief ratio no largerthan that obtained for the hypothesized value. Small values for Eq. 9 provideevidence against (P

1, . . . , PK) in the sense that such a value is surprising in

light of the data we have observed. Note that if we put (P1

, . . . , PK) equal to

the LRSE, then Eq. 9 equals 1 and so is the least surprising partition.

Example 4 (Testing a partition in Example 2.) Suppose we want to test the nullhypothesis H0 that the true partition is (2, 4, 5) when the prior is [0, 0, 0]. Thenit is clear that the tail probability is 1 and we have no evidence against H0. If wetested the null hypothesis H0 that the true partition is (1, 2, 5) when the prior is[0, 0, 0], then the tail probability is 1 0.993 = 0.007 and we have substantialevidence against H0.

We consider now the general effectiveness of our methodology. For thiswe focus on the performance of the LRSE when there is a collapsing. It canbe proved (see Evans and Jang 2011) that, for any problem where we areestimating a quantity that takes on one of finitely many values, the LRSEis a Bayes rule with respect to the loss function I(a = )/( = ) where( = ) is the prior probability that = and I(a = ) = 1 when a = and is 0 otherwise. For this loss function the prior risk of an estimator (x) (theaverage loss under the sampling model for the data x and the prior) is equal to

M((X) = | = ) where the sum is taken over all the possible valuesfor and M((X) = | = ) is the conditional prior probability that theestimator doesnt equal given that = . So selecting to be the LRSEminimizes the sum of the conditional probabilities of making an error. By con-trast the posterior mode, or MAP estimator, minimizes the unconditional priorprobability of making an error, namely,

( = )M((X) = | = ).

The advantage of the LRSE is that it treats all the possible values of equivalently and doesnt weight them by their prior probabilities. In caseswhere we inadvertently assign a relatively small prior probability to the truevalue of, then the LRSE will perform much better than the MAP estimator.

Of course, they are equivalent when the prior weights are uniform. Note thatit is equivalent to say that the LRSE maximizes

M((X) = | = ),

namely, the sum of the conditional probabilities of not making an error.The following example examines the performance of the LRSE for the

estimation of the correct collapsing by computing the probabilities c =M(LRSE(X) = | = ) for TR,K, where LRSE is the LRSE partitionand X is comprised of (f1, . . . , fR) and (g1, . . . ,gK). So c(i1,...,iK) is the condi-tional prior probability, given that the true partition is the one specified by(i1, . . . , iK), that the LRSE will equal this partition. The larger the value of

c(i1,...,iK) the better is the performance of the LRSE.

Example 5 (A simulation study.) In Table 4 we present the values ofc(i1,...,iK)for several cases where (p1, . . . , pR) Dirichlet(1, 1, . . . , 1) and (P1, . . . , PK)has a uniform prior. Since ( f1, . . . , fR) Multinomial(n, p1, . . . , pR), thenwe have (g1, . . . ,gK) Multinomial(m,

iP1

pi, . . . ,

iPKpi) for a specified


11/22


Table 4 Results of the simulation study in Example 5 of the performance of the LRSE of the truecollapsing when using uniform priors

(R, K) (m, n) c(i1 ,...,iK)

(3, 2) (5, 5) c(1,3) = 0.755, c(2,3) = 0.734

(3

,2

) (5

,10

) c(1,3) =0

.756

, c(2,3) =0

.756

(3, 2) (20, 20) c(1,3) = 0.836, c(2,3) = 0.833(3, 2) (20, 50) c(1,3) = 0.852, c(2,3) = 0.853

(3, 2) (500, 500) c(1,3) = 0.961, c(2,3) = 0.961(5, 3) (10, 10) c(1,2,5) = 0.611, c(1,3,5) = 0.340, c(1,4,5) = 0.560,

c(2,3,5) = 0.403, c(2,4,5) = 0.337, c(3,4,5) = 0.602(5, 3) (500, 500) c(1,2,5) = 0.929, c(1,3,5) = 0.826, c(1,4,5) = 0.879,

c(2,3,5) = 0.887, c(2,4,5) = 0.824, c(3,4,5) = 0.929(10, 3) (50, 50) min(c(i1,i2 ,10)) = 0.181, max(c(i1 ,i2,10)) = 0.644

(10, 3) (500, 500) min(c(i1,i2 ,10)) = 0.523, max(c(i1 ,i2,10)) = 0.872(10, 7) (50, 50) min(c(i1,...,10)) = 0.145, max(c(i1 ,...,10)) = 0.603

(10, 7) (500, 500) min(c(i1,...,10)) = 0.516, max(c(i1 ,...,10)) = 0.885

(20, 2) (50, 50) min(c(i1,20)) = 0.211, max(c(i1 ,20)) = 0.702(20, 2) (500, 500) min(c(i1,20)) = 0.476, max(c(i1 ,20)) = 0.853

Here c(i1 ,...,iK) is the conditional prior probability, given that true partition is the one specified by(i1, . . . , iK), that the LRSE will equal this partition and the max and min refer to the maximum andminimum values of these prior probabilities over all the possible partitions for a particular choiceof R and K

(P1, . . . , PK). Note that we estimated the c(i1,...,iK) using Monte Carlo based ona sample of104. The estimates are felt to be accurate to about .02.

The case (R, K) = (3, 2) is the simplest possibility as #(T3,2) = 2. We seethat even for very small sample sizes the LRSE is very accurate as it is correctabout 75% of the time when (m, n) = (5, 5) and is correct 96% of the timewhen (m, n) = (500, 500). This probability increases as we increase n or m.Similar results are reported for the case (R, K) = (5, 3) where #(T5,3) = 6.When (R, K) = (10, 3) then #(T10,3) = 36 and when (R, K) = (10, 7) then#(T10,7) = 84, so we cannot list all the c(i1,...,iK) and have reported max(c(i1,...,i7))and min(c(i1,...,i7)) only. Note that, when (R, K) = (10, 3), (n, m) = (50, 50) the

median value ofc(i1,i2,i3) is 0.305, when (m, n) = (500, 500) the median value ofc(i1,i2,i3) is 0.623, when (R, K) = (10, 3), (m, n) = (50, 50) the median value ofc(i1,...,i7) is 0.268 and is 0.670 when (m, n) = (500, 500).

For other priors on p the results are similar. As one would expect, aninformative prior (5) improves performance for those partitions that receivemore prior mass, in the sense that c(i1,...,iK) increases compared to when theuniform prior is used, while c(i1,...,iK) drops for those partitions for which theprior probability is lower than with the uniform.

It should be noted that the required simulation time becomes exorbitantas #(TR,K) increases as we need to estimate c(i1,...,iK) for each (i1, . . . , iK) andthis constrains the range of values of R and K for which we can obtainsimulation results. There is another effect due to the increase in #(TR,K). Asillustrated in Table 4, the greater #(TR,K) is the more difficult it is for theLRSE to exactly equal the true partition. Perhaps this is partly due to the


12/22

M. Evans et al.

greater number of possibilities for a partition but also it appears that thereis an intrinsic complexity to a particular partition as well. This is illustratedin the case (R, K) = (20, 2), (m, n) = (50, 50) where we obtained the followingvalues for the c(i1,i2), namely,

(c(1,20), c(2,20), . . . , c(18,20)) = (0.702, 0.375, 0.294, 0.257, 0.245, 0.244, 0.230,

0.212, 0.212, 0.214, 0.212, 0.212, 0.219, 0.227, 0.248, 0.268, 0.303, 0.370, 0.702).

We see that splits near the end are easier to estimate than splits in the middle.A plausible conjecture for this is that there are simply more splits in or nearthe middle than at the ends.

We note also that the performance of the LRSE is undoubtedly much

better from a practical point-of-view than the results indicated in Table 4.This is because the results displayed in Table 4 make no reference to howclose the LRSE is to the true partition. Clearly, if the LRSE differsfrom the true partition in only a few partition elements, then this couldbe viewed as satisfactory performance. It isnt completely clear, however,what to use as the measure of distance between partitions in general. Forexample, when (R, K) = (20, 2), (m, n) = (50, 50) we obtained 0.540 for theconditional probability that the LRSE was in {(9, 20), (10, 20), (11, 20)}, giventhat the true partition is (10, 20), which is much higher than the conditional

probability of0

.214

recorded for the LRSE being exactly equal to (10

,20

).Further we obtained 0.756 for the conditional probability that the LRSE isin {(8, 20), (9, 20), (10, 20), (11, 20), (12, 20)}, given that the true partition is(10, 20). While a measure of distance seems obvious in this case, for a generalchoice of(R, K) more study is required to determine an appropriate distancemeasure. Accordingly, we leave the study of this aspect of the performance ofthe LRSE to a further investigation, but note that the simulations indicate thatthe LRSE should do very well from this enlarged perspective.

4 Conversion of scales

A problem of major interest is how we should convert one scale to anotherwhen we believe that one scale is a collapsing of the other. In one direction itseems quite clear how to do this.

Once we have selected an estimate of the collapsing, then this immediatelyprovides a conversion rule from the finer to the coarser scale, i.e., from scale Ito scale II. To assess the uncertainty in this rule we can look at the rules thatwould arise from each of the partitions in a -credible region for the collapsing.We illustrate this in the following example.

Example 6 (Converting from the finer to the coarser scale in Example 2.) InExample 3 we showed that the LRSE partition is always (2, 4, 5) for all thepriors specified in Example 2. The conversion rule from scale I to scale II, using


13/22


Table 5 Conversion of scale Ito scale II in Example 2 usingLRSE collapsing

Scale I Scale II

1 1

2 1

3 2

4 2

5 3

the LRSE partition, is as shown in Table 5 For example, individuals assigned ascore of 1 or 2 on scale I are assigned a score of 1 on scale II.

To assess the uncertainty entailed in this conversion rule we now present, inTable 6, the full set of conversion rules obtained via the .95-relative surprise

region (8) given in Example 3, when using the prior [0

,0

,0

]. Of course, values1 and 5 on scale I must always convert to 1 and 3 on scale II. From the table wesee that there is little variation in converting 3 on scale I but more variation forboth 2 and 4, for example, four of the five partitions in the .95-credible regionconvert 3 to a 2.

A conversion rule for expanding scale II to scale I is more problematic.To see this, suppose we have selected a rule to convert from the finer to thecoarser scale. Then, if we wanted to convert from the coarser to the finer scalewe have multiple possible assignments for any categories that correspond to

a proper collapsing on the finer scale. A number of rules are possible and weconsider two rules, one deterministic, called the maximal rule, while the othercorresponds to a random assignment, called the random rule. These rules arebased on having chosen a specific collapsing as represented by the partition(P1, . . . , PK).

For the maximal rule we look at estimates pi of the pi, corresponding to acollapsed category, and choose, for the conversion of a value on the coarserscale, the value in the finer scale that maximizes pi. For the random rule weuse the the estimates pi to construct the conditional distributions for eachpartition element and then randomly allocate a value on scale II to a valueon scale I using the appropriate conditional distribution. The random ruleseems somewhat more realistic as not every observation on a given category onscale II corresponding to a collapsing, would be assigned the same category onscale I.

Table 6 A .95 credible region (based on the partitions in Eq. 8) for conversion of scale I to scaleII in Example 2 when using the uniform prior

Scale I Scale II Scale II Scale II Scale II(LRSE) Scale II

1 1 1 1 1 1

2 2 2 1 1 1

3 2 2 2 2 1

4 3 2 3 2 2

5 3 3 3 3 3


14/22

M. Evans et al.

Table 7 Possible conversions of scale II to scale I in Example 2 when using the uniform prior andbased on partitions in the .95-relative surprise region

Scale II Scale I Scale I Scale I Scale I (LRSE) Scale I

1 1 1 1 1 3

2 3 3 3 3 4

3 5 5 5 5 5

Logically, it makes sense to base the estimates pi on the conditional poste-rior of p given that the particular collapsing in question holds. Fixing the parti-tion (P1, . . . , PK), the likelihood for p is given by Eq. 2 as a function of p. Nowsuppose that we have placed a DirichletR(a1, . . . , aR) prior on p to start. Theconditional prior distribution of q is then DirichletK(lP1 al, . . . ,lPK al)and, from Eq. 2, the conditional posterior distribution ofq is DirichletK(g1 +

lP1( fl + al) , . . . ,gK +

lPK

( fl + al)). Given q, the conditional prior of(pli /qi, . . . , pui /qi), corresponding to Pi = (li, li + 1, . . . , ui), is Dirichletuili+1(ali , . . . , aui ) and the conditional posterior is Dirichletuili+1( fli + ali , . . . , fui +aui ). So the conditional LRSE of pl/qi, for l satisfying li l ui, is given byfl/( fli + + fui ) which we note does not involve the prior. The maximal rulethen converts i on scale II to the value l satisfying li l ui which maximizesfl/( fli + + fui ). The random rule converts i on scale II to the value l onscale I, satisfying li l ui, with probability fl/( fli + + fui ).

We illustrate this in the following example.

Example 7 (Converting from the coarser to the finer scale in Example 2.) InTable 7 we give the results of converting scale II to scale I, using the maximalrule, for each of the partitions in Eq. 8. This again gives us a sense of theuncertainties involved in this conversion.

We note that Table 6 is based on the .95-credible region (8) that we obtainedfor the collapsing, and so can be interpreted as a .95-credible region for the trueconversion from scale I to scale II. This is not the case, however, for Table 7as these are obtained by a somewhat ad hoc, albeit intuitively reasonable rule.There is an element of nonidentifiability in converting from scale II to scale I.

In Table 8 we record the conditional distributions corresponding to therandom rule for the LRSE partition in Example 2. We can see from this thatthe choice between 1 and 2 on scale I for a 1 on scale II is somewhat uncertain.

Table 8 Conversion probabilities for converting scale II to scale I in Example 2 based on the

LRSE partition

scale II \ scale I 1 2 3 4 5

1 4/7 3/7 0 0 0

2 0 0 2/3 1/3 0

3 0 0 0 0 1


15/22


5 Model checking

A question of some interest is whether or not the model for collapsings thatwe have developed in Section 2 makes sense in light of the observed data

( f1, . . . , fR) and (g1, . . . ,gK). This is effectively model checking and shouldbe carried out before we proceed to make inferences about an appropriatecollapsing.

One approach to model checking is based on the minimal sufficient statisticfor the sampling model given by Eq. 2. Once this statistic is determined weuse the conditional distribution of a discrepancy statistic given the value of theminimal sufficient statistic to assess the model via a P-value. Unfortunately, inthis problem it is very difficult to determine the minimal sufficient statistic letalone determine the conditional distribution.

An alternative approach might be to select a discrepancy statistic, suchas Pearsons chi-squared test statistic to compare the observed gi with theestimated expected values based on the observed f, for a particular partition(P1, . . . , PK) and do this for every possible partition. This, however, entailsthe use of

R1K1

dependent chi-squared tests and it is not at all clear how to

combine these to compute an appropriate P-value.We adopt an alternative strategy for model checking here. Suppose we

are satisfied that samples have been correctly obtained from the relevantpopulation so we feel confident that the underlying multinomial model leading

to likelihood (1) is correct. Now suppose that we have no information thatwould suggest that scale II is a collapsing of scale I. In such a case it makessense to put independent priors on p and q and then use the posterior tomake inferences about these quantities. Suppose we use the same prior onp as prescribed in Section 2, namely, p DirichletR(a1, . . . , aR) independentofq DirichletK(1, . . . , 1). Although other choices are possible for the prioron q, a noninformative prior would seem to make sense when we are testing amodel.

Now consider the null hypothesis H0 : q is a collapsing of p, and we will

consider H0 as a subset of SR SK. Notice that, provided we place a con-tinuous prior on SR SK, then H0 is a subset having prior probability equalto 0. The consequence of this is that we can think of the DirichletR(a1, . . . ,aR) DirichletK(1, . . . , 1) prior as being effectively on H

c0

while the prior onH0 is as described in Section 2. The prior on H0 only comes into play ifwe agree that H0 makes sense and, to assess this, we use the prior on thecomplement Hc

0. This shows that there is no conflict in our prior assessments

as the prior on p is the same.To assess H0 we proceed as described in Evans et al. (1993) and (1997). For

an arbitrary point (p, q) SR

SK

let dH

0(p, q) be a measure of the distance

(p, q) is from H0. For example, we will use

dH0 (p, q) = min(P1,...,PK)

Ki=1

qi

lPi

pl

2

, (10)


16/22

M. Evans et al.

i.e., least squares distance, although other choices are possible and in factmay prove more suitable for detecting specific types of deviations from themodel of a collapsing existing. Provided the amount of data is large enough,however, the use of Eq. 10 will detect deviations from model correctness.

Now we measure the concentrations of the prior distribution and the posteriordistributions of (p, q) about H0 by the prior and posterior distributions ofdH0 (p, q), respectively. Naturally, if H0 is true then we should see that theposterior distribution of dH0 (p, q) concentrates much more closely about 0than the prior distribution of dH0 (p, q). To assess this formally let dH0 anddH0 ( | f,g) be the prior and posterior densities of dH0 (p, q). Then to assessH0 we compute the relative surprise tail probability

dH0 (dH0 (p, q) | f,g)dH0 (dH0 (p, q))

dH0 (0 | f,g)

dH0 (0) f,g (11)

where ( | f,g) is the posterior of (p, q) based on the DirichletR(a1, . . . ,aR) DirichletK(1, . . . , 1) prior. Then dH0 (0 | f,g)/dH0 (0) is the relative beliefratio of H0 and this measures how the data have changed beliefs in H0. Analternative to H0 is given by a nonzero value of dH0 and so we see thatEq. 11 is the posterior probability that an alternative to H0 does not have arelative belief ratio greater than that of H0. As such, when Eq. 11 is smallthere are alternatives that have a larger relative belief ratio than H0 with high

posterior probability and this is evidence against H0. Note that ( | f,g) is theDirichletR(a1 + f1, . . . , aR + fR) DirichletK(1 +g1, . . . , 1 +gR) measure.It is straightforward to sample from both the posterior and prior dis-

tributions of (p, q) and from such Monte Carlo samples we can construct

Fig. 1 The prior (- -) and theposterior () densities ofdH0in Example 8

0.0 0.2 0.4 0.6 0.8 1.0 1.2

0

10

20

30

4

0

squared distance

density


17/22


Fig. 2 The relative beliefratios ofdH0 in Example 8

0.0 0.2 0.4 0.6 0.8 1.0 1.2

0.

0

0.

5

1.

0

1.

5

2

.0

2.

5

squared distance

relative

belief

estimates ofdH0 and dH0 ( | f,g) and estimate (11). The only difficulty lies inevaluating (10) for each sample and we simply do this by brute force. ProvidedR1

K1

is not large, this is feasible. We illustrate this procedure in the followingexample.

Example 8 (Testing the model in Example 2.) In Fig. 1 we have plotted theprior and posterior densities of dH0 (p, q) when p DirichletR(1, . . . , 1) andq DirichletK(1, . . . , 1) based upon Monte Carlo samples from the priorand posterior of N = 105. Note that the prior and posterior densities arevery similar reflecting the fact that we have a very small amount of datain this example. Irrespective of that, it is the difference between these dis-

tributions that is of importance as represented by the relative belief ratiodH0 (dH0 (p, q) | f,g)/dH0 (dH0 (p, q)) as graphed in Fig. 2. From both plotsit would appear that the posterior is concentrating closer to H0 than theprior. The relative belief ratio peaks at 0 and is dH0 (0 | f,g)/dH0 (0) = 2.794.Therefore, Eq. 11 equals 1.00 and we have no evidence against H0. Therefore,we have no evidence against the hypothesis that scale I has been collapsed intoscale II and can now proceed to inference about the specific collapsing, as wehave already discussed.

Table 9 Car satisfaction data for scale I

1 2 3 4 5

19 19 107 225 257


18/22

M. Evans et al.

Table 10 Car satisfaction data for scale II

1 2 3

5 381 115

6 Real data examples

We now consider some practical examples.

Example 9 (Satisfaction with cars survey.) This example concerns a pan-European survey on driver satisfaction with their cars. The survey was donein 2001 by TNS, a well-known European marketing analytics company. Data

were given to Zvi Gilula on condition that if used for publication, no disclosureof the make of the underlying cars is permitted. The survey was carriedout on 10 different compact cars made in Europe and Japan and was doneindependently in 5 countries (Italy, Germany, Spain, UK, and France). Thesame 10- category scale was originally used and then was adapted for eachcountry to allow ranking of cars by their distribution of satisfaction followingthe methodology reported by Yakir and Gilula (1999). In France, a 5-categoryscale was used while in Germany a 3-category scale was used.

The following satisfaction data were obtained based on samples of 627

in France and 501 in Germany on a car that is not made in either country(Tables 9 and 10).In Fig. 3 we have plotted the prior and posterior densities for the squared

distance used in model checking. In this case (11) is effectively 0 and we have

Fig. 3 The prior (- -) and theposterior () densities ofdH0in Example 9

0.0 0.2 0.4 0.6 0.8 1.0 1.2

0

10

15

20

squared distance

density

5


19/22


Table 11 Student evaluation data for scale I

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

3 3 0 0 3 0 3 0 3 0 3 0 6 6 6 0 12 6 12 39

strong evidence against a collapsing as we have defined it. This suggests thatFrench and German respondents have different attitudes (possibly cultural)towards this car, or at least that a more sophisticated model for the collapsingis needed to explain the results, if indeed their attitudes are the same.

Example 10 (Student evaluation data.) Students instructor evaluation at theHebrew University has been done for years on a 20-category scale. In theyear 2000, the provost established a professional committee to re-examinethe entire evaluation methodology and come up with recommendations forimprovement. One major recommendation was to replace the original scale byan 11-category scale. An experiment was carried out by one of us (Gilula) onsecond year BA students in psychology taking a course in research methods.The class had about 140 students and 105 of them filled the regular 20-categoryform. A sample of anonymous 43 of them later filled in the 11-category form.Although the two underlying samples are not fully independent, we usedour proposed methodology for illustrative purposes. The data appears in the

Tables 11 and 12.There are

19

10

= 92,378 possible collapsings. With this number the com-

putation time for the model checking is considerable but the computationsfor inference about the true collapsing are still very fast. In Fig. 4 we haveplotted the prior and posterior densities for the squared distance used in modelchecking. We see that the posterior has concentrated much more closely about0 than the prior. In fact Eq. 11 here is 0.967 so there is clearly no evidenceagainst the model.

We then proceeded to make inference about which of the possible collaps-

ings was appropriate. Experience with these surveys lead to the hypothesisthat an appropriate collapsing was given by (4, 5, 7, 8, 10, 11, 13, 15, 17, 19, 20).In Fig. 5 we have plotted the relative belief ratios for each of the collapsingswhere they are labelled sequentially according to the ordering we discussed inExample 1. The hypothesized collapsing corresponded to #86388, with relativebelief ratio equal to 45.477, while the LRSE corresponds to collapsing #81260,with relative belief ratio equal to 47.987. So the hypothesized collapsing iswell supported and in fact Eq. 11 equaled .991 when we tested the null

Table 12 Student evaluation data for scale II

1 2 3 4 5 6 7 8 9 10 11

2 1 1 0 1 1 3 6 5 7 16


20/22


21/22


7 Conclusions

We have formalized the problem of collapsing one categorical ordinal scaleinto another as an inference problem. Inferences then proceed by placing a

prior on the parameter for the finer scale and a prior on the set of collapsingsand applying Bayesian methodology. We have developed a test to assesswhether or not a collapsing makes sense for a given data set. Furthermore,we have discussed how one can convert one scale into another and assess theuncertainty in such a conversion. The authors will be happy to provide the Rcode that was used to implement the calculations.

Both from the reviewers fruitful comments and our initial results, we areencouraged and plan to extend this research to address the following issues.

1. The neutrality constraint. In many (if not most) cases, the attitudinal scalehas an odd number of categories to allow for a neutral category as themiddle category. It is not uncommon to have such a scale collapsed intoa coarser scale having an even number of categories without an explicitneutral category. In such cases it is not clear whether such a categorycan be collapsed over or must be retained. This adds an interestingconstraint not addressed by the methodology discussed here. We are nowinvestigating a new approach that utilizes stochastic ordering which, wehope, can accommodate this constraint.

2. Cut-point modeling. Ordinality of scales has traditionally prompted the useof latent variable models with censoring (e.g. Rossi et al. 2001). Typically,to use such an approach, more data is needed per individual, somethinglike the use of several attitudinal responses per individual. Such data arenot easily disclosed, and we are currently looking for cooperative sources.

3. More than two scales. The problem of finding an optimal conversionmechanism for 3 or more scales is challenging. As with log-linear models,this will require an extended methodology.

4. Time dependence. Sometimes surveys are administered at different points

of time which may affect the multinomial probabilities underlying thescales considered. We are investigating this interesting avenue.

We conclude this paper with the hope that this research will stimulateinterest among marketing scientists.

Acknowledgements The authors thank the Editor and two referees for many helpful comments.

References

Box, G. E. P., & Tiao, G. C. (1973). Bayesian inference in statistical analysis. New York: John Wileyand Sons.

Evans, M., Gilula, Z., & Guttman, I. (1993). Computational issues in the Bayesian analysis ofcategorical data: Loglinear and Goodmans RC model. Statistica Sinica, 3, 391406.


22/22

M. Evans et al.

Evans, M., Gilula, Z., Guttman, I., & Swartz, T. (1997). Bayesian analysis of stochastically ordereddistributions of categorical variables. Journal of the American Statistical Association, 92(437),208214.

Evans, M., Guttman, I., & Swartz, T. (2006). Optimality and computations for relative surpriseinferences. Canadian Journal of Statistics, 34(1), 113129.

Evans, M., & Jang, G. H. (2011). Inferences from prior-based loss functions. Technical ReportNo. 1104, Dept. of Statistics, U. of Toronto.

Evans, M., & Shakhatreh, M. (2008). Optimal properties of some Bayesian inferences. ElectronicJournal of Statistics, 2, 12681280.

Green, P. E., & Rao, V. R. (1970). Rating scales and information recovery - How many scales andresponse categories to use? Journal of Marketing, 34, 3339.

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacityfor processing information. Psychological Review, 63, 8197.

Rossi, P. E, Gilula, Z., & Allenby, G. M. (2001). Overcoming scale usage heterogeneity:A Bayesian hierarchical approach. Journal of the American Statistical Association, 96(453),2031.

Yakir, B., & Gilula, Z. (1999). An insightful comparison of brands on an arbitrary ordinal scale.Journal of the Marketing Research Society, 40, 249264.

conversion of ordinal attitudinal scales

Documents