t02-measuring the involvement construct

8/10/2019 t02-Measuring the Involvement Construct

1/13

easuring the Involvement onstruct

A bipolar a


2/13

34

THE JOURNAL OF CONSUMER RESEAR

been tested for inte rnal reliability, stability, or validity.

Hence a standardized, general, valid, and multiple-item

measure of involvement should be useful.

BACKGROUND AND CRITERIA

FOR MEASURING INVOLVEMENT

A measure of involvementindependent of the be-

havior tha t results from involvement would allow the

researcher to use the same measure across various re-

search studies. This measu re sho uld also be sensitive to

the proposed areas that affect a person's involvement

level. These areas might be classified into three cate-

gories (Bloch and Richins 1983; Houston and Roths-

child 1978):

1. Personalinherent interests, values, or needs that

motivate one toward the object

Physicalcharacteristics of the object that cause dif-

ferentiation and increase interest

3. Situationalsomething that temporarily increases

relevance or interest toward the object

In Houston and R othschild 's (1978) framework, differ-

ent situations and different people are two factors that

lead to various levels of involvement. Houston and

Rothschild integrate physical characteristics of the

product as part of the situational factor. Coinciding with

Bloch and Richins (1983), the present article separates

the physical from the situational and allows the same

physical object to be subjected to different levels of in-

volvement given different situations.

The evidence for the three factorsphysical, per-

sonal, or situationalthat influence the consumer's

level of involvement or response to products, advertis-

ing, and purchase decisions is found in the literature.

For examp le, Wright (1974) found that v ariation in the

type of media print versus audioinfluenced the re-

sponse given to the same message (physical). Lastovicka

and G ardner (1978a) demonstrated that the same prod-

uct has different involvem ent levels across people (per-

sonal),

and Clarke and Belk (1978) demonstrated that

different purchase situations for the same products

cause differences in search and evaluation or raise the

level of involvement (situational). Based on this prior

reasoning, a measure of involvement might be devel-

oped tha t would pick up differences across people, ob-

jects,and situations.

Different types of scales were pretested before select-

ing a measurement approach that seemed to be gener-

alizable across all product categories. First, a series of

vignettes was developed to represent involvemen t. T he

vignettes were similar to scenarios found in Lastovicka

and Gardner (1978b). Problems arose with developing

enough generalizable scenarios for a reliable scale. Lik-

ert scale items proposed a problem because items that

seemed to be appropriate for frequently purchased

goods did not seem to apply to durable goods and v

versa.

The mo st effective a nd g eneralizable tyjre of scale

peared to be a sem antic differential type (Osgood, S

and Tann enb aum 1957). Th e Sem antic D ifferen

consists of a series of bipolar items, each measured

a seven-point rating scale. It is easy to administer

score, takes only a few minu tes to com plete, and is

plicable to a wide array of objects. The descriptors

phrases easily relate across product categories and

be appropriate to other do mains, such as purchase

cisions or ad ve rti^ m en ts. (However, the main focu

this article and scale development is involvement w

products.) The steps taken to develop the measure w

1.

Define the construct to be measured.

2. Generate items that pertain to the construct.

3. Judge the content validity of generated items (i

reduction).

4.Determine the internal reliabiiity of items judge

have content validity (item reduction).

5.

Determine the stability of internally reliable it

over time (item reduction).

6. Measure the co nten t validity of the 20 selected ite

as a w hole.

7. Measure the criterion-related validity, which is

ability of the scale to discriminate among diffe

products for the same people and different situati

for the same product and same people.

8. Test the construct validity or theoretical value of

scale by gathering data and testing whether the sc

discriminates on self-reported behavior.

DEFINING THE CONSTRUCT

This article will adopt the general view of invol

ment that focuses on personal relevance (Greenw

and Leavitt 1984; Kru gm an 1967; Mitchell 19

Rothschild 1984). In the advertising domain, invol

ment is manipulated by making the ad relevant:

receiver is pereonally affected, and hence m otivated

respond to the ad (e.g.. Petty and Cacciopo 1981).

product class research, the concern is with the releva

of the pro duct t o the n eeds and values of the consum

In purchase decision research, the concern is that

decision is relevant, and hence that the consumer w

be motivated to make a careful purchase decision (e

Clarke an d Belk 1978). Although each is a different

main of research, in general, high involvement me

personal relevance (Greenwald and Leavitt 1984).

In this study, the definition of involvement used

the purposes of scale development was:

A person's perceived relevance of the object based

inherent needs, values, and interests.

This definition recognized past definitions of invol

ment (e.g., Engel and Blackwell 1982; Krugman 19


3/13

nition may be applied to advertisements, p roducts,

7) in advertising focused o n persona connections.

efined involvem ent with advertising as

s perception of the relevancy of the ad con-

of purchase and involvement inter-

red to response invo lvem ent and defined it as a func-

endu ring involvem ent or a need derived from a

e in the individu al's hierarchy of needs.

ITEM GENERATION AND

CONTENT VALIDITY

A sem antic differential scale was to be dev eloped

e eaTlier definition of invo lvem ent. T hu s, a

The first step was to judge the pro-

68

word pairs was tested in two p hases: (1) initial

ion of poor w ord pairs, and (2)

fin r

udging of the

Three expert judges (senior Ph .D. candidates in con -

ith "advertisement;" and third, replacing the

ve of involvem ent, (2) somew hat representative

nt. W ord pairs that w ere not rated as representative

Word pairs that were dropped included traditional

tudes used in th e psychology an d mar-

of involve me nt. The judges decided

scale that represent the low end of involvement

were generally not negativeas they would be if me

suring attitudes but rather were "w ho cares " descri

tors,

e.g., unimportant, unexciting, doesn't matter,

of no co ncern.

Five new judges then rated the remaining 43 wo

pairs using the same procedure. Only 23 items we

consistently rated as representing the involvem ent co

struct (80 percent agreement over products, purcha

decisions, and advertisements for each word pair). Th

meant that at least 12 of the possible 5 judgments f

each word pair (five judges over three objects) had

be rated as representative of the involvement construc

Agreement across judges and within each area for th

23 word pairs was as follows: advertisements, 84 pe

cent; products, 87 percent; and purchase decisions, 7

percent.

Twenty-three was assumed to be too low a numb

of items with which to start data collection (French an

Michael 1966; Nun nally 1978). Thu s, seven addition

items were added to the item pool to raise the initi

number to 30 (five of these seven were eventual

dropped). For example, trivial-grand (45 perce

agreement) was changed to trivial-fundam ental, an

inspiring-discouraging (55 percent agreement) w

changed to inspiring-uninspiring and returned to th

list. Therefore, a thirty-item scale emerged from t

content validity phase that trained and knowledgeab

judges agreed measured involvement over three d

mains: products, advertisements, and purchase dec

sions. However, this study focused on, and further va

idation procedures were carried out on, involveme

with products.

INTERNAL SCALE RELIABILITY

The next task was to administer the 30 items as

scale over different product categories to measure th

internal consistency or inter-item correlation. Tw

product classeswatches and athletic shoeswere s

lected b ecause they were thoug ht to b e used by the su

jects.

One hundred and fifty-two undergraduate ps

chology stud ents com pleted the scale during class tim

Approximately half of the subjects filled out the sca

pertaining to athletic shoes and the o ther half filled o

the scale pertaining to watches. The results show th

for both product categories, 26 bipolar items had a

item-to-total score correlation of 0.50 or more, and

Cronbach alpha level of 0.95.

Six adjective pairs with relatively low item-to-to

correlations were dropped; interestingly, m ost of the

adjective pairs had been returned earlier to the ite

pool. Factor analyses, using varimax rotation wi

squared m ultiple correlations in the diagno nals for fa

tor extraction, were carried out over both products

check if the item s selected for deletion loaded o nto o

particular dimension or were amorphous across facto

For both watches and athletic shoes, one factor e


4/13

44

THE JOURNAL OF CONSUMER RESEARCH

plained the major variation in the data, accounting for

70.3 percent and 69 3percent of the comm on variance,

respectively (eigenvalues 13.3 and 13.2). Watches had

two more factors, accounting for 11.6 percent and 5.6

percent of the common variance (eigenvalues 2.2 and

1.1), and athletic shoes had three more factors, ac-

coun ting for 11.7 percent, 5.9 percent, and 5.7 percent

ofthe common variance (eigenvalues 2.2, 1.2, and l.l).

The results of the factor analyses showed that the

item s selected for deletion did not load togethe r on any

unique factor across either product category. Since the

first factor accounts for approximately 70 percent of

the variance, and none of the remaining items had a

loading of zero or less on th at first dim ension , the scale

development con tinued on the assumption of a simple

linear combination of the individual items (Comrey

1973).The assumption is that no individual item is suf-

ficient, and that it is the scale taken as a whole that

tends to measure the involvement construct (Nunnally

1978).

TEST-RETEST RELIABILITY

Test-retest reliability of the remaining 24 items was

examined over two new subject samples and four new

prod uct categories. Sixty-eight psychology students ini-

tially rated calculators and mouthwash. Forty-five MBA

students rated breakfast cereals and red

wine.

The order

of the products was counterbalancedhalf of the sub-

jects in each group rated on e prod uct category first, and

the other half rated the other product category first.

The scales were administered duringclasstime and took

about five minutes to complete.

Three weeks later the scales were administered over

the same product categories to the same subjects. Thir-

teen psychology students and 19 MBA subjects were

lost to attrition; thus, 55 psychology students and 26

MBA students were used to measure test-retest reli-

ability. The average Pearson co rrelation between T ime

I and Time 2 on the 24 items was 0.90. Individual item-

to-item co rrelations ranged from 0.31 to

0.93.

Four ad-

ditional items with average test-retest correlations below

0.60 were deleted. The resulting twenty-item involve-

ment score test-retest correlations for each product were

as follows: calculators,

r

0.88; mouthwash, r = 0.89;

breakfast cereals, r = 0.88 ; and red wine,r=0.93.These

product categories were also tested for internal scale

reliability. The Cronbach alpha ranged from 0.95 to

0.97 over the four products.

Therefore, a twenty-item scale emerged from the in-

ternal reliability and stability phases of scale develop-

ment for products. Twenty items allowed an adequate

sampling of the possible items that represent involve-

ment with products and yet was long enough to ensure

a high level of reliability.' On a practical level, the scale

'Although the current analyses do not suggest what the reliability

is for subsets ofthe scale items, the case may be that a smaller num ber

fits neatly on one page and only takes a few mo m ent

to complete. The scale was then counterbalanced s

that ten random items were reverse scored. Since each

bipolar item was rated on a seven-point scale, the tota

possible score ranged from a low of 20 to a high of 140

The scale was named the Personal Involvement Inven

tory (PII) and is listed in Appendix A.

SECOND CONTENT VALIDITY

A second measure of content validity was obtained

from the open-ended responds of 45 MBA student

over three prod uct categories: 35mm cameras, red wine

and breakfast cereals. After completing the scales fo

each product, subjects answered the following open

ended question:

Now wewould like you to

state,

in yourown

words,w

you rated each product category as you did.

Subjects were then divided into three groupshigh

medium, or lowfor each product class according to

their scale scores.^ Examples o fthe open-ended respon

ses appear in the Exhibit.

Two expert judges (senior Ph.D. candidates in con

sumer behavior) b lind t o the scale scores evaluated the

total set of open-ended responses. For each produc

category, the judges sorted the comments into three

groups indicative of low involvement, medium in

volvement, and high involvement with the product cat

egory, based on how well the responses represented in

volvement, as defined earlier.

Interjudge reliability on the classification ofthe re

sponses was 80 percent agreement for 35mm cameras

84 percent agreement for red wines, and 80 percen

agreement for breakfast

cereals.

Classifications on which

the two expert judges did not agree were then given to

of items w ould be almost as reliable as the 20 items. The problem o

reducing the scale to fewer item s lies in deciding which items to selec

as subsets, since individual items differed in their reliability acros

product categories. A subset of items that may approach the reliabilit

ofthe 20 items for one product may not approach the same reliabilit

for another product. This variation is evident in that the test-retes

total score correlation ranged from 0.88 to 0.93 over products, an

test-retest for the 20 individual items ranged from 0.44 to 0.93 ove

various products. The twenty-item measure should outperform any

subset of the scale; besides, decreasing the number of items would

not really make the scale any easier to administer, but may serve to

decrease the domain of items judged as being representative of in

volvement and also lower the reliability ofthe scale. Researchers who

may use this scale are warned not to haphazardly reduce Ihe numbe

of items.

^The classification of subjects into low, medium, and high score

was based on an overall distribution developed over 3 product cat

egories (Table 3) and several hundred subjects. All scores were tab

ulated on the PII scale range presented in the Figure. Subjects whose

PII scores fell into the botto m 25 percent of the overall distribution

were classified as having low involvement with the product. Subject

whose PH scores fell into the middle 50 percent ofthe distribution

were classified as having medium involvement, and subjects whos

PIl scores were in the to p 25 percent ofth e distribution were classified

as having high involvement with the product. For development o

this classification scheme see Appendix B.


5/13

34

EXHIBIT

OPEN-ENDED RESPONSES ON CONTENT VALIDITY

3Smm Cameras

High involvement for cameras (scw e greater than 110).

a. Subject 1 . Cameras are impo rtant, but not essential. TTiey

provide

s

creative and h istorical outlet for m e.

b. Subject 12. Cim eras interest me and are

i

impcMiant bcrfiby

to me.

owinvolvement for cameras (sc wes less than 70).

a. Subject 1 7. Because I never use 35mm cameras and am not

extremely interested in them.

b. Subject 37. It's a nice prod uct to have but not a high priority.

I have several but as

I

recall, none of the purchases was an

involved purchase.

ed Wine

High involvem ait for red wine (score greater than 11 0).

a. Subject 22. Red wine adds a tot to the approfsiate meals.

b. Subject 6-1 have always wa nted to know more about wmes

and fflijoy

it

when people I know teach me about them.

Low involvem ent for red wine (score less than 70).

a. Subject 2 0. 1 m not interes ted in vt/ines nor do I particularly

appreciate the m ystique that surrounds w ines, in general.

b. Subject 36. OK for socials and getting drunk.

Breakfast cereals

. High involvement for breakfas t cereals (score greater than 1 1 0).

a. Subject 27.1 eat cereal, healthy efficient 'wake up Ame rica.

Cereal is good for y ou.

b. Subject 8. Because they are diet foods.

Low involvement for breakfas t cereals (score less than 70).

a. Subject 3. think breakfast cereals are a sham . I only eat

gr^ ienuts. it infuriates me to see breakfast cereals

advertised to be eaten with toas t, juice, etc. W hat's the use,

\awexercise?Irefuse to buy c ereal for my

child.

b. Subject 31 .1 eat cereal for convenience;itis easy and fast.I

have no interest in them nor am Ifascinated w ith them.

e to classify. Th e categories of responses, as

presented in Table 1. These data indicate

responses from the subjects, thus adding

ditional m odicum of suppo rt to the validity of the

CRITERION-RELATED VALIDITY

Criterion-related validityisdem onstrated by com-

neormore external variables that provideadirect

was the simple ordering or classification of prod-

Twenty-one products classified in other studiesas

TABLE

RELATIONSHIP BETWEEN THE SCALE SCORES AND

THE OPEN-ENDED RESPONSES

Scale

sccwes

Lo w

Medium

High

(Total)

Lo w

Medium

High

Judges' ratings

Lo w

7

4

0

(11)

1 2

8

0

Medium High (Total)

35 mm Cameras '

1

12

4

(17)

1

9

1

0

7

1 0

(17)

(8)

(23)

(14)

(45)

Red wine

0

8

6

(13)

(25)

(7)

Lo w

1 1

0

12

8

Collapsed for

Chi-square

Medium

1 3

4

1

1 0

Hig

7

1 0

0

14

(Total) (20) (11) (14) (45)

Breakfast cereals^

Lo w

Medium

High

(Total)

1 9

9

0

(28)

3

9

2

(14)

0

1

2

(3)

(22)

0 9 )

(4)

(45)

1 9

9

3

1 1

0

3

x:'-

10,4,

Of = 2.p 0.01.

' x ' = 1 7 . 0 , c / / - 2 ,p


6/13

46

THE JOURNALO CONSUMER RESEARC

,

138) = 39.9,/J


7/13

34

T LE 2

HELATIONSHIP BETWEEN CONSTRUCT VALIDITY STATEMENTS AND LOW. M EDIUM. OR HIGH PII SCORES:

MEANS, STANDARD DEVIATIONS, AND CORRELATIONS

Construct v^idi ty

sta tement '

. wou ld be interested in

reading mfc>rmation

about how the product is

made.

. would be intwes ted in

reading Itie onsumer

Reports

article about this

product.

have compared product

characteristics am ong

t'inds.

.

think there are a great

deal of differences

among brands.

I have a most-preferred

brand of this product.

Lo w

(32)

3.28

(2.0)

3 . 0 0 '

(1.8)

2.59*

(1.8)

3 .94 '

0-6)

2 .88 '

(1.9)

Instant coffee

Medium

(12)

4.42

(2.3)

4.75

(2.3)

3.42

(2.1)

4.67

(1.1)

4.83

(1.8)

High

(12)

4.25

(2.3)

4.92

(2.3)

5.25

(2.0)

6.33

(-8)

6.17

(1.7)

r"

.30=

.47"

.52=

.63=

.68=

Laundry detergent

Lo w

(4)

1 .25 '

(.5)

2.75*

(2.9)

1.75

(1.5)

2 ,25"

(1-0)

2 .50"

(3.0)

Medium

(28)

4.04

(1.7)

4.46

(2.0)

4.36

(1.8)

4.00

(1.7)

4.68

(1.6)

High

(25)

4.48

(2.4)

5.00

(2.1)

4.80

(2.4)

5.20

(2.1)

5.44

(1.9)

r

.37=

.33=

.42=

.42=

.42=

Low

(9)

3.56

(2.1)

4.56

(1.9)

3 . 1 1 '

(1.9)

4 . 1 1 '

(1.2)

2.56'

(1-4)

Color television

Medium

(26)

4.00

(1.9)

4.65

(1.9)

3.85

(1.9)

4.85

(1.5)

4.77

( 1 7 )

High

(12)

4.23

(2.1)

5.36

(2.1)

4.59

(2.3)

5.73

(1.7)

5.55

(1.9)

r

.1 4

.27=

.23

.33=

.50=

TTie construct valkSty stateme nts are me asi^ed on a seven-point scale: (t ) strongly disagree to (7) strongly agree

' r - Pearson CoireJation bet we m PI s core and response to constrvici vaBdity question .

p < 0 . 0 1 ,

" p < 0 . 0 5

Lo w scores signiflcantty (Sflerent than htgh scores p < 0-01 .

' Low scores significantly cfiffereot than high scores p < 0.05.

Nwnbera in parentfwses in Tatjie heading ore numbers of subjects in each group. NiHT*ef^ in parentheses in T^ jle body are st an d^ fl deviaBons.

color television because it may be their responsibility

to do the family laundry. If this is true, they would

value the product's benefits and they would be iikely

to be interested in the quality of the product because

they need the prod uct to perform their household duties.

Televisions, however, may not fall under their respon-

sibility for mainte nan ce o r interest them as mu ch. Elec-

tronics, solid state, or color tuning may not be relevant

to them. And if a television does not affect them per-

sonally, housewives might have relatively low involve-

ment with this product.

The results of the M ANOVA for all five statements

were significant for the three products (instant coffee

K10 ,

98) = 6.56,

p

O OOI;

laundry detergent

F

10,100)

= 2.34,p 0.05 ; and color television F(1O, 100) = 2.00,

p


8/13

348

THE JOURN L OF CONSUMER RESE R

is made. T o tap this dimension, subjects were asked the

extent to which they agreed with the statemen t I have

compared product characteristics among brands of

. For all prod ucts, the high scorers had signif

icantly greater agreement with the statement than low

scorers.

Perception of brand d ifferences. The next proposi-

tion tested was that high involvement scoTers would

perceive greater differences among brands in the prod-

uct class than low involvem ent scorers. This proposition

stems from writings of Robertson (1976), who suggests

that high involvement im plies that beliefs about product

attributes are strongly held, whereas low involvement

individuals do not hold strong beliefs about product

attribu tes. Thu s, the strength ofth e belief system to the

attributes emphasizes the perception of differences

among brands on the attributes where beliefs are

strongly held. Subjects were asked to respond to the

statem ent I think there are a great deal of differences

amo ng brand s of . High scorers always per-

ceived greater differences {p< 0.01) among brands than

low scorers in the product class.

Brand preferences. People highly involved in a

product class were hypothesized to have a most pre-

ferred brand in the product category. The preference

of a particular brand stems from the perception of dif-

ferences among brands. Since high involvement implies

perceiving greater differences about product attributes,

then the consumer should have a greater preference

based on that product differentiation. Again, over all

three products, high scorers showed a significantly (/J

< O.OI) greater agreem ent with the statement I have

a most preferred brand of than low scorers.

In conclusion, the various measures of construct va-

lidity used the correlation of two paper an d pencil tests

on the same subjects as evidence that the proposed scale

does tap the construct of involvement, as applied to

produ ct categories. Although no one result is an excel-

lent test of the scale, each finding add s to the weight of

evidence that the scale is an acceptable measure of in-

volvement, as applied to product categories.

F CTOR N LYSES

OF TH PII

An investigation ofthe dimensionality ofthe twenty-

item scale was carried out for each product category

used in the scale development. The items were factor

analyzed using varimax rotation with squared multiple

correlations in the diagonal for factor extraction. The

general pattern of results showed one main factor and

(usually) one minor or residual factor for every produc t

category. The major factor accounted for a range of

com mo n va riance from 65 percent for jean s to 100 per-

cet for instant coffee. Over ail products, all items loaded

positively on the first factor, which indicates that the

asumption of a simple linear combination ofthe scale

items was not violated.

SENSITIVITY TO SITUATIONAL

DIFFERENCES

The second content validity, the criterion validi

and the co nstruct validity sections have d emon stra

that the level of involvement with product categor

varies greatly over individuals. For any product ca

gory, there seems to be individuals who have low

volvement with the product and individuals who ha

high involvement with the product. Additionally,

average level of involvem ent varies across the differ

products. For example, students rated bubble bath

on the PII and rated automob iles 122 on the PII. T

demonstrates that different products are perceived d

ferently by the sam e p eople. The scale is also propos

to be sensitive to different situ ations , a third factor t

causes involvement, given the same people and the sa

products.

Previous studies by Clarke and Belk (1978) and B

(1981) demonstrated that some purchase situations c

be more involving than others. They found that

purchase of some previously uninvolving products

gifts can raise the level of involvement in the p urcha

decision. To investigate the possibility of rating p

chase situations on the scale, the PII was administe

over two purchase situations for wine to 41 memb

of the clerical and administrative staff used in the p

vious construct validity study. * Each subject rated t

purchase situations: ( i) the purchase of a bottle of w

for everyday consumption, and (2) the purchase o

bottle of wine for a special dinne r party . The scale ite

were internally reliable for these purchase decisio

Cronbach alphas were 0.98 and 0.97 respectively, a

the item-to-total correlations were generally above 0.

For this data collection, these situations were cou

terbalanced across subjects. The mean scale score

the everyday consumption was 78 ,s

34), and for

special dinner party was 106 s = 24). A related m

sures t-test was significant at ;(40) - 5.42,

p

t02-measuring the involvement construct

Documents