clinpsych rev

8/10/2019 Clinpsych Rev

1/36

Clinical Psychology Review, Vol. 15, No. 4, 261-296, 1995p.Copyright 0 1995 F&vim Science LtdPri&ciin the USA. AU rights reserved

0272~7356/95 9.50 t .ooPergamon

ASSESSMENT OF OBSESSIONS ANDCOMPULSIONS: RELIABILITY, VALIDITY,

AND SENSITIVITY TO TREATMENT EFFECTS

teven Taylor

Department of Psychiatry University of British Columbia

ABSTRACT. Advances in the treatment of obsessive-wm~lsive disora%r (OCD) require reliab~and valid measures of sufmt sensitivity to detect t7i?atment effects. The jm9.sen.t rticle 67iticaUyreoiews the instruments used in OCD treatment+n&ome wseamh. Behavioral methoa3, selj%$ortinventories, and obserwerrated scales are mviewed with mspect to content, n&abSty, validity, andsensitivity to treatment effects. Th.e latter was d&ermined by meta-analj~es of triuls of behavior

therapy (exposum plus response #m?vention) and clomipamine. Little is known about the psychumetric @perties of behavioral asst~sment methods, and they are used increasingly less o@ inoutcome re.seu~~ despite certain advantup. Sey* inventories tend to have acceptable &a-bility and validity, exce$t for the XL- M-R OC scale (and its jnzd.exxssars) hich has weak dis-criminunt validity and appears to be essentially a measunz of rwnspec.i$.c distnx Little is knownabout the reliability and validity of most observer+ated scales, despite the fact thut thq am poaU-lar in treatment outcome mean& AU measures appear sensitive to tmatment effects, aMoughobseruer-rated scules tend to yield larger effect sizes than self-report measures. For treatment outcolneresearch, the Yale-Bmwn Obsessive Compulsive Scale (YBOCS) awars to be th.e best availableinstrument in t8rm.s of range of obsessive-wm+sive features assessed reliabi&y, validi$ and

sensitivity to tteatment effects. Cvmp~ministemd and se&qbort versions of th8 YBOCS havebeen &eloped, which appearjnvmising but ~quinzj&rther evaluation. The effects of &a&tent maybe best understood 4 using measures of specify symptoms rather than relying on global measureof symptom severity. The YBOCS can be readily used fm the jn@osa. Th.e article concludes bco7Gdering additional requirfmwnts far a wm$nzhensive assessmat of obsf3.sion.s nd [email protected].

ONCE CONSIDERED RARE, obsessive-compulsive disorder (OCD) is now recog-nized as one of the most common psychiatric disorders. It has been described as ahidden epidemic (Jenike, 1989, p. 539) with a lifetime prevalence of 2.3% in NorthAmerica and similar rates of occurrence in other countries (Weissman et al., 1994).OCD is characterized by recurrent obsessions and/or compulsions of sufficient sever-ity to be time consuming, cause marked distress, and interfere with daily functioning(American Psychiatric Association [APA], 1994). Obsessions are intrusive thoughts,

Correspondence should be addressed to Steven Taylor, Department of Psychiatry, 2255Wesbrook Mall, University of British Columbia, Vancouver, B.C., Canada, V6T 2Al.


2/36

262 S. Taylor

impulses, or images, such as repetitive thoughts of violence or contamination.Compulsions are repetitive, intentional behaviors that the person feels compelled toperform, often with a desire to resist. Compulsions are performed either in response

to an obsession or according to certain rules, and are often intended to neutralize orprevent some feared event. However, either the compulsive activity is not connectedin a realistic way to what it is designed to neutralize or prevent, or it is clearly exces-sive (APA, 1994). Common compulsions include excessive washing and checking.

Cognitive-behavioral therapies and certain pharmacotherapies are effective treat-ments of OCD (Abel, 1993; Cox, Swinson, Morrison, Lee, 1993; van Balkom et al.,1994). However, there is much room for improvement in treatment efficacy.Investigators are continually searching for new treatments, and for optimal combi-nations of existing therapies. Advances in these endeavours require reliable, valid,

and sensitive assessment instruments. In this article I critically review behavioralmethods, self-report inventories, and observer-rated scales used in OCD treat-ment-outcome studies.

Should a comprehensive assessment of obsessions and compulsions include mea-sures of so-called OC personality traits? It was once argued that OCD arises fromtraits of excessive parsimony, obstinacy, and orderliness (Salzman, 1968), which char-acterize obsessive+ompulsive personality disorder (OCPD; APA, 1994). However,later research has shown that OCPD rarely precedes the occurrence OCD, andoccurs no more commonly in OCD than in other anxiety disorders. In OCD, as inother anxiety disorders, avoidant, dependent, and histrionic personality disorders

are far more common than OCPD (for reviews see Baer Jenike, 1992; Stein,Hollander, Skodol, 1993; Taylor Livesley, 1995). Accordingly, the present reviewwill focus on measures of obsessive compulsive (OC) symptoms, not on measures ofOC personality traits.

The review commences with a statement of criteria for evaluating the psychomet-ric properties of each instrument. Then I examine the tange of phenomena assessedby the instruments, along with their reliabilities and validities. Following this, I reviewtheir abilities to detect treatment-related changes in OC symptoms.

CRITERIA FOR EVALUATION

Nunnallys (1978) criteria will be used to evaluate internal consistency; coefficienta 2.70 will be defined as acceptable and a 2.80 will be defined as good. Criterion-related (knowngroups) validity will be examined by determining whether scores differ across diagnostic groups. For example, OC checkers should score higher than OCwashers on measures of compulsive checking. In evaluating criterion-related validity,I will consider the reliability and validity of the procedures used to establish diagnos-tic status.

Correlation coefficients are typically used to determine test-retest reliability and

convergent and discriminant validity. Comparison of correlations across studies iscomplicated by the fact that statistical significance varies with sample size. To illus-trate, two studies may find measures x and y are correlated .50. The correlation wouldbe nonsignificant (p > .05) if Study 1 used a sample size of 11, and significant (p c .Ol)if Study 2 used a sample size of 22. Reliance on statistical significance would lead tothe erroneous conclusion that the studies obtained inconsistent results. To circum-vent this difficulty, I will use Cohens (1988) scheme to evaluate the substantive sig-nificance of correlations: Large correlations are defined as those > .50, Ymediumncorrelations are from 30 to .49, and small correlations are between -10 and .29.


3/36


4/36

264 S. Taybr

a number of different fear-related tasks. To illustrate, Rachman et al. (1979) askedeach patient to carry out five tasks that usually gave rise to compulsive rituals. For eachtask an independent assessor scored the patients performance (1 = task completed,0 = task avoided), and scores were summed across tasks to yield a O-5 avoidance score.The assessor also rated the patients discomfort during each task, using a O-8 scale(0 = no discomfort, 8 = extreme discomfort). Discomfort scores were summed to yielda O-40 discomfort scale.

In the most recent effort to capture the complexity of OGrelated fear and avoid-ance, Woody, Steketee, and Chambless (in press-a) developed a multitask BAT inwhich tasks consisted of several steps. For each patient the authors identified threetasks that were difficult or impossible for the patients to complete without significantanxiety or rituals (e.g., switching off electrical appliances without checking). Each

task was broken down into 3-7 steps and the degree of avoidance and ritualizing wasassessed on a 3point scale (0 = no avoidance/rituals, 1 = partial avoidance/rituals,2 = unable to do task). SUDS levels also were recorded.

Reliabil&~ and validi+ Studies of phobia and agoraphobia have found BAT measuresof fear and avoidance to have acceptable test-retest reliability and good convergentvalidity with other measures of fear and avoidance (for a review see Nietzel, Bernstein,& Russell, 1988). Kern (1983) reported good criterion-related validity of BAT mea-sures of fear and avoidance for animal phobics. These results can be taken as evidencein support of the reliability and validity of the BATS used for OCs. However, this con-clusion rests on the untested assumption that findings from phobics and agorapho-bits generalize to OCs.

Unfortunately, there have been only two published studies of the psychometricproperties of BATS for OCD. Woody et al. (in press-a), using a multistep-multitaskBAT, obtained medium-sized correlations between BAT measures (fear and avoid-ance) and the Yale-Brown obsessive compulsive inventory (rs = .38 to .43). Woody,Steketee, and Chambless (in press-b) obtained small correlations between BAT fearand avoidance measures and the SCL-90-R OC scale (rs = -.02 to .26). Further stud-ies, using a broader range of OC measures, are required before conclusions can bedrawn about the convergent validity of the various versions of the BAT.

Cum&. BATS have the advantage of providing in vivo measures of fear and avoid-ance in OCD. The BAT appears well suited for assessing fear and avoidance of cont-aminated stimuli associated with washing compulsions. It is more difficult to designBATS for patients with other types of compulsions, such as checking or ordering ritu-als (Steketee, 1993). Indeed, Rachman et al. (1971) were unable to devise BATS for 2of their 10 OC patients. A further problem is that fear and avoidance can be situationspecific; a compulsive washer might fear and avoid touching objects in his or herhome (because of fear of contaminating the house), yet this person may fearlessly

handle objects in other situations (Rachman, 1993; Rachman Jc Hodgson, 1980).When a BAT is conducted in the clinic it may fail tocapture the severity of fear andavoidance that occurs in the patients habitual environment. Some studies haveassigned BATS as homework assignments (e.g., Cottraux et al., 1990), but this intro-duces the problem of determining whether the BAT was appropriately conducted.

There is no standardized protocol for administering the BAT Indeed, several ver-sions are available. Performance on the BAT varies with the perceived demands toperform the task (Nietzel et al., 1988). If the assessor strongly encourages the patientto approach the feared object, then this may not provide an accurate measure of nat-


5/36

0bsesskm s and Cum nd.sti 265

urally occurring avoidance. Low demand BATS are likely to be better measures of suchavoidance (Kern, 1983). Unfortunately, the published studies provide little informa-tion about the instructions given to OC patients, and so it is difficult to estimate thedegree of demand placed on patients. Given these difficulties, along with the dearthof reliability and validity data, it is not surprising that many OCD investigators nolonger use BATS (Emmelkamp, 1982; Foa, Steketee, 8c Milby, 1980)

Direct Observation

Direct observation of the frequency or duration of compulsive rituals has been usedin several case studies. Mills, Agras, Barlow, and Mills (1973) assessed washing com-pulsions of OC inpatients by installing a device that recorded the number of times thepatient approached and used the sink. For another inpatient, Mills et al. mounted avideo camera in a patients room to assess the duration of rituals associated with goingto bed. Both methods were validated against ratings made by ward staff, and were sen-sitive to treatment effects. Turner and colleagues (Turner, Hersen, Bellack, Andrasik,& Capparell, 1980; Turner, Hersen, Bellack, & Wells, 1979) also used observationalmethods to assess rituals. Ward staff were trained to use time sampling procedures toassess rituals, and achieved good inter-rater reliability (rs = .87 to .99). These measureswere sensitive to the effects of treatment.

Direct observation methods have not been used in controlled outcome trials, andwould be difficult to use to monitor compulsions in outpatients. Their test-retest reli-

ability, convergent validity, and discriminant validity remain to be determined.

Diary Methods

Diary methods are popular ways of assessing the frequency, duration, severity, andcontext of problematic behaviors. For example, panic attack diaries are popular mea-sures in treatment-outcome studies of panic disorder (e.g., Clark et al., 1994) andhave been used in studies of a variety of other disorders, including social phobia(Glass & Arnkoff, 1994), chronic pain (Philips, 1988), and insomnia (Lacks & Mot-in,1992). Diary methods appear to be useful methods for assessing the frequency, dum-

tion, and situational determinants of obsessions and compulsions. Several studieshave used these methods in OCD outcome studies (e.g., Boersma, Den Hengst,Dekker, & Emmelkamp, 1976; Foa, Steketee, & Milby, 1980; Hackman & McLean,1975)) but there appear to be no published data on their psychometric properties.

In their review of the assessment of obsessions and compulsions, Mavissakalian andBarlow (1981) noted that a major aim of treatment is to reduce the frequency andduration of obsessive


6/36


7/36

Obsessionr and compllsionr 267

Z&c&&u& r&&y. The HSCL OC scale has been found to have medium-t&ugecorrelations with measures of various types of psychopathology, with rs ranging from36 (MMPI Scale 7, Psychasthenia) to .52 (IPAT anxiety scale; Steketee & Doppelt,1986). Three studies found the SCL90 OC scale had medium-t&uge correlationswith measures of depression (rs = .41 to .81), anxiety (rs = .54 to -64)) hostility/anger(rs = .43 to .65), and psychotic symptoms (r= .57; Clark & Friedman, 1988; Derogatiset al., 1976, Dinning & Evans, 1977). The SCLSO-R OC scale has been found to havelarge correlations with the SCL90-R anxiety scale (r = .56) and SCL-90-R depressionscale (r = .79). In summary, the OC scales of the HSCL, SCL90, and SCL90-R havemedium-telarge correlations with measures of various types of psychopathology,including depression, anxiety, hostility/anger, and psychotic symptoms. These corre-lations tend to be larger than the convergent validity correlations, which indicates

poor discriminative validity.

The SCLSO-R OC scale and predecessors have acceptable internal consistency andadequate test-retest reliability for periods up to at least 7 days. There have been fewstudies of criterion-related validity, and available findings offer mixed support.Convergent validity appears adequate, but discriminant validity is poor. The OC scaleshave medium-to-large correlations with a variety of psychopathologic measures, whichsuggests the OC scales are largely measures of general (nonspecific) distress. This pas-

sibility is supported by a review of the item content of the scales. The scales overem-phasize nonspecific distress and under-emphasize OC symptoms; half the items of theHSCL OC scale and 40% of items of the later versions refer to nonspecific symptomsfound in several anxiety and mood disorders; that is, your mind going blank, ?rou-ble remembering things, difliculty making decisions, and trouble concentrating.A further problem is the scales confound the frequency of symptoms with the amountof distress evoked by them. This makes scores ambiguous; a high score indicates thesymptom was present and evoked distress, but a low score may indicate low fiequen-cy, low distress, or both. Given these problems, the SCL90-R OC scale (and prede-cessors) are not recommended as treatment outcome measures.

LEYTON OBSESSIONAL INVENTORY LOI)

The LO1 was developed to assess obsessionality in Yhouse-proudn (perfectionistic)housewives (Cooper & McNeil, 1968), and subsequently used to assess clinical OCphenomena (Cooper, 1970). The inventory consists of 69 items that each describean OC symptom (46 items) or so-called OC character trait (23 items). The subjectcompletes the LO1 by responding yes or no according to whether or not theitems are self-descriptive. A subset of 39 items then are rerated to assess: (a) theamount interference caused by the symptom or behavior described in the item (4-

point scale), and (b) the degree to which the subject resists performing the activitydescribed in the item (5point scale). Resistance and interference ratings are madeonly for items endorsed with a yes response. Thus, the LO1 contains four sub-scales: prevalence of OC symptoms, prevalence of OC traits, degree of interfer-ence, and degree of resistance. The present article is concerned with OC symptomsrather than putative OC character traits, and so I will focus on the symptom, inter-ference, and resistance subscales.

The LO1 originally had a postbox response format, where each item was printedon a separate card and the subject rated the self-descriptiveness of items by placing


8/36

268 S. Taylor

cards in boxes marked yes or %o. An assessor then instructed the subject to makeresistance and interference ratings for select items. Separate versions of the LO1were devised for men and women, distinguished by minor differences in wording(Cooper, 1970). The postbox format proved cumbersome and time consuming,requiring 30-45 min per subject (Cooper & McNeil, 1968). The LO1 was later con-verted to a self-report questionnaire, and the wording was revised to make a commonversion for both genders (Kazarian, Evans, & Lefave, 1977; Snowdon, 1980).Snowdon (1980) reported the postbox and questionnaire versions were highly cor-related (r= .72). For most purposes the postbox and questionnaire versions are probably interchangeable, although treatment outcome studies favor the latter becauseof its ease of administration.

ReliabilityInternal cxms&mq. The symptom, interference, and resistance subscales have accept-able-to-good internal consistencies, with coefficients ~ZC anging from .75 to .90 (Richter,Cox, & Direnfeld, 1994; Stanley et al., 1993).

TestRetat red-$. Kim et al. (1989) assessed a sample of OCs and obtained good7day test-retest reliabilities, with intraclass correlations ranging from .80 (interfer-ence subscale) to .83 (resistance subscale). Kim, Dysken, and Kuskowski (1990)administered the LO1 three times over 14 days to another sample of OCs. The intra-

class correlation for the total scale (sum of symptom and trait items) was .73.Intraclass correlations were .79 and .84 for the interference and resistance subscales,respectively. These results suggest the subscales have acceptable test-retest reliabili-ties, at least over a 14-day interval.

Validity

CriteriorrRe;late z&d+. Cooper (1970) and Millar (1980) found that OCs scoredhigher than normal controls on each of the symptom, inference, and resistance subscales. Kendell and DiScipio (1970) found that OCs scored higher than depressedpatients on these subscales. Millar (1983) found that OCs scored higher thandepressed patients on the interference and resistance subscales, but not on the symptom subscale. Stanley et al. (1993) used a structured interview - the AnxietyDisorders Interview Schedule-Revised (ADIS-R: DiNardo & Barlow, 1988) - to establish the diagnostic status of their patients. The ADIS-R has good reliability and validi-ty for the diagnosis of DSM-III-R anxiety disorders (DiNardo, Moms, Barlow, Rapee,& Brown, 1993). Stanley et al. found OCs differed from patients with other anxietydisorders on the LO1 symptom, interference, and resistance subscales. In summary,most studies were limited by the fact that the validity of the criterion (diagnostic sta-tus) is unknown, because diagnoses were based on chart reviews or unstructured

interviews. Nevertheless, most studies support the criterion-related validity of thesymptom, inference, and resistance subscales.

comrergent z&d&y. The LOI symptom, interference, and resistance subscales tend tohave large correlations (mean r= .62, range = .38 to .77) with other OC measures (i.e.,SCLSO-R OC scale, Maudsley Obsessional Compulsive Inventory, Padua Inventory,and Yale-Brown Obsessive Compulsive Scale; Hodgson 8c Rachman, 1977; Kim et al.,1990; Kraaijkamp, Emmelkamp, & van den Hout, 1986, Richter et al., 1994; Sanavio,1988; Stanley et al., 1993). This indicates good convergent validity.


9/36

Obsessims and compulsions 269

Disc&a&~ val&Uy. Rendell and DiScipio (1970) found the LO1 symptom subscalehad large correlations (r= .53) with the neuroticism scale of the Eysenck PersonalityInventory (EPI). Stanley et al. (1993) found the LO1 subscales had moderate correla-tions (rs = .36 to 37) with the EPI neuroticism scale. The subscales had small corre-lations (Iris < .27) with SCLSO-R scales assessing somatization, anxiety, phobia,depression, and interpersonal sensitivity. The LO1 subscales had small-tomedium cor-relations with the SCLSO-R hostility, paranoid ideation, and psychoticism scales(rs = .29 to .42). There was little to distinguish the pattern of correlations of the LO1subscales. Richter et al. (1994) found the subscales had medium-tolarge correlations(7s = .43 to .50) with the Hamilton depression scale. In all, the correlations betweenthe LO1 subscales and non-OC measures tend to be smaller than the convergent valid-ity correlations. These findings support the discriminant validity of the LO1 subscales.

omment

The LOI subscales have acceptable psychometric properties, yet they also have sever-al important drawbacks. The three subscales are highly intercorrelated with oneanother (mean r= .81, range = .70 to .91: Rachman et al., 1973; Richter et al., 1994;Stanley et al., 1993), which suggests it is redundant to use them all as outcome mea-sures. The symptom subscale was developed to assess house-proud housewives and,therefore, contains many items concerned with cleanliness and tidiness. It has fewitems assessing other symptom domains. For example, only three items pertaining to

checking, which is a serious limitation because checking is one of the most commoncompulsions (APA, 1994). This means that OCs with checking rituals may obtain spu-riously low scores on the LO1 symptom subscale.

A further problem is the resistance subscale may yield misleading results because itconfounds the intensity of resistance with the number of obsessions and compulsionsreported by the person. That is, the resistance scale is constructed such that an itemis rated for resistance only if the subject indicates that he or she experiences the symptom described in the item. Thus, high scores on the resistance scale can be obtainedonly from subjects endorsing a lot of symptoms. Although there can be no resistanceunless the person has at least one OC symptom, this does not mean that resistance isnaturally correlated with the number of obsessions and compulsions reported by theperson. Indeed, the LO1 resistance scale can produce a misleading picture of thepatients degree of resistance. To illustrate, two patients might equally struggle toresist their symptoms. If patient A has more symptoms than patient B, then patient Awill obtain a higher score on the LO1 resistance subscale, giving the misleadingimpression that patient A is exerting stronger resistance.

The LO1 resistance subscale also entails the questionable assumption that greaterresistance is associated with more psychopathology. Indeed, it can be argued thatgreater resistance is associated with less psychopathology (Goodman, Price,

Rasmussen, Mazure, Fleishmann, et al., 1989). Resistance to compulsions, for exam-ple, is a means of attaining mastery over symptoms and an important component ofbehavior therapy for OCD (Rachman & Hodgson, 1980; Steketee, 1993). Resistingobsessions also can lead to symptom reduction, to the extent that resistance involvesrefusing to act on ones obsessional fears; for example, refusing to avoid fearevokingstimuli can lead to habituation of obsessional fears. Resistance by means of deliber-ately suppressing obsessions can (under certain conditions) lead to a paradoxical in-crease in obsession frequency (Salkovskis & Campbell, 1994). Despite this exception,it is generally found that if measures of resistance are not confounded with symptom


10/36

270 S. Tzylur

prevalence, then the degree of resistance is negatively correlated with the severity ofobsessions and compulsions (Goodman, Price, Rasmussen, Mazure, Fleishmann et al.,1989; Woody et al., in press-a). Given the questionable assumptions underlying theconstruction of the LO1 resistance subscale, it appears to be of dubious value in assess-ing resistance to obsessions and compulsions.

MAUDSLEY OBSESSIONAL COMPULSIVE INVENTORY MOCI)

Hodgson and Rachman (1977) generated 65 true-false items to assess overt rituals andrelated obsessions. The items were administered to 50 OCs and 50 non-OC neurotics.The groups were discriminated by 30 items, which were retained to form the MOCI.The authors then administered the scale to 100 OCs and factor analyzed the responses. Five factors were obtained. Four were used to form the MOCI subscales: (a) wash-ing (11 items), (b) checking (9 items), (c) obsessional slowness/repetition (7 items),and (d) doubting/conscientiousness (7 items). The fifth factor, which assessed obses-sional rumination, had salient loadings for only two items and so it was disregarded.

The subscales are essentially symptom checklists; that is, their scores reflect theamount of time consumed by OC symptoms. To illustrate, a high score on the check-ing subscale indicates that the person spends a great deal of time checking andrechecking. A high score on the doubting/conscientiousness subscale indicates theperson has serious doubts about whether he/she has performed tasks adequately, anda sense of incompleteness even when tasks are performed carefully (Rachman 8c

Hodgson, 1980).

Relationships among Subscales

The MOCI subscales were developed because they corresponded to separate factors,and so conveyed unique (non-redundant) information. Factor analytic studies havereplicated the washing, checking, and doubting/conscientiousness factors (Chan,1990; Kraaijkamp et al., 1986; Rachman 8c Hodgson, 1980; Sanavio & Vidotto, 1985;Stemberger & Bums, 199Ob), but only Kraaijkamp et al. (1986) found support for afactorially distinct slowness subscale. In most studies, items from the slowness subscale

tended to load on other factors, such as the doubting/conscientiousness factor.Although most of the subscales are conceptually and Eactorially distinct, this does

not mean they are entirely unrelated. The doubting/conscientiousness subscaleassesses doubts or uncertainties about the adequacy of ones actions. Such doubts canlead to the repetition of actions, such as repeated checking, washing, and slowness incompleting tasks. Accordingly, the doubting/conscientiousness subscale is correlatedwith the checking subscale (mean r= .50) and has smaller but nontrivial correlationswith the washing subscale (mean r = .27) and the slowness subscale (mean r = .21;Chan, 1990; Hodgson 8c Rachman, 1977; Richter et al., 1994).

Reliability

In mud cmktmcy. Studies using clinical samples have generally obtained acceptableinternal consistencies for the checking, cleaning, and doubting/conscientiousnesssubscales, with coefficients a ranging from 60 to .87 (Hodgson & Rachman, 1977;Kraaijkamp et al., 1986; Rachman & Hodgson, 1980; Richter et al., 1994). Studies ofstudent samples yielded lower internal consistencies, ranging from .40 to .62 (Ghan,1980 ; Sanavio & Vidotto, 1985; Stemberger & Bums, 1980b). Lower crs may havebeen due to range restriction. Studies of clinical and nonclinical samples have gener-


11/36

Obsessions nd pulsions 7

ally found very low internal consistencies for the slowness subscale, with as rangingfrom 0 to .44 (Chan, 1990 ; Rraaijkamp et al., 1986; Rachman & Hodgson, 1980;Sanavio 8c Vidotto, 1985). Very low QS for the slowness subscales may be due to itemheterogeneity (see the Comment section below).

T&Retest w&&.Zity. Hodgson and Rachman (1977) examined the 4week test-retestreliability for a sample of university students. Rendells (1963) z was used to examinethe concordance between item responses across the retest interval. For the sum ofMOCI items, test-retest reliability was found to be acceptable (2 = 8). Kraaijkarnp etal. (1986) used the same procedure to examine the 4week test-retest reliability in amixed sample of OCs and depressed patients. Reliability was good (Z = .84) and MOCItotal scores correlated .92 across the test-retest interval. Stemberger and Bums

(1990b), using a sample of university students, found the 6-7 month test-retest relia-bility was acceptable for the MOCI total score (r = .69). In summary, the available datasuggest the MOCI total score has acceptable test-retest reliability over a period of atleast 6-7 months. Test-retest reliabilities of the subscales have yet to be reported.

Validity

CriteriorRe&ed r&d&y. Hodgson, Rankin, and Stockwell (1979, unpublished, citedin Rachman 8c Hodgson, 1980) found the MOCI total scale discriminated betweenphobics and OCs. Rraaijkamp et al. (1986) found the MOCI total scale reliably dis-

criminated OCs from normal controls, anorectics, and patients with non_OC anxietydisorders. Diagnoses were made from information obtained from the Present StateExam (Wing, Cooper, & Sartorius, 1974). The MOCI did not discriminate betweenOCs and depressed patients, although the latter may have been a chance result be-cause comparisons were based on 89 00 and only 6 depressives. Compared to normalcontrols and the combined psychiatric samples (anorexia, depression, and nor&Canxiety disorders), OCs had higher scores on all MOCI subscales, except the slownesssubscale, where OCs and normals did not differ.

Hodgson and Rachman (1977) obtained retrospective therapist ratings of theseverity of washing and checking rituals for OCs. Patients were diagnosed on the basisof unstructured clinical interviews. Patients were classified according to dichotomizedtherapist ratings (slight or no problem vs. moderate or severe problem) for wash-ing rituals and compared to dichotomized scores (low vs. high) on the MOCI washingsubscale. The same type of classification was made for therapist ratings of checking rit-uals and the MOCI checking subscale. Concordance between therapist ratings andMOCI subscale scores was assessed by the y coefficient (Goodman & RruskaI, 1963).Acceptable concordance (y = .7) was obtained for both washing and checking classi-fications. Rraaijkamp et al. (1986) performed the same analysis for a sample of OCs,classified as either washers or checkers by two independent raters. The Y coefficient

was calculated separately for each rater, and was .74 and .78 for checking rituals, and.85 and .89 for washing rituals. In sum, the results support the criterion-related valid-ity of the MOCI total score and its washing, checking, and doubting/conscientious-ness subscales. The only study of the slowness subscale (Rraaijkamp et al., 1986) failedto support its criterion-related validity.

C&q+ v&d&y. The MOCI tends to have large correlations (mean r= .57, range = .23to -77) with other OC measures (i.e., XL-9GR OC scale and predecessors, subscales ofthe Leyton Obsessional Inventory, Compulsive Activity Checklist, Padua Inventory, and


12/36

272 S. Taylor

Yale-Brown Obsessive Compulsive Scale; Freund, Steketee, & Foa, 1987; Goodman etal., 1989b; Hodgson & Rachman, 1977; Rraaijkamp et al., 1986; Richter et al., 1994;Sanavio, 1988; Steketee & Doppelt, 1986; Steketee & Freund, 1993; Stemberger &Bums, 1990a, 1990b; van Oppen, 1992; v-an Oppen et al., 1995; Woody et al., in press-a, in press-b). These results support the convergent validity of the MOCI.

Diseriminant validily. Chan (1990) found the MOCI correlated .54 with the BeckDepression Inventory, and Richter et al. (1994) found the MOCI correlated .41 withthe Hamilton depression scale. Stemberger and Bums (1990b) found the MOCI hadsmall-t-medium correlations with all SCUO-R scales (rs = .26 to .36) except the SCL90-R OC scale (r = .51). In general, the results show that correlations with non-OCmeasures tend to be lower than correlations with OC measures, which supports the

discriminant validity of the MOCI.

tlhwqmt and discri?ninant malidity of the MOCZsubscaks Several studies have examinedthe convergent and discriminant validities of the MOCI washing and checking sub-scales. The MOCI washing subscale has been found to have large correlations with thePadua inventory contamination subscale (rs = .53 to .87) and small-tomedium corre-lations with the Padua checking subscale (rs = -.05 to 33) (Stemberger & Bums,1990a; van Oppen, 1992; van Oppen et al., 1995). A similar pattern of results wasobtained for the MOCI checking subscale, which had large correlations with thePadua checking subscale (rs = .62 to .84) and small-tomedium correlations with thePadua contamination subscale (rs = .24 to .35) (Stemberger & Bums, 1990~ vanOppen, 1992; van Oppen et al., 1995). The MOCI washing subscale has small-tomedi-urn correlations with the MOCI checking subscale (rs = .25 to .46: Chan, 1990;Hodgson & Rachman, 1977; Stemberger & Bums, 1990b). These results indicate goodconvergent and discriminant validities of the MOCI washing and checking subscales.

Sher and colleagues (Sher, Frost, & Otto, 1983; Sher, Mann, & Frost, 1984) foundthat students with high scores on the MOCI checking subscale, compared to studentswith low scores, had higher scores on a self-report measure of the frequency ofchecking of everyday actions (e.g., checking lights and door locks). Frost and Sher

(1989) administered the MOCI to a sample of college students 1 month before anexam. During the exam, students were asked to indicate how many times theychecked their answers. The MOCI checking subscale was correlated .27 with check-ing frequency, whereas the other subscales correlated were unrelated to checkingfrequency (rs = -.08 to .02).

The MOCI checking and washing subscales have medium-tolarge correlations(r-s = .30 to .51) with the Beck Depression Inventory and Hamilton depression scale(Chan, 1990; Richter et al., 1994). These tend to be lower than the convergent valid-ity correlations, and so support the discriminant validity of the checking and washingsubscales. There is insufficient information to evaluate the convergent and discrimi-nant validity of the other subscales.

The MOCI total scale has generally acceptable psychometric properties, as does itswashing and checking subscales. The other subscales require further investigation.Available evidence suggests the slowness subscale is in need of revision. The MOCIsubscales were developed on the basis of factor analysis, and subsequent studies support the factorial distinction between all but the slowness subscale. The latter has


13/36

Obsess~andGma~ionr 273

poor internal consistency, which is not surprising given its item content. Two of itsitems are related to ruminations, two items refer to compulsive counting and the needfor routine, and only three items make direct reference to obsessional slowness.

Although the MOCI total scale has adequate psychometric properties, it also hasimportant limitations. The scale was developed to assess obsessions and compulsionsassociated with overt rituals (Hodgson & Rachman, 1977). Yet some items do notdirectly pertain to obsessions or compulsions (e.g., Some numbers are extremelyunlucky, Neither of my parents was very strict during my childhood). The MOCIassesses washing and checking compulsions, which are the most common types ofcompulsions (APA, 1994; Rachman & Hodgson, 1980), but does not assess otherimportant compulsions such as hoarding and covert rituals. It provides a limitedassessment of obsessional ruminations (two items).

The MOCI does not assess important parameters of OCD, such as interference andresistance to compulsions. Interference only can be inferred by the number of symptoms endorsed by the subject. Moreover, because the MOCI emphasizes cleaning andchecking rituals, patients with these compulsions may obtain higher overall MOCIscores than patients with other, equally severe OC symptoms. This means it is possiblethat patients with moderate washing and checking compulsions may obtain higherMOCI scores than patients with severe obsessions or hoarding compulsions(Goodman, Price, Rasmussen, Mazure, Delgado, et al., 1989).

The MOCI could be improved by addressing these issues. The internal consistencyof the slowness subscale could be enhanced by increasing the length of the scale byadding items central to the construct of obsessional slowness. The addition of subscale(s) to assess obsessions also would improve the coverage of the MOCI. The addi-tion of resistance and interference subscales would enhance the breadth of assessment.

COMPULSIVE ACTIVITY CHECKLIST CAC)

The CAC was developed originally as a 62-item interviewer-administered scheduleto assess the extent to which OC symptoms interfere with everyday activities (Philpott,1975). Each item lists an activity (e.g., washing, dressing, using electrical appliances),which is rated on a 4-point scale, ranging from 0 (performance of activity within nor-mal limits) to 3 (complete impairment). Impairment is rated according to four crite-ria: frequency, duration, avoidance, and oddity of behavior. To illustrate, a score of 3would be given if (a) the activity takes three times longer than usual, (b) is three timesas frequent as usual, (c) definitely appears very odd, or (d) avoidance markedly inter-feres with activity. Criteria for normal and odd behavior are left to the judgementof the interviewer. Interviewers are instructed to elicit concrete information to allowthem to make a rating (e.g., How long does it take you to brush your hair?).

The CAC has been revised several times, mainly by deleting items and changing toa self-report format. Marks, Hallam, Connolly, and Philpott (1977) developed clini-

cian-rated and self-report versions, each containing 39 items. Freund et al. (1987)developed a 38item observer-rated version, and Cottraux, Bouvard, Defayolle, andMessy (1988) developed an 18item self-report version. Most recently, Steketee andFreund (1993) developed a 28item self-report version. Each revision was intended toincrease item homogeneity and discriminability of OCs from other populations,although as we will see, the versions have very similar psychometric properties.Instructions and the rating scale remained essentially unchanged. In summary, eachversion is a measure of global impairment due to obsessions or compulsions, takinginto account duration, frequency, and avoidance.


14/36

274 S. Taylor

Reliability

Internal txm&emy. Good internal consistency has been reported for the 37-item self-

report CAC (a = .94: Cottraux et al., 1988) and for the 38item observer-rated version(a = .91; Freund et al., 1987). Similar results were obtained for the 38item self-reportversion (as = .86 to .95: Stemberger & Bums, 1990b; Steketee & Freund, 1987) andfor the 28item self-report version (a = .87: Steketee 8c Freund, 1987). Internal con-sistency of the other, less popular versions have not been reported.

Znterrcrter &ability and txd&m&p between seZf+ and obsewerd vemions. Marks,Stem, Mawson, Cobb, and McDonald (1980) had two independent assessors admin-ister the 39item CAC to a sample of OCs. Total scores correlated .95 betweenobservers, and the observer-rated and self-report versions correlated .83. Freund et al.(1987) obtained moderate inter-rater agreement (r = .64) for the 38item CAC. Themean CAC score, averaged across raters, correlated .94 with the 38-item self-reportCAC. These results suggest the observer-rated CAC has adequate interrater reliability.The self-report and observer-rated versions are highly correlated. It is possible the cor-relations between observer-rated and self-report versions were inflated by criterioncontamination. That is, patients may have rated their responses to the self-report ver-sion simply by recalling their responses to the observer-rated version.

TestAtetest mZiubiZity. Freund et al. (1987) averaged CAC ratings from two inter-

viewers to examine the test-retest reliability of the S&item CAC. Test-retest relia-bility was .68 for a retest interval ranging from 5 to 60 days (mean = 37 days).Cottraux et al. (1988) administered the 37-item self-report CAC to a sample of nor-mal controls and found the l-month test-retest reliability was .62. Sternberger andBurns (1990b), using a sample of university students, obtained a 6-7 monthtest-retest reliability of .74. Extrapolating from these results, it seems likely that theself-report and observer-rated versions have good test-retest reliability over a peri-od of weeks, if not months.

Validity

~riterion_~elated zm~idity. Using the 37-item self-report CAC, Cottraux et al. (1988)found that OCs (diagnosed by an unspecified method) had higher scores than panicdisordered patients, social phobics, and normal controls. Steketee and Freund (1993)compared OCs (diagnosed by an unspecified method) to patients with other anxietydisorders and to university students. OCs had significantly higher scores on 29 of 38items of the self-report CAC. In the absence of information on the reliability andvalidity of the diagnoses, these findings offer only tentative support for the criterion-related validity of the CAC.

Convergent validity. The self-report and observer-rated CACs tend to have medium cor-relations (mean r= .40, range = .19 to .84) with other OC measures (i.e., SCLSO-R OCscale, Maudsley Obsessional Compulsive Inventory, Padua Inventory, and Likert scaleratings of symptom severity; Cottraux et al., 1988; Freund et al., 1987; Marks et al.,1980; Steketee 8c Freund, 1993; Stemberger & Bums, 1990b). These results supportthe convergent validity of the CAC.

LXscriminant w&%+ Freund et al. (1987) found the 38item observer-rated CAC had amedium correlation with the SCLSO-R OC scale (r = .38) and slightly smaller correla-


15/36

ObsessionsandconrpuLFionr 275

tions with the other SCLWR scales (rs = .14 to 31). Foa et al. (1987) found the observ-er-rated CAC had medium correlations (rs = 33 to .47) with measures of depression(i.e., the Reck Depression Inventory and patient and observer-mted Likert measures ofdepression severity). To summarize, the observer-rated CAC has correlations with non-OC measures that tend to be similar in magnitude to correlations with OC measures.This indicates weak discriminant validity. The same conclusion probably holds for theself-report CAC, because the self-report and observer-rated CACs are highly correlated.

omment

Since its appearance in the 197Os, he CAC has been through several revisions. The mostpopular are the S&item self-report and observer-rated versions. The 2838 item selfreport and observer-rated versions have very similar psychometric properties. Test-retestreliability and internal consistency are good. Interrater reliability appears adequate.Criterion-related and convergent validities are acceptable, but discriminant validityappears weak. A further problem with the CAC is that it provides only an indirect mea-sure of OC symptoms because it assesses only the degree of interference in everydayactivities. t does not directly assess bsessions or compulsions. Moreover, scores on theCAC are ambiguous because they confound slowness, avoidance, and oddity of behavior. The lack of a structured interview is a further limitation for the observer-rated ver-sion because psychometric properties may depend on the skill and training experiencesof the interviewer(s) rather than the properties inherent to the CAC.

OC SCALE FROM THE COMPREHENSIVE PSYCHOPATHOLOGICALRATING SCALE CPRS-OC)

The Comprehensive Psychopathological Rating Scale (CPRS, Asberg, Montgomexy,Perris, Schalling, & Sedvall, 1978) is a set of 63 clinician-rated items that assess a rangeof psychiatric signs and symptoms. Each item defines a sign or symptom, which is ratedon a 4point (O-3) severity scale. Each point on the rating scale also is accompanied bya description. For example, a rating of 3 on the rituals item is indicated by extensiverituals or checking habits that are time consuming and incapacitating. The interview-er is required to elicit sufhcient information to rate each item, using an unstructuredclinical interview. The CPRS-OC consists of eight items selected from the CPRSbecause a sample of 24 OCs scored higher on these items than on the remaining items(Thoren, Asberg, Cronholm, Jomestedt, & Traskman, 1980). The items are as follows:rituals, D inner tension, compulsive thoughts, concentration difficulties, worry-ing over trifles, sadness, lassitude, and indecision. Four of these items also areincluded in the CPRS depression scale.

Reliability and Validity

The CPRS-OC has been used in several pharmacotherapy trials (Table 3), even thoughits psychometric properties are largely unknown. Internal consistency and test-retestreliability have yet to be examined. Thoren et al. (1980) reported moderate-tohigh in-terrater correlations for individual items (rs = .30 to .93) and for the total score (r = .97).Criterion-related, convergent, and discriminant validities have yet to be examined.

C o m m e n t

There are several limitations to the CPRSOC. Its psychometric properties are largelyunknown, and only two of its eight items are specific to OCD: compulsive thoughts


16/36

276 S. Taylor

(obsessions) and rituals. The remaining items are either features of depression(lassitude, n concentration difficulties, indecision, sadness) or are nonspecificfeatures of anxiety states (worrying over trifles, * inner tension). Insel et al. (1983)modified the CPRS-OC by deleting items assessing sadness, inner tension, and worry.The resulting 5-item scale still shares two items with the depression scale. Even in itsrevised form the CPRS-OC appears to be largely a measure of nonspecific distress.

LIKERT SCALES

A variety of single-item Qpoint Likert scales have been developed to assess a variety ofaspects of OCD, including global measures of severity of obsessions and compulsions,and specific scales, including measures of the degree of OGrelated fear, degree of avoid-ance, time spent ritualizing, and severity of urges to ritualize (e.g., Emmelkamp, 1982;Foa et al., 1983,1992). The scales may be rated by the patient or by an interviewer.

Reliability

Intewutm vhbi y and tdadmship between seljk+ort and obsemwm&d zmGons. Foa etal. (1983) obtained high inter-rater correlations for Likert measures of severity ofobsessions and severity of compulsions (rs = .92 to .97). Cottraux et al. (1990) report-ed large correlations (rs = .74 to .89) between a self-report and observer-rated ver-sions of two types of Likert measures (OGrelated anxiety/discomfort and duration ofcompulsions). Large correlations also have been obtained among the patients, ther-apists, and independent observers ratings of a range of OC features, including mainfear, avoidance, and compulsion severity (rs = .64 to .83; Foa, Steketee, Kozak, 8cDugger, 1987). Thus, there is evidence of good interrater reliability, and high corre-lations between self-report and observer-rated Likert scales.

T Ret ns&zbiZi~. Steketee, Freund, and Foa (1988) reported the test-retest relia-bility of Likert scales (assessing main fear, avoidance, general functioning, anxiety,and depression) ranged from .40 to .87 for self-report ratings, and .20 to 50 forobserver ratings over a mean 6O-day interval. These data suggest considerable varia-tion in test-retest reliabilities. Unfortunately, reliabilities were not reported for indi-vidual scales (only the above-mentioned ranges were given), so it is not possible toidentify which scales had the lowest reliability. In summary, the test-retest reliabilityof Likert scales require further investigation.

Validity

CriteriorrRelated valid&y. There have been no published studies of the criterion-related validity of these scales. It may be assumed that the scales should have goodcriterion-related validity because patients without OCD would have low (or zero)

scores on items measuring global severity of obsessions, compulsions, etc. However,this assumption may not be warranted because unwanted intrusive thoughts oftenoccur in people without OCD (Rachman & de Silva, 1978; Salkovskis & Harrison,1984), and compulsion-like behaviors (e.g., excessive checking) can occur inpatients with disorders other than OCD (e.g., generalized anxiety disorder; Craske,Rapee, Jackel, & Barlow, 1989).

Ce r&&y. Likert measures of OC symptoms generally have moderate cor-relations (mean r = .32, range = .17 to .62) with other OC measures, including the


17/36

Obsessionr and GmpukMns 277

SCLSO-R OC scale (and predecessors), Compulsive Activity Checklist, Yale-Brownobsessive compulsive scale, and Padua inventory (Cottraux et al., 1988; Foa et al.,1983; Freund et al., 1987; Steketee 8c Doppelt, 1986; van Oppen, Emmelkamp, vanBalkom, & van Dyck, in press; Woody et al., in press-a, in press-b). This suggests thatthe Likert scales generally have acceptable convergent validity.

LGcrimina& z&id$y. Foa et al. (1983) found that Likert measures of OC symptoms hadsmall-tomedium correlations with HSCL measures of depression, somatization, anxi-ety, and interpersonal sensitivity (rs = .09 to .36). Foa et al. (1987) found Likert ratingsof the patients main fear and severity of compulsions had small correlations with theBeck Depression Inventory, and with an interview rating of depression (rs < .29), andsmall-to-medium correlations with a patient-rating of depression severity (rs = .28 to

.30). In all, these results suggest adequate discriminant validity of the Likert measures.

Comment

Likert Scales are popular because of their ease of administration and scoring.Multiple scales can be used to assess multiple aspects of OCD. Indeed, the Yale-BrownObsessive Compulsive Scale (discussed below) can be regarded as a compilation ofsuch scales. A limitation of Likert scales is that researchers using them have providedlittle information on the instructions accompanying the self-report versions, and noinformation on the questions asked by interviewers using the observer-rated versions.

This makes it difficult to determine whether different investigators are administeringthe measures in the same way.

NIMH GLOBAL OBSESSIVE COMPULSIVE SCALE COG)

The GOCS (Insel et al., 1983) is a single-item Likert-like measure of the overallseverity of OC symptoms. It is a clinician-rated scale based on other NIMH globalrating scales, such as the global measures of mania and depression (Murphy, Pickar,& Alter-man, 1982). It differs from the Likert scales described in the previous sec-tion in two ways: the number of rating points (15 vs. 9), and the clustering of

descriptors on the scale. The observer completes the GOCS by selecting one of 15severity levels, ranging from 1 (minimal symptoms or within normal range) to 15(very severe). Severity levels are clustered into five main groups (i.e., ratings of l-3,4-6, 7-9, 10-12, and 13-15), with detailed descriptors for each cluster. For exam-ple, ratings from lo-12 represent severe obsessive-compulsive behavior, defined assymptoms that are crippling to the patient, interfering so that daily activity is anactive struggle. Patient may spend full time resisting symptoms. Requires muchhelp from others to function.

Reliability and Validity

The GOCS has been used in several treatment outcome studies (Table 3) even thoughlittle is known about its psychometric properties. Inter-rater reliability has yet to bedetermined. Two studies have examined test-retest reliability. Kim et al. (1992)reported a twoweek intraclass correlation of .98, and Kim, Dysken, Kuskowski, andHoover (1993) obtained a 2-week intraclass correlation of .87.

There have been no studies of criterion-related validity or discriminant validity. Withregard to convergent validity, one study found the GOCS had a medium correlation(7 = .33) with the SCLSO-R OC scale, and several studies obtained large correlations


18/36

278 S. Tqlur

(mean r= .69, range = .63 to .77) with the YBOCS (Black, Kelly, Myers, de Noyes, 1990;Goodman, Price, Rasmussen, Mazure, Delgado et al., 1989; Kim et al., 1992,1993). Theconvergent validity of the COCS is promising, albeit in need of further evaluation.Correlations between the COCS and YBOCS may have been spuriously inlIated becausein each case the scales were administered by the same interviewer. This means that rat-ings made on the YBOCS may have influenced those on the COCS, or vice versa.

Comment

The COCS has the advantage of being a simple l-item scale, which, no doubt,accounts for its popularity in treatment outcome studies. However, little is knownabout its reliability and validity. COCS ratings are based on unstructured clinical inter-views, and so its psychometric properties may vary widely from one study to the next,depending on the adequacy of the interviews. The COCS provides only a globalassessment of OC symptoms, and fails to capture information about the severity of dif-ferent types of OC symptoms.

YALE-BROWN OBSESSIVE-COMPULSIVE SCALE

The YBOCS is a semistructured interview designed to assess symptom severity andresponse to treatment for patients diagnosed with OCD (Goodman, Price, Rasmussen,Mazure, Fleishmann et al., 1989; Goodman, Price, Rasmussen, Mazure, Delgado et al.,1989; Goodman, Rasmussen et al., 1989). It consists of three parts:

1. Definitions and examples of obsessions and compulsions, which the interviewerreads to the patient.

2. A Symptom Checklist, containing over 50 common obsessions and compulsions,including obsessions about aggression, contamination, and counting, and com-pulsions about cleaning, checking, ordering, and hoarding. The interviewerasks the patient whether the symptoms are present currently or were present inthe past. The interviewer then asks the patient to list the most prominent obses-sions, compulsions, and OCrelated avoidance behaviors.

3. The YBOCS proper, which consists of 10 core items and 11 investigationalitems. The latter are included on a provisional basis and require further evalu-ation. The core items assess five parameters of obsessions (items l-5) and com-pulsions (items 6-10): (a) duration/frequency, (b) interference in social andoccupational functioning, (c) associated distress, (d) degree of resistance, and(e) perceived control over obsessions or compulsions. Thus, the YBOCS assess-es parameters of symptom severity independent of symptom content.

For the YBOCS proper, each core item is rated by the interviewer on a 5-pointscale, ranging from 0 (none) to 4 (extreme). The rater must determine whether the

patient is presenting with real obsessions or compulsions, and not symptoms ofanother disorder such as paraphilia. All items are accompanied by probe questions,and written definitions accompanying each point on the O-4 scales. Items are ratedin terms of the average severity of each parameter over the past week. To illustrate,item 1 assesses the average time spent on all obsessions over the past week. Theaccompanying rating scale ranges from 0 (no obsessions) to 4 (extreme, greaterthan 8 hours/day or near constant intrusions). Scores on the 10 core items aresummed to yield scores for the obsessions subscale, the compulsions subscale, andthe total (lo-item) YBOCS scale.


19/36

Obsd and Ckm@lsions 279

The YBOCS investigational items assess the following: amount of time free of obses-sions or compulsions, insight into the irrationality of obsessions and compulsions,avoidance, degree of indecisiveness, overvalued sense of personal responsibility,obsessional slowness/inertia, pathological doubting, global severity, overall responseto treatment, and reliability of information obtained from the patient. They are ratedby the interviewer on O-4 or O-6 scales, similar to those used for the core items.

YBOCS resistance items are rated such that greater resistance is associated withlower scores, because greater resistance is associated with less impairment in socialand occupational functioning. This scoring rule is supported by the finding that resis-tance scores are correlated with less severe OC symptoms, as assessed by other YBOCSitems (Goodman, Goodman, Price, Rasmussen, Mazure, Fleishmann et al., 1989;Woody et al., in press-a).

In practice, most published treatment outcome studies used only the sum of the 10core items. Scores on the obsession and compulsion subscales are infrequently used,and the Symptom Checklist has yet to be used as an outcome measure. In the follow-ing, the review is confined to the psychometric properties of the lo-item YBOCSbecause there is little or no available information on the properties of the SymptomChecklist or the investigational items. Accordingly, I will use the acronym YBOCS torefer to the scale formed by the sum of the 10 core items.

e

Z&et-m&r reEub29. Price, Goodman, Charney, Rasmussen, and Heninger (1987) obtamed an intraclass correlation of .99 when the YBOCS was administered by two inde-pendent raters to 10 OCs. Goodman, Price, Rasmussen, Mazure, Fleishmann et al.(1989) assessed the inter-rater reliability of the YBOCS by having six trained ratersevaluate videotape interviews of six OCs. The intraclass correlation was .80. In a sec-ond study reported in the same article, four trained raters evaluated videotaped inter-views of 40 OCs, yielding an intraclass correlation of -98. Jenike et al. (1990) used fourraters to assess 40 OCs and obtained an intraclass correlation of .96 for the YBOCS.No information was presented on whether the ratings were based on audiotapes,videotapes, or live interviews. Woody et al. (in press-a) had an interviewer obtainYBOCS ratings from live interviews of 30 OCs, and then a second rater listened toaudiotapes of the interviews. The intraclass correlation was .93.

The results of these studies suggest the YBOCS has excellent inter-rater reliability.However, it is possible that inter-rater reliability was spuriously inflated, at least to somedegree. The reliability estimates were obtained by having one evaluator rerate tapedinterviews of another evaluator. This shows that one can score anothers interview reli-ably, but not that one can administer the instrument reliably. It is quite a different taskto reproduce a raters score, based on a taped interview, than to interview the patientfrom scratch and obtain a score that matches that of another rater who also inter-

views the patient independently. If the original (criterion) rater makes the mistake ofgiving the patient actual rating categories to choose from (instead of the interviewerrating the categories), then extremely high reliabilities can be obtained on rerating ifthe patients self-ascribed category is the rating. The irony, of course, is that justaccepting the patients self-rating, rather than doing the difficult work of evaluatingthe details of the symptoms, and assigning a rating, appears more reliable.Unfortunately, it appears to be common for evaluators using the YBOCS to make the

lThe author acknowledges, with thanks, an anonymous reviewer as the source of these comments.


20/36

280 S. Taylor

mistake of giving patients the rating categories, despite their having received trainingto the contrary.

Znternul m. The YBOCS has acceptable-to-good internal consistency with coef-ficients 01 ranging from .69 to .91 (Goodman, Price, Rasmussen, Mazure, Fleishmannet al., 1989; Richter et al., 1994; Woody et al., in press-a).

TestRezest &ubili#y. Rim et al. (1990,1992, 1993) administered the YBOCS to threesamples of OCs three times over a 2-week period. Intraclass correlations ranged from.81 to .97. Woody et al. (in press-a) administered the YBOCS to 24 OCs on two occa-sions over test-retest intervals ranging from 10 to 103 days (mean = 49 days). Theintraclass correlation was .61, and was reduced probably because of the large retest

interval. The findings suggest the YBOCS has good test-retest reliability over at leasta cl-week interval.

Validity

cr&r&&&let& valid@. The VBOCS was intended for use with patients diagnosedwith OCD, and so there has been only one study of its criterion-related validity.Rosenfeld, Dar, Anderson, Robak, and Greist (1992) found that patients with OCD(method of diagnosis unspecified) had higher YBOCS scores than patients with otheranxiety disorders and normal controls.

Convergientr.&i#y.TheYBOCStendst h ave large correlations (mean r= .51, range = .17to .77) with other OC measures (i.e., anxiety and avoidance ratings from behavioralavoidance tests, SCL90-R OC scale, subscales of the Leyton Obsessional Inventory,Maudsley Obsessional Compulsive inventory, Likert scales of symptom severity, GlobalObsessive Compulsive Scale; Black et al., 1990, Goodman, Price, Rasmussen, Mazure,Delgado et al., 1989; Rim et al., 1996,1992; Richter et al., 1994; Woody et al., in press-a,in press-b). These results indicate that the YBOCS has good convergent validity.

ZXseriminant n&&y. Studies of discriminant validity have been less encouraging. The

YBOCS has large correlations with the Hamilton depression scale (mean r= .64, range= .53 to .91) and large correlations with the Hamilton anxiety scale (mean r = .62,range = .47 to 85; Goodman, Price, Rasmussen, Mazure, Delgado et al., 1989;Hewlett, Vinogradov, & Agms, 1992; Price et al., 1987; Richter et al., 1994). Thesestudies show that correlations between the YBOCS and measures of depression andgeneral anxiety tend to be as large as the convergent validity correlations. This sug-gests the IO-item YBOCS has poor discriminant validity.

Comment

The YBOCS provides a comprehensive assessment of OC symptoms and their para-meters. The core items have good interrater reliability and acceptable internal con-sistency. Although there is evidence of adequate convergent validity, the lo-itemYBOCS has weak discriminant validity. The psychometric properties of the SymptomChecklist and investigational items remain to be investigated.

The Symptom Checklist requires the assessor to inquire about a wide range of obsessive and compulsive phenomena. This is important for a comprehensive assessmentbecause patients may feel embarrasse d or otherwise reluctant to discuss their obsessionsand compulsions, and they may not mention these symptoms unless the interviewer


21/36

Obsesskms and Comprlsionr 281

directly asks about them. A shortcoming of the Symptom Checklist is that it provides alimited assessment of cognitive compulsions (e.g., repeating special words or phrases toundon disturbing thoughts). The Checklist was recently expanded by Foa and Kozak toassess these phenomena (personal communication, April, 1994).

The YBOCS provides separate scores to measure the severity of obsessions andcompulsions. However, most outcome studies simply combine these into a total score.Rim et al. (1989) observed that if a patient has only obsessions or compulsions, theYBOCS total score may be spuriously low even if symptoms are severe. The use of subscales would provide more information about the effects of treatment (e.g., sometreatments may have a greater effect on compulsions than obsessions) and would helpcircumvent the problem raised by Rim et al.

YBOCS interviews (including the Symptom Checklist, core items, and investiga-

tional items) are time consuming, requiring an average of 40 min per patient from atrained interviewer (Rosenfeld et al., 1992). Recently, Rosenfeld et al. (1992) devel-oped a self-administered computerized version which was well received by patientsand yielded comparable ratings to those obtained from the interview version (97%agreement).* Self-report versions also have been developed (beckman, Walker,Goodman, Pauls, & Cohen, 1994; Warren, Zgourides, & Monto, 1993), although theirpsychometric properties remain to be determined.

CONTENT, RELlABlLITY, AND VALIDITY: SUMMARY AND CONCLUSlONS

Behavioral AssessmentThe psychometric properties of the assessment methods reviewed in this article are

summarized in Table 1. As the table shows, little is known about the psychometric properties of behavioral assessment methods. Behavioral Avoidance Tests (BATS) have theadvantage of providing in tivo measures of O&elated fear and avoidance. Unfortunately,these measures are sometimes diEcult to construct, and often focus on external fear stim-uli to the neglect of internal sources of fear (e.g., fear of having a bad thought). BATSalso fail to assess covert avoidance (e.g., imagining a glove on ones hand while touchinga contaminant). Although these limitations could be addressed by including selfreportmeasures of such forms of fear and avoidance, BATS are used increasingly less often intreatment outcome studies (Emmelkamp, 1982; Foa et al., 1992).

Diary measures of naturally occurring target behaviors are popular in treat-ment-outcome studies of panic disorder (e.g., Clark et al., 1994), and have been usedin studies of other disorders, including social phobia (Glass & Arnkoff, 1994) andchronic pain (Philips, 1988). Surprisingly, these methods are used infrequently inOCD outcome studies. The assessment of OCD would be advanced by the development and validation of such measures.

Direct observation methods have been used occasionally in case studies of inpa-tients (e.g., Mills et al., 1973). Although these methods are more difficult to apply to

*The computer-administered and interview versions were administered in counter-balancedorder. Inter-version agreement may have been inflated by criterion contamination. However,thii would occur only if the interviewer asked the patient to make the ratings for each YBOCSitem (which violates YBOCS protocol) and if the patient simply recalled the interview ratingswhen completing the computerized version (and vice versa when the versions were completedin reverse order). Rosenfeld et al. (1992) did not report whether protocol violations occurred,nor did they report assessing for such violations. Accordingly, it is possible that interversionagreement may have been inflated.


22/36

282 S. Taylm

outpatients, it may be possible to have significant others make ratings of particularpatient behaviors (e.g., the frequency or duration of handwashing). Such ratings - iffound reliable and valid -would provide valuable information about the occurrenceof OC symptoms in the patients habitual environment. There have yet to be published studies of the feasibility of this approach.

e ep o lnven tories

Self-report inventories are popular because of their ease of administration. They differmarkedly in their breadth of measurement; some provide measures of different OCphenomena (e.g., the MOCI subscales) whereas others are simply global measures ofsymptom severity (e.g., the SCIAO-R OC scale). As summarized in Table 1, the inven-tories also differ in their psychometric properties. The SCIAWR OC scale (and prede-cessors) has adequate reliability and convergent validity, but uncertain criterion-relatedvalidity and poor discriminant validity. The item content of the SCIA@R OC scale andpredecessors suggests they are essentially measures of nonspecific distress. This is con-sistent with their high correlations with measures of general psychopathology.

The LO1 subscales have adequate reliability and validity. The MOCI total scale alsohas adequate psychometric properties. The MOCI subscales have adequate internalconsistency, apart from the slowness subscale. The MOCI washing and checking subscales have adequate validities, whereas the validities of the other subscales remain tobe evaluated. The self-report CAC has adequate psychometric properties, apart from

questionable discriminant validity. The self-report Likert scales have adequate conver-gent and discriminant validity, although their other psychometric properties remain tobe determined.

Some self-report inventories confound the assessment of important variables.Distress caused by symptoms and symptom frequency are confounded in the SCIAWROC scale (and predecessors). The LO1 subscales are highly intercorrelated, which rai+es the question of whether there is any advantage to having separate symptom, inter-ference, and resistance subscales. The high correlations arise, in part, from the fact thatthe assessment of interference and resistance is confounded with symptom prevalence.

Freund et al. (1987) claimed two advantages of the CAC over the MOCI: (a) the for-mer uses a 4point rather than a dichotomous scale, and so the CAC may be more sen-sitive to gradations in symptom severiry; and (b) the CAC focuses on highly specificbehaviors, with each point on the rating scale labeled with a detailed written description. The first point is unlikely to be correct because Dominguez, Jacobson, de laGandara, Goldstein, and Steinbrook (1989) found that original version of the MOCIcorrelated .96 with a revised MOCI that used a 4point Likert rating. The advantagesof Freund et als second point also is questionable, because the MOCI assesses specif-ic OC symptoms. In comparison, the CAC does not directly assess OC symptoms, itmerely assesses interference in everyday activities that may be due to obsessions, com-

pulsions, or both. The CAC provides no indication as to the nature of the interferencebecause its ratings confound slowness, avoidance, and oddity of behavior. This meansthat high scores on the CAC are ambiguous; they could arise from obsessional slowness, compulsive repeating, avoidance, and/or obsessional doubting and indecision.

In summary, in terms of breadth of measurement, reliability, and validity, there ismuch to recommend the MOCI over the other self-report measures. The MOCI totalscale has comparable reliability and validity to other inventories. Compared to globalmeasures of OC symptoms (e.g., the LO1 Symptom subscale), the MOCI subscalespermit a more detailed assessment of OC symptoms. The MOCI has further advan-


23/36

CTksshs n Cam- 283

TABLE 1. Prqerties of Measures: summpry

Behavioral ApproachTests

Direct ObservationMethods

Diary Methods

SCL N%ROC& predecessors

LO1 SuhscalesSymptomResistanceInterference

MOCITotal scaleWashing suhscale

Checking suhscaleDoubting suhscaleSlowness suhscale

CACSelf-ReportObserver-Rated

CPRS-OCLikert Scales

GOCS

YBOCS (IO-item)

aFor at least 7 days.

Reliability

InternalConsistency Interrater

na ?

na t

na

+

na

na

+tt

na

na

na

na

t

tt

na

na

nanana

tt

?na

na

t

nat

tt

?

t

Validity

Test- Criterion-Retest? Related Convergent Diirhninant

? ? ? ?

? ? ? ?

? ? ? ?

t ? t

ttt

t

t+t

tt

t

ttt

tt

t

t+t

tt

t?

?

Note. = good or adequate; - = inadequate; 3 = insufficient information; na = not applicable;SCUM&R OC = OC subscale of Symptom Checklist-90, Revised; LO1 = Leyton ObsessionalInventory; MOCI = Maudsley Obsessional Compulsiie Inventory; CAC = Compulsive ActivityChecklist; CPRS-OC = Comprehensive Psychopathological rating scale, OC suhscale; GOCS =Global Obsessive Compulsive Scale; YBOCS = Yale-Brown Obsessive Compulsiie Scale.

tages of not confounding the assessment of important variables. However, the MOCIhas three main shortcomings: (a) It provides a limited assessment of obsessions, (b)the slowness subscale has weak psychometric properties, and (c) it provides no mea-sure of symptom interference or resistance.

Observer-Rated Scales

The CPRSOC, GOCS, and observer-rated CAC provide only global measures ofsymptom severity. The CPRS-OC and GOCS have been used in numerous treatmentoutcome studies (Table 3), despite the lack of data supporting their reliability and


24/36

284 S. Taylor

validity (Table 1). Although there are more data on the reliability and validity of theobserver-rated Likert scales, each of these measures suffer the important limitation ofbeing based on unstructured clinical interviews. As a consequence, the psychometricproperties of these scales may vary with the skill and (unspecified) training experi-ences of the interviewer. This is less of a problem when the interviewer follows a struc-tured interview protocol such as that used in the YBOCS.

The YBOCS yields a wealth of information on OC symptoms and their parameters.Each item is accompanied by detailed probe questions, which structures the interviewand ensures that appropriate information is collected. The YBOCS has acceptablereliability and convergent validity, although discriminant validity is weak. It is unlike-ly that this is a weakness specific to the YBOCS, because the GOCS is highly correlat-ed with the YBOCS, and so it may have similar problems with discriminant validity.

Moreover, the item content of the CPRS-OC suggests it is a measure of nonspecific dis-tress, and so it is also likely to have even worse discriminant validity than the YBOCS.Apart from the time required to administer the YBOCS (approximately 40 min), it isgenerally superior to the other observer-rated scales covered in this review. TheYBOCS (including the Symptom Checklist and investigational items) has advantagesover self-report measures, including greater coverage (i.e., it assesses a range of OCsymptoms and parameters), greater flexibility, and allows the interviewer to deter-mine whether the patient is reporting OC symptoms or other phenomena, such as ticsor paraphilic symptoms. Considering reliability, validity, and breadth of measure-ment, the YBOCS appears to be the best available observer-rated scale.

SENSITIVITY TO TREATMENT EFFECTS

Treatment outcome measures need to be more than reliable and valid; they also mustbe sensitive to changes in symptom severity. Behavior therapy (in vivo exposure plusresponse prevention) and clomipramine are established treatments for OCD, withtheir efficacy demonstrated on numerous outcome measures (Abel, 1993; Cox et al.,1993; van Balkom et al., 1994). Accordingly, studies of these therapies were used inme&analyses of the sensitivity of OC measures.

Method

A meta-analysis was conducted separately for clomipramine and behavior therapy, usingthe procedures described by Wolf (1986). Studies were included if they (a) includedsamples of more than five subjects, (b) used one or more of the measures covered inthis review, and (c) reported sufficient information to compute effect sizes. Suitablestudies were located by searching Psychological Abstracts and Medline data bases, andby consulting recent treatment-outcome reviews (e.g., Abel, 1993; Cox et al., 1993; vanBalkom et al., 1994). When necessary and feasible, authors of published reports werecontacted in an effort to obtain information necessary to compute effect sizes. Studies

using subsamples of larger studies were excluded unless they reported outcome mea-sures that were not reported in the larger studies. Also excluded were studies that usedcombined pharmacological and behavioral treatment within a single therapy trial.Thirty-five suitable studies were identified, which provided 19 trials of clomipramineand 20 trials of behavior therapy (some studies reported more than one trial).

The effect size for each measure was computed according to the following formu-la: Effect size (ES) = (Mpre - Mpost)/SDFled, where M,, and Mp, are the pre andposttreatment means for a given treatment trial, and SDI-ld is the mean of pre andposttreatment standard deviations. Hedges (1982) correction was used to calculate


25/36

Obs~ti and Gmfnhuns 85

mean effect sizes. This adjusts for differences in sample size by weighting each effectsize according to the number of subjects it was based upon.

Behavioral Assessment

Nine published treatment studies of behavior therapy or clomipramine used a BAT toassess treatment outcome. Only four studies provided enough information to computeeffect sizes for behavior therapy, and only one for clomipramine. The results for behav-ior therapy are presented in Table 2. Here it can be seen that effect sizes for SUDS andavoidance varied markedly across studies, and effect sizes had no obvious relationship tonumber or duration of treatment sessions, sample size, or type of BAT. Overall, the find-ings suggest that BATS are sensitive to treatment effects. However, further research isrequired to determine which type of BAT is most sensitive to treatment effects.

Direct observation and diary methods also require further investigation. These mea-sures appear sensitive to treatment effects in case studies and small open trials (e.g.,Foa et al., 1980; Mills et al., 1973; Turner et al., 1979,1980, 1985). However, the stud-ies using these methods either did not meet criteria for inclusion in the meta-analysis,or they did not provide sufficient information to compute relevant effect sizes.

Self-Report lnven tories and Observer-Rated Scales

Table 3 shows the mean effect sizes for self-report inventories and observer-ratedscales, along with the number of trials on which the calculations were based. Beforecomparing the treatment sensitivity of the scales, it is necessary to determine whetherthe effect sizes of each measure were based on different amounts of treatment. Forthe behavior therapy trials, the amount of therapy per outcome trial was computed bymultiplying the number of treatment sessions by the duration of each session. Theinventories and scales were defined as independent variables and were compared, bymeans of a one-way ANOVA, in terms of the amount of therapy associated with them.The inventories and scales did not differ with regard to this variable, fl13,40) < 1.

For the clomipramine trials the amount of therapy was defined as the mean dose.per patient multiplied by the number of weeks of treatment. This was used as a depen-

dent variable in a one-way ANOVA, where the inventories and scales were independent

TABLE 2. Sensitivity to Treatment Effects: Behavioral Avoidance Tests

Study

Number of Effect SizeTreatment SampleSessionsa Size Trpe of BAT SUDS Avoidance

Cottraux et al. (1990)

Foa et al. (1984)

Rachman et al. (1979)Woody et al. (in press-a)Hedges adjusted meanSLI

up to 25 15 Multitask 1.87 1.64

(set as homework)17 11 Single task 5.36 b

15 10 Multitask 3.69 2.8020 51 Multitask 1.03 1.09

1.50 1.341.93 0.87

Note. SUDS = Subjective units of distress.?4ll treatments were behavior therapy (exposure and response prevention).bNot reported.


26/36

T

E

3.Sn

tvtyo

e

me

Efe

s

S

Rep

nv

ntores

nd

d S

In

umen

Nmb

oTas

Comipamin

Ee

Sz

Hd

AuedMen

SD

Bh

oT

a

E

uean

R

pn

Penon

Ee

Sz

Nmb

Hd

oTas

AuedMen

SD

Gan

Men

AUTas

SCQ

RO

anped

@

LO

S

e

Smpom

R

san

Ineeen

M CS

-Rp

O

v

R

ed

CRO

4

05

.

4

04

02

4

08

06

2

13

03

3

0 89

03

2

1

10

-

4

3

20

04

0

2

0.49

00

04

10

06

08

09

00

06

10

04

09

10

05

10

08

14

08

17

03

16

-

-

20


27/36

Lk

S

e

S

Rp

M

nF

(M

0

-

-

6

15

06

15

A

dn

osm

aoaedwhM

0

-

-

5

21

06

21

S

yocoms

o

0

-

-

4

34

09

34

Lk

S

e

O

v

Red

M

nF

(M

0

-

-

6

18

08

18

A

dn

osm

aoaedwhM

0

-

-

5

22

06

22

S

yocoms

o

0

-

-

4

17

19

17

G

9

17

05

0

-

-

17

YO

(O

em

8

16

02

3

19

06

17

aO

taudthHC

(ee

sz=04

an

o

udthSC

OR(ee

sz=5

N

eSueudA

en R

k1

Bn

aea

1

1

B

me

a

1

Compamn

C

abav

Su

Go

1

C

au

ea

1

Em

km

B

n

1

Em

km

Rmnn

1

Em

km

ea

1

1

1

F

sSew

ea

1

Fe

a

1

1

Feu

ea

1

Chsea

1

Hwee

a

1

In

ea

1

R

kea

1

M

sk

anea

1

Poea1

Pg

ea

1

1O

R

hmn

ea

1

R

k1

Sym S

mn

1

Seke D

1

Tmm

ea

1

T

enea

1

V

eoea

1

vndnH

ea

1

W

kwzea

1

W

ea

inpea

inpe

bZ

In

1


28/36

288 S. Taylor

variables. Again, the inventories and scales did not diier on this variable, F(8,27) = 1.17,p > .l. Thus, the effect sizes obtained for the inventories and scales were not con-founded by differences in the amount of treatment associated with each of measure.This means the effect sizes could be directly compared to determine the relative sen-sitivity of the measures.

In terms of Cohens (1988) classification scheme, large effect sizes are > .80, andmedium effects are .50 to .79. Table 3 shows the inventories and scales generally yield-ed medium-tolarge effects, suggesting they all were sensitive to treatment effects. TheOC scales from the SCLSO-R and HSCL produced the smallest effects. The other self-report inventories (LO1 subscales, self-report CAC, and MOCI total scale) producedsimilar effect sizes to one another. MOCI subscales have been used as outcome measuresin only one study (Mavissakalian, Jones, Olson, & Perel, 1990) and so it is diflicult to

gauge their sensitivities. Mavissakalian et al. found that all subscales were sensitive tothe effects of clomipramine. The largest effect was for the checking subscale (effect size= 1.15)) followed by the washing (1.00)) doubting/conscientiousness (0.77)) and slow-ness subscales (0.47). Although the effect sizes suggest the subscales are sensitive totreatment effects, they should be interpreted with caution because Mavissakalian et al.did not describe the types of obsessions and compulsions in their sample. If their sam-ple was mostly patients with compulsive checking, then we would expect the checkingsubscales to have the largest effect size.

As Table 3 suggests, observer-rated scales produced larger effect sizes than self-report scales for the clomipramine trials, t(36) = 7.14, p < .OOl, and there was a trend

in this direction for behavior therapy trials, t(56) = 1.99, p < .052. Table 3 suggests thefindings for the Likert scales were an exception to these results, since the effect sizesof self-report and observer-rated versions do not appear to differ. These impressionswere supported by statistical analyses of the behavior therapy trials, which is where theLikert scales were used (Table 3). The scales were classified into four groups: (a) self-report inventories (SCLSO-R OC, LOI, MOCI, self-report CAC b) self-report Likertscales, (c) observer-rated Likert scales, and (d) other observer-rated scales (observer-rated CAC and YBOCS). The groups were used as independent variables and effectsize was the dependent variable. The one-way ANOVA was significant, F(3,54) = 8.88,p < .OOl, and Newman-Keuls posthoc comparisons revealed that the effect size ofGroup 1 (self-report inventories) was significantly smaller than those of the othergroups (p < .05), and that the other groups did not differ from one another (ps > .05).

Why do observer-rated scales generally yield larger effects? Lambert, Hatch,Kingston, and Edwards (1986) found similar results for measures of depression andsuggested that trained observers might be better than patients at detecting changes insymptom severity. This advantage does not appear to be present for Likert scales. It isnot clear why this occurred. Studies using Likert scales have provided little informationon how the scales were administered. Apart from providing descriptors for the anchorpoints on the scales, the studies have provided no information on the time frame used

to assess symptoms or other pertinent details. It may be that self-report Likert scales aremore sensitive than self-report inventories because of their greater specificity; that is,the subject may be instructed to rate specific symptoms over a specific time period.

D I S C U S S I O N

C u r r e n t Status of the Assessment of Obsessions and Compulsions

The selection of measures for treatment outcome studies is based on multiple crite-ria. Among the most important are (a) content (range of phenomena assessed); (b)


29/36

Obsessions and i3mpulGm.s 289

reliability and validity, and whether their is sufficient available information to evalu-ate these properties; and (c) sensitivity to changes in symptom severity. Some measuresare popular in OCD treatment-outcome studies, yet have unknown psychometricproperties (i.e., the CPRS-OC and COCS; see Table 1). Some measures provide onlyglobal measures of OC symptoms (LO1 symptom subscale and CAC) and othersappear to be largely measures of nonspecific distress (SCL90-R OC scale, its prede-cessors, and CPRS-OC). Some measures confound important variables (e.g., symptomprevalence and distress is confounded in the SCLSO-R OC scale; symptom prevalenceand degree of resistance is confounded in the LO1 resistance subscale; obsessionalslowness, avoidance, and oddity of behavior are confounded in the CAC). Whenbreadth of measurement, reliability, validity, and sensitivity to treatment effects areconsidered together, the YBOCS appears to be the best available measure for treat-

ment outcome research.

Future Directions

Rejining ng?ne-. Further research is needed to firmly establish the reliabili-ty and validity of many of the measures currently used in treatment outcomeresearch. For example, studies of test-retest reliability have been confined to rela-tively short periods (days to weeks). For most measures, temporal stability (in theabsence of treatment) over longer periods of time remains unknown. This is animportant omission because OCD is a chronic d

clinpsych rev

Documents