the tom test: a new instrument for assessing theory of ... · p.o. box 616. 6200 md maastricht, the...

14
Journal of Autism and Developmental Disorders. Vol. 29. No. 1. 1999 The TOM Test: A New Instrument For Assessing Theory of Mind in Normal Children and Children with Pervasive Developmental Disorders Peter Muris,1,4 Pim Steerneman, 2 Cor Meesters, 3 Harald Merckelbach,1 Robert Horselenberg,' Tanja van den Hogen, 3 and Lieke van Dongen3 This article describes a fust attempt to investigate the reliability and validity of the TOM test, a new instrument for assessing theory of mind ability in normal children and children with pervasive developmental disorders (PDDs). In Study 1, TOM test scores of normal children (n = 70) cor- related positively with their performance on other theory of mind tasks. Furthermore, young chil- dren only succeeded on TOM items that tap the basic domains of theory of mind (e.g., emotion recognition), whereas older children also passed items that measure the more mature areas of theory of mind (e.g., understanding of humor, understanding of second-order beliefs). Taken together, the findings of Study 1 suggest that the TOM test is a valid measure. Study 2 showed for a separate sample of normal children (n = 12) that the TOM test possesses sufficient test-retest stability. Study 3 demonstrated for a sample of children with PDDs (n = 10) that the interrater reliability of the TOM test is good. Study 4 found that children with PDDs (n = 20) had significantly lower TOM test scores than children with other psychiatric disorders (e.g., children with Attention-deficit Hyperactivity Disorder; n = 32), a finding that underlines the discriminant validity of the TOM test. Furthermore, Study 4 showed that intelligence as indexed by the Wechsler Intelligence Scale for Children was positively associated with TOM test scores. Finally, in all studies, the TOM test was found to be reliable in terms of internal consistency. Altogether, results indicate that the TOM test is a reliable and valid instrument that can be employed to measure various aspects of theory of mind. KEY WORDS: Theory of mind; pervasive developmental disorders; reliability. INTRODUCTION Recently, children's understanding of their own and others' mental states has been the focus of considerable 1Department of Psychology, University of Limburg, P.O. Box 616. 6200 MD Maastricht, The Netherlands. 2 South-Limburg Centre of Autism, c/o RIAGG-OZL, P.O. Box 165. 6400 AD Heerlen, The Netherlands. 3 Department of Experimental Abnormal Psychology, University of Limburg. P.O. Box 616. 6200 MD Maastricht, The Netherlands. 4 Address all correspondence to Peter Muris. Department of Psychol- ogy, University of Limburg, P.O. Box 616. 6200 MD Maastricht, The Netherlands. interest. Research in this area is described under the gen- eral heading "theory of mind." Premack and Woodruff (1978) were the first to use the term to refer to the child's ability to ascribe thoughts, feelings, ideas, and intentions to others and to employ this ability to antici- pate the behavior of others. According to Wellman (1990), theory of mind is a prerequisite for the under- standing of the social environment and for engaging in socially competent behavior (see also Astington & Jen- kins, 1995). It has been proposed that autistic children are so- cially impaired precisely because they lack a theory of mind (Frith, 1989). In a series of studies, Baron-Cohen, 67 0162-3257/99/ 0200-0067$16.00/0 C 1999 Plenum Publishing Corporation

Upload: phungcong

Post on 04-Apr-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Journal of Autism and Developmental Disorders. Vol. 29. No. 1. 1999

The TOM Test: A New Instrument For Assessing Theory ofMind in Normal Children and Children with PervasiveDevelopmental Disorders

Peter Muris,1,4 Pim Steerneman,2 Cor Meesters,3 Harald Merckelbach,1Robert Horselenberg,' Tanja van den Hogen,3 and Lieke van Dongen3

This article describes a fust attempt to investigate the reliability and validity of the TOM test, anew instrument for assessing theory of mind ability in normal children and children with pervasivedevelopmental disorders (PDDs). In Study 1, TOM test scores of normal children (n = 70) cor-related positively with their performance on other theory of mind tasks. Furthermore, young chil-dren only succeeded on TOM items that tap the basic domains of theory of mind (e.g., emotionrecognition), whereas older children also passed items that measure the more mature areas of theoryof mind (e.g., understanding of humor, understanding of second-order beliefs). Taken together, thefindings of Study 1 suggest that the TOM test is a valid measure. Study 2 showed for a separatesample of normal children (n = 12) that the TOM test possesses sufficient test-retest stability.Study 3 demonstrated for a sample of children with PDDs (n = 10) that the interrater reliabilityof the TOM test is good. Study 4 found that children with PDDs (n = 20) had significantly lowerTOM test scores than children with other psychiatric disorders (e.g., children with Attention-deficitHyperactivity Disorder; n = 32), a finding that underlines the discriminant validity of the TOMtest. Furthermore, Study 4 showed that intelligence as indexed by the Wechsler Intelligence Scalefor Children was positively associated with TOM test scores. Finally, in all studies, the TOM testwas found to be reliable in terms of internal consistency. Altogether, results indicate that the TOMtest is a reliable and valid instrument that can be employed to measure various aspects of theoryof mind.

KEY WORDS: Theory of mind; pervasive developmental disorders; reliability.

INTRODUCTION

Recently, children's understanding of their own andothers' mental states has been the focus of considerable

1Department of Psychology, University of Limburg, P.O. Box 616.6200 MD Maastricht, The Netherlands.

2 South-Limburg Centre of Autism, c/o RIAGG-OZL, P.O. Box 165.6400 AD Heerlen, The Netherlands.

3 Department of Experimental Abnormal Psychology, University ofLimburg. P.O. Box 616. 6200 MD Maastricht, The Netherlands.

4 Address all correspondence to Peter Muris. Department of Psychol-ogy, University of Limburg, P.O. Box 616. 6200 MD Maastricht,The Netherlands.

interest. Research in this area is described under the gen-eral heading "theory of mind." Premack and Woodruff(1978) were the first to use the term to refer to thechild's ability to ascribe thoughts, feelings, ideas, andintentions to others and to employ this ability to antici-pate the behavior of others. According to Wellman(1990), theory of mind is a prerequisite for the under-standing of the social environment and for engaging insocially competent behavior (see also Astington & Jen-kins, 1995).

It has been proposed that autistic children are so-cially impaired precisely because they lack a theory ofmind (Frith, 1989). In a series of studies, Baron-Cohen,

670162-3257/99/ 0200-0067$16.00/0 C 1999 Plenum Publishing Corporation

68 Muris et al.

Leslie, and Frith (1985, 1986) demonstrated that theability of autistic children to attribute mental states toothers is seriously impaired. These researchers foundthat about 80% of the autistic children were unable tocorrectly predict the ideas of others, whereas most men-tally retarded and normal controls of lower mental agewere able to do so.

Specific programs have been developed to train the-ory of mind skills in autistic children. For example, ina study by Ozonoff and Miller (1995), five autistic chil-dren received a training program in which they were notonly taught specific interactional and conversationalskills but also received explicit and systematic instruc-tion regarding the underlying social-cognitive principlesnecessary to infer the mental states of others (i.e., theoryof mind). Pre- and posttreatment assessment demon-strated that the trained children improved on a numberof false belief tasks compared to control children whohad received no treatment. Similar positive results wereobtained by Swettenham (1996), Hadwin, Baron-Cohen,Howlin, and Hill (1996), Bowler, Strom, and Urquhart(1993), and Whiten, Irving, and Macintyre (1993). Allthese studies were successful in that autistic childrenwho had received training were able to pass theory ofmind tasks. Furthermore, in a recent study of Steerne-man, Jackson, Pelzer, and Muris (1996), socially im-mature (but not autistic) children were given a socialskills intervention program that incorporated theory ofmind principles. Results showed that this type of trainingproduced positive effects on theory of mind tests. Yet,it should be added that the treatment effects found inthese studies do not always generalize to nonexperimen-tal settings or to tasks in domains where children re-ceived no teaching (see, for a discussion of this issue,Slaugther & Gopnik, 1996).

Given the availability of reasonably successfultreatment programs, theory of mind assessment instru-ments are important for two reasons. First, such instru-ments can be used to identify those children who displaydeficits in theory of mind. Second, such instruments canbe employed to evaluate the efficacy of theory of mindtraining programs.

The assessment of theory of mind in children hasbeen predominantly confined to so-called "false belieftasks. Such tasks intend to test children's comprehensionof another person's wrong belief. An example is the so-called Smarties test (e.g., Hogrefe, Wimrner, & Pemer,1986). During this test, children are presented with aSmarties box and asked what it contains. Children arehighly familiar with these boxes and know that they usu-ally contain Smarties, a desirable chocolate candy. When

children give an answer in this sense, they are shownthat the box actually contains a pencil. Next, childrenare told that another child will be asked what is in thebox. They are then asked the crucial question: "Whatdo you think the other child will say?" From their an-swer on this question, one can infer whether children areable to make a judgment about another person's falseexpectation. That is, an understanding of another indi-vidual's false belief—and presence of theory of mind—is demonstrated if children predict that another personwill think that there are Smarties in the box. Conceptualdifficulty with false belief attribution—and absence oftheory of mind—is revealed if children assume that an-other person will think that there is a pencil in the box.

Several authors have argued that theory of mind ismore than just the comprehension of false belief. Forexample, Perner and Wimmer (1985) have described twoother types of belief that play a crucial role in children'sunderstanding of social interactions: first-order beliefsthat refer to what children think about real events (e.g.,"Michael thinks that Sophie is angry") and second-or-der beliefs that pertain to what children think about otherpeople's thoughts (e.g., "Michael thinks that Sophiethinks that he's angry with her").

Flavell, Miller, and Miller (1993) argue that chil-dren develop a theory of mind along five successivestages. During the first stage, children adopt the conceptof mind, that is, they attribute needs, emotions, and othermental states to people and use cognitive terms such as"know," "remember," and "think." During the secondstage, children acknowledge that the mind has connec-tions to the physical world. More specifically, they un-derstand that certain stimuli lead to certain mental states,that these mental states lead to behavior, and that mentalstates can be inferred from stimulus-behavior links. Dur-ing the third stage, children recognize that the mind isseparate from and differs from the physical world. Forexample, they realize that a person can think about anobject even though the object is not physically present.During the fourth stage, children learn that the mind canrepresent objects and events accurately or inaccurately.Thus, a representation can be false with respect to a realobject or event (e.g., in a false belief task), behavior canbe false with respect to a mental state (e.g., when a sadperson smiles), and two people's perceptual views orbeliefs can differ (i.e., perspective taking). During thefifth and final stage, children learn to understand that themind actively mediates the interpretation of reality. Forinstance, children recognize that prior experiences affectcurrent mental states which in turn affect emotions andsocial inferences. According to Flavell et al. (1993)

The TOM Test 69

Stages 1-3 can best be regarded as theory of mind pre-cursors. These authors assume that these stages "prob-ably emerge in quick succession, for they are veryclosely related concepts having to do with the differen-tiation of, and relations between, the mind and the ex-ternal world" (p. 101). The step from Stage 3 to 4, theemergence of a "real" theory of mind, probably comesmore slowly (around the age of 6); Stage 5, the "moremature" theory of mind, would emerge still later.

Taken together, theory of mind refers to the child'scapacity to analyze the behavior of others by recognizingthe mental states (i.e., desires and beliefs) that underlieintentional behavior. Thus, theory of mind is a complex,developmental phenomenon, which implies certainlymore than just the understanding of false belief. Obvi-ously, there is a need for assessment tools that measurethe developmental progression of theory of mind in abroader age range. One promising candidate in this re-spect is the Theory-of-Mind test (TOM test) designed bySteerneman (1994). The TOM test contains a variety ofitems that can be allocated to three subscales which cor-respond with the three main theory of mind stages asproposed by Flavell et al. (1993): (a) precursors of the-ory of mind (e.g., emotion recognition), (b) first mani-festations of a real theory of mind (e.g., understandingof false belief), and (c) mature aspects of theory of mind(e.g., second-order beliefs). As a practical tool, the testprovides information about the extent to which a childpossesses social understanding, insight and sensibility,and the extent to which he or she takes the feelings andthoughts of others into account. The present article isconcerned with the reliability and validity of the TOMtest.

STUDY 1

The purpose of Study 1 was twofold. First, the con-struct validity of the TOM test was investigated. TheTOM test intends to be a developmental scale. There-fore, it was anticipated that TOM test scores correlatepositively with age. That is, as children grow older, theirtheory of mind develops, and hence they pass moreTOM test items. Furthermore, one expects that youngerchildren predominantly succeed on TOM items that tapthe basic domains of theory of mind (e.g., emotion rec-ognition), whereas older children should increasinglypass items that measure the more mature aspects of the-ory of mind (e.g., understanding of false belief, under-standing of humor, second-order belief). A secondpurpose of Study 1 was to evaluate the concurrent va-

lidity of the TOM test. More specifically, its relationshipwith other, more traditional, indices of theory of mindand social development was examined.

Materials and Method

Subjects and Procedure

Seventy children (46 boys and 24 girls) recruitedfrom a regular primary school ('De Driesprong' in Ge-leen, the Netherlands) participated in the study. The chil-dren ranged in age from 5 to 12 years. Ten children ofeach age level (i.e., 5, 6, 7, 8, 9, 10, and 11/12 years)were selected. All children were healthy, socially well-functioning, and none had learning difficulties. Thus, itcan be assumed that they had normal intelligence. Chil-dren were tested at school in a private room with onlythe experimenter present. The assessment took place intwo sessions. In one session, children underwent theTOM test. In another session, a series of alternative the-ory of mind or social development tasks was adminis-tered. The order of the sessions was counterbalancedwithin each age level group (i.e. half of the childrenstarted with the TOM test, while the other half first re-ceived the alternative battery of tests).

The New Theory of Mind Test

The TOM test comprises an interview that can beused in children between 5 and 12 years of age. TheTOM test consists of vignettes, stories, and drawingsabout which the child has to answer a number of ques-tions. The test lasts about 35 minutes and contains 78items (i.e., questions). The TOM test contains threesubscales: (a) precursors of theory of mind (i.e., TOM1; 29 items; e.g., recognition of emotions, pretense), (b)first manifestations of a real theory of mind (i.e., TOM2; 33 items; e.g., first-order belief, understanding of falsebelief), and (c) more advanced aspects of theory of mind(i.e., TOM 3; 16 items; e.g., second-order belief, under-standing of humor). In the Appendix, examples of itemsof the three subscales are shown. Each TOM test itemis scored as either failed (0) or passed (1). Accordingly,total TOM scores range between 0 and 78, with higherscores indicating a more mature theory of mind. TOM1, TOM 2, and TOM 3 subscale scores vary between 0and 29, 0 and 33, and 0 and 16, respectively.

Alternative, More Traditional, Indices of Theory ofMind and Social Development

A number of alternative indices of theory of mindand social development were employed in the currentstudy.

70 Muris et al.

The Sally and Anne test (see Baron-Cohen et al.,1985) is a false belief task. It consists of a comic-stripstory in which Sally and Anne are first introduced: Sallywith a basket in front of her and Anne with a box. Next,Sally is shown placing a ball in the basket and leavingthe room. Anne is then shown taking the ball from thebasket and placing it in the box. Following this, Sallyreturns and children are asked: "Where will Sally lookfor her ball?" If the children point to the previous lo-cation of the ball, they pass the task because they ac-knowledge Sally's false belief (score = 1). If, however,they point to the ball's current location, they fail the taskby not taking into account Sally's false belief (score =0).

The Smarties test (Hogrefe et al., 1986) was usedas an alternative false belief task (see Introduction).Scores on this test also vary between 0 (failed) and I(passed).

Two tests of emotion recognition (Spence, 1980),the "Test of perception of emotion from facial expres-sion" and the "Test of perception of emotion from pos-ture cues" were administered. Children were asked toidentify four basic emotions (happiness, fear, anger, andsadness) on pictures showing facial expressions or bod-ily postures. Scores on each test range between 0 and 4.

The Social Interpretation Test (SIT; Vijftigschild,Berger, & Spaendonck, 1969) examines the child's abil-ity to interpret social situations adequately. The test con-sists of a colored picture depicting a street in which anumber of events take place. The child has to answer 9questions about the picture (e.g., 'What has happenedhere?', 'Why is the ambulance driving in the street?').The answers are registered, and classified into 24 cate-gories. For each category, 1 point is given. SIT testscores range between 0 and 24 with higher scores re-flecting greater ability to interpret social situations.

The Picture Arrangement subtest of the WechslerIntelligence Scale for Children-Revised (WISC-R;Wechsler. 1974) was used as a measure of social sen-sibility. This subtest asks children to order 12 series of4 pictures in such a way that each series of picturesdepicts a sensible story (range 0-12).

The Role Taking test (Selman & Byrne. 1974) tapsrole taking skills of children. The test comprises a storyof a social dilemma (a young girl has to save a little catfrom a high tree, although she has just promised herfather not to climb in trees anymore). Children are ques-tioned about this story. From their answers on thesequestions, one can derive the level of role taking: ego-centric role taking (i.e.. the child is not able to differ-entiate between his/her own point of view and that of

others, level 0); subjective role taking (i.e., the child rec-ognizes his own point of view and that of others, level1); self-reflective role taking (i.e., the child is able toadopt another person's perspective, level 2); and recip-rocal role taking (i.e., the child weights his perspectiveagainst that of others and finds a solution for the socialdilemma, level 3).

The John and Mary test (Perner & Wimmer, 1985)assesses children's understanding of second-order be-liefs. The test is an acted story in which two characters(John and Mary) are independently informed about anobject's (an ice cream van) unexpected transfer to a newlocation. Hence both John and Mary know where thevan is but there is a mistake in John's second-order be-lief about Mary's belief. "John thinks that Mary thinksthat the van is still at the old place." Children's under-standing of this second-order belief was tested by ask-ing: 'Where does John think Mary will go for the icecream?' Scores on this test are either 0 (failed) or 1(passed).

RESULTS AND DISCUSSION

General Results

Reliability of the TOM Test

The internal consistency of the TOM test was sat-isfactory, that is, Cronbach's alphas were .92 for thetotal TOM-scale, .84 for TOM 1, .86 for TOM 2, and.85 for TOM 3.

Age and Theory of Mind

Table 1 (right column) presents Pearson product-moment and point-biserial correlations between age, onthe one hand, and theory of mind measures, on the otherhand. As can be seen from this table, except for theSmarties test, all measures were positively and signifi-cantly associated with age. The absence of a connectionbetween age and Smarties test performance is due to thefact that nearly all children in the present study, eventhe 5- to 6-year-olds, passed mis test.

As expected, there was a robust correlation betweenTOM test and age: r(70) = .80, p < .001. Inspection ofmean TOM scores per age level (see Table 1) showedthat theory of mind capability increased linearly as chil-dren grew older. This indicates that the TOM test hasone crucial property of a developmental scale, namely,it is sensitive to maturation. With respect to this result,

The TOM Test 71

Table I. Mean Scores of Children on Theory of Mind and Social Development Measures for Different AgeLevels, and Pearson Product-Moment and Point-Biserial Correlations Between Age and Various Measures

Age (in years)

5-6

Measure

TOM testEmotion recognition-faceEmotion recognition-postureSally and Anne testSmarties testSocial Interpretation testWISC-R picture arrangementRole taking testJohn and Mary test

M

42.53.12.40.40.97.23.20.50.4

SD

7.40.91.10.50.33.03.00.60.5

7-8

M

59.33.42.70.70.98.88.31.60.9

SD

6.90.71.20.50.22.62.00.80.3

9-10

M

63.93.93.40.91.0

13.59.72.00.9

SD

5.20.30.90.20.02.81.60.60.3

11-12

M

68.13.93.70.81.0

14.79.42.30.9

SD

4.80.30.70.40.02.41.20.70.3

r with age

.80°

.50"

.46°

.48°

.25

.74°

.72°

.73°

.44°

"p < .05/9 (i.e., Bonferroni correction).

two further remarks are in order. To begin with, it shouldbe noted that the most pronounced increase in theory ofmind took place between ages 6 and 7. This is in linewith the findings of previous studies showing that chil-dren of that age display marked improvement in theirperformance on more complicated theory of mind tasks(e.g., Perner & Wimmer, 1985). Second, the TOM testalso proved suitable to index differential development oftheory of mind in older age groups (i.e., in 9-10- and11-12-year-old children). Note that a number of the al-ternative tasks tap an aspect of theory of mind that mostnormal children master at a relatively early age. For ex-ample, from age 7 onwards about 90% of the childrensuccessfully pass the John and Mary test, whereas fromage 8 onwards most children recognize the four basicemotions from facial expression (see Table I). This in-dicates that these tests are less sensitive to index differ-ential development of theory of mind in older agegroups.

Construct Validity of the TOM Test

As the TOM test intends to measure three succes-sive developmental stages of children's theory of mind(i.e., precursors of theory of mind, first manifestationsof a real theory of mind, mature theory of mind), onewould expect that young children predominantly succeedon items that index the precursors of theory of mind,while at the same time they fail to pass items that mea-sure the more mature aspects of theory of mind. Forolder ages, one would predict that an increasing numberof children succeed on items that tap the more advancedareas of theory of mind. To examine this issue, for each

age level (i.e., 5, 6, 7, 8, 9, 10, and 11/12 years) successpercentages of the three TOM subscales were calculated(i.e., number of passed items on a subscale divided bythe total number of items of that subscale). Figure 1shows mean success percentages on the three TOMsubscales for the various age levels. A 3 (Subscales) X7 (Age Levels) multivariate analysis of variance per-formed on these data revealed a significant effect of age,F(6, 63) = 32.1, p < .001, indicating that TOM testperformance improves with age. Furthermore, a signifi-cant effect of subscale, Fhot(2, 62) = 133.2, p < .001,emerged due to the fact that success percentages onTOM 1 (i.e., precursors of theory of mind) were higherthan those on TOM 3 (i.e., mature theory of mind),whereas success percentages of TOM 2 (i.e., first man-ifestations of a real theory of mind) were in between.Finally, the interaction of subscale with age also reachedsignificance, Fhot(l2, 122) = 2.3, p < .05. As can beseen, 7-year-old children succeeded on the vast majorityof TOM 1 and TOM 2 items (>80%), indicating thatmost of these children have passed the first two stagesof theory of mind development. Note also that the meansuccess percentage on TOM 3 items in 5-year-old chil-dren was only 23.8%, whereas in 11- to 12-year-old chil-dren a success percentage of more than 80% is reached.Thus, as expected, children acquire advanced aspects oftheory of mind at a relatively later age (i.e., after theyhave learned the more basic principles of theory ofmind).

Concurrent Validity of the TOM Test

The relationships between TOM test and alternativeindices of theory of mind were studied by means of

72 Muris et al.

Fig. 1. Mean success percentages on the three TOM subscales calculated per age level

Table II. Pearson Product-Moment and Point-Biserial Correlations Between TOM Test and AlternativeTheory of Mind and Social Development Measures

Variable

1.23.4.5,6,f .

8.

Emotion recognition-faceEmotion recognition-postureSally and Anne testSmarties testSocial Interpretation TestWISC-R picture arrangementRole taking testJohn and Mary test

TOM

.55b

.46b

.50b

.37b

.61b

.77b

.75b

.55"

1

—.27.42*.45*.38*.45".55".44b

2

—.30.30.48b

.44b

.40"

.23

3

—.16,29.49b

.40b

.45b

4

—.10.27.27.20

5

—.55b

.57b

.29

6 7 TOMa

.34

.30

.17

.29

.22— .30.63* — .40.54b .54b .18

- To control for age effects. Pearson and point-biserial correlations were computed for each age level and thenaveraged. Mean correlations thus obtained are shown in this column.

p < .05/36 (i.e.. Bonferroni correction).

Pearson product-moment correlations. In cases where di-chotomous variables were involved, point-biserial cor-relations were used. As can be seen in Table II, mosttheory of mind indices are significantly correlated witheach other.

At first sight, it seems appropriate to compute cor-relations between TOM test and alternative indices oftheory of mind while controlling for age (i.e., partial

correlations). However, by selecting 10 children of eachage level, the design of Study 1 capitalized on the de-velopmental progression of theory of mind. Thus, con-trolling for age would imply the elimination of anintrinsically important factor in both TOM and alterna-tive tests (i.e., the developmental progression of theoryof mind). To circumvent this problem, Pearson andpoint-biserial correlations between TOM test and con-

The TOM Test 73

current measures were computed for each age levelseparately. The mean of these separate correlations arepresented in the right column of Table II. As can beseen, correlations attenuated considerably. Nevertheless,the TOM test was still positively associated with con-current theory of mind indices. This result suggests that,as intended, the TOM test covers a broad range of theoryof mind aspects.

STUDY 2

Study 2 intended to investigate another aspect ofthe reliability of the TOM test, namely, its test-reteststability. To examine this issue, 12 normal primaryschool children were tested twice with the TOM test, 8weeks apart.

Method

Subjects and Procedure

Twelve children (8 boys and 4 girls) varying in agebetween 5 and 12 years from a regular primary school(De Pater van de Geld in Waalwijk, the Netherlands)participated in the study. AH children were healthy, nor-mal-functioning children. Children were interviewedwith the TOM test twice, 8 weeks apart. Both interviewswere conducted by the same experimenter in a separateroom at school.

Results and Discussion

Internal Consistency

Internal consistency of the TOM test appeared tobe sufficient: Cronbach's alphas were .95 for the totalscore, .62 for TOM 1, .94 for TOM 2, and .77 for TOM3.

Test-Retest Reliability

Table III shows demographic variables (age andsex) of the children as well as their total TOM test scoreson both occasions. As can be seen. TOM test scoresincreased with age; the Pearson correlation was .88 (p< .001). Note further that most children slightly im-proved their score on Occasion 2. A paired t test showedthat this improvement was significant. t(l 1) = 5.4. p <.01. Most important, test-retest reliability for the TOMtest was satisfactory; intraclass correlation (ICC) coef-

Table III. Demographic Variables of Normal Children in Study 2,and Their Total TOM Test Scores on Both Occasions

TOM test scores(8 weeks apart)

Child

123456789

101112

M

SD

Sex

MMMFMFMMMFMF

Age

56678899

10111112

Occasion 1

40464641566262636665737160.510.7

Occasion 2

41485445566765686771747764.410.4

ficients were .99 (p < .001) for the total score, .80 (p< .005) for TOM 1, .98 (p < .001) for TOM 2, and .91(p < .001) for TOM 3. These results indicate that theTOM test has sufficient test-retest stability and that thetest can be used to measure children's development orimprovement in theory of mind capability.

STUDY 3

The results presented so far suggest that the TOMtest can be used as a measure of the efficacy of theoryof mind training programs in children with pervasivedevelopmental disorders (PDDs). Yet, as the TOM testis based on an interview with the child, data about theinterrater reliability are needed. Study 3 addressed thisissue. Ten children with PDDs were tested with theTOM test. Two independent observers classified the re-actions of the children to each TOM test item as eitherfailed or passed.

Method

Subjects and Procedure

Ten children (10 boys) with PDDs were randomlyselected for the purpose of the present study. Age of thechildren ranged between 7 and 13 years. All childrenwere treated in one of the AUTI-groups of the PediatricCenter Overbunde, Maastricht, The Netherlands. After

74 Muris et al.

Table IV. Demographic Characteristics of 10 Boys and TOM Test Scores as Obtained by both Observers

TOM test score

Child Age (years; months) DSM-III-R diagnosis- IQb Observer 1 Observer 2 Kappac

123456789

10

13:312:9

10:117;68:111:210;812;36.97:10

PDDNOSPDDNOS

ADAD

PDDNOSPDDNOSPDDNOSPDDNOSPDDNOSPDDNOS

9293828693

11992979692

75704432617160693540

75704S33597159683338

1.001.000.870.980.971.000,960.900.900.95

a PDDNOS = pervasive developmental disorder not otherwise specified; AD = autistic disorder.b As indexed by the WISC-R,c Interrater reliability (Cohen's kappa).

extensive psychodiagnostic and psychiatric screening,the children were assigned a diagnosis of Autistic Dis-order or Pervasive Developmental Disorder Not Other-wise Specified (PDDNOS). The children fulfilled therelevant DSM-III-R criteria (American Psychiatric As-sociation, 1987). Diagnoses were made by a specialized,multidisciplinary team of professionals of the Center ofAutism South-Limburg. The main demographic charac-teristics of the children are shown in Table IV.

Children were tested in a silent room with two ex-perimenters present. Five children were tested by Ex-perimenter 1, while Experimenter 2 observed from adistance. For the other five children. Experimenter 2 ad-ministered the TOM test, while Experimenter 1 ob-served. Both experimenters monitored the responses andreactions of the children on-line. They were not able toobserve each other's registrations.

Results and Discussion

Internal Consistency

Internal consistency of the TOM test was good;Cronbach's alphas were .98 for the total score, .95 forTOM 1, .97 for TOM 2, and .95 for TOM 3.

Interrater Reliability

Interrater reliability of the TOM test was examinedby computing Cohen's kappa using scores of both ob-servers for the 78 items of the test. Kappas were cal-culated for each child separately because this makes itis possible to evaluate whether interrater reliability is

affected by the level of theory of mind development ofeach child. As can be seen in the right panel of TableIV, the kappa values were high (i.e., all exceeded .87).Furthermore, both observers produced a highly similarrank order of the children with regard to theory of mind;Spearman rank correlation was .99, p < .001.

Altogether, the results of Study 3 indicate that theinterrater reliability of the TOM test is good.

STUDY 4

Study 4 examined the discriminant validity of theTOM test. Various studies have concluded that a sub-stantial proportion of the children with PDDs exhibit def-icits in theory of mind. In most of these studies, theoryof mind deficits have been demonstrated by means offalse belief tasks (Baron-Cohen et al., 1985; Eisenmajer& Prior, 1991; Leslie & Frith, 1988; Perner, Frith, Les-lie, & Leekam, 1989; Prior, Dahlstrom, & Squires,1990). To investigate whether the TOM test is able todetect this specific deficit in children with PDDs, Study4 compared TOM test scores of children with autism andPDDNOS with those of children who suffered fromother psychiatric disorders (i.e., Attention-defi-cit/Hyperactivity Disorder, Anxiety Disorder).

There is evidence to suggest that intelligence is amoderator variable in performance on theory of mindtests (see, for a review, Happe, 1995), For example,Happe (1994) investigated the WISC-R scores of autisticchildren who either passed or failed a false belief task.Her results showed that passers had significantly higherIQ scores than failers. Most researchers in this domain

The TOM Test 75

Table V. Demographic Characteristics and Mean TOM Test Scores for Children with Attention-deficit/Hyperactivity Disorder (ADHD), Childrenwith an Anxiety Disorder (AnxD), and Children with a Pervasive Developmental Disorder (PDD)

Variablea

AgeSex (m/f)TIQVIQPIQ

TOMTOM1TOM 2TOM 3

ADHD children(n = 14)

8.5 (0.9)12/2

86.9(7.1)91.6 (12.0)83.4 (9.1)

61.1 (8.4)23.5 (3.2)27.5 (3.8)

9.5 (22)

AnxD children(n = 18)

9.1 (1.9)11/7

93.6 (12.7)90.5(11.9)97.4 (14.3)

58.9 (9.9)23.1 (3.1)26.7 (4.5)8.5 (3.2)

PDD children(n = 20)

9.3 (2.4)17/3

85.4 (12.9)84.3 (16.1)86.6 (10.9)

39.1 (24.9)16.9 (8.6)16.8(11.3)4.9 (5.4)

F or X2

0.73.82.61.56.6

9.27.2

10.96.4

P

nsns

<10ns

<.005

<.00l<.005<.001<.005

Post hoc comparisons

PDD<AnxD

PDD<AnxD; ADHD<AnxD

PDD<AnxD; PDD<ADHDPDD<AnxD; PDD<ADHDPDD<AnxD; PDD<ADHDPDD<AnxD; PDD<ADHD

' m = male; f = female; TOM = TOM total score: TOM 1 = precursors of theory of mind; TOM 2 = first manifestations of the 'real' theory ofmind; TOM 3 = mature theory of mind. Levels of intelligence were measured with the WISC-R.

assume that it is especially verbal IQ that plays a rolein the performance on false belief tasks (Happe, 1995).This may be relevant for the TOM test, as this test isessentially an interview instrument. Thus, it may well bethe case that children's scores on this test are criticallydependent on their verbal ability (i.e., language compre-hension and/or expression ability). To examine this is-sue, WISC-R scores of the children in Study 4 were alsoobtained.

Method

Subjects and Procedure

The subjects of Study 4 consisted of three groups:a group of anxiety-disordered children, a group of chil-dren with Attention-deficit/Hyperactivity Disorder(ADHD), and a group of children with pervasive devel-opmental disorders.

From the database (1996) of the children and youthsection of the Community Mental Health Center, EasternSouth-Limburg in Heerlen, The Netherlands, all childrensuffering from ADHD (n = 14) or an anxiety disorder(AnxD, i.e., obsessive-compulsive disorder, overanxiousdisorder, specific phobia, posttraumatic stress disorder,and separation anxiety disorder; n = 18) were selected.Children were classified on the basis of the DSM-III-Rafter extensive psychodiagnostic and psychiatric screen-ing. As part of the intake procedure, all children com-pleted the TOM test and the revised version of theWechsler Intelligence Scale for Children (WISC-R;Wechsler, 1974).

Twenty high-functioning children with PDDs (i.e.,8 children with Autistic Disorder and 12 children with

PDDNOS) also participated in Study 4. These childrenwere chosen randomly from the database of the Centerof Autism South-Limburg (see Study 3) and then inter-viewed with the TOM test. WISC-R scores of the PDDchildren were also available. Demographic characteris-tics (i.e., age, sex distribution, and IQ scores) of the threegroups are shown in the upper part of Table V.

Results and Discussion

Internal Consistency

As in the previous studies, the internal consistencyof the TOM test was satisfactory; Cronbach's alphas ofthe total scale and the various TOM subscales variedbetween .87 and .96 for the total group, .95 and .98 forthe children with PDD, and .72 and .80 for psychiatriccontrol children.

Discriminant Validity

The lower part of Table V shows mean TOM testscores for the three groups. Analyses of variance fol-lowed up by post-hoc t tests revealed that children withPDD had significant lower TOM test scores than chil-dren with ADHD and AnxD.

For this sample, the Pearson product-moment cor-relation between TOM test and age was only .24 (p <.10). Correlations between TOM test scores, on the onehand, and Total IQ, Verbal IQ, and Performance IQ, onthe other hand, however, were all positive and signifi-cant; r(52)s were .58 (p < .001), .61 (p < .001), and.45 (p < .001), respectively. Thus, children with higherintelligence scores performed better on the TOM test.

76 Muris et al.

To examine the unique contribution of the diag-nosis Pervasive Developmental Disorder to TOM testperformance, two additional analyses were performed.First of all, a multiple regression analysis (forward step-wise) was earned out with Diagnosis Autism, DiagnosisPDDNOS (both dummy variables), Verbal IQ, Perform-ance IQ, and Age as the predictors, and TOM test scoresbeing the dependent variable. Results showed that Di-agnosis Autism entered on the first step r(52) = -.69,p < 0.001; accounting for 47.6% of the TOM testscores. Verbal IQ (partial r = .32, p < .01), Age (partialr = .24, p < 0.05), and Diagnosis PDDNOS (partial r= -0,23, p < .05) entered on the second, third, andfourth step of the regression equation, accounting forsignificant proportions of the variance (10.2, 5.8, and4.4%, respectively). Second, an additional multiple re-gression analysis was performed while forcing VerbalIQ, Performance IQ, and Age in the equation at Step 1.Still, both Diagnosis Autism and Diagnosis PDDNOScontributed significantly to TOM test scores: partial rsbeing -.45 (p < .001) and -.22 (p < 0.05). Thus, evenwhen controlling for IQ level and age, diagnoses stillpredicted TOM test performance; the more severe chil-dren's pervasive developmental disorder, the worse theyperformed on the TOM test.

Altogether, the results of Study 4 support the dis-criminant validity of the TOM test in that children witha PDD performed worse on the test than children withother psychiatric disorders. Furthermore, the findings in-dicate that this difference in TOM test performance isnot carried by differences in intelligence. Even whencontrolling for intelligence, a significant and negativeassociation between diagnoses of autism and PDDNOS,on the one hand, and TOM test performance, on theother hand, emerged.

GENERAL DISCUSSION

Theory of mind pertains to children's capacity toanalyze the behavior of others by recognizing mentalstates (i.e., desires and beliefs) that underlie intentionaland social behavior. Clearly, then, theory of mind con-sists of various aspects, such as the recognition of emo-tions, the assessment of how others think, and theunderstanding of the motives underlying behavior ofothers. The TOM test has been construed to measure thisbroad range of aspects from a developmental perspec-tive. The test intends to tap three successive stages inthe development of theory of mind: precursors of theoryof mind, first manifestations of a real theory of mind,and more advanced aspects of theory of mind.

The current study was a first attempt to investigatethe reliability and validity of the TOM test. The mainresults can be summarized as follows. To begin with,the TOM test was found to be a reliable instrument;internal consistency was good, test-retest reliability wassufficient, and interrater reliability was high. Second,TOM test scores increased with age, indicating that thetest is sensitive to developmental progression. In linewith this, young children only succeeded on TOM itemsthat tap basic domains of theory of mind, whereas olderchildren also passed items that measure the more ad-vanced areas of theory of mind. Third, evidence wasobtained that supports the concurrent validity of theTOM test. That is, TOM test scores correlated positivelyand significantly with the performance on several othertheory of mind tasks (i.e., tests of emotion recognition,understanding of false and second-order beliefs, and roletaking). Fourth and finally, children with a PDD per-formed worse on the test than children with other psy-chiatric disorders. This suggests that the TOM testpossesses discriminant validity.

The TOM test can be used in three ways. First, thetest can be employed to screen children for deficits intheory of mind. There is some evidence to suggest thata poorly developed theory of mind can have negativesocial-emotional consequences, even in normal children(Lalonde & Chandler, 1995). Consequently, an instru-ment that measures the maturity of children's theory ofmind at different age levels is important. Second, be-cause the TOM test is informative about the develop-mental phase of children's theory of mind, it enablesclinicians to tailor their intervention to specific problemsof each child. For example, when the TOM test indicatesthat a child even fails on items that measure precursorsof theory of mind, it would be futile to teach this childunderstanding of false beliefs. Third, the TOM test canbe used to evaluate the efficacy of theory of mind train-ing programs.

Altogether, the present findings imply that the TOMtest is a reliable and valid instrument that can be em-ployed to screen the development of theory of mind in5- to 12-years-old normal children, children with per-vasive developmental disorders, and other socially im-mature children.

APPENDIX

Examples of TOM Test Items

Each question represents a TOM test item which isscored as either failed (0) or passed (1). The subscale to

The TOM Test 77

Fig. Al. Picture of Example 1. Fig. A2. Picture of Example 3.

which each item belongs is mentioned between paren-theses.

Example 1

Instruction: Take a look at this picture.Question 1: What has happened? Can you tell somethingabout it? (TOM 1)Question 2: Who in this picture is afraid? (TOM I)Question 3: Why is this person afraid? (TOM 2)Question 4: Who in this picture is happy? (TOM 1)Question 5: Why is this person happy? (TOM 2)Question 6: Who in this picture is sad? (TOM 1)Question 7: Why is this person sad? (TOM 2)Question 8: Who in this picture is angry? (TOM 1)Question 9: Why is this person angry? (TOM 2)

Example 2

Instruction: I will read you a short story. Listen care-fully.

Story: Pirn is one year old. He's at home, playing on theground Mother has given him a piece of apple. Sud-denly, Pim bites his lip and he starts to cry. He throwsthe piece of apple on the ground. Mother lifts Pim up,comforts him, and puts the piece of apple on the table.When father arrives at home, mother is on the phone.Father lifts Pim up and hugs him. Then he puts Pim backon the ground, and gives him the piece of apple whichis still lying on the table. As soon as Pim sees the pieceof apple, he starts to cry.Question 1: Why is Pim crying when father gives himthe piece of apple? (TOM 1)Question 2: Does father know why Pim is crying? (TOM2)Question 3: Does father know that Pim has bitten his lipwhen he wanted to eat the apple? (TOM 2)

Example 3

Instruction: Take a look at this picture.Question 1: What, do you think, is happening in thispicture? (TOM 1)

78 Muris et al.

Fig. A3. Picture of Example 4.

Story: The two boys in the foreground gossip about theother boy. Suddenly, that boy approaches them andhears what they are saying. The two boys are startled.Question 1: How does this boy feel? (point at the boyin the background) (TOM 1)Question 2: How does this boy feel? (point at one of theboys in the foreground) (TOM 1)

Example 4

instruction: Take a look at this picture.Question 1: What has happened in this picture? (TOM1)Question 2: How do you feel when you hurt yourself?(TOM 1)Question 3: Can you see from the girl's face how shereally feels? (TOM 2)Question 4: Is it possible to look happy, when you havehurt yourself? (TOM 2)

Example 5

Instruction: Take a look at this picture.Story: This is Ben. Ben wants to play with his bricks.Question 1: Which box will Ben open to play with hisbricks? (TOM 1)Story: Ben opens the box of bricks, and surprisingly hefinds out that it is filled with washing powder! He closesthe box, and opens the other smaller box. There are hisbricks! He takes out some bricks, and goes playing withthem in his bedroom. Then his brother Tim is enteringthe room. Tim also wants to play with the bricks...Question 2: Which box will Tim open to play with hisbricks? (TOM 2)Question 3: Do you know where the bricks really are?(TOM 2)

Example 6

Instruction: I will read you a short story. Listen care-fully.Story: Father and mother are at a birthday party. Theyonly know a few people, and think the music is too loud."Wow," says father, "It's a pleasure to be here!"Question 1: What does father mean? (TOM 3)Question 2: Why does father say: "It's a pleasure to behere!" (TOM 3)

Example 7

Question:Question:Question:Question:(TOM 2)Question:Question:2)Question:Question:

Do as if you comb your hair. (TOM 1)Do as if you brush your teeth (TOM 1)Do as if you are feeling cold, (TOM 1)How can I see that you are feeling cold?

Do as if you have a nasty drink. (TOM 1)How can I see that your drink is nasty? (TOM

Do as. if you are scared? (TOM 1)How can I see that you are scared? (TOM 2)

Example 8

Instruction: Take a look at this picture.Story: This is John. John often dreams. Sometimes hedreams about a new bike that he likes to have.Question 1: Is John able to touch the bike that he dreamsabout? (TOM 1)Story: Sometimes John has a frightening dream. Thenhe dreams about shadows.

The TOM Test 79

Fig. A4. Pictures of Example 5.

Question 2: Does John really see these shadows with hiseyes? (TOM 1)Question 3: Can somebody else see the shadows or thebike of John's dreams? (TOM 1)

Example 9

Instruction: I will read you a short story. Listen care-fully.Story: It is summer. Will and Mike have their holidays.They go out for a bicycle ride. Suddenly, there is adownpour and they have to shelter in a bus station.There are two men in the bus station who also shelterfrom the rain. One of the men remarks: "Wow, we havenice weather today!"Question 1: What does the man mean? (TOM 3)Question 2: Is it true what the man says? (TOM 3)Question 3: Why does the man say: "Wow, we havenice weather today!" (TOM 3)

REFERENCES

Astington. J. W., & Jenkins, J. M. (1995). Theory-of-mind develop-ment and social understanding. Cognition and Emotion, 9, 151-165.

American Psychiatric Association. (1987). Diagnostic and statisticalmanual of mental disorders (3rd ed., Rev.), Washington, DC:Author.

Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autisticchild have a 'theory of mind'? Cognition, 2J, 37-46.

Baron-Cohen, S., Leslie, A. M., & Frith. U. (1986). Mechanical, be-havioral and intentional understanding of picture stories in autisticchildren. British Journal of Developmental Psychology, 4, 113-125.

Bowler, D. M., Strom, E., & Urquhart, L. (1993). Elicitation of first-order "theory of mind" in children with autism. Paper presentedat the SRCD Conference, New Orleans, LA.

Eisenmajer, R., & Prior. M. (1991). Cognitive linguistic correlates of"theory of mind" ability in autistic children. British Journal ofDevelopmental Psychology. 9, 351-364.

Flavell, J. H., Miller, P. H., & Miller, S. (1993). Cognitive develop-ment. Englewood Cliffs, NJ: Prentice-Hall.

Frith. U. (1989). Autism; Explaining the enigma. Oxford: Blackwell.Hadwin, J., Baron-Cohen, S., Howlin, P., & Hill, K. (1996). Can we

teach children with autism to understand emotions, belief, or pre-tence? Development and Psychopathology, S. 345-365.

80 Muris et al.

Fig. A5. Picture of Example 8.

Happe. F. (1994). Wechsler IQ profile and theory of mind in autism:A research note. Journal of Child Psychology and Psychiatry, 35,1461-1471.

Happe. F. (1995). The role of age and verbal ability in the theory-of-mind task performance of subjects with autism. Child Develop-ment, 66. 567-582.

Hogrefe. G. J., Wimmer. H., & Perner, J. (1986). Ignorance versusfalse belief: A developmental lag in attribution of epistemicstales. Child Development. 57. 567-582.

Lalonde, C. E., & Chandler, M. J. (1995). False belief understandinggoes to school: On the social-emotional consequences of comingearly or late to a first theory of mind. Cognition and Emotion, 9,167-185.

Leslie. A. M., & Frith, U. (1988). Autistic children's understanding ofseeing, knowing and believing. British Journal of DevelopmentalPsychology, 6, 315-324.

Ozonoff, S., & Miller, J. N. (1995). Teaching theory of mind: A newapproach to social skills training for individuals with autism.Journal of Autism and Developmental Disorders, 25, 415-433.

Perner, J., Frith, U., Leslie, A. M., & Leekam, S. (1989). Explorationof the autistic child's theory of mind: Knowledge, belief and com-munication. Child Development, 60, 689-700.

Perner, J., & Wimmer, H. (1985). 'John thinks that Mary thinks that..,'Attribution of second-order beliefs by 5-10 years old children.Journal of Experimental Child Psychology, 39, 437-471.

Premack, D., & Woodruff, G. (1978). Does the chimpanzee have atheory of mind? Behavioural and Brain Sciences, 4, 515-526.

Prior, M., Dahlstrom, B., & Squires, T. (1990). Autistic children'sknowledge of thinking and feeling states in other people. Journalof Child Psychology and Psychiatry, 31, 587-601.

Selman, R. L., & Byrne, D. F. (1974). A structural-developmentalanalysis of levels of role taking in middle childhood. Child De-velopment, 45, 803-806.

Slaugther, V., & Gopnik, A. (1996). Conceptual coherence in thechild's theory of mind: Training children to understand belief.Child Development, 67, 2967-2988.

Spence. S. (1980). Social skills training with children and adolescents.A counselor's manual. Windsor: NFER/Nelson.

Steerneman, P. (1994). Theory-of-mind screening-schaal fTlieory-of-mind screening-scale]. Leuven/Apeldoorn: Garant.

Steerneman, P., Jackson. S., Pelzer, H., & Muris, P. (1996). Childrenwith social handicaps: An intervention program using a theory-of-mind approach. Clinical Child Psychology and Psychiatry, I,251-263.

Swettenham, J. (1996). Can children with autism be taught to under-stand false belief using computers? Journal of Child Psychologyand Psychiatry, 37, 157-165.

Vijtigschild, W., Berger, H. J. C., & van Spaendonck, J. A. S. (1969).Sociale Interpretatie Test [Social Interpretation Test]. Amster-dam: Swets & Zeitlinger.

Wechsler, D. (1974). Wechsler Intelligence Scale for Children (Rev.).New York: Psychological Corp.

Wellman, H. (1990). The child's theory of mind. Cambridge. MA: MITPress.

Whiten, A., Irving, K., & Macintyre, K. (1993). Can three-year-oldsand people with autism team to predict the consequences of falsebelief. Paper presented at the British Psychological Society De-velopmental Section Annual Conference, Birmingham, UK.