diferente individuale in metacomprehensiune

8/12/2019 Diferente Individuale in Metacomprehensiune

1/9

Individual Differences in Absolute and RelativeMetacomprehension Accuracy

Ruth H. Maki, Micheal Shields, Amanda Easton Wheeler, and Tammy Lowery ZacchilliTexas Tech University

The authors investigated absolute and relative metacomprehension accuracy as a function of verbal

ability in college students. Students read hard texts, revised texts, or a mixed set of texts. They then

predicted their performance, took a multiple-choice test on the texts, and made posttest judgments about

their performance. With hard texts, students with lower verbal abilities were overconfident in predictions

of future performance, and students with higher verbal abilities were underconfident in judging past

performance. Revised texts produced overconfidence for predictions. Thus, absolute accuracy of pre-

dictions and confidence judgments depended on students abilities and text difficulty. In contrast, relative

metacomprehension accuracy as measured by gamma correlations did not depend on verbal ability or on

text difficulty. Absolute metacomprehension accuracy was much more dependent on types of materials

and verbal skills than was relative accuracy, suggesting that they may tap different aspects of

metacomprehension.

Keywords:reading comprehension, metacognition, judgment

Peoples abilities to judge their own performance is one aspect

of metacognition, specifically metacognitive monitoring. Re-

searchers have measured the accuracy of metacognitive monitor-

ing in two fundamentally different ways. In the cognitive psychol-

ogy literature, the accuracy of metacognitive monitoring is usually

measured in a relative way, by correlating confidence judgments

and actual performance across a number of units, usually word

pairs, test questions, or texts (Maki & Serra, 1992; Nelson &

Dunlosky, 1991). Nelson (1984) recommended nonparametric

gamma as the measure of correlation, and since then, gamma hasconsistently been used in studies of metamemory (judgments about

future or past memory performance) and metacomprehension

(judgments about future or past performance over text). Gamma

shows whether units (word pairs, sentences, or texts) that receive

higher judgments are also the units that produce higher perfor-

mance and whether units that receive lower judgments produce

lower performance. Such a correlational measure indicates

whether students can discriminate materials within a set that lead

to poorer performance from materials that lead to higher perfor-

mance. However, gamma does not indicate whether participants

judgments are higher than or lower than actual performance; that

is, it does not show overconfidence or underconfidence.

Several other researchers have examined absolute accuracy in

judgments by investigating whether judgments match performance

exactly. In the educational psychology literature, bias has often

been used as a measure of metacognitive monitoring. Bias is the

signed difference between confidence judgments and actual per-

formance. A positive value indicates that judgments are higher

than performance, showing overconfidence. A negative value in-

dicates that judgments are lower than performance, showing un-

derconfidence. Bias measures whether individuals can estimate

their exact performance on materials. Individuals who perform

better on the criterion test usually show less overconfidence in

predictions than individuals who perform less well (Hacker, Bol,

Horgan, & Rakow, 2000; Jacobson, 1990; Maki, 1998). In two

other studies (Glover, 1989; Grabe, Bordages, & Petros, 1990),independent measures of ability rather than performance on the

test itself were used to define individual difference groups, and the

same pattern (i.e., less overconfidence in predictions for those with

higher ability) was found. In the social psychology literature,

Kruger and Dunning (1999; see also Dunning, Johnson, Ehrlinger,

& Kruger, 2003) have shown that people who have less knowledge

in a domain overestimate their performance to a large degree, and

people who have greater knowledge slightly underestimate their

performance. They have suggested that poor performers are un-

skilled and unaware of it (Kruger & Dunning, 1999, p. 1121).

Kruger and Dunning (1999) used posttest confidence judgments

exclusively in showing their unskilled and unaware effect. There

is only one study in the educational literature using posttest con-fidence judgments and individual differences. In that study,

Hacker et al., (2000) found that higher performers showed less

overconfidence than poorer performers.

Individual differences have been more difficult to find with

relative measures (i.e., within-subjects correlations) of metacogni-

tion (Maki & McGuire, 2002). Maki, Jonas, and Kallod (1994)

found no relationship between verbal ability and prediction accu-

racy for texts, although they did find that better comprehenders

produced more accurate posttest judgments. Lin, Moore, and

Zabrucky (2001) found no relationship between verbal abilities

and several measures of pretest or posttest judgment accuracy.

Ruth H. Maki, Micheal Shields, Amanda Easton Wheeler, and Tammy

Lowery Zacchilli, Department of Psychology, Texas Tech University.

We thank Elizabeth Garza, David Cardenas, Michele Cristan, and Emily

Phillips for testing participants in these studies.

Correspondence concerning this article should be addressed to Ruth H.

Maki, Department of Psychology, Texas Tech University, Lubbock, TX

79409-2051. E-mail: [email protected]

Journal of Educational Psychology Copyright 2005 by the American Psychological Association2005, Vol. 97, No. 4, 723731 0022-0663/05/$12.00 DOI: 10.1037/0022-0663.97.4.723

723


2/9

One reason that finding individual differences with relative

measures has been difficult might be that some measures of

relative metacognition accuracy may be unreliable. Thompson and

Mason (1996) asked students to judge their confidence in recog-

nizing studied photographs of faces and of words and also to judge

their confidence in answers to truefalse questions. Using different

sets of materials, students made such confidence judgments twicewith a 2-week interval. Thompson and Mason used gamma to

measure relative metamemory accuracy at each interval. By cor-

relating gammas at the two sessions, they measured the stability of

gamma over time and on alternate forms. They also calculated

separate gammas for the even and odd items to measure split-half

reliability. The split-half reliability for performance averaged .47,

and the split-half reliability for confidence ratings was .69. The

alternate forms correlations for performance and for confidence

ratings were similar to the split-half reliabilities. However, the

correlations of the gammas were much lower. The mean split-half

correlation for the gammas was .19, and the alternate forms reli-

ability was zero. Thus, the memory performance and the confi-

dence judgments showed some reliability, but relative

metamemory accuracy did not.Kelemen, Frost, and Weaver (2000) investigated the reliability

of relative metacognitive accuracy for a number of different mea-

sures, including judgments of learning (predictions of future per-

formance on recently studied word pairs) and predictions of future

performance on recently read text. They studied stability over time

by testing each participant twice at a 1-week interval on each task,

and they examined stability over tasks. Consistent with Thompson

and Masons (1996) results, they found that the memory perfor-

mance and the judgments themselves showed some stability over

time and, to a lesser extent, across tasks, but relative metacognitive

accuracy (the gamma correlations) showed no stability. The unre-

liability of gamma may explain why it has been so difficult to find

stable individual differences in relative accuracy measures.A lack of stability seems to be the norm for relative accuracy

judgments, but a measure of absolute accuracy, bias (the difference

between judged and actual performance), shows much better sta-

bility both across tasks and over time. Kelemen et al. (2000) found

that students who were overconfident or underconfident at one

time tended to be overconfident or underconfident at the second

time; a similar pattern was seen across tasks. Other studies have

also shown consistency across tasks for the amount of overconfi-

dence or underconfidence. West and Stanovich (1997) used very

different tasks: confidence judgments on answers to general

knowledge questions and confidence in a motor task. They also

found correlations in bias across these different tasks. Schraw and

his colleagues (Schraw, 1997; Schraw, Dunkle, Bendixen, & Roe-

del, 1995; Schraw & Roedel, 1994) have also found bias to besimilar for individuals across tasks that differ in difficulty and in

content domain. The fact that measures of bias (overconfidence

and underconfidence) show better reliability than gamma means

that bias is more likely to show stable individual differences.

Some of the studies cited above have used predictions of future

performance, and others have used confidence in past perfor-

mance. In educational applications, predictions rather than posttest

confidence judgments are most important for controlling effective

study behavior. Both the ability to predict overall performance and

the ability to predict performance among different units (e.g., texts

or topics) are important. A student who is overconfident overall

may fail a test because he or she assumes that there is no need for

further study (a failure of absolute metacomprehension). That

same student may not know which specific topics need further

study and which have been mastered (a failure of relative meta-

comprehension). Thus, an inability to predict absolute test accu-

racy and relative test accuracy may result in underpreparation for

tests. There have been relatively few studies using absolute judg-ments to examine predictions of future test performance (cf.,

Glover, 1989; Hacker et al., 2000). Predicting performance over

text material is most relevant to educational settings, but there are

no studies examining individual differences in both absolute pre-

diction accuracy and relative prediction accuracy with text

material.

In the present study, we investigated absolute and relative meta-

comprehension accuracy as a function of verbal ability and diffi-

culty of texts. We varied the difficulty of texts because the degree

of overconfidence among students with low skill (e.g., Grabe et al.,

1990; Hacker et al., 2000; Kruger & Dunning, 1999) should be

related to the difficulty of the test. Schraw and Roedel (1994)

found that overconfidence in absolute judgments was much greater

on more difficult tasks than on easier tasks. They found this resultfor texts of varying difficulty and also across tasks in different

domains (e.g., reading comprehension, probability judgments, and

spatial judgments). When different domains were used, Schraw

and Roedel reported that the specific type of task was less impor-

tant than overall task difficulty. However, they did not include

differences in participant skill in their studies.

We manipulated text difficulty by using hard texts and revised

versions of those texts, using revisions that were similar to those

studied by Burns (reported in Weaver, Bryant, & Burns, 1995).

Burns revised texts in three different ways. In the principled

revision, he followed the suggestions of Britton and Gulgoz

(1991), which are guided by Kintsch and van Dijks (1978) model

of text comprehension. Burns heuristic revision was done by anexpert in text processing to make the original text more readable.

Finally, in the readability revision, he changed the surface struc-

ture of the text to match the readability statistics of the heuristic

version. Burns compared only the original texts, the readability

revisions, and the heuristic revisions in his first experiment. Par-

ticipants read texts, made predictions, and then answered questions

from the texts. Relative metacomprehension accuracy (gamma

correlations) was significantly higher for the heuristically revised

texts. In Burns Experiment 2 in which four versions of the texts

were compared (original, principled revision, heuristic revision,

and readability revision), proportions correct and predictions dif-

fered across texts, but relative metacomprehension accuracy did

not differ. With these two conflicting results, it is unclear whether

revising texts affects relative metacomprehension accuracy.We revised difficult texts, following both principled and read-

ability guidelines in our revisions. Participants read and predicted

their performance on six texts: all hard texts, all revised versions

of these hard texts, or mixed texts in which half of the texts were

hard and half were revised. We thought that this mixed condition

might produce good relative accuracy if participants were sensitive

to the differences in text difficulty. However, Rawson and Dun-

losky (2002) mixed very short texts (two sentences each) that had

either high, medium, or low coherence. They did not find higher

relative accuracy for mixed sets than for pure sets. Thus, relative

accuracy may be insensitive to variations in difficulty of materials,

724 MAKI, SHIELDS, WHEELER, AND ZACCHILLI


3/9

but Rawson and Dunloskys texts may have been too short for

participants to notice the mix of text coherence. Measures of

relative accuracy may be sensitive to variations in difficulty across

longer texts if the variation in materials makes participants more

aware of the potential variation in their performance. Increased

variation in ones own performance may also increase absoluteaccuracy, especially among higher ability students.

In addition to the three text conditions (hard, revised, and mixed

texts), we divided students into low, medium, and high ability

groups on the basis of standardized test scores to study the effects

of individual differences in relative and absolute accuracy of

metacomprehension. We expected to see different patterns of

overconfidence and underconfidence (absolute accuracy) for the

different types of texts in participants of varying verbal abilities.

Whether we would see similar effects for relative accuracy of

metacomprehension was the primary question of interest.

Method

Participants

A total of 159 college student volunteers from general psychology

courses at a large public university participated for partial course credit.

There were 72 men and 87 women in the sample. The average age was 19.1

(with a range of 17 to 41). The sample was 76.1% European American,

13.2% Hispanic, and 11.7% other or unidentified. We randomly assigned

52 participants to the revised text condition, 51 participants to the hard text

condition, and 55 participants to the mixed text condition.

We used university records to determine students gender, ethnicity, and

age, as well as their scores on the verbal portion of the Scholastic Achieve-

ment Test (SAT) or the English portion of the American College Test

(ACT). If these scores were not available, we used students reading scores

on a state academic skills test. All but 1 participant in the mixed text

condition had scores for the SAT, the ACT, or the Texas Academic Skills

Program (TASP). Overall, the mean score for the verbal SAT was 522, and

the standard deviation was 72.98. The mean verbal ACT score was 21.3,

and the standard deviation was 4.39. We used TASP scores for only a few

students, so those means are not meaningful. Scores were converted to z

scores, using national (or state) norms for each test. Students were divided

into three ability groups on the basis of their verbal z scores. Of 157

participants having verbal scores, 54 students were in the lower ability

group with z scores of less than .126 (SAT M 438, SD 36.44; ACT

M 17.5,SD .76), 57 students were placed in the medium ability group

withz scores between .126 and .414 (SAT M 522, SD 13.51; ACT

M 21.3,SD .82), and 46 students were in the higher ability group with

z scores greater than .414 (SAT M 592,SD 38.22; ACT M 27.6,

SD 2.51).

Materials

For the hard texts, we used the six texts and the short practice text used

by Rawson, Dunlosky, and Thiede (2000) that were taken from a GRE

preparation manual (Branson, Selub, & Solomon, 1987). Because we

needed two short, hard practice texts, we shortened one of the texts used by

Glenberg and Epstein (1987) for use as a practice text. This text hasproduced low performance in our laboratory. We revised the texts to

increase readability by replacing low frequency words with high frequency

words; breaking long, complex sentences into shorter, simpler sentences;

eliminating embedded clauses; and changing passive sentences into active

sentences. In addition, we used two of the three principled revision rules

found by Britton and Gulgoz (1991) to be effective in improving text recall.

We used the same term for the same concept throughout the text, and we

replaced anaphoric references (e.g., it) with the specific concept. One of

the original hard practice texts and its revised version is shown in Appen-

dix A. The characteristics of the six hard texts and the six revised texts as

analyzed by Microsoft Word can be seen in Table 1. Table 1 shows that in

comparison with the original hard texts, the revised texts had fewer words,

more sentences, fewer words per sentence, fewer passive sentences, higher

Flesch Reading Ease scores, and lower FleschKincaid grade levels. We

also analyzed the coherence among sentences using latent semantic anal-ysis (LSA).1 These are also shown in Table 1. Higher values indicate that

the concepts presented in sequential sentences have more overlap (with 1.0

as the highest possible score). The sentence-to-sentence coherence of the

revised texts was somewhat higher than the sentence-to-sentence coher-

ence of the hard texts. We also compared the semantic overlap of each

complete hard text with each complete revised text using LSA. Concepts in

the hard and revised texts were similar in that the observed LSA of .89 is

high.

For the mixed condition, half of the revised texts were grouped (Sets A

and B) and half of the hard texts were grouped (Sets C and D). One quarter

of the participants in the mixed group each read a combination of texts

designated as Sets AC, AD, BC, or BD. One revised and one hard practice

text was used for all participants in the mixed condition.

We used six multiple-choice test questions with five alternatives for each

text. For the hard texts, we used the questions that had been used by

1 LSA compares the relations among terms in a sentence with those of a

large text base. Similarities across pairs of sentences are then measured,

and an average sentence-to-sentence coherence score is obtained. Perfect

overlap across sentences produces LSA 1.0. Texts with higher coherence

should be easier to comprehend (Folz, Kintsch, & Landauer, 1998). We

selected the General Reading up to First-Year College as the database.

Using more specialized databases for some texts (such as the psychology

corpus for the intelligence testing text) tended to lower the coherence

scores relative to using the larger database, but it did not change the pattern

of coherence for the hard versus revised texts.

Table 1

Mean Characteristics of Hard and Revised Texts (With Ranges in Parentheses)

Characteristics Hard texts Revised texts

No. of words 478.00 (358.00601.00) 441.00 (347.00604.00)No. of sentences 20.00 (16.0023.00) 31.00 (24.0038.00)

Words/sentence 24.05 (21.0628.38) 14.21 (10.8415.89)% passive sentences 22.50 (3.7050.00) 9.17 (018.00)Flesch Reading Ease 37.50 (19.1049.40) 50.92 (42.2059.50)Flesch-Kincaid grade level 11.71 (10.9012.00) 9.80 (7.6012.00)LSA sentence coherence 0.24 (0.180.30) 0.27 (0.230.33)

Note. LSA latent semantic analysis.

725METACOMPREHENSION ACCURACY


4/9

Rawson et al. (2000). Half of the test questions tapped details, and half

tapped more conceptual material. The questions were the same for the

revised texts, except that we changed some of the words and phrases in the

questions in the same way as we did for the revised texts. There were two

practice questions for each of the four practice texts. Examples of the hardand revised test questions can be seen in Appendix B.

Procedure

Participants came to the laboratory for one session, which lasted about

1 hr. They were seated in front of a computer in a small individual cubicle.

We used Inquisit software (2002) to control the presentation of stimuli and

trial events and to collect the data. First, participants read and responded to

two practice texts. Then, they read the six texts, presented in a randomized

order for each participant. Participants read one sentence at a time and

pressed the space bar for the next sentence in the text to appear on the

screen. After reading each text, students were asked to indicate the per-

centage of the text they thought they were able to comprehend, using the

following scale: 0%, 20%, 40%, 60%, 80%, and 100%. Participants then

judged how many of six test questions they thought they would get correct(predictions) for a given text, and they typed a number from 0 to 6. After

reading all six texts, students answered the six multiple-choice questions

for each text; texts were tested in a random order. After answering the six

questions for a text, students gave a posttest confidence judgment about

their test performance by typing in the number of test questions (06) they

thought they got correct for the text. This was repeated for questions from

each of the six texts. Students were then debriefed.

Results

Hard Versus Revised Texts

We analyzed the percent correct for the pure revised and the

pure hard text conditions to ensure that our revisions were suc-

cessful. The mean percent correct for the revised texts was56.14%, and the mean percent correct for the hard texts was

48.91%. A 2 (text condition) 3 (verbal ability level) analysis of

variance (ANOVA) showed that revised texts produced signifi-

cantly higher performance than hard texts, F(1, 97) 10.79,

MSE 175.69, p .001, partial 2 .085.2 In addition, we

analyzed students ratings of the percentage of the texts they

thought they were able to comprehend. The mean percentage for

revised texts was 65.33%, and the mean percentage for the hard

texts was 48.52%. Better perceived comprehension of revised texts

was significant in a 2 (text condition) 3 (verbal ability)

ANOVA,F(1, 97) 15.12,MSE 464.53,p .001, partial 2

.120. Our revisions were successful both as measured objectively

in terms of percent correct and as measured subjectively in terms

of percentage comprehended.

We investigated the reliability of our measures by using Cron-

bachs alpha for scores on the six texts in each condition. We

calculated Cronbachs alphas for percent correct, predictions, and

confidence judgments. For hard texts, the alphas were .712, .919,

and .858 for percent correct, predictions, and confidence judg-

ments, respectively. For the revised texts, the corresponding alphas

were .705, .931, and .893. Predictions and confidence judgments

showed good reliability across texts; test performance was more

variable.

Absolute Accuracy With Hard Texts

We analyzed absolute and relative metacomprehension accuracy

for the hard texts separately from the revised texts. The analysis

with hard texts included data for texts in the pure hard text

condition and for the hard half of the texts in the mixed hard and

revised text condition.3 We converted number of test questions

correct for each text into percent correct. We also converted

predictions and confidence judgments, which had been given in

terms of numbers correct, into percent correct. The top half of

Table 2 shows the mean actual percent correct, predictions, and

confidence judgments for hard texts in the pure and mixed text

conditions. A 2 (text condition) 3 (ability) 3 (measure

2 Partial 2 is an estimation of proportion variance accounted for by an

effect, using a correction for error in the effect and using the corrected

effect plus error variance in the denominator (Keppel & Wickens, 2004, p.

233). The formula is: Partial 2dfeffect(Feffect 1)

dfeffect(Feffect 1)N . It estimates ef-

fect size in the population. For repeated measures designs, the N is

multiplied by the number of levels of the repeated factor. This produces a

conservative estimate of the population effect size because the formula

includes error variance from all sources in the denominator (Keppel &

Wickens, 2004, pp. 427447).3 The analyses of revised and hard texts were done separately so that the

mixed conditions could be compared with the pure conditions. Scores on

the hard half of the texts were used in the analysis of hard texts, and scores

on the revised texts were used in the revised ANOVA. Participants in the

mixed text group were, thus, included in both analyses, but the scores were

from a different set of texts in each analysis.

Table 2

Mean Percent Performance, Predictions, and Posttest Confidence Judgments (With Standard Errors of the Mean in Parentheses) for

the Pure and Mixed Sets of Hard Texts and for Pure and Mixed Sets of Revised Texts as a Function of Verbal Ability

Texts

Pure text sets Mixed text sets

Predictions Confidence Correct Predictions Confidence Correct

Hard textsLow ability 53.86 (3.92) 43.98 (3.46) 38.27 (3.92) 54.17 (4.16) 51.74 (3.67) 45.14 (4.16)Medium ability 45.66 (4.16) 46.18 (3.67) 49.13 (4.16) 47.09 (3.63) 45.24 (3.20) 52.38 (3.63)High ability 52.29 (4.03) 50.00 (3.56) 59.97 (4.04) 55.88 (4.03) 54.90 (3.56) 57.19 (4.04)

Revised textsLow ability 52.08 (3.51) 46.67 (2.96) 48.75 (3.32) 63.89 (3.92) 57.99 (3.31) 50.35 (3.71)Medium ability 61.11 (3.51) 59.03 (2.96) 57.78 (3.32) 62.70 (3.42) 54.23 (2.89) 55.82 (3.24)High ability 74.54 (4.53) 69.44 (3.82) 65.74 (4.28) 66.99 (3.81) 60.46 (3.21) 65.87 (3.60)



5/9

[percent correct, percentage prediction, vs. percentage posttest

confidence])4 mixed ANOVA showed a significant main effect of

verbal ability, F(2, 99) 3.60, MSE 169.87, p .031, partial

2 .047. Measure interacted with ability, F(4, 198) 6.53,

MSE 129.72, p .001, partial 2 .066, but there were no

effects of pure versus mixed texts, largest F(1, 198) 1.14,

MSE 129.72, p .338.

To understand the interactions, we conducted separate analyses

for low, medium, and high ability participants. Using planned

comparisons, we asked whether predictions and confidence judg-

ments differed from performance to see overconfidence and un-

derconfidence. Participants having low verbal skills gave predic-

tions (54.02%) that were higher than their performance (41.71%);

they were overconfident, F(1, 32) 7.95, MSE 645.56, p

.008, partial 2 .093. However, their posttest confidence judg-

ments (47.86%) did not differ significantly from their percent

correct (41.71%), F(1, 32) 2.80, MSE 458.31, p .104. The

predictions of medium-ability participants were about the same as

their percent correct (46.38% vs. 50.76%), F(1, 35) 3.06,MSE 228.13, p .089, but their confidence judgments showed

underconfidence in that they were significantly lower (45.71%)

than their percent correct (50.76%), F(1, 35) 4.43, MSE

209.01, p .043, partial 2 .044. Predictions of high-ability

participants did not differ significantly from performance (54.08%

vs. 58.58%), F(1, 32) 2.15, MSE 319.56, p .152, but their

confidence judgments (52.45%) were significantly lower than their

performance (58.58%),F(1, 32) 6.03,MSE 211.85, p .020,

partial 2 .069. Thus, with hard texts, only low-ability partici-

pants showed overconfidence, and this was only with predictions

and not with posttest confidence judgments. Both medium- and

high-ability participants showed underconfidence in their posttest

confidence judgments, but their predictions were accurate.

Absolute Accuracy With Revised Texts

The bottom half of Table 2 shows the mean percent correct,

predictions, and confidence judgments for the revised texts in the

pure and mixed conditions. A 2 (text condition) 3 (verbal

ability) 3 (measure [percent correct, predictions, and posttest

confidence judgments]) ANOVA showed that the main effects of

verbal ability,F(2, 100) 10.99,MSE 138.25,p .001, partial

2 .158, and measure, F(2, 200) 10.50, MSE 113.44, p

.001, partial 2 .056, were significant. Planned comparisons

showed that percentage predictions were higher than percent cor-

rect (63.04% vs. 57.35%), F(1, 100) 12.01, MSE 326.95,

partial 2 .049,p .001, but that confidence judgments did not

differ from percent correct (57.96% vs. 57.35%; F 1). These

effects did not interact with pure versus mixed text sets or with

ability (Fs 1). Overall, students were overconfident before

taking the tests but were accurate after taking the tests on these

revised texts. This effect was not different for students of differing

verbal abilities.

Relative Accuracy: Performance and Judgment

Relationships

Table 3 shows the mean gammas for prediction and confidence

judgments in the pure hard, mixed, and pure revised conditions as

a function of verbal ability. Gamma is a nonparametric measure

that taps ordinal consistency between two measures (Goodman &

Kruskal, 1954). In this case, it measured the extent to which high

judgments were accompanied by high percentages correct for textswithin each individual. It ranges from 1.0 (showing a perfect

negative relationship) to 1.0 (showing a perfect positive rela-

tionship). We calculated prediction gammas individually for each

participant using their predicted percent correct and their actual

percent correct for each of the six texts. We could not compute

prediction gammas for 9 participants because they gave the same

predictions to all six texts. One of these participants (with high

ability) read the hard texts, 5 participants (2 with high ability, 2

with medium ability, and 1 with low ability) read the revised texts,

and 3 participants (1 with high ability and 2 with low ability) read

the mixed texts. We calculated confidence judgment gammas

using the posttest judgments of percent correct and actual percent

correct across the six texts for each participant. We could not

compute gammas for 7 participants because they gave the sameposttest judgments for all texts. Three of these (all low ability) read

the hard texts, and 4 participants read the mixed texts (1 high

ability, 1 medium ability, and 2 low ability).

4 We analyzed the data with performance, predictions, and posttest

confidence judgments as levels of an independent variable so that over-

confidence and underconfidence could be seen relative to performance. An

alternative analysis would be to use the signed difference between each

type of judgment and performance (i.e., bias). Analyzing bias shows

identical effects, but the derived negative and positive values do not show

the levels of judgments and performance as clearly as the analysis we used.

Table 3

Mean Prediction and Posttest Confidence Judgment Gamma Correlations (With Standard Errors

of the Mean in Parentheses) for the Hard, Mixed, and Revised Texts for the Verbal Ability

Groups

Ability groups

Hard texts Mixed texts Revised texts

Predictions Confidence Predictions Confidence Predictions Confidence

Low .34* (.14) .43* (.16) .23 (.18) .38* (.16) .20 (.15) .28* (.12)Medium .42* (.12) .62* (.07) .36* (.14) .44* (.14) .02 (.13) .27* (.12)High .35* (.11) .64* (.09) .33* (.14) .44* (.14) .48* (.16) .55* (.14)Mean .36* .57* .31* .42* .20* .36*

*p .05.



6/9

We used single-sample t tests to determine whether each of

these correlations was significantly different from zero. All of the

confidence judgment gammas were greater than zero, and all of the

prediction gammas in the hard text conditions were significantly

greater than zero. For mixed sets of hard and revised texts, pre-

dictions were significantly correlated with performance for

medium- and high-ability participants but not for low-ability par-ticipants. For the revised texts, only the high-ability participants

gave predictions that produced significant correlations with their

individual performance. The mean gammas across ability levels

are shown in the bottom row of Table 3. Each of these gammas

was significantly greater than zero. Each ability group produced

prediction gammas that were significantly greater than zero (.26

for low ability, .25 for medium ability, and .38 for high ability),

and they produced confidence judgment gammas that were signif-

icantly greater than zero (.36 for low ability, .42 for medium

ability, and .53 for high ability).

We analyzed these gammas in a 3 (pure hard, pure revised,

mixed text condition) 3 (low, medium, and high ability) 2

(type of judgment [predictions and confidence judgments]) mixed

ANOVA. The only significant effect was that confidence judgmentgammas were higher than prediction gammas, F(1, 133) 7.80,

MSE .209, p .006, partial 2 .023. The effect of text

difficulty was not significant (F 1), and the effect of ability was

not significant,F(2, 133) 2.78,MSE .373, p .106. None of

the interactions approached significance; the largest Fwas for the

Verbal Ability Text Condition interaction, F(4, 133) 1.76,

MSE .373,p .176. In contrast to absolute metacomprehension

accuracy, relative metacomprehension accuracy did not show sig-

nificantly different patterns of individual differences for the dif-

ferent types of texts.

We correlated our measures of absolute metacomprehension

accuracy and relative metacomprehension accuracy. Table 4 shows

the Pearson r correlations among these measures. Prediction bias(the signed difference between percentage predictions and percent-

age performance) correlated with posttest bias, and prediction

gammas correlated with posttest gammas. However, bias did not

correlate with gamma either for predictions or posttest judgments.

This suggests that relative and absolute metacognitive accuracy

involve different processes.

Discussion

We found an interaction between students ability levels and the

difference between their predictions and their performance with

the hard texts. Low-ability students were overconfident, especially

in their predictions, and high-ability students were underconfident,

especially in their posttest confidence judgments. Revised texts

produced better performance than hard texts. However, with re-

vised texts, overall predictions were higher than percent correct for

students of all ability levels. Thus, even though the revisions

improved test performance, revisions increased predictions of per-

formance to a greater extent so that students were overconfident.

This finding of more overconfidence with the easier, revised textscontrasts with that of Schraw and Roedel (1994), who found

increasing overconfidence as the difficulty of tasks increased. In

their study, however, they did not divide participants by ability

level. Our effect of increased overconfidence for revised texts

compared with hard texts was especially true for higher ability

participants.

We found fairly strong relationships between the absolute ac-

curacy of metacognitive estimates (bias) and students abilities

with the hard texts. On the other hand, we did not find a relation-

ship between relative metacomprehension accuracy and individual

differences (i.e., gammas did not differ significantly with verbal

ability). Furthermore, the accuracy of relative and absolute judg-

ments did not correlate, suggesting that these two types of meta-

cognition tap different processes. This conclusion is consistentwith that of Kelemen et al. (2000) who found no relationships

among different types of metacognitive accuracy.

One question about gamma as a measure of relative metacog-

nitive accuracy is whether it is reliable. Thompson and Mason

(1996) showed that gamma produced low split-half and low alter-

native forms reliability. Kelemen et al. (2000) found that relative

prediction accuracy for texts did not correlate in a testretest

reliability design. Both of these studies suggest that gamma is

unreliable. If gamma is an unreliable measure, then finding rela-

tionships between this measure and other measures is unlikely.

However, in the present study, gamma produced a pattern of

effects that is consistent with the idea that it is measuring relative

metacomprehension accuracy. We found that pretest and posttestgammas were correlated. In addition, posttest judgment gammas

were significantly higher than prediction gammas, as is usually

found when relative metacomprehension accuracy is tested (Maki,

1998). Overall, gammas for prediction accuracy and for posttest

accuracy were significantly greater than zero for each type of text

and for each ability group. Within text and ability groups, all of the

confidence judgment gammas were significantly greater than zero,

and most of the prediction gammas were greater than zero. If

gamma is an unreliable measure that results from chance variation,

consistent effects such as those described above should not be

found.

Assuming that gamma is measuring relative metacognitive ac-

curacy in a meaningful way, we can ask why relative and absolute

accuracy produced different patterns of results. We found a strongrelationship between bias in prediction judgments and bias in

posttest judgments. Others have also shown that absolute

confidence-judgment accuracy is consistent for individuals across

a number of content domains (Kelemen et al., 2000; Schraw, 1997;

Schraw et al., 1995; West & Stanovich, 1997). Schraw and Roedel

(1994) found increasing overconfidence with increased difficulty.

This suggests that individuals judge their performance levels as

fairly constant even when task difficulty varies. In addition, most

individuals think that they are at least average (Krueger & Mueller,

2002). To some extent, individuals must base their absolute judg-

ments on prior knowledge about the placement of their scores on

Table 4

Pearsons r Correlations Between Metamemory Measures in

Hard, Revised, and Mixed Text Sets

MeasurePrediction

gammaPosttest

biasPosttestgamma

Prediction bias .027 .832** .102Prediction gamma .008 .316**Confidence bias .153

**p .01.



7/9

a specific type of task relative to average scores (or on a consistent

misperception about their scores relative to what they think is the

average). Ehrlinger and Dunning (2003) reported that individuals

chronic self views about their ability on various tasks influenced

their posttest confidence judgments independently of their actual

performance. As shown in our study and in other studies (e.g.,

Kruger & Dunning, 1999), individuals do not adjust their perfor-mance judgments upward or downward enough to account for

differences in their ability. In addition, they do not accurately

estimate changes in mean performance related to task difficulty.

This leads to errors in absolute judgments. With difficult tasks, the

poorer-performing individuals are overconfident and the higher-

performing individuals are underconfident.

The absolute level of judgments is irrelevant with relative mea-

sures of metacomprehension. Here, variance in judgments that

correspond to variance in performance is critical. Participants have

to use their experience with the specific task in order to produce

accurate relative judgments across parts of that task. The low

levels of relative accuracy that are generally obtained in studies of

metacomprehension (see Maki, 1998, for a review) may result

from an inability of individuals to emphasize learning from spe-cific texts more than the perceived general ability to learn from

text. Judgments based entirely on general abilities would not match

variance in performance across specific texts at all. However,

individuals must use learning from specific texts to some extent

because correlations between judgments and performance are not

zero. In the present study, correlations between judgments and

performance were about equal for students of differing abilities.

This suggests that participants who are high or low in verbal ability

use specific learning from individual texts to about the same

degree in making their judgments.

One curious effect for relative judgment accuracy was that

students were not able to use heterogeneity of texts to produce

more accurate relative predictions for mixed sets of hard andrevised texts, although the two types of text did produce significant

differences in performance. Rawson and Dunlosky (2002) also

found that mixing levels of coherence among short texts did not

improve relative metacomprehension. In addition, the revisions did

not improve students abilities to judge which texts would produce

higher and which would produce lower performance; that is,

relative metacomprehension accuracy was not improved with the

revisions. Prediction of relative performance on revised texts

seemed to be particularly difficult for low- and medium-ability

students. In fact, the gamma correlations for predictions were not

significantly different from zero for these groups, although the

overall analysis of variance did not reveal significant differences

among text conditions, and there was not a significant interaction

between ability and text condition.Hard texts produced overconfidence in predictions for students

with low verbal abilities. Because these students had lower than

average performance when they predicted average performance,

they would study less than is necessary to achieve an average

amount of learning. Such students are likely to be disappointed in

their test performance although they gave accurate judgments after

taking the tests. This accuracy, however, came too late (after the

test instead of before the test). Students with medium and high

verbal abilities predicted their future performance accurately.

Thus, they should be able to study as long as is needed to meet the

criterion they have set for themselves. After taking the tests,

however, they were underconfident in their performance. This

should have no adverse effects. These average and above average

students should be surprised when their actual performance is

better than expected.

Students of all ability levels were overconfident in their predic-

tions of test performance over the revised texts. This has interest-

ing implications for education. Revising the texts produced higherperformance, so the revisions were successful. However, students

with medium and high abilities were overconfident with the re-

vised texts although they were quite accurate with hard texts. The

revisions made the texts seem easier than they actually were, so

that the medium- and higher-ability students looked more like the

low-ability students in terms of overconfidence in predictions.

Revising texts did improve performance, but they also produced

overconfidence in the stronger students, and the revisions did not

improve relative metacomprehension accuracy. This raises an in-

teresting educational paradox: Revisions are clearly helpful for

learning, but they may be detrimental to metacomprehension.

References

Branson, M., Selub, M., & Solomon, L. (1987). How to prepare for the

GRE. San Diego, CA: Harcourt Brace.

Britton, B. K., & Gulgoz, S. (1991). Using Kintschs computational model

to improve instructional text: Effects of repairing inference calls on

recall and cognitive structures.Journal of Educational Psychology, 83I,

329345.

Dunning, D., Johnson, K., Ehrlinger, J., & Kruger, J. (2003). Why people

fail to recognize their own incompetence. Current Directions in Psy-

chological Science, 12, 8387.

Ehrlinger, J., & Dunning, D. (2003). How chronic self-views influence

(and potentially mislead) estimates of performance. Journal of Person-

ality and Social Psychology, 84, 517.

Folz, P. W., Kintsch, W., & Landauer, P. K. (1998). The measurement of

textual coherence with latent semantic analysis. Discourse Processes,

25, 285307.Glenberg, A. N., & Epstein, W. (1987). Inexpert calibration of compre-

hension. Memory & Cognition, 15, 8493.

Glover, J. (1989). Reading ability and the calibrator of comprehension.

Educational Research Quarterly, 13,711.

Goodman, L. A., & Kruskal, W. H. (1954). Measures of association for

cross classifications.Journal of the American Statistical Association, 49,

732764.

Grabe, M., Bordages, W., & Petros, T. (1990). The impact of computer-

supported study on student awareness of examination performance.

Journal of Computer-Based Instruction, 27, 113119.

Hacker, D. J., Bol, L., Horgan, D. D., & Rakow, E. A. (2000). Test

prediction and performance in a classroom context. Journal of Educa-

tional Psychology, 92, 160170.

Inquisit. (2002). (Version 1.32) [Computer software]. Seattle, WA: Milli-

second Software.

Jacobson, J. M. (1990). Congruence of pre-test predictions and posttest

estimations with grades on short answer and essay tests. Educational

Research Quarterly, 14,4147.

Kelemen, W. L., Frost, P. J., & Weaver, C. A., III. (2000). Individual

differences in metacognition: Evidence against a general metacognitive

ability. Memory & Cognition, 28, 92107.

Keppel, G., & Wickens, T. D. (2004).Design and analysis: A researchers

handbook (4th ed., pp. 427447). Upper Saddle River, NJ: Pearson

Prentice Hall.

Kintsch, W., & van Dijk, T. A. (1978). Toward a model of text compre-

hension and production. Psychological Review, 95, 363394.

Krueger, J., & Mueller, R. A. (2002). Unskilled, unaware, both? The



8/9

better-than-average heuristic and statistical regression predict errors of

estimates of own performance. Journal of Personality and Social Psy-

chology, 82, 180188.

Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How

difficulties in recognizing ones own incompetence lead to inflated

self-assessments. Journal of Personality and Social Psychology, 77,

11211134.

Lin, L.-M., Moore, D., & Zabrucky, K. M. (2001). An assessment ofstudents calibration of comprehension and calibration of performance

using multiple measures. Reading Psychology, 22, 111128.

Maki, R. H. (1998). Metacomprehension of text: Influence of absolute

confidence level on bias and accuracy. In D. L. Medin (Ed.), The

psychology of learning and motivation: Vol. 38 (pp. 223248). San

Diego, CA: Academic Press.

Maki, R. H., Jonas, D., & Kallod, M. (1994). The relationship between

comprehension and metacomprehension ability. Psychonomic Bulletin

and Review, 1, 126138.

Maki, R. H., & McGuire, M. J. (2002). Metacognition for text: Implica-

tions for education. In T. J. Perfect & B. L. Schwartz (Eds.), Applied

metacognition(pp. 3967). Cambridge, United Kingdom: Cambridge

University Press.

Maki, R. H., & Serra, M. (1992). The basis of test predictions for test

material.Journal of Experimental Psychology: Learning, Memory, andCognition, 18, 116126.

Nelson, T. O. (1984). A comparison of current measures of the accuracy of

feeling-of-knowing predictions. Psychological Bulletin, 95, 109133.

Nelson, T. O., & Dunlosky, J. (1991). When peoples judgments of

learning (JOLs) are extremely accurate at predicting subsequent recall:

The delayed-JOL effect.Psychological Science, 2, 267270.

Rawson, K. A., & Dunlosky, J. (2002). Are performance predictions for

text based on ease of processing? Journal of Experimental Psychology:

Learning, Memory, and Cognition, 28, 6980.

Rawson, K. A., Dunlosky, J., & Thiede, K. W. (2000). The rereading

effect: Metacomprehension accuracy improves across reading trials.

Memory & Cognition, 28, 10041010.

Schraw, G. (1997). The effect of generalized metacognitive knowledge on

test performance and confidence judgments. Journal of Experimental

Education, 65, 135146.

Schraw, G., Dunkle, M. E., Bendixen, L., & Roedel, T. D. (1995). Does a

general monitoring skill exist? Journal of Educational Psychology, 87,

433444.

Schraw, G., & Roedel, T. D. (1994). Test difficulty and judgment bias.

Memory & Cognition, 22, 6369.

Thompson, W. B., & Mason, S. E. (1996). Instability of individual differ-

ences in the association between confidence judgments and memory

performance. Memory & Cognition, 24, 226234.

Weaver, C. A., III, Bryant, D. S., & Burns, K. D. (1995). Comprehension

monitoring: Extensions of the Kintsch and van Dijk model. In C. A.

Weaver III & S. Mannes (Eds.), Discourse comprehension: Essays inhonor of Walter Kintsch (pp. 177193). Hillsdale, NJ: Erlbaum.

West, R. F., & Stanovich, K. E. (1997). The domain specificity and

generality of overconfidence: Individual differences in performance

estimation bias. Psychonomic Bulletin & Review, 4, 387392.



9/9

Appendix A

Practice Texts

Hard Text

GLOBAL TEMPERATURE AND FLOODING5

Scientific investigators of global climate change have warned that

there will occur substantial rises in worldwide sea levels if there is a rise

of several degrees in global temperature. The projected increase in

worldwide temperature is based on the observation that both individual

and corporate use of carbon dioxide-producing combustible fuels has

been on the rise since the middle of the last century. The carbon dioxide

is delivered into the earths atmosphere where it acts somewhat like the

glass in a greenhouse, retaining radiant energy. The carbon dioxide

absorbs infrared heat radiation from the earth instead of allowing it to

escape into space. Trapping the infrared heat radiation in the air leads

to rising temperature. Even a rise of a few degrees of global temperature

may cause melting of the polar icecaps and considerable increases in the

height of oceans.

Revised Text

GLOBAL TEMPERATURE AND FLOODINGScientists who study change in the worlds climate warn that sea levels will

increase if the temperature increases throughout the world. An increase of several

degrees in temperature would make the sea levels go up quite a lot. The scientists

expect worldwide temperature to increasebecause people and companies use fuels

that make carbon dioxide. The amount of carbon dioxide released by these fuels

has been increasing since the middle 1800s. When carbon dioxide is released into

the air, it acts like theglass in a greenhouse. The carbon dioxidetraps heat near the

surface of the earth. Carbon dioxide stops the heat from escaping into space.

Because the heat cant escape, the temperature of the earth is rising. If the worlds

temperaturegoesup only a fewdegrees, the polaricecapswill melt.This will cause

a large increase in the height of the oceans.

5 The hard practice text was a short version of a text used by Glenberg

and Epstein (1987).

Appendix B

Test Questions for Practice Texts

Questions for Hard Texts

(Recalling facts) GLOBAL TEMPERATURE AND FLOODING

The projected increase in worldwide temperature is based on what

observation?

* A) both individual and corporate use of carbon dioxide-producing

combustible fuels has been increasing.

B) trapping of infrared radiation in the air is decreasing.

C) heat radiation is more likely to be trapped in the earth as sea levels

rise.

D) carbon dioxide has been decreasing in the earths atmosphere.

E) more greenhouses have been built, increasing the amount of carbon

dioxide trapped in the atmosphere.

(Understanding the passage) GLOBAL TEMPERATURE AND

FLOODING

How would carbon dioxide cause a rise in global temperature?

* A) by absorbing and retaining infrared heat radiation coming from the

earth into the atmosphere.

B) by reflecting infrared heat energy back to the earth once it had come

into contact with the atmosphere.

C) the rise would come directly from heat being emitted from individualand corporate use of carbon dioxide-producing fuels.

D) by intensifying the heat potential from the suns rays when they

collide with carbon dioxide gases in the atmosphere.

E) by facilitating the movement of radiation into space.

Questions for Revised Text

(Recalling facts) GLOBAL TEMPERATURE AND FLOODING

[Revised]

The projected increase in worldwide temperature is based on what

observation?

* A) heat is more likely to be trapped by the sea as sea levels rise

B) the amount of heat trapped near the earth is decreasing

C) the amount of carbon dioxide in the earths atmosphere has been

decreasingD) individuals and companies have been using more fuels that produce

carbon dioxide

E) more greenhouses have been built, increasing the amount of carbon

dioxide trapped in the atmosphere

(Understanding the passage) GLOBAL TEMPERATURE AND

FLOODING [Revised]

How could carbon dioxide cause a rise in global temperature?

* A) by keeping heat close to the earths surface rather than letting it

escape into space

B) by reflecting heat energy back to the earth once it has escaped into

space

C) the temperature increase would come directly from heat being given

off from the use of carbon dioxide-producing fuels

D) by strengthening the heat from the suns rays when the rays collide

with carbon dioxide gases in the atmosphereE) by facilitating the movement of the heat into space

* denotes the correct response. The order of the alternatives was ran-

domized for each participant.

Received September 8, 2004

Revision received July 22, 2005

Accepted July 22, 2005


diferente individuale in metacomprehensiune

Documents