diferente individuale in metacomprehensiune
TRANSCRIPT
-
8/12/2019 Diferente Individuale in Metacomprehensiune
1/9
Individual Differences in Absolute and RelativeMetacomprehension Accuracy
Ruth H. Maki, Micheal Shields, Amanda Easton Wheeler, and Tammy Lowery ZacchilliTexas Tech University
The authors investigated absolute and relative metacomprehension accuracy as a function of verbal
ability in college students. Students read hard texts, revised texts, or a mixed set of texts. They then
predicted their performance, took a multiple-choice test on the texts, and made posttest judgments about
their performance. With hard texts, students with lower verbal abilities were overconfident in predictions
of future performance, and students with higher verbal abilities were underconfident in judging past
performance. Revised texts produced overconfidence for predictions. Thus, absolute accuracy of pre-
dictions and confidence judgments depended on students abilities and text difficulty. In contrast, relative
metacomprehension accuracy as measured by gamma correlations did not depend on verbal ability or on
text difficulty. Absolute metacomprehension accuracy was much more dependent on types of materials
and verbal skills than was relative accuracy, suggesting that they may tap different aspects of
metacomprehension.
Keywords:reading comprehension, metacognition, judgment
Peoples abilities to judge their own performance is one aspect
of metacognition, specifically metacognitive monitoring. Re-
searchers have measured the accuracy of metacognitive monitor-
ing in two fundamentally different ways. In the cognitive psychol-
ogy literature, the accuracy of metacognitive monitoring is usually
measured in a relative way, by correlating confidence judgments
and actual performance across a number of units, usually word
pairs, test questions, or texts (Maki & Serra, 1992; Nelson &
Dunlosky, 1991). Nelson (1984) recommended nonparametric
gamma as the measure of correlation, and since then, gamma hasconsistently been used in studies of metamemory (judgments about
future or past memory performance) and metacomprehension
(judgments about future or past performance over text). Gamma
shows whether units (word pairs, sentences, or texts) that receive
higher judgments are also the units that produce higher perfor-
mance and whether units that receive lower judgments produce
lower performance. Such a correlational measure indicates
whether students can discriminate materials within a set that lead
to poorer performance from materials that lead to higher perfor-
mance. However, gamma does not indicate whether participants
judgments are higher than or lower than actual performance; that
is, it does not show overconfidence or underconfidence.
Several other researchers have examined absolute accuracy in
judgments by investigating whether judgments match performance
exactly. In the educational psychology literature, bias has often
been used as a measure of metacognitive monitoring. Bias is the
signed difference between confidence judgments and actual per-
formance. A positive value indicates that judgments are higher
than performance, showing overconfidence. A negative value in-
dicates that judgments are lower than performance, showing un-
derconfidence. Bias measures whether individuals can estimate
their exact performance on materials. Individuals who perform
better on the criterion test usually show less overconfidence in
predictions than individuals who perform less well (Hacker, Bol,
Horgan, & Rakow, 2000; Jacobson, 1990; Maki, 1998). In two
other studies (Glover, 1989; Grabe, Bordages, & Petros, 1990),independent measures of ability rather than performance on the
test itself were used to define individual difference groups, and the
same pattern (i.e., less overconfidence in predictions for those with
higher ability) was found. In the social psychology literature,
Kruger and Dunning (1999; see also Dunning, Johnson, Ehrlinger,
& Kruger, 2003) have shown that people who have less knowledge
in a domain overestimate their performance to a large degree, and
people who have greater knowledge slightly underestimate their
performance. They have suggested that poor performers are un-
skilled and unaware of it (Kruger & Dunning, 1999, p. 1121).
Kruger and Dunning (1999) used posttest confidence judgments
exclusively in showing their unskilled and unaware effect. There
is only one study in the educational literature using posttest con-fidence judgments and individual differences. In that study,
Hacker et al., (2000) found that higher performers showed less
overconfidence than poorer performers.
Individual differences have been more difficult to find with
relative measures (i.e., within-subjects correlations) of metacogni-
tion (Maki & McGuire, 2002). Maki, Jonas, and Kallod (1994)
found no relationship between verbal ability and prediction accu-
racy for texts, although they did find that better comprehenders
produced more accurate posttest judgments. Lin, Moore, and
Zabrucky (2001) found no relationship between verbal abilities
and several measures of pretest or posttest judgment accuracy.
Ruth H. Maki, Micheal Shields, Amanda Easton Wheeler, and Tammy
Lowery Zacchilli, Department of Psychology, Texas Tech University.
We thank Elizabeth Garza, David Cardenas, Michele Cristan, and Emily
Phillips for testing participants in these studies.
Correspondence concerning this article should be addressed to Ruth H.
Maki, Department of Psychology, Texas Tech University, Lubbock, TX
79409-2051. E-mail: [email protected]
Journal of Educational Psychology Copyright 2005 by the American Psychological Association2005, Vol. 97, No. 4, 723731 0022-0663/05/$12.00 DOI: 10.1037/0022-0663.97.4.723
723
-
8/12/2019 Diferente Individuale in Metacomprehensiune
2/9
One reason that finding individual differences with relative
measures has been difficult might be that some measures of
relative metacognition accuracy may be unreliable. Thompson and
Mason (1996) asked students to judge their confidence in recog-
nizing studied photographs of faces and of words and also to judge
their confidence in answers to truefalse questions. Using different
sets of materials, students made such confidence judgments twicewith a 2-week interval. Thompson and Mason used gamma to
measure relative metamemory accuracy at each interval. By cor-
relating gammas at the two sessions, they measured the stability of
gamma over time and on alternate forms. They also calculated
separate gammas for the even and odd items to measure split-half
reliability. The split-half reliability for performance averaged .47,
and the split-half reliability for confidence ratings was .69. The
alternate forms correlations for performance and for confidence
ratings were similar to the split-half reliabilities. However, the
correlations of the gammas were much lower. The mean split-half
correlation for the gammas was .19, and the alternate forms reli-
ability was zero. Thus, the memory performance and the confi-
dence judgments showed some reliability, but relative
metamemory accuracy did not.Kelemen, Frost, and Weaver (2000) investigated the reliability
of relative metacognitive accuracy for a number of different mea-
sures, including judgments of learning (predictions of future per-
formance on recently studied word pairs) and predictions of future
performance on recently read text. They studied stability over time
by testing each participant twice at a 1-week interval on each task,
and they examined stability over tasks. Consistent with Thompson
and Masons (1996) results, they found that the memory perfor-
mance and the judgments themselves showed some stability over
time and, to a lesser extent, across tasks, but relative metacognitive
accuracy (the gamma correlations) showed no stability. The unre-
liability of gamma may explain why it has been so difficult to find
stable individual differences in relative accuracy measures.A lack of stability seems to be the norm for relative accuracy
judgments, but a measure of absolute accuracy, bias (the difference
between judged and actual performance), shows much better sta-
bility both across tasks and over time. Kelemen et al. (2000) found
that students who were overconfident or underconfident at one
time tended to be overconfident or underconfident at the second
time; a similar pattern was seen across tasks. Other studies have
also shown consistency across tasks for the amount of overconfi-
dence or underconfidence. West and Stanovich (1997) used very
different tasks: confidence judgments on answers to general
knowledge questions and confidence in a motor task. They also
found correlations in bias across these different tasks. Schraw and
his colleagues (Schraw, 1997; Schraw, Dunkle, Bendixen, & Roe-
del, 1995; Schraw & Roedel, 1994) have also found bias to besimilar for individuals across tasks that differ in difficulty and in
content domain. The fact that measures of bias (overconfidence
and underconfidence) show better reliability than gamma means
that bias is more likely to show stable individual differences.
Some of the studies cited above have used predictions of future
performance, and others have used confidence in past perfor-
mance. In educational applications, predictions rather than posttest
confidence judgments are most important for controlling effective
study behavior. Both the ability to predict overall performance and
the ability to predict performance among different units (e.g., texts
or topics) are important. A student who is overconfident overall
may fail a test because he or she assumes that there is no need for
further study (a failure of absolute metacomprehension). That
same student may not know which specific topics need further
study and which have been mastered (a failure of relative meta-
comprehension). Thus, an inability to predict absolute test accu-
racy and relative test accuracy may result in underpreparation for
tests. There have been relatively few studies using absolute judg-ments to examine predictions of future test performance (cf.,
Glover, 1989; Hacker et al., 2000). Predicting performance over
text material is most relevant to educational settings, but there are
no studies examining individual differences in both absolute pre-
diction accuracy and relative prediction accuracy with text
material.
In the present study, we investigated absolute and relative meta-
comprehension accuracy as a function of verbal ability and diffi-
culty of texts. We varied the difficulty of texts because the degree
of overconfidence among students with low skill (e.g., Grabe et al.,
1990; Hacker et al., 2000; Kruger & Dunning, 1999) should be
related to the difficulty of the test. Schraw and Roedel (1994)
found that overconfidence in absolute judgments was much greater
on more difficult tasks than on easier tasks. They found this resultfor texts of varying difficulty and also across tasks in different
domains (e.g., reading comprehension, probability judgments, and
spatial judgments). When different domains were used, Schraw
and Roedel reported that the specific type of task was less impor-
tant than overall task difficulty. However, they did not include
differences in participant skill in their studies.
We manipulated text difficulty by using hard texts and revised
versions of those texts, using revisions that were similar to those
studied by Burns (reported in Weaver, Bryant, & Burns, 1995).
Burns revised texts in three different ways. In the principled
revision, he followed the suggestions of Britton and Gulgoz
(1991), which are guided by Kintsch and van Dijks (1978) model
of text comprehension. Burns heuristic revision was done by anexpert in text processing to make the original text more readable.
Finally, in the readability revision, he changed the surface struc-
ture of the text to match the readability statistics of the heuristic
version. Burns compared only the original texts, the readability
revisions, and the heuristic revisions in his first experiment. Par-
ticipants read texts, made predictions, and then answered questions
from the texts. Relative metacomprehension accuracy (gamma
correlations) was significantly higher for the heuristically revised
texts. In Burns Experiment 2 in which four versions of the texts
were compared (original, principled revision, heuristic revision,
and readability revision), proportions correct and predictions dif-
fered across texts, but relative metacomprehension accuracy did
not differ. With these two conflicting results, it is unclear whether
revising texts affects relative metacomprehension accuracy.We revised difficult texts, following both principled and read-
ability guidelines in our revisions. Participants read and predicted
their performance on six texts: all hard texts, all revised versions
of these hard texts, or mixed texts in which half of the texts were
hard and half were revised. We thought that this mixed condition
might produce good relative accuracy if participants were sensitive
to the differences in text difficulty. However, Rawson and Dun-
losky (2002) mixed very short texts (two sentences each) that had
either high, medium, or low coherence. They did not find higher
relative accuracy for mixed sets than for pure sets. Thus, relative
accuracy may be insensitive to variations in difficulty of materials,
724 MAKI, SHIELDS, WHEELER, AND ZACCHILLI
-
8/12/2019 Diferente Individuale in Metacomprehensiune
3/9
but Rawson and Dunloskys texts may have been too short for
participants to notice the mix of text coherence. Measures of
relative accuracy may be sensitive to variations in difficulty across
longer texts if the variation in materials makes participants more
aware of the potential variation in their performance. Increased
variation in ones own performance may also increase absoluteaccuracy, especially among higher ability students.
In addition to the three text conditions (hard, revised, and mixed
texts), we divided students into low, medium, and high ability
groups on the basis of standardized test scores to study the effects
of individual differences in relative and absolute accuracy of
metacomprehension. We expected to see different patterns of
overconfidence and underconfidence (absolute accuracy) for the
different types of texts in participants of varying verbal abilities.
Whether we would see similar effects for relative accuracy of
metacomprehension was the primary question of interest.
Method
Participants
A total of 159 college student volunteers from general psychology
courses at a large public university participated for partial course credit.
There were 72 men and 87 women in the sample. The average age was 19.1
(with a range of 17 to 41). The sample was 76.1% European American,
13.2% Hispanic, and 11.7% other or unidentified. We randomly assigned
52 participants to the revised text condition, 51 participants to the hard text
condition, and 55 participants to the mixed text condition.
We used university records to determine students gender, ethnicity, and
age, as well as their scores on the verbal portion of the Scholastic Achieve-
ment Test (SAT) or the English portion of the American College Test
(ACT). If these scores were not available, we used students reading scores
on a state academic skills test. All but 1 participant in the mixed text
condition had scores for the SAT, the ACT, or the Texas Academic Skills
Program (TASP). Overall, the mean score for the verbal SAT was 522, and
the standard deviation was 72.98. The mean verbal ACT score was 21.3,
and the standard deviation was 4.39. We used TASP scores for only a few
students, so those means are not meaningful. Scores were converted to z
scores, using national (or state) norms for each test. Students were divided
into three ability groups on the basis of their verbal z scores. Of 157
participants having verbal scores, 54 students were in the lower ability
group with z scores of less than .126 (SAT M 438, SD 36.44; ACT
M 17.5,SD .76), 57 students were placed in the medium ability group
withz scores between .126 and .414 (SAT M 522, SD 13.51; ACT
M 21.3,SD .82), and 46 students were in the higher ability group with
z scores greater than .414 (SAT M 592,SD 38.22; ACT M 27.6,
SD 2.51).
Materials
For the hard texts, we used the six texts and the short practice text used
by Rawson, Dunlosky, and Thiede (2000) that were taken from a GRE
preparation manual (Branson, Selub, & Solomon, 1987). Because we
needed two short, hard practice texts, we shortened one of the texts used by
Glenberg and Epstein (1987) for use as a practice text. This text hasproduced low performance in our laboratory. We revised the texts to
increase readability by replacing low frequency words with high frequency
words; breaking long, complex sentences into shorter, simpler sentences;
eliminating embedded clauses; and changing passive sentences into active
sentences. In addition, we used two of the three principled revision rules
found by Britton and Gulgoz (1991) to be effective in improving text recall.
We used the same term for the same concept throughout the text, and we
replaced anaphoric references (e.g., it) with the specific concept. One of
the original hard practice texts and its revised version is shown in Appen-
dix A. The characteristics of the six hard texts and the six revised texts as
analyzed by Microsoft Word can be seen in Table 1. Table 1 shows that in
comparison with the original hard texts, the revised texts had fewer words,
more sentences, fewer words per sentence, fewer passive sentences, higher
Flesch Reading Ease scores, and lower FleschKincaid grade levels. We
also analyzed the coherence among sentences using latent semantic anal-ysis (LSA).1 These are also shown in Table 1. Higher values indicate that
the concepts presented in sequential sentences have more overlap (with 1.0
as the highest possible score). The sentence-to-sentence coherence of the
revised texts was somewhat higher than the sentence-to-sentence coher-
ence of the hard texts. We also compared the semantic overlap of each
complete hard text with each complete revised text using LSA. Concepts in
the hard and revised texts were similar in that the observed LSA of .89 is
high.
For the mixed condition, half of the revised texts were grouped (Sets A
and B) and half of the hard texts were grouped (Sets C and D). One quarter
of the participants in the mixed group each read a combination of texts
designated as Sets AC, AD, BC, or BD. One revised and one hard practice
text was used for all participants in the mixed condition.
We used six multiple-choice test questions with five alternatives for each
text. For the hard texts, we used the questions that had been used by
1 LSA compares the relations among terms in a sentence with those of a
large text base. Similarities across pairs of sentences are then measured,
and an average sentence-to-sentence coherence score is obtained. Perfect
overlap across sentences produces LSA 1.0. Texts with higher coherence
should be easier to comprehend (Folz, Kintsch, & Landauer, 1998). We
selected the General Reading up to First-Year College as the database.
Using more specialized databases for some texts (such as the psychology
corpus for the intelligence testing text) tended to lower the coherence
scores relative to using the larger database, but it did not change the pattern
of coherence for the hard versus revised texts.
Table 1
Mean Characteristics of Hard and Revised Texts (With Ranges in Parentheses)
Characteristics Hard texts Revised texts
No. of words 478.00 (358.00601.00) 441.00 (347.00604.00)No. of sentences 20.00 (16.0023.00) 31.00 (24.0038.00)
Words/sentence 24.05 (21.0628.38) 14.21 (10.8415.89)% passive sentences 22.50 (3.7050.00) 9.17 (018.00)Flesch Reading Ease 37.50 (19.1049.40) 50.92 (42.2059.50)Flesch-Kincaid grade level 11.71 (10.9012.00) 9.80 (7.6012.00)LSA sentence coherence 0.24 (0.180.30) 0.27 (0.230.33)
Note. LSA latent semantic analysis.
725METACOMPREHENSION ACCURACY
-
8/12/2019 Diferente Individuale in Metacomprehensiune
4/9
Rawson et al. (2000). Half of the test questions tapped details, and half
tapped more conceptual material. The questions were the same for the
revised texts, except that we changed some of the words and phrases in the
questions in the same way as we did for the revised texts. There were two
practice questions for each of the four practice texts. Examples of the hardand revised test questions can be seen in Appendix B.
Procedure
Participants came to the laboratory for one session, which lasted about
1 hr. They were seated in front of a computer in a small individual cubicle.
We used Inquisit software (2002) to control the presentation of stimuli and
trial events and to collect the data. First, participants read and responded to
two practice texts. Then, they read the six texts, presented in a randomized
order for each participant. Participants read one sentence at a time and
pressed the space bar for the next sentence in the text to appear on the
screen. After reading each text, students were asked to indicate the per-
centage of the text they thought they were able to comprehend, using the
following scale: 0%, 20%, 40%, 60%, 80%, and 100%. Participants then
judged how many of six test questions they thought they would get correct(predictions) for a given text, and they typed a number from 0 to 6. After
reading all six texts, students answered the six multiple-choice questions
for each text; texts were tested in a random order. After answering the six
questions for a text, students gave a posttest confidence judgment about
their test performance by typing in the number of test questions (06) they
thought they got correct for the text. This was repeated for questions from
each of the six texts. Students were then debriefed.
Results
Hard Versus Revised Texts
We analyzed the percent correct for the pure revised and the
pure hard text conditions to ensure that our revisions were suc-
cessful. The mean percent correct for the revised texts was56.14%, and the mean percent correct for the hard texts was
48.91%. A 2 (text condition) 3 (verbal ability level) analysis of
variance (ANOVA) showed that revised texts produced signifi-
cantly higher performance than hard texts, F(1, 97) 10.79,
MSE 175.69, p .001, partial 2 .085.2 In addition, we
analyzed students ratings of the percentage of the texts they
thought they were able to comprehend. The mean percentage for
revised texts was 65.33%, and the mean percentage for the hard
texts was 48.52%. Better perceived comprehension of revised texts
was significant in a 2 (text condition) 3 (verbal ability)
ANOVA,F(1, 97) 15.12,MSE 464.53,p .001, partial 2
.120. Our revisions were successful both as measured objectively
in terms of percent correct and as measured subjectively in terms
of percentage comprehended.
We investigated the reliability of our measures by using Cron-
bachs alpha for scores on the six texts in each condition. We
calculated Cronbachs alphas for percent correct, predictions, and
confidence judgments. For hard texts, the alphas were .712, .919,
and .858 for percent correct, predictions, and confidence judg-
ments, respectively. For the revised texts, the corresponding alphas
were .705, .931, and .893. Predictions and confidence judgments
showed good reliability across texts; test performance was more
variable.
Absolute Accuracy With Hard Texts
We analyzed absolute and relative metacomprehension accuracy
for the hard texts separately from the revised texts. The analysis
with hard texts included data for texts in the pure hard text
condition and for the hard half of the texts in the mixed hard and
revised text condition.3 We converted number of test questions
correct for each text into percent correct. We also converted
predictions and confidence judgments, which had been given in
terms of numbers correct, into percent correct. The top half of
Table 2 shows the mean actual percent correct, predictions, and
confidence judgments for hard texts in the pure and mixed text
conditions. A 2 (text condition) 3 (ability) 3 (measure
2 Partial 2 is an estimation of proportion variance accounted for by an
effect, using a correction for error in the effect and using the corrected
effect plus error variance in the denominator (Keppel & Wickens, 2004, p.
233). The formula is: Partial 2dfeffect(Feffect 1)
dfeffect(Feffect 1)N . It estimates ef-
fect size in the population. For repeated measures designs, the N is
multiplied by the number of levels of the repeated factor. This produces a
conservative estimate of the population effect size because the formula
includes error variance from all sources in the denominator (Keppel &
Wickens, 2004, pp. 427447).3 The analyses of revised and hard texts were done separately so that the
mixed conditions could be compared with the pure conditions. Scores on
the hard half of the texts were used in the analysis of hard texts, and scores
on the revised texts were used in the revised ANOVA. Participants in the
mixed text group were, thus, included in both analyses, but the scores were
from a different set of texts in each analysis.
Table 2
Mean Percent Performance, Predictions, and Posttest Confidence Judgments (With Standard Errors of the Mean in Parentheses) for
the Pure and Mixed Sets of Hard Texts and for Pure and Mixed Sets of Revised Texts as a Function of Verbal Ability
Texts
Pure text sets Mixed text sets
Predictions Confidence Correct Predictions Confidence Correct
Hard textsLow ability 53.86 (3.92) 43.98 (3.46) 38.27 (3.92) 54.17 (4.16) 51.74 (3.67) 45.14 (4.16)Medium ability 45.66 (4.16) 46.18 (3.67) 49.13 (4.16) 47.09 (3.63) 45.24 (3.20) 52.38 (3.63)High ability 52.29 (4.03) 50.00 (3.56) 59.97 (4.04) 55.88 (4.03) 54.90 (3.56) 57.19 (4.04)
Revised textsLow ability 52.08 (3.51) 46.67 (2.96) 48.75 (3.32) 63.89 (3.92) 57.99 (3.31) 50.35 (3.71)Medium ability 61.11 (3.51) 59.03 (2.96) 57.78 (3.32) 62.70 (3.42) 54.23 (2.89) 55.82 (3.24)High ability 74.54 (4.53) 69.44 (3.82) 65.74 (4.28) 66.99 (3.81) 60.46 (3.21) 65.87 (3.60)
726 MAKI, SHIELDS, WHEELER, AND ZACCHILLI
-
8/12/2019 Diferente Individuale in Metacomprehensiune
5/9
[percent correct, percentage prediction, vs. percentage posttest
confidence])4 mixed ANOVA showed a significant main effect of
verbal ability, F(2, 99) 3.60, MSE 169.87, p .031, partial
2 .047. Measure interacted with ability, F(4, 198) 6.53,
MSE 129.72, p .001, partial 2 .066, but there were no
effects of pure versus mixed texts, largest F(1, 198) 1.14,
MSE 129.72, p .338.
To understand the interactions, we conducted separate analyses
for low, medium, and high ability participants. Using planned
comparisons, we asked whether predictions and confidence judg-
ments differed from performance to see overconfidence and un-
derconfidence. Participants having low verbal skills gave predic-
tions (54.02%) that were higher than their performance (41.71%);
they were overconfident, F(1, 32) 7.95, MSE 645.56, p
.008, partial 2 .093. However, their posttest confidence judg-
ments (47.86%) did not differ significantly from their percent
correct (41.71%), F(1, 32) 2.80, MSE 458.31, p .104. The
predictions of medium-ability participants were about the same as
their percent correct (46.38% vs. 50.76%), F(1, 35) 3.06,MSE 228.13, p .089, but their confidence judgments showed
underconfidence in that they were significantly lower (45.71%)
than their percent correct (50.76%), F(1, 35) 4.43, MSE
209.01, p .043, partial 2 .044. Predictions of high-ability
participants did not differ significantly from performance (54.08%
vs. 58.58%), F(1, 32) 2.15, MSE 319.56, p .152, but their
confidence judgments (52.45%) were significantly lower than their
performance (58.58%),F(1, 32) 6.03,MSE 211.85, p .020,
partial 2 .069. Thus, with hard texts, only low-ability partici-
pants showed overconfidence, and this was only with predictions
and not with posttest confidence judgments. Both medium- and
high-ability participants showed underconfidence in their posttest
confidence judgments, but their predictions were accurate.
Absolute Accuracy With Revised Texts
The bottom half of Table 2 shows the mean percent correct,
predictions, and confidence judgments for the revised texts in the
pure and mixed conditions. A 2 (text condition) 3 (verbal
ability) 3 (measure [percent correct, predictions, and posttest
confidence judgments]) ANOVA showed that the main effects of
verbal ability,F(2, 100) 10.99,MSE 138.25,p .001, partial
2 .158, and measure, F(2, 200) 10.50, MSE 113.44, p
.001, partial 2 .056, were significant. Planned comparisons
showed that percentage predictions were higher than percent cor-
rect (63.04% vs. 57.35%), F(1, 100) 12.01, MSE 326.95,
partial 2 .049,p .001, but that confidence judgments did not
differ from percent correct (57.96% vs. 57.35%; F 1). These
effects did not interact with pure versus mixed text sets or with
ability (Fs 1). Overall, students were overconfident before
taking the tests but were accurate after taking the tests on these
revised texts. This effect was not different for students of differing
verbal abilities.
Relative Accuracy: Performance and Judgment
Relationships
Table 3 shows the mean gammas for prediction and confidence
judgments in the pure hard, mixed, and pure revised conditions as
a function of verbal ability. Gamma is a nonparametric measure
that taps ordinal consistency between two measures (Goodman &
Kruskal, 1954). In this case, it measured the extent to which high
judgments were accompanied by high percentages correct for textswithin each individual. It ranges from 1.0 (showing a perfect
negative relationship) to 1.0 (showing a perfect positive rela-
tionship). We calculated prediction gammas individually for each
participant using their predicted percent correct and their actual
percent correct for each of the six texts. We could not compute
prediction gammas for 9 participants because they gave the same
predictions to all six texts. One of these participants (with high
ability) read the hard texts, 5 participants (2 with high ability, 2
with medium ability, and 1 with low ability) read the revised texts,
and 3 participants (1 with high ability and 2 with low ability) read
the mixed texts. We calculated confidence judgment gammas
using the posttest judgments of percent correct and actual percent
correct across the six texts for each participant. We could not
compute gammas for 7 participants because they gave the sameposttest judgments for all texts. Three of these (all low ability) read
the hard texts, and 4 participants read the mixed texts (1 high
ability, 1 medium ability, and 2 low ability).
4 We analyzed the data with performance, predictions, and posttest
confidence judgments as levels of an independent variable so that over-
confidence and underconfidence could be seen relative to performance. An
alternative analysis would be to use the signed difference between each
type of judgment and performance (i.e., bias). Analyzing bias shows
identical effects, but the derived negative and positive values do not show
the levels of judgments and performance as clearly as the analysis we used.
Table 3
Mean Prediction and Posttest Confidence Judgment Gamma Correlations (With Standard Errors
of the Mean in Parentheses) for the Hard, Mixed, and Revised Texts for the Verbal Ability
Groups
Ability groups
Hard texts Mixed texts Revised texts
Predictions Confidence Predictions Confidence Predictions Confidence
Low .34* (.14) .43* (.16) .23 (.18) .38* (.16) .20 (.15) .28* (.12)Medium .42* (.12) .62* (.07) .36* (.14) .44* (.14) .02 (.13) .27* (.12)High .35* (.11) .64* (.09) .33* (.14) .44* (.14) .48* (.16) .55* (.14)Mean .36* .57* .31* .42* .20* .36*
*p .05.
727METACOMPREHENSION ACCURACY
-
8/12/2019 Diferente Individuale in Metacomprehensiune
6/9
We used single-sample t tests to determine whether each of
these correlations was significantly different from zero. All of the
confidence judgment gammas were greater than zero, and all of the
prediction gammas in the hard text conditions were significantly
greater than zero. For mixed sets of hard and revised texts, pre-
dictions were significantly correlated with performance for
medium- and high-ability participants but not for low-ability par-ticipants. For the revised texts, only the high-ability participants
gave predictions that produced significant correlations with their
individual performance. The mean gammas across ability levels
are shown in the bottom row of Table 3. Each of these gammas
was significantly greater than zero. Each ability group produced
prediction gammas that were significantly greater than zero (.26
for low ability, .25 for medium ability, and .38 for high ability),
and they produced confidence judgment gammas that were signif-
icantly greater than zero (.36 for low ability, .42 for medium
ability, and .53 for high ability).
We analyzed these gammas in a 3 (pure hard, pure revised,
mixed text condition) 3 (low, medium, and high ability) 2
(type of judgment [predictions and confidence judgments]) mixed
ANOVA. The only significant effect was that confidence judgmentgammas were higher than prediction gammas, F(1, 133) 7.80,
MSE .209, p .006, partial 2 .023. The effect of text
difficulty was not significant (F 1), and the effect of ability was
not significant,F(2, 133) 2.78,MSE .373, p .106. None of
the interactions approached significance; the largest Fwas for the
Verbal Ability Text Condition interaction, F(4, 133) 1.76,
MSE .373,p .176. In contrast to absolute metacomprehension
accuracy, relative metacomprehension accuracy did not show sig-
nificantly different patterns of individual differences for the dif-
ferent types of texts.
We correlated our measures of absolute metacomprehension
accuracy and relative metacomprehension accuracy. Table 4 shows
the Pearson r correlations among these measures. Prediction bias(the signed difference between percentage predictions and percent-
age performance) correlated with posttest bias, and prediction
gammas correlated with posttest gammas. However, bias did not
correlate with gamma either for predictions or posttest judgments.
This suggests that relative and absolute metacognitive accuracy
involve different processes.
Discussion
We found an interaction between students ability levels and the
difference between their predictions and their performance with
the hard texts. Low-ability students were overconfident, especially
in their predictions, and high-ability students were underconfident,
especially in their posttest confidence judgments. Revised texts
produced better performance than hard texts. However, with re-
vised texts, overall predictions were higher than percent correct for
students of all ability levels. Thus, even though the revisions
improved test performance, revisions increased predictions of per-
formance to a greater extent so that students were overconfident.
This finding of more overconfidence with the easier, revised textscontrasts with that of Schraw and Roedel (1994), who found
increasing overconfidence as the difficulty of tasks increased. In
their study, however, they did not divide participants by ability
level. Our effect of increased overconfidence for revised texts
compared with hard texts was especially true for higher ability
participants.
We found fairly strong relationships between the absolute ac-
curacy of metacognitive estimates (bias) and students abilities
with the hard texts. On the other hand, we did not find a relation-
ship between relative metacomprehension accuracy and individual
differences (i.e., gammas did not differ significantly with verbal
ability). Furthermore, the accuracy of relative and absolute judg-
ments did not correlate, suggesting that these two types of meta-
cognition tap different processes. This conclusion is consistentwith that of Kelemen et al. (2000) who found no relationships
among different types of metacognitive accuracy.
One question about gamma as a measure of relative metacog-
nitive accuracy is whether it is reliable. Thompson and Mason
(1996) showed that gamma produced low split-half and low alter-
native forms reliability. Kelemen et al. (2000) found that relative
prediction accuracy for texts did not correlate in a testretest
reliability design. Both of these studies suggest that gamma is
unreliable. If gamma is an unreliable measure, then finding rela-
tionships between this measure and other measures is unlikely.
However, in the present study, gamma produced a pattern of
effects that is consistent with the idea that it is measuring relative
metacomprehension accuracy. We found that pretest and posttestgammas were correlated. In addition, posttest judgment gammas
were significantly higher than prediction gammas, as is usually
found when relative metacomprehension accuracy is tested (Maki,
1998). Overall, gammas for prediction accuracy and for posttest
accuracy were significantly greater than zero for each type of text
and for each ability group. Within text and ability groups, all of the
confidence judgment gammas were significantly greater than zero,
and most of the prediction gammas were greater than zero. If
gamma is an unreliable measure that results from chance variation,
consistent effects such as those described above should not be
found.
Assuming that gamma is measuring relative metacognitive ac-
curacy in a meaningful way, we can ask why relative and absolute
accuracy produced different patterns of results. We found a strongrelationship between bias in prediction judgments and bias in
posttest judgments. Others have also shown that absolute
confidence-judgment accuracy is consistent for individuals across
a number of content domains (Kelemen et al., 2000; Schraw, 1997;
Schraw et al., 1995; West & Stanovich, 1997). Schraw and Roedel
(1994) found increasing overconfidence with increased difficulty.
This suggests that individuals judge their performance levels as
fairly constant even when task difficulty varies. In addition, most
individuals think that they are at least average (Krueger & Mueller,
2002). To some extent, individuals must base their absolute judg-
ments on prior knowledge about the placement of their scores on
Table 4
Pearsons r Correlations Between Metamemory Measures in
Hard, Revised, and Mixed Text Sets
MeasurePrediction
gammaPosttest
biasPosttestgamma
Prediction bias .027 .832** .102Prediction gamma .008 .316**Confidence bias .153
**p .01.
728 MAKI, SHIELDS, WHEELER, AND ZACCHILLI
-
8/12/2019 Diferente Individuale in Metacomprehensiune
7/9
a specific type of task relative to average scores (or on a consistent
misperception about their scores relative to what they think is the
average). Ehrlinger and Dunning (2003) reported that individuals
chronic self views about their ability on various tasks influenced
their posttest confidence judgments independently of their actual
performance. As shown in our study and in other studies (e.g.,
Kruger & Dunning, 1999), individuals do not adjust their perfor-mance judgments upward or downward enough to account for
differences in their ability. In addition, they do not accurately
estimate changes in mean performance related to task difficulty.
This leads to errors in absolute judgments. With difficult tasks, the
poorer-performing individuals are overconfident and the higher-
performing individuals are underconfident.
The absolute level of judgments is irrelevant with relative mea-
sures of metacomprehension. Here, variance in judgments that
correspond to variance in performance is critical. Participants have
to use their experience with the specific task in order to produce
accurate relative judgments across parts of that task. The low
levels of relative accuracy that are generally obtained in studies of
metacomprehension (see Maki, 1998, for a review) may result
from an inability of individuals to emphasize learning from spe-cific texts more than the perceived general ability to learn from
text. Judgments based entirely on general abilities would not match
variance in performance across specific texts at all. However,
individuals must use learning from specific texts to some extent
because correlations between judgments and performance are not
zero. In the present study, correlations between judgments and
performance were about equal for students of differing abilities.
This suggests that participants who are high or low in verbal ability
use specific learning from individual texts to about the same
degree in making their judgments.
One curious effect for relative judgment accuracy was that
students were not able to use heterogeneity of texts to produce
more accurate relative predictions for mixed sets of hard andrevised texts, although the two types of text did produce significant
differences in performance. Rawson and Dunlosky (2002) also
found that mixing levels of coherence among short texts did not
improve relative metacomprehension. In addition, the revisions did
not improve students abilities to judge which texts would produce
higher and which would produce lower performance; that is,
relative metacomprehension accuracy was not improved with the
revisions. Prediction of relative performance on revised texts
seemed to be particularly difficult for low- and medium-ability
students. In fact, the gamma correlations for predictions were not
significantly different from zero for these groups, although the
overall analysis of variance did not reveal significant differences
among text conditions, and there was not a significant interaction
between ability and text condition.Hard texts produced overconfidence in predictions for students
with low verbal abilities. Because these students had lower than
average performance when they predicted average performance,
they would study less than is necessary to achieve an average
amount of learning. Such students are likely to be disappointed in
their test performance although they gave accurate judgments after
taking the tests. This accuracy, however, came too late (after the
test instead of before the test). Students with medium and high
verbal abilities predicted their future performance accurately.
Thus, they should be able to study as long as is needed to meet the
criterion they have set for themselves. After taking the tests,
however, they were underconfident in their performance. This
should have no adverse effects. These average and above average
students should be surprised when their actual performance is
better than expected.
Students of all ability levels were overconfident in their predic-
tions of test performance over the revised texts. This has interest-
ing implications for education. Revising the texts produced higherperformance, so the revisions were successful. However, students
with medium and high abilities were overconfident with the re-
vised texts although they were quite accurate with hard texts. The
revisions made the texts seem easier than they actually were, so
that the medium- and higher-ability students looked more like the
low-ability students in terms of overconfidence in predictions.
Revising texts did improve performance, but they also produced
overconfidence in the stronger students, and the revisions did not
improve relative metacomprehension accuracy. This raises an in-
teresting educational paradox: Revisions are clearly helpful for
learning, but they may be detrimental to metacomprehension.
References
Branson, M., Selub, M., & Solomon, L. (1987). How to prepare for the
GRE. San Diego, CA: Harcourt Brace.
Britton, B. K., & Gulgoz, S. (1991). Using Kintschs computational model
to improve instructional text: Effects of repairing inference calls on
recall and cognitive structures.Journal of Educational Psychology, 83I,
329345.
Dunning, D., Johnson, K., Ehrlinger, J., & Kruger, J. (2003). Why people
fail to recognize their own incompetence. Current Directions in Psy-
chological Science, 12, 8387.
Ehrlinger, J., & Dunning, D. (2003). How chronic self-views influence
(and potentially mislead) estimates of performance. Journal of Person-
ality and Social Psychology, 84, 517.
Folz, P. W., Kintsch, W., & Landauer, P. K. (1998). The measurement of
textual coherence with latent semantic analysis. Discourse Processes,
25, 285307.Glenberg, A. N., & Epstein, W. (1987). Inexpert calibration of compre-
hension. Memory & Cognition, 15, 8493.
Glover, J. (1989). Reading ability and the calibrator of comprehension.
Educational Research Quarterly, 13,711.
Goodman, L. A., & Kruskal, W. H. (1954). Measures of association for
cross classifications.Journal of the American Statistical Association, 49,
732764.
Grabe, M., Bordages, W., & Petros, T. (1990). The impact of computer-
supported study on student awareness of examination performance.
Journal of Computer-Based Instruction, 27, 113119.
Hacker, D. J., Bol, L., Horgan, D. D., & Rakow, E. A. (2000). Test
prediction and performance in a classroom context. Journal of Educa-
tional Psychology, 92, 160170.
Inquisit. (2002). (Version 1.32) [Computer software]. Seattle, WA: Milli-
second Software.
Jacobson, J. M. (1990). Congruence of pre-test predictions and posttest
estimations with grades on short answer and essay tests. Educational
Research Quarterly, 14,4147.
Kelemen, W. L., Frost, P. J., & Weaver, C. A., III. (2000). Individual
differences in metacognition: Evidence against a general metacognitive
ability. Memory & Cognition, 28, 92107.
Keppel, G., & Wickens, T. D. (2004).Design and analysis: A researchers
handbook (4th ed., pp. 427447). Upper Saddle River, NJ: Pearson
Prentice Hall.
Kintsch, W., & van Dijk, T. A. (1978). Toward a model of text compre-
hension and production. Psychological Review, 95, 363394.
Krueger, J., & Mueller, R. A. (2002). Unskilled, unaware, both? The
729METACOMPREHENSION ACCURACY
-
8/12/2019 Diferente Individuale in Metacomprehensiune
8/9
better-than-average heuristic and statistical regression predict errors of
estimates of own performance. Journal of Personality and Social Psy-
chology, 82, 180188.
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How
difficulties in recognizing ones own incompetence lead to inflated
self-assessments. Journal of Personality and Social Psychology, 77,
11211134.
Lin, L.-M., Moore, D., & Zabrucky, K. M. (2001). An assessment ofstudents calibration of comprehension and calibration of performance
using multiple measures. Reading Psychology, 22, 111128.
Maki, R. H. (1998). Metacomprehension of text: Influence of absolute
confidence level on bias and accuracy. In D. L. Medin (Ed.), The
psychology of learning and motivation: Vol. 38 (pp. 223248). San
Diego, CA: Academic Press.
Maki, R. H., Jonas, D., & Kallod, M. (1994). The relationship between
comprehension and metacomprehension ability. Psychonomic Bulletin
and Review, 1, 126138.
Maki, R. H., & McGuire, M. J. (2002). Metacognition for text: Implica-
tions for education. In T. J. Perfect & B. L. Schwartz (Eds.), Applied
metacognition(pp. 3967). Cambridge, United Kingdom: Cambridge
University Press.
Maki, R. H., & Serra, M. (1992). The basis of test predictions for test
material.Journal of Experimental Psychology: Learning, Memory, andCognition, 18, 116126.
Nelson, T. O. (1984). A comparison of current measures of the accuracy of
feeling-of-knowing predictions. Psychological Bulletin, 95, 109133.
Nelson, T. O., & Dunlosky, J. (1991). When peoples judgments of
learning (JOLs) are extremely accurate at predicting subsequent recall:
The delayed-JOL effect.Psychological Science, 2, 267270.
Rawson, K. A., & Dunlosky, J. (2002). Are performance predictions for
text based on ease of processing? Journal of Experimental Psychology:
Learning, Memory, and Cognition, 28, 6980.
Rawson, K. A., Dunlosky, J., & Thiede, K. W. (2000). The rereading
effect: Metacomprehension accuracy improves across reading trials.
Memory & Cognition, 28, 10041010.
Schraw, G. (1997). The effect of generalized metacognitive knowledge on
test performance and confidence judgments. Journal of Experimental
Education, 65, 135146.
Schraw, G., Dunkle, M. E., Bendixen, L., & Roedel, T. D. (1995). Does a
general monitoring skill exist? Journal of Educational Psychology, 87,
433444.
Schraw, G., & Roedel, T. D. (1994). Test difficulty and judgment bias.
Memory & Cognition, 22, 6369.
Thompson, W. B., & Mason, S. E. (1996). Instability of individual differ-
ences in the association between confidence judgments and memory
performance. Memory & Cognition, 24, 226234.
Weaver, C. A., III, Bryant, D. S., & Burns, K. D. (1995). Comprehension
monitoring: Extensions of the Kintsch and van Dijk model. In C. A.
Weaver III & S. Mannes (Eds.), Discourse comprehension: Essays inhonor of Walter Kintsch (pp. 177193). Hillsdale, NJ: Erlbaum.
West, R. F., & Stanovich, K. E. (1997). The domain specificity and
generality of overconfidence: Individual differences in performance
estimation bias. Psychonomic Bulletin & Review, 4, 387392.
730 MAKI, SHIELDS, WHEELER, AND ZACCHILLI
-
8/12/2019 Diferente Individuale in Metacomprehensiune
9/9
Appendix A
Practice Texts
Hard Text
GLOBAL TEMPERATURE AND FLOODING5
Scientific investigators of global climate change have warned that
there will occur substantial rises in worldwide sea levels if there is a rise
of several degrees in global temperature. The projected increase in
worldwide temperature is based on the observation that both individual
and corporate use of carbon dioxide-producing combustible fuels has
been on the rise since the middle of the last century. The carbon dioxide
is delivered into the earths atmosphere where it acts somewhat like the
glass in a greenhouse, retaining radiant energy. The carbon dioxide
absorbs infrared heat radiation from the earth instead of allowing it to
escape into space. Trapping the infrared heat radiation in the air leads
to rising temperature. Even a rise of a few degrees of global temperature
may cause melting of the polar icecaps and considerable increases in the
height of oceans.
Revised Text
GLOBAL TEMPERATURE AND FLOODINGScientists who study change in the worlds climate warn that sea levels will
increase if the temperature increases throughout the world. An increase of several
degrees in temperature would make the sea levels go up quite a lot. The scientists
expect worldwide temperature to increasebecause people and companies use fuels
that make carbon dioxide. The amount of carbon dioxide released by these fuels
has been increasing since the middle 1800s. When carbon dioxide is released into
the air, it acts like theglass in a greenhouse. The carbon dioxidetraps heat near the
surface of the earth. Carbon dioxide stops the heat from escaping into space.
Because the heat cant escape, the temperature of the earth is rising. If the worlds
temperaturegoesup only a fewdegrees, the polaricecapswill melt.This will cause
a large increase in the height of the oceans.
5 The hard practice text was a short version of a text used by Glenberg
and Epstein (1987).
Appendix B
Test Questions for Practice Texts
Questions for Hard Texts
(Recalling facts) GLOBAL TEMPERATURE AND FLOODING
The projected increase in worldwide temperature is based on what
observation?
* A) both individual and corporate use of carbon dioxide-producing
combustible fuels has been increasing.
B) trapping of infrared radiation in the air is decreasing.
C) heat radiation is more likely to be trapped in the earth as sea levels
rise.
D) carbon dioxide has been decreasing in the earths atmosphere.
E) more greenhouses have been built, increasing the amount of carbon
dioxide trapped in the atmosphere.
(Understanding the passage) GLOBAL TEMPERATURE AND
FLOODING
How would carbon dioxide cause a rise in global temperature?
* A) by absorbing and retaining infrared heat radiation coming from the
earth into the atmosphere.
B) by reflecting infrared heat energy back to the earth once it had come
into contact with the atmosphere.
C) the rise would come directly from heat being emitted from individualand corporate use of carbon dioxide-producing fuels.
D) by intensifying the heat potential from the suns rays when they
collide with carbon dioxide gases in the atmosphere.
E) by facilitating the movement of radiation into space.
Questions for Revised Text
(Recalling facts) GLOBAL TEMPERATURE AND FLOODING
[Revised]
The projected increase in worldwide temperature is based on what
observation?
* A) heat is more likely to be trapped by the sea as sea levels rise
B) the amount of heat trapped near the earth is decreasing
C) the amount of carbon dioxide in the earths atmosphere has been
decreasingD) individuals and companies have been using more fuels that produce
carbon dioxide
E) more greenhouses have been built, increasing the amount of carbon
dioxide trapped in the atmosphere
(Understanding the passage) GLOBAL TEMPERATURE AND
FLOODING [Revised]
How could carbon dioxide cause a rise in global temperature?
* A) by keeping heat close to the earths surface rather than letting it
escape into space
B) by reflecting heat energy back to the earth once it has escaped into
space
C) the temperature increase would come directly from heat being given
off from the use of carbon dioxide-producing fuels
D) by strengthening the heat from the suns rays when the rays collide
with carbon dioxide gases in the atmosphereE) by facilitating the movement of the heat into space
* denotes the correct response. The order of the alternatives was ran-
domized for each participant.
Received September 8, 2004
Revision received July 22, 2005
Accepted July 22, 2005
731METACOMPREHENSION ACCURACY