diferente individuale in metacomprehensiune

Upload: 119568

Post on 03-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Diferente Individuale in Metacomprehensiune

    1/9

    Individual Differences in Absolute and RelativeMetacomprehension Accuracy

    Ruth H. Maki, Micheal Shields, Amanda Easton Wheeler, and Tammy Lowery ZacchilliTexas Tech University

    The authors investigated absolute and relative metacomprehension accuracy as a function of verbal

    ability in college students. Students read hard texts, revised texts, or a mixed set of texts. They then

    predicted their performance, took a multiple-choice test on the texts, and made posttest judgments about

    their performance. With hard texts, students with lower verbal abilities were overconfident in predictions

    of future performance, and students with higher verbal abilities were underconfident in judging past

    performance. Revised texts produced overconfidence for predictions. Thus, absolute accuracy of pre-

    dictions and confidence judgments depended on students abilities and text difficulty. In contrast, relative

    metacomprehension accuracy as measured by gamma correlations did not depend on verbal ability or on

    text difficulty. Absolute metacomprehension accuracy was much more dependent on types of materials

    and verbal skills than was relative accuracy, suggesting that they may tap different aspects of

    metacomprehension.

    Keywords:reading comprehension, metacognition, judgment

    Peoples abilities to judge their own performance is one aspect

    of metacognition, specifically metacognitive monitoring. Re-

    searchers have measured the accuracy of metacognitive monitor-

    ing in two fundamentally different ways. In the cognitive psychol-

    ogy literature, the accuracy of metacognitive monitoring is usually

    measured in a relative way, by correlating confidence judgments

    and actual performance across a number of units, usually word

    pairs, test questions, or texts (Maki & Serra, 1992; Nelson &

    Dunlosky, 1991). Nelson (1984) recommended nonparametric

    gamma as the measure of correlation, and since then, gamma hasconsistently been used in studies of metamemory (judgments about

    future or past memory performance) and metacomprehension

    (judgments about future or past performance over text). Gamma

    shows whether units (word pairs, sentences, or texts) that receive

    higher judgments are also the units that produce higher perfor-

    mance and whether units that receive lower judgments produce

    lower performance. Such a correlational measure indicates

    whether students can discriminate materials within a set that lead

    to poorer performance from materials that lead to higher perfor-

    mance. However, gamma does not indicate whether participants

    judgments are higher than or lower than actual performance; that

    is, it does not show overconfidence or underconfidence.

    Several other researchers have examined absolute accuracy in

    judgments by investigating whether judgments match performance

    exactly. In the educational psychology literature, bias has often

    been used as a measure of metacognitive monitoring. Bias is the

    signed difference between confidence judgments and actual per-

    formance. A positive value indicates that judgments are higher

    than performance, showing overconfidence. A negative value in-

    dicates that judgments are lower than performance, showing un-

    derconfidence. Bias measures whether individuals can estimate

    their exact performance on materials. Individuals who perform

    better on the criterion test usually show less overconfidence in

    predictions than individuals who perform less well (Hacker, Bol,

    Horgan, & Rakow, 2000; Jacobson, 1990; Maki, 1998). In two

    other studies (Glover, 1989; Grabe, Bordages, & Petros, 1990),independent measures of ability rather than performance on the

    test itself were used to define individual difference groups, and the

    same pattern (i.e., less overconfidence in predictions for those with

    higher ability) was found. In the social psychology literature,

    Kruger and Dunning (1999; see also Dunning, Johnson, Ehrlinger,

    & Kruger, 2003) have shown that people who have less knowledge

    in a domain overestimate their performance to a large degree, and

    people who have greater knowledge slightly underestimate their

    performance. They have suggested that poor performers are un-

    skilled and unaware of it (Kruger & Dunning, 1999, p. 1121).

    Kruger and Dunning (1999) used posttest confidence judgments

    exclusively in showing their unskilled and unaware effect. There

    is only one study in the educational literature using posttest con-fidence judgments and individual differences. In that study,

    Hacker et al., (2000) found that higher performers showed less

    overconfidence than poorer performers.

    Individual differences have been more difficult to find with

    relative measures (i.e., within-subjects correlations) of metacogni-

    tion (Maki & McGuire, 2002). Maki, Jonas, and Kallod (1994)

    found no relationship between verbal ability and prediction accu-

    racy for texts, although they did find that better comprehenders

    produced more accurate posttest judgments. Lin, Moore, and

    Zabrucky (2001) found no relationship between verbal abilities

    and several measures of pretest or posttest judgment accuracy.

    Ruth H. Maki, Micheal Shields, Amanda Easton Wheeler, and Tammy

    Lowery Zacchilli, Department of Psychology, Texas Tech University.

    We thank Elizabeth Garza, David Cardenas, Michele Cristan, and Emily

    Phillips for testing participants in these studies.

    Correspondence concerning this article should be addressed to Ruth H.

    Maki, Department of Psychology, Texas Tech University, Lubbock, TX

    79409-2051. E-mail: [email protected]

    Journal of Educational Psychology Copyright 2005 by the American Psychological Association2005, Vol. 97, No. 4, 723731 0022-0663/05/$12.00 DOI: 10.1037/0022-0663.97.4.723

    723

  • 8/12/2019 Diferente Individuale in Metacomprehensiune

    2/9

    One reason that finding individual differences with relative

    measures has been difficult might be that some measures of

    relative metacognition accuracy may be unreliable. Thompson and

    Mason (1996) asked students to judge their confidence in recog-

    nizing studied photographs of faces and of words and also to judge

    their confidence in answers to truefalse questions. Using different

    sets of materials, students made such confidence judgments twicewith a 2-week interval. Thompson and Mason used gamma to

    measure relative metamemory accuracy at each interval. By cor-

    relating gammas at the two sessions, they measured the stability of

    gamma over time and on alternate forms. They also calculated

    separate gammas for the even and odd items to measure split-half

    reliability. The split-half reliability for performance averaged .47,

    and the split-half reliability for confidence ratings was .69. The

    alternate forms correlations for performance and for confidence

    ratings were similar to the split-half reliabilities. However, the

    correlations of the gammas were much lower. The mean split-half

    correlation for the gammas was .19, and the alternate forms reli-

    ability was zero. Thus, the memory performance and the confi-

    dence judgments showed some reliability, but relative

    metamemory accuracy did not.Kelemen, Frost, and Weaver (2000) investigated the reliability

    of relative metacognitive accuracy for a number of different mea-

    sures, including judgments of learning (predictions of future per-

    formance on recently studied word pairs) and predictions of future

    performance on recently read text. They studied stability over time

    by testing each participant twice at a 1-week interval on each task,

    and they examined stability over tasks. Consistent with Thompson

    and Masons (1996) results, they found that the memory perfor-

    mance and the judgments themselves showed some stability over

    time and, to a lesser extent, across tasks, but relative metacognitive

    accuracy (the gamma correlations) showed no stability. The unre-

    liability of gamma may explain why it has been so difficult to find

    stable individual differences in relative accuracy measures.A lack of stability seems to be the norm for relative accuracy

    judgments, but a measure of absolute accuracy, bias (the difference

    between judged and actual performance), shows much better sta-

    bility both across tasks and over time. Kelemen et al. (2000) found

    that students who were overconfident or underconfident at one

    time tended to be overconfident or underconfident at the second

    time; a similar pattern was seen across tasks. Other studies have

    also shown consistency across tasks for the amount of overconfi-

    dence or underconfidence. West and Stanovich (1997) used very

    different tasks: confidence judgments on answers to general

    knowledge questions and confidence in a motor task. They also

    found correlations in bias across these different tasks. Schraw and

    his colleagues (Schraw, 1997; Schraw, Dunkle, Bendixen, & Roe-

    del, 1995; Schraw & Roedel, 1994) have also found bias to besimilar for individuals across tasks that differ in difficulty and in

    content domain. The fact that measures of bias (overconfidence

    and underconfidence) show better reliability than gamma means

    that bias is more likely to show stable individual differences.

    Some of the studies cited above have used predictions of future

    performance, and others have used confidence in past perfor-

    mance. In educational applications, predictions rather than posttest

    confidence judgments are most important for controlling effective

    study behavior. Both the ability to predict overall performance and

    the ability to predict performance among different units (e.g., texts

    or topics) are important. A student who is overconfident overall

    may fail a test because he or she assumes that there is no need for

    further study (a failure of absolute metacomprehension). That

    same student may not know which specific topics need further

    study and which have been mastered (a failure of relative meta-

    comprehension). Thus, an inability to predict absolute test accu-

    racy and relative test accuracy may result in underpreparation for

    tests. There have been relatively few studies using absolute judg-ments to examine predictions of future test performance (cf.,

    Glover, 1989; Hacker et al., 2000). Predicting performance over

    text material is most relevant to educational settings, but there are

    no studies examining individual differences in both absolute pre-

    diction accuracy and relative prediction accuracy with text

    material.

    In the present study, we investigated absolute and relative meta-

    comprehension accuracy as a function of verbal ability and diffi-

    culty of texts. We varied the difficulty of texts because the degree

    of overconfidence among students with low skill (e.g., Grabe et al.,

    1990; Hacker et al., 2000; Kruger & Dunning, 1999) should be

    related to the difficulty of the test. Schraw and Roedel (1994)

    found that overconfidence in absolute judgments was much greater

    on more difficult tasks than on easier tasks. They found this resultfor texts of varying difficulty and also across tasks in different

    domains (e.g., reading comprehension, probability judgments, and

    spatial judgments). When different domains were used, Schraw

    and Roedel reported that the specific type of task was less impor-

    tant than overall task difficulty. However, they did not include

    differences in participant skill in their studies.

    We manipulated text difficulty by using hard texts and revised

    versions of those texts, using revisions that were similar to those

    studied by Burns (reported in Weaver, Bryant, & Burns, 1995).

    Burns revised texts in three different ways. In the principled

    revision, he followed the suggestions of Britton and Gulgoz

    (1991), which are guided by Kintsch and van Dijks (1978) model

    of text comprehension. Burns heuristic revision was done by anexpert in text processing to make the original text more readable.

    Finally, in the readability revision, he changed the surface struc-

    ture of the text to match the readability statistics of the heuristic

    version. Burns compared only the original texts, the readability

    revisions, and the heuristic revisions in his first experiment. Par-

    ticipants read texts, made predictions, and then answered questions

    from the texts. Relative metacomprehension accuracy (gamma

    correlations) was significantly higher for the heuristically revised

    texts. In Burns Experiment 2 in which four versions of the texts

    were compared (original, principled revision, heuristic revision,

    and readability revision), proportions correct and predictions dif-

    fered across texts, but relative metacomprehension accuracy did

    not differ. With these two conflicting results, it is unclear whether

    revising texts affects relative metacomprehension accuracy.We revised difficult texts, following both principled and read-

    ability guidelines in our revisions. Participants read and predicted

    their performance on six texts: all hard texts, all revised versions

    of these hard texts, or mixed texts in which half of the texts were

    hard and half were revised. We thought that this mixed condition

    might produce good relative accuracy if participants were sensitive

    to the differences in text difficulty. However, Rawson and Dun-

    losky (2002) mixed very short texts (two sentences each) that had

    either high, medium, or low coherence. They did not find higher

    relative accuracy for mixed sets than for pure sets. Thus, relative

    accuracy may be insensitive to variations in difficulty of materials,

    724 MAKI, SHIELDS, WHEELER, AND ZACCHILLI

  • 8/12/2019 Diferente Individuale in Metacomprehensiune

    3/9

    but Rawson and Dunloskys texts may have been too short for

    participants to notice the mix of text coherence. Measures of

    relative accuracy may be sensitive to variations in difficulty across

    longer texts if the variation in materials makes participants more

    aware of the potential variation in their performance. Increased

    variation in ones own performance may also increase absoluteaccuracy, especially among higher ability students.

    In addition to the three text conditions (hard, revised, and mixed

    texts), we divided students into low, medium, and high ability

    groups on the basis of standardized test scores to study the effects

    of individual differences in relative and absolute accuracy of

    metacomprehension. We expected to see different patterns of

    overconfidence and underconfidence (absolute accuracy) for the

    different types of texts in participants of varying verbal abilities.

    Whether we would see similar effects for relative accuracy of

    metacomprehension was the primary question of interest.

    Method

    Participants

    A total of 159 college student volunteers from general psychology

    courses at a large public university participated for partial course credit.

    There were 72 men and 87 women in the sample. The average age was 19.1

    (with a range of 17 to 41). The sample was 76.1% European American,

    13.2% Hispanic, and 11.7% other or unidentified. We randomly assigned

    52 participants to the revised text condition, 51 participants to the hard text

    condition, and 55 participants to the mixed text condition.

    We used university records to determine students gender, ethnicity, and

    age, as well as their scores on the verbal portion of the Scholastic Achieve-

    ment Test (SAT) or the English portion of the American College Test

    (ACT). If these scores were not available, we used students reading scores

    on a state academic skills test. All but 1 participant in the mixed text

    condition had scores for the SAT, the ACT, or the Texas Academic Skills

    Program (TASP). Overall, the mean score for the verbal SAT was 522, and

    the standard deviation was 72.98. The mean verbal ACT score was 21.3,

    and the standard deviation was 4.39. We used TASP scores for only a few

    students, so those means are not meaningful. Scores were converted to z

    scores, using national (or state) norms for each test. Students were divided

    into three ability groups on the basis of their verbal z scores. Of 157

    participants having verbal scores, 54 students were in the lower ability

    group with z scores of less than .126 (SAT M 438, SD 36.44; ACT

    M 17.5,SD .76), 57 students were placed in the medium ability group

    withz scores between .126 and .414 (SAT M 522, SD 13.51; ACT

    M 21.3,SD .82), and 46 students were in the higher ability group with

    z scores greater than .414 (SAT M 592,SD 38.22; ACT M 27.6,

    SD 2.51).

    Materials

    For the hard texts, we used the six texts and the short practice text used

    by Rawson, Dunlosky, and Thiede (2000) that were taken from a GRE

    preparation manual (Branson, Selub, & Solomon, 1987). Because we

    needed two short, hard practice texts, we shortened one of the texts used by

    Glenberg and Epstein (1987) for use as a practice text. This text hasproduced low performance in our laboratory. We revised the texts to

    increase readability by replacing low frequency words with high frequency

    words; breaking long, complex sentences into shorter, simpler sentences;

    eliminating embedded clauses; and changing passive sentences into active

    sentences. In addition, we used two of the three principled revision rules

    found by Britton and Gulgoz (1991) to be effective in improving text recall.

    We used the same term for the same concept throughout the text, and we

    replaced anaphoric references (e.g., it) with the specific concept. One of

    the original hard practice texts and its revised version is shown in Appen-

    dix A. The characteristics of the six hard texts and the six revised texts as

    analyzed by Microsoft Word can be seen in Table 1. Table 1 shows that in

    comparison with the original hard texts, the revised texts had fewer words,

    more sentences, fewer words per sentence, fewer passive sentences, higher

    Flesch Reading Ease scores, and lower FleschKincaid grade levels. We

    also analyzed the coherence among sentences using latent semantic anal-ysis (LSA).1 These are also shown in Table 1. Higher values indicate that

    the concepts presented in sequential sentences have more overlap (with 1.0

    as the highest possible score). The sentence-to-sentence coherence of the

    revised texts was somewhat higher than the sentence-to-sentence coher-

    ence of the hard texts. We also compared the semantic overlap of each

    complete hard text with each complete revised text using LSA. Concepts in

    the hard and revised texts were similar in that the observed LSA of .89 is

    high.

    For the mixed condition, half of the revised texts were grouped (Sets A

    and B) and half of the hard texts were grouped (Sets C and D). One quarter

    of the participants in the mixed group each read a combination of texts

    designated as Sets AC, AD, BC, or BD. One revised and one hard practice

    text was used for all participants in the mixed condition.

    We used six multiple-choice test questions with five alternatives for each

    text. For the hard texts, we used the questions that had been used by

    1 LSA compares the relations among terms in a sentence with those of a

    large text base. Similarities across pairs of sentences are then measured,

    and an average sentence-to-sentence coherence score is obtained. Perfect

    overlap across sentences produces LSA 1.0. Texts with higher coherence

    should be easier to comprehend (Folz, Kintsch, & Landauer, 1998). We

    selected the General Reading up to First-Year College as the database.

    Using more specialized databases for some texts (such as the psychology

    corpus for the intelligence testing text) tended to lower the coherence

    scores relative to using the larger database, but it did not change the pattern

    of coherence for the hard versus revised texts.

    Table 1

    Mean Characteristics of Hard and Revised Texts (With Ranges in Parentheses)

    Characteristics Hard texts Revised texts

    No. of words 478.00 (358.00601.00) 441.00 (347.00604.00)No. of sentences 20.00 (16.0023.00) 31.00 (24.0038.00)

    Words/sentence 24.05 (21.0628.38) 14.21 (10.8415.89)% passive sentences 22.50 (3.7050.00) 9.17 (018.00)Flesch Reading Ease 37.50 (19.1049.40) 50.92 (42.2059.50)Flesch-Kincaid grade level 11.71 (10.9012.00) 9.80 (7.6012.00)LSA sentence coherence 0.24 (0.180.30) 0.27 (0.230.33)

    Note. LSA latent semantic analysis.

    725METACOMPREHENSION ACCURACY

  • 8/12/2019 Diferente Individuale in Metacomprehensiune

    4/9

    Rawson et al. (2000). Half of the test questions tapped details, and half

    tapped more conceptual material. The questions were the same for the

    revised texts, except that we changed some of the words and phrases in the

    questions in the same way as we did for the revised texts. There were two

    practice questions for each of the four practice texts. Examples of the hardand revised test questions can be seen in Appendix B.

    Procedure

    Participants came to the laboratory for one session, which lasted about

    1 hr. They were seated in front of a computer in a small individual cubicle.

    We used Inquisit software (2002) to control the presentation of stimuli and

    trial events and to collect the data. First, participants read and responded to

    two practice texts. Then, they read the six texts, presented in a randomized

    order for each participant. Participants read one sentence at a time and

    pressed the space bar for the next sentence in the text to appear on the

    screen. After reading each text, students were asked to indicate the per-

    centage of the text they thought they were able to comprehend, using the

    following scale: 0%, 20%, 40%, 60%, 80%, and 100%. Participants then

    judged how many of six test questions they thought they would get correct(predictions) for a given text, and they typed a number from 0 to 6. After

    reading all six texts, students answered the six multiple-choice questions

    for each text; texts were tested in a random order. After answering the six

    questions for a text, students gave a posttest confidence judgment about

    their test performance by typing in the number of test questions (06) they

    thought they got correct for the text. This was repeated for questions from

    each of the six texts. Students were then debriefed.

    Results

    Hard Versus Revised Texts

    We analyzed the percent correct for the pure revised and the

    pure hard text conditions to ensure that our revisions were suc-

    cessful. The mean percent correct for the revised texts was56.14%, and the mean percent correct for the hard texts was

    48.91%. A 2 (text condition) 3 (verbal ability level) analysis of

    variance (ANOVA) showed that revised texts produced signifi-

    cantly higher performance than hard texts, F(1, 97) 10.79,

    MSE 175.69, p .001, partial 2 .085.2 In addition, we

    analyzed students ratings of the percentage of the texts they

    thought they were able to comprehend. The mean percentage for

    revised texts was 65.33%, and the mean percentage for the hard

    texts was 48.52%. Better perceived comprehension of revised texts

    was significant in a 2 (text condition) 3 (verbal ability)

    ANOVA,F(1, 97) 15.12,MSE 464.53,p .001, partial 2

    .120. Our revisions were successful both as measured objectively

    in terms of percent correct and as measured subjectively in terms

    of percentage comprehended.

    We investigated the reliability of our measures by using Cron-

    bachs alpha for scores on the six texts in each condition. We

    calculated Cronbachs alphas for percent correct, predictions, and

    confidence judgments. For hard texts, the alphas were .712, .919,

    and .858 for percent correct, predictions, and confidence judg-

    ments, respectively. For the revised texts, the corresponding alphas

    were .705, .931, and .893. Predictions and confidence judgments

    showed good reliability across texts; test performance was more

    variable.

    Absolute Accuracy With Hard Texts

    We analyzed absolute and relative metacomprehension accuracy

    for the hard texts separately from the revised texts. The analysis

    with hard texts included data for texts in the pure hard text

    condition and for the hard half of the texts in the mixed hard and

    revised text condition.3 We converted number of test questions

    correct for each text into percent correct. We also converted

    predictions and confidence judgments, which had been given in

    terms of numbers correct, into percent correct. The top half of

    Table 2 shows the mean actual percent correct, predictions, and

    confidence judgments for hard texts in the pure and mixed text

    conditions. A 2 (text condition) 3 (ability) 3 (measure

    2 Partial 2 is an estimation of proportion variance accounted for by an

    effect, using a correction for error in the effect and using the corrected

    effect plus error variance in the denominator (Keppel & Wickens, 2004, p.

    233). The formula is: Partial 2dfeffect(Feffect 1)

    dfeffect(Feffect 1)N . It estimates ef-

    fect size in the population. For repeated measures designs, the N is

    multiplied by the number of levels of the repeated factor. This produces a

    conservative estimate of the population effect size because the formula

    includes error variance from all sources in the denominator (Keppel &

    Wickens, 2004, pp. 427447).3 The analyses of revised and hard texts were done separately so that the

    mixed conditions could be compared with the pure conditions. Scores on

    the hard half of the texts were used in the analysis of hard texts, and scores

    on the revised texts were used in the revised ANOVA. Participants in the

    mixed text group were, thus, included in both analyses, but the scores were

    from a different set of texts in each analysis.

    Table 2

    Mean Percent Performance, Predictions, and Posttest Confidence Judgments (With Standard Errors of the Mean in Parentheses) for

    the Pure and Mixed Sets of Hard Texts and for Pure and Mixed Sets of Revised Texts as a Function of Verbal Ability

    Texts

    Pure text sets Mixed text sets

    Predictions Confidence Correct Predictions Confidence Correct

    Hard textsLow ability 53.86 (3.92) 43.98 (3.46) 38.27 (3.92) 54.17 (4.16) 51.74 (3.67) 45.14 (4.16)Medium ability 45.66 (4.16) 46.18 (3.67) 49.13 (4.16) 47.09 (3.63) 45.24 (3.20) 52.38 (3.63)High ability 52.29 (4.03) 50.00 (3.56) 59.97 (4.04) 55.88 (4.03) 54.90 (3.56) 57.19 (4.04)

    Revised textsLow ability 52.08 (3.51) 46.67 (2.96) 48.75 (3.32) 63.89 (3.92) 57.99 (3.31) 50.35 (3.71)Medium ability 61.11 (3.51) 59.03 (2.96) 57.78 (3.32) 62.70 (3.42) 54.23 (2.89) 55.82 (3.24)High ability 74.54 (4.53) 69.44 (3.82) 65.74 (4.28) 66.99 (3.81) 60.46 (3.21) 65.87 (3.60)

    726 MAKI, SHIELDS, WHEELER, AND ZACCHILLI

  • 8/12/2019 Diferente Individuale in Metacomprehensiune

    5/9

    [percent correct, percentage prediction, vs. percentage posttest

    confidence])4 mixed ANOVA showed a significant main effect of

    verbal ability, F(2, 99) 3.60, MSE 169.87, p .031, partial

    2 .047. Measure interacted with ability, F(4, 198) 6.53,

    MSE 129.72, p .001, partial 2 .066, but there were no

    effects of pure versus mixed texts, largest F(1, 198) 1.14,

    MSE 129.72, p .338.

    To understand the interactions, we conducted separate analyses

    for low, medium, and high ability participants. Using planned

    comparisons, we asked whether predictions and confidence judg-

    ments differed from performance to see overconfidence and un-

    derconfidence. Participants having low verbal skills gave predic-

    tions (54.02%) that were higher than their performance (41.71%);

    they were overconfident, F(1, 32) 7.95, MSE 645.56, p

    .008, partial 2 .093. However, their posttest confidence judg-

    ments (47.86%) did not differ significantly from their percent

    correct (41.71%), F(1, 32) 2.80, MSE 458.31, p .104. The

    predictions of medium-ability participants were about the same as

    their percent correct (46.38% vs. 50.76%), F(1, 35) 3.06,MSE 228.13, p .089, but their confidence judgments showed

    underconfidence in that they were significantly lower (45.71%)

    than their percent correct (50.76%), F(1, 35) 4.43, MSE

    209.01, p .043, partial 2 .044. Predictions of high-ability

    participants did not differ significantly from performance (54.08%

    vs. 58.58%), F(1, 32) 2.15, MSE 319.56, p .152, but their

    confidence judgments (52.45%) were significantly lower than their

    performance (58.58%),F(1, 32) 6.03,MSE 211.85, p .020,

    partial 2 .069. Thus, with hard texts, only low-ability partici-

    pants showed overconfidence, and this was only with predictions

    and not with posttest confidence judgments. Both medium- and

    high-ability participants showed underconfidence in their posttest

    confidence judgments, but their predictions were accurate.

    Absolute Accuracy With Revised Texts

    The bottom half of Table 2 shows the mean percent correct,

    predictions, and confidence judgments for the revised texts in the

    pure and mixed conditions. A 2 (text condition) 3 (verbal

    ability) 3 (measure [percent correct, predictions, and posttest

    confidence judgments]) ANOVA showed that the main effects of

    verbal ability,F(2, 100) 10.99,MSE 138.25,p .001, partial

    2 .158, and measure, F(2, 200) 10.50, MSE 113.44, p

    .001, partial 2 .056, were significant. Planned comparisons

    showed that percentage predictions were higher than percent cor-

    rect (63.04% vs. 57.35%), F(1, 100) 12.01, MSE 326.95,

    partial 2 .049,p .001, but that confidence judgments did not

    differ from percent correct (57.96% vs. 57.35%; F 1). These

    effects did not interact with pure versus mixed text sets or with

    ability (Fs 1). Overall, students were overconfident before

    taking the tests but were accurate after taking the tests on these

    revised texts. This effect was not different for students of differing

    verbal abilities.

    Relative Accuracy: Performance and Judgment

    Relationships

    Table 3 shows the mean gammas for prediction and confidence

    judgments in the pure hard, mixed, and pure revised conditions as

    a function of verbal ability. Gamma is a nonparametric measure

    that taps ordinal consistency between two measures (Goodman &

    Kruskal, 1954). In this case, it measured the extent to which high

    judgments were accompanied by high percentages correct for textswithin each individual. It ranges from 1.0 (showing a perfect

    negative relationship) to 1.0 (showing a perfect positive rela-

    tionship). We calculated prediction gammas individually for each

    participant using their predicted percent correct and their actual

    percent correct for each of the six texts. We could not compute

    prediction gammas for 9 participants because they gave the same

    predictions to all six texts. One of these participants (with high

    ability) read the hard texts, 5 participants (2 with high ability, 2

    with medium ability, and 1 with low ability) read the revised texts,

    and 3 participants (1 with high ability and 2 with low ability) read

    the mixed texts. We calculated confidence judgment gammas

    using the posttest judgments of percent correct and actual percent

    correct across the six texts for each participant. We could not

    compute gammas for 7 participants because they gave the sameposttest judgments for all texts. Three of these (all low ability) read

    the hard texts, and 4 participants read the mixed texts (1 high

    ability, 1 medium ability, and 2 low ability).

    4 We analyzed the data with performance, predictions, and posttest

    confidence judgments as levels of an independent variable so that over-

    confidence and underconfidence could be seen relative to performance. An

    alternative analysis would be to use the signed difference between each

    type of judgment and performance (i.e., bias). Analyzing bias shows

    identical effects, but the derived negative and positive values do not show

    the levels of judgments and performance as clearly as the analysis we used.

    Table 3

    Mean Prediction and Posttest Confidence Judgment Gamma Correlations (With Standard Errors

    of the Mean in Parentheses) for the Hard, Mixed, and Revised Texts for the Verbal Ability

    Groups

    Ability groups

    Hard texts Mixed texts Revised texts

    Predictions Confidence Predictions Confidence Predictions Confidence

    Low .34* (.14) .43* (.16) .23 (.18) .38* (.16) .20 (.15) .28* (.12)Medium .42* (.12) .62* (.07) .36* (.14) .44* (.14) .02 (.13) .27* (.12)High .35* (.11) .64* (.09) .33* (.14) .44* (.14) .48* (.16) .55* (.14)Mean .36* .57* .31* .42* .20* .36*

    *p .05.

    727METACOMPREHENSION ACCURACY

  • 8/12/2019 Diferente Individuale in Metacomprehensiune

    6/9

    We used single-sample t tests to determine whether each of

    these correlations was significantly different from zero. All of the

    confidence judgment gammas were greater than zero, and all of the

    prediction gammas in the hard text conditions were significantly

    greater than zero. For mixed sets of hard and revised texts, pre-

    dictions were significantly correlated with performance for

    medium- and high-ability participants but not for low-ability par-ticipants. For the revised texts, only the high-ability participants

    gave predictions that produced significant correlations with their

    individual performance. The mean gammas across ability levels

    are shown in the bottom row of Table 3. Each of these gammas

    was significantly greater than zero. Each ability group produced

    prediction gammas that were significantly greater than zero (.26

    for low ability, .25 for medium ability, and .38 for high ability),

    and they produced confidence judgment gammas that were signif-

    icantly greater than zero (.36 for low ability, .42 for medium

    ability, and .53 for high ability).

    We analyzed these gammas in a 3 (pure hard, pure revised,

    mixed text condition) 3 (low, medium, and high ability) 2

    (type of judgment [predictions and confidence judgments]) mixed

    ANOVA. The only significant effect was that confidence judgmentgammas were higher than prediction gammas, F(1, 133) 7.80,

    MSE .209, p .006, partial 2 .023. The effect of text

    difficulty was not significant (F 1), and the effect of ability was

    not significant,F(2, 133) 2.78,MSE .373, p .106. None of

    the interactions approached significance; the largest Fwas for the

    Verbal Ability Text Condition interaction, F(4, 133) 1.76,

    MSE .373,p .176. In contrast to absolute metacomprehension

    accuracy, relative metacomprehension accuracy did not show sig-

    nificantly different patterns of individual differences for the dif-

    ferent types of texts.

    We correlated our measures of absolute metacomprehension

    accuracy and relative metacomprehension accuracy. Table 4 shows

    the Pearson r correlations among these measures. Prediction bias(the signed difference between percentage predictions and percent-

    age performance) correlated with posttest bias, and prediction

    gammas correlated with posttest gammas. However, bias did not

    correlate with gamma either for predictions or posttest judgments.

    This suggests that relative and absolute metacognitive accuracy

    involve different processes.

    Discussion

    We found an interaction between students ability levels and the

    difference between their predictions and their performance with

    the hard texts. Low-ability students were overconfident, especially

    in their predictions, and high-ability students were underconfident,

    especially in their posttest confidence judgments. Revised texts

    produced better performance than hard texts. However, with re-

    vised texts, overall predictions were higher than percent correct for

    students of all ability levels. Thus, even though the revisions

    improved test performance, revisions increased predictions of per-

    formance to a greater extent so that students were overconfident.

    This finding of more overconfidence with the easier, revised textscontrasts with that of Schraw and Roedel (1994), who found

    increasing overconfidence as the difficulty of tasks increased. In

    their study, however, they did not divide participants by ability

    level. Our effect of increased overconfidence for revised texts

    compared with hard texts was especially true for higher ability

    participants.

    We found fairly strong relationships between the absolute ac-

    curacy of metacognitive estimates (bias) and students abilities

    with the hard texts. On the other hand, we did not find a relation-

    ship between relative metacomprehension accuracy and individual

    differences (i.e., gammas did not differ significantly with verbal

    ability). Furthermore, the accuracy of relative and absolute judg-

    ments did not correlate, suggesting that these two types of meta-

    cognition tap different processes. This conclusion is consistentwith that of Kelemen et al. (2000) who found no relationships

    among different types of metacognitive accuracy.

    One question about gamma as a measure of relative metacog-

    nitive accuracy is whether it is reliable. Thompson and Mason

    (1996) showed that gamma produced low split-half and low alter-

    native forms reliability. Kelemen et al. (2000) found that relative

    prediction accuracy for texts did not correlate in a testretest

    reliability design. Both of these studies suggest that gamma is

    unreliable. If gamma is an unreliable measure, then finding rela-

    tionships between this measure and other measures is unlikely.

    However, in the present study, gamma produced a pattern of

    effects that is consistent with the idea that it is measuring relative

    metacomprehension accuracy. We found that pretest and posttestgammas were correlated. In addition, posttest judgment gammas

    were significantly higher than prediction gammas, as is usually

    found when relative metacomprehension accuracy is tested (Maki,

    1998). Overall, gammas for prediction accuracy and for posttest

    accuracy were significantly greater than zero for each type of text

    and for each ability group. Within text and ability groups, all of the

    confidence judgment gammas were significantly greater than zero,

    and most of the prediction gammas were greater than zero. If

    gamma is an unreliable measure that results from chance variation,

    consistent effects such as those described above should not be

    found.

    Assuming that gamma is measuring relative metacognitive ac-

    curacy in a meaningful way, we can ask why relative and absolute

    accuracy produced different patterns of results. We found a strongrelationship between bias in prediction judgments and bias in

    posttest judgments. Others have also shown that absolute

    confidence-judgment accuracy is consistent for individuals across

    a number of content domains (Kelemen et al., 2000; Schraw, 1997;

    Schraw et al., 1995; West & Stanovich, 1997). Schraw and Roedel

    (1994) found increasing overconfidence with increased difficulty.

    This suggests that individuals judge their performance levels as

    fairly constant even when task difficulty varies. In addition, most

    individuals think that they are at least average (Krueger & Mueller,

    2002). To some extent, individuals must base their absolute judg-

    ments on prior knowledge about the placement of their scores on

    Table 4

    Pearsons r Correlations Between Metamemory Measures in

    Hard, Revised, and Mixed Text Sets

    MeasurePrediction

    gammaPosttest

    biasPosttestgamma

    Prediction bias .027 .832** .102Prediction gamma .008 .316**Confidence bias .153

    **p .01.

    728 MAKI, SHIELDS, WHEELER, AND ZACCHILLI

  • 8/12/2019 Diferente Individuale in Metacomprehensiune

    7/9

    a specific type of task relative to average scores (or on a consistent

    misperception about their scores relative to what they think is the

    average). Ehrlinger and Dunning (2003) reported that individuals

    chronic self views about their ability on various tasks influenced

    their posttest confidence judgments independently of their actual

    performance. As shown in our study and in other studies (e.g.,

    Kruger & Dunning, 1999), individuals do not adjust their perfor-mance judgments upward or downward enough to account for

    differences in their ability. In addition, they do not accurately

    estimate changes in mean performance related to task difficulty.

    This leads to errors in absolute judgments. With difficult tasks, the

    poorer-performing individuals are overconfident and the higher-

    performing individuals are underconfident.

    The absolute level of judgments is irrelevant with relative mea-

    sures of metacomprehension. Here, variance in judgments that

    correspond to variance in performance is critical. Participants have

    to use their experience with the specific task in order to produce

    accurate relative judgments across parts of that task. The low

    levels of relative accuracy that are generally obtained in studies of

    metacomprehension (see Maki, 1998, for a review) may result

    from an inability of individuals to emphasize learning from spe-cific texts more than the perceived general ability to learn from

    text. Judgments based entirely on general abilities would not match

    variance in performance across specific texts at all. However,

    individuals must use learning from specific texts to some extent

    because correlations between judgments and performance are not

    zero. In the present study, correlations between judgments and

    performance were about equal for students of differing abilities.

    This suggests that participants who are high or low in verbal ability

    use specific learning from individual texts to about the same

    degree in making their judgments.

    One curious effect for relative judgment accuracy was that

    students were not able to use heterogeneity of texts to produce

    more accurate relative predictions for mixed sets of hard andrevised texts, although the two types of text did produce significant

    differences in performance. Rawson and Dunlosky (2002) also

    found that mixing levels of coherence among short texts did not

    improve relative metacomprehension. In addition, the revisions did

    not improve students abilities to judge which texts would produce

    higher and which would produce lower performance; that is,

    relative metacomprehension accuracy was not improved with the

    revisions. Prediction of relative performance on revised texts

    seemed to be particularly difficult for low- and medium-ability

    students. In fact, the gamma correlations for predictions were not

    significantly different from zero for these groups, although the

    overall analysis of variance did not reveal significant differences

    among text conditions, and there was not a significant interaction

    between ability and text condition.Hard texts produced overconfidence in predictions for students

    with low verbal abilities. Because these students had lower than

    average performance when they predicted average performance,

    they would study less than is necessary to achieve an average

    amount of learning. Such students are likely to be disappointed in

    their test performance although they gave accurate judgments after

    taking the tests. This accuracy, however, came too late (after the

    test instead of before the test). Students with medium and high

    verbal abilities predicted their future performance accurately.

    Thus, they should be able to study as long as is needed to meet the

    criterion they have set for themselves. After taking the tests,

    however, they were underconfident in their performance. This

    should have no adverse effects. These average and above average

    students should be surprised when their actual performance is

    better than expected.

    Students of all ability levels were overconfident in their predic-

    tions of test performance over the revised texts. This has interest-

    ing implications for education. Revising the texts produced higherperformance, so the revisions were successful. However, students

    with medium and high abilities were overconfident with the re-

    vised texts although they were quite accurate with hard texts. The

    revisions made the texts seem easier than they actually were, so

    that the medium- and higher-ability students looked more like the

    low-ability students in terms of overconfidence in predictions.

    Revising texts did improve performance, but they also produced

    overconfidence in the stronger students, and the revisions did not

    improve relative metacomprehension accuracy. This raises an in-

    teresting educational paradox: Revisions are clearly helpful for

    learning, but they may be detrimental to metacomprehension.

    References

    Branson, M., Selub, M., & Solomon, L. (1987). How to prepare for the

    GRE. San Diego, CA: Harcourt Brace.

    Britton, B. K., & Gulgoz, S. (1991). Using Kintschs computational model

    to improve instructional text: Effects of repairing inference calls on

    recall and cognitive structures.Journal of Educational Psychology, 83I,

    329345.

    Dunning, D., Johnson, K., Ehrlinger, J., & Kruger, J. (2003). Why people

    fail to recognize their own incompetence. Current Directions in Psy-

    chological Science, 12, 8387.

    Ehrlinger, J., & Dunning, D. (2003). How chronic self-views influence

    (and potentially mislead) estimates of performance. Journal of Person-

    ality and Social Psychology, 84, 517.

    Folz, P. W., Kintsch, W., & Landauer, P. K. (1998). The measurement of

    textual coherence with latent semantic analysis. Discourse Processes,

    25, 285307.Glenberg, A. N., & Epstein, W. (1987). Inexpert calibration of compre-

    hension. Memory & Cognition, 15, 8493.

    Glover, J. (1989). Reading ability and the calibrator of comprehension.

    Educational Research Quarterly, 13,711.

    Goodman, L. A., & Kruskal, W. H. (1954). Measures of association for

    cross classifications.Journal of the American Statistical Association, 49,

    732764.

    Grabe, M., Bordages, W., & Petros, T. (1990). The impact of computer-

    supported study on student awareness of examination performance.

    Journal of Computer-Based Instruction, 27, 113119.

    Hacker, D. J., Bol, L., Horgan, D. D., & Rakow, E. A. (2000). Test

    prediction and performance in a classroom context. Journal of Educa-

    tional Psychology, 92, 160170.

    Inquisit. (2002). (Version 1.32) [Computer software]. Seattle, WA: Milli-

    second Software.

    Jacobson, J. M. (1990). Congruence of pre-test predictions and posttest

    estimations with grades on short answer and essay tests. Educational

    Research Quarterly, 14,4147.

    Kelemen, W. L., Frost, P. J., & Weaver, C. A., III. (2000). Individual

    differences in metacognition: Evidence against a general metacognitive

    ability. Memory & Cognition, 28, 92107.

    Keppel, G., & Wickens, T. D. (2004).Design and analysis: A researchers

    handbook (4th ed., pp. 427447). Upper Saddle River, NJ: Pearson

    Prentice Hall.

    Kintsch, W., & van Dijk, T. A. (1978). Toward a model of text compre-

    hension and production. Psychological Review, 95, 363394.

    Krueger, J., & Mueller, R. A. (2002). Unskilled, unaware, both? The

    729METACOMPREHENSION ACCURACY

  • 8/12/2019 Diferente Individuale in Metacomprehensiune

    8/9

    better-than-average heuristic and statistical regression predict errors of

    estimates of own performance. Journal of Personality and Social Psy-

    chology, 82, 180188.

    Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How

    difficulties in recognizing ones own incompetence lead to inflated

    self-assessments. Journal of Personality and Social Psychology, 77,

    11211134.

    Lin, L.-M., Moore, D., & Zabrucky, K. M. (2001). An assessment ofstudents calibration of comprehension and calibration of performance

    using multiple measures. Reading Psychology, 22, 111128.

    Maki, R. H. (1998). Metacomprehension of text: Influence of absolute

    confidence level on bias and accuracy. In D. L. Medin (Ed.), The

    psychology of learning and motivation: Vol. 38 (pp. 223248). San

    Diego, CA: Academic Press.

    Maki, R. H., Jonas, D., & Kallod, M. (1994). The relationship between

    comprehension and metacomprehension ability. Psychonomic Bulletin

    and Review, 1, 126138.

    Maki, R. H., & McGuire, M. J. (2002). Metacognition for text: Implica-

    tions for education. In T. J. Perfect & B. L. Schwartz (Eds.), Applied

    metacognition(pp. 3967). Cambridge, United Kingdom: Cambridge

    University Press.

    Maki, R. H., & Serra, M. (1992). The basis of test predictions for test

    material.Journal of Experimental Psychology: Learning, Memory, andCognition, 18, 116126.

    Nelson, T. O. (1984). A comparison of current measures of the accuracy of

    feeling-of-knowing predictions. Psychological Bulletin, 95, 109133.

    Nelson, T. O., & Dunlosky, J. (1991). When peoples judgments of

    learning (JOLs) are extremely accurate at predicting subsequent recall:

    The delayed-JOL effect.Psychological Science, 2, 267270.

    Rawson, K. A., & Dunlosky, J. (2002). Are performance predictions for

    text based on ease of processing? Journal of Experimental Psychology:

    Learning, Memory, and Cognition, 28, 6980.

    Rawson, K. A., Dunlosky, J., & Thiede, K. W. (2000). The rereading

    effect: Metacomprehension accuracy improves across reading trials.

    Memory & Cognition, 28, 10041010.

    Schraw, G. (1997). The effect of generalized metacognitive knowledge on

    test performance and confidence judgments. Journal of Experimental

    Education, 65, 135146.

    Schraw, G., Dunkle, M. E., Bendixen, L., & Roedel, T. D. (1995). Does a

    general monitoring skill exist? Journal of Educational Psychology, 87,

    433444.

    Schraw, G., & Roedel, T. D. (1994). Test difficulty and judgment bias.

    Memory & Cognition, 22, 6369.

    Thompson, W. B., & Mason, S. E. (1996). Instability of individual differ-

    ences in the association between confidence judgments and memory

    performance. Memory & Cognition, 24, 226234.

    Weaver, C. A., III, Bryant, D. S., & Burns, K. D. (1995). Comprehension

    monitoring: Extensions of the Kintsch and van Dijk model. In C. A.

    Weaver III & S. Mannes (Eds.), Discourse comprehension: Essays inhonor of Walter Kintsch (pp. 177193). Hillsdale, NJ: Erlbaum.

    West, R. F., & Stanovich, K. E. (1997). The domain specificity and

    generality of overconfidence: Individual differences in performance

    estimation bias. Psychonomic Bulletin & Review, 4, 387392.

    730 MAKI, SHIELDS, WHEELER, AND ZACCHILLI

  • 8/12/2019 Diferente Individuale in Metacomprehensiune

    9/9

    Appendix A

    Practice Texts

    Hard Text

    GLOBAL TEMPERATURE AND FLOODING5

    Scientific investigators of global climate change have warned that

    there will occur substantial rises in worldwide sea levels if there is a rise

    of several degrees in global temperature. The projected increase in

    worldwide temperature is based on the observation that both individual

    and corporate use of carbon dioxide-producing combustible fuels has

    been on the rise since the middle of the last century. The carbon dioxide

    is delivered into the earths atmosphere where it acts somewhat like the

    glass in a greenhouse, retaining radiant energy. The carbon dioxide

    absorbs infrared heat radiation from the earth instead of allowing it to

    escape into space. Trapping the infrared heat radiation in the air leads

    to rising temperature. Even a rise of a few degrees of global temperature

    may cause melting of the polar icecaps and considerable increases in the

    height of oceans.

    Revised Text

    GLOBAL TEMPERATURE AND FLOODINGScientists who study change in the worlds climate warn that sea levels will

    increase if the temperature increases throughout the world. An increase of several

    degrees in temperature would make the sea levels go up quite a lot. The scientists

    expect worldwide temperature to increasebecause people and companies use fuels

    that make carbon dioxide. The amount of carbon dioxide released by these fuels

    has been increasing since the middle 1800s. When carbon dioxide is released into

    the air, it acts like theglass in a greenhouse. The carbon dioxidetraps heat near the

    surface of the earth. Carbon dioxide stops the heat from escaping into space.

    Because the heat cant escape, the temperature of the earth is rising. If the worlds

    temperaturegoesup only a fewdegrees, the polaricecapswill melt.This will cause

    a large increase in the height of the oceans.

    5 The hard practice text was a short version of a text used by Glenberg

    and Epstein (1987).

    Appendix B

    Test Questions for Practice Texts

    Questions for Hard Texts

    (Recalling facts) GLOBAL TEMPERATURE AND FLOODING

    The projected increase in worldwide temperature is based on what

    observation?

    * A) both individual and corporate use of carbon dioxide-producing

    combustible fuels has been increasing.

    B) trapping of infrared radiation in the air is decreasing.

    C) heat radiation is more likely to be trapped in the earth as sea levels

    rise.

    D) carbon dioxide has been decreasing in the earths atmosphere.

    E) more greenhouses have been built, increasing the amount of carbon

    dioxide trapped in the atmosphere.

    (Understanding the passage) GLOBAL TEMPERATURE AND

    FLOODING

    How would carbon dioxide cause a rise in global temperature?

    * A) by absorbing and retaining infrared heat radiation coming from the

    earth into the atmosphere.

    B) by reflecting infrared heat energy back to the earth once it had come

    into contact with the atmosphere.

    C) the rise would come directly from heat being emitted from individualand corporate use of carbon dioxide-producing fuels.

    D) by intensifying the heat potential from the suns rays when they

    collide with carbon dioxide gases in the atmosphere.

    E) by facilitating the movement of radiation into space.

    Questions for Revised Text

    (Recalling facts) GLOBAL TEMPERATURE AND FLOODING

    [Revised]

    The projected increase in worldwide temperature is based on what

    observation?

    * A) heat is more likely to be trapped by the sea as sea levels rise

    B) the amount of heat trapped near the earth is decreasing

    C) the amount of carbon dioxide in the earths atmosphere has been

    decreasingD) individuals and companies have been using more fuels that produce

    carbon dioxide

    E) more greenhouses have been built, increasing the amount of carbon

    dioxide trapped in the atmosphere

    (Understanding the passage) GLOBAL TEMPERATURE AND

    FLOODING [Revised]

    How could carbon dioxide cause a rise in global temperature?

    * A) by keeping heat close to the earths surface rather than letting it

    escape into space

    B) by reflecting heat energy back to the earth once it has escaped into

    space

    C) the temperature increase would come directly from heat being given

    off from the use of carbon dioxide-producing fuels

    D) by strengthening the heat from the suns rays when the rays collide

    with carbon dioxide gases in the atmosphereE) by facilitating the movement of the heat into space

    * denotes the correct response. The order of the alternatives was ran-

    domized for each participant.

    Received September 8, 2004

    Revision received July 22, 2005

    Accepted July 22, 2005

    731METACOMPREHENSION ACCURACY