(766992060) benefits of testing memory; best practices and boundary conditions
TRANSCRIPT
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
1/41
1 Benefits of testing
memory
Best practices and boundaryconditions
Henry L. Roediger, III, Pooja K. Agarwal, Sean H. K. K ang and Elizabeth J. Marsh
The idea of a memory test or of a test of academic achievement is often
circumscribed. Tests within the classroom are recognized as important for the
assignment of grades, and tests given for academic assessment or achieve-
ment have increasingly come to determine the course of childr en’s lives: scor e
well on such tests and you advance, are placed in more challenging classes,
and attend better schools. Against this widely acnowledged bacdrop of
the importance of testing in educational life !not "ust in the #$, but all over the world%, it would be difficult to "ustify the claim that testing is not used
enough in educational practice. &n fact, such a claim may seem to be ludicr ouson the face of it. 'owever, this is "ust the claim we will mae in this cha pter :(ducation in schools would greatly benefit from additional testing, andthe need for increased testing probably increases with advancement in the
educational system. &n addition, students should use self-testing as a study
strategy in preparing for their classes. )ow, having begun with an inflammatory claim * we need more testing
in education * let us e+plain what we mean and bac up our claims. irst, we
are not recommending increased use of standardized tests in education,
which is usually what people thin of when they hear the words testing in
education. /ather, we have in mind the types of assessments !tests,
essays, e+ercises% given in the classroom or assigned for homewor. The
reason we
advocate testing is that it re0uires students to retrieve information eff ortfullyfrom memory, and such eff ortful retrieval turns out to be a wonderfully powerful mnemonic device in many cir cumstances.
Tests have both indirect and direct eff ects on learning !/oediger 12arpice, 3445b%. The indirect eff ect is that, if tests are given morefr e0uently,students study more. 6onsider a college class in which there is only a midter mand a final e+am compared to a similar class in which weely 0uizzes ar egiven every riday, in addition to the midterm and the final. A large r esearch
program is not re0uired to determine that students study more in the class
with weely 0uizzes than in the class without them. 7et tests also have a
dir ecteff ect on learning8 many studies have shown that students’ retrieval
of
information on tests greatly improves their later retention of the tested
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
2/41
material, either compared to a no-intervention control or even compared to a
control condition in which students study the material for an e0uivalent
amount of time to that given to students taing the test. That is, taing a
test on material often yields greater gains than restudying material, as wedocument below. These findings have important educational
implications,
ones that teachers and professors have not e+ploited.&n this chapter, we first report selectively on findings from our lab on the
critical importance of testing !or retrieval% for future remembering. / etrieval
is a powerful mnemonic enhancer. 'owever, testing does not lead to
improvements under all possible conditions, so the remainder of our cha pter will discuss 0ualifications and boundary conditions of test-enhanced
learning,as we call our program !9caniel, /oediger, 1 9cermott, 344;b%.
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
3/41
story. The sub"ects were told that they should remember the pictures, because
they would be tested on the names of the pictures !which were given in the
story%. The test was free recall, meaning that students were given a blan
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
4/41
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
5/41
aff ected performance a wee later8 three prior tests raised recall over C4I
relative to the no-test condition !i.e., !G3 J ;%?; K 44%. @ooed at another way, immediately after study about G3 items could be recalled. &f sub"ects
too three tests "ust after recall, they could still recall G3 items a wee later .The act of taing three tests essentially stopped the forgetting process in its
tracs, so testing may be a mechanism to permit memories to consolidate or
reconsolidate !udai, 3445%.6ritics, however, could pounce on a potential flaw in the
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
6/41
ig!re ".# /esults from /oediger and 2arpice !3445a, (+periment %. >n theH-minute delayed test, students who had repeatedly studied the materialrecalled it better than those who had studied it once and taen a test.6ramming !repeatedly reading% does wor, at least at very short r etention
intervals. 'owever, on the two delayed tests, the pattern reversed8studying and taing an initial test led to better performance on the delayedtest than did studying the material twice.
modulate the magnitude of the testing eff ect, beginning with the f or matof tests.
THE FORMAT OF TESTS
The power of testing to increase learning and retention has been demon-
strated in numerous studies using a diverse range of materials8 but both
study and test materials come in a multitude of formats. Although the use of
true?false and multiple-choice e+ams is now commonplace in high school
and college classrooms, there was a time !in the 34s and G4s% when these
inds of e+ams were a novelty and referred to as new-type, in contrast
to the more traditional essay e+ams !/uch, 3%. Liven the variety of
testformats, one 0uestion that arises is whether all formats are e0ually
efficaciousin improving retention. &f we want to provide evidence-based r ecommenda-
tions for educators to utilize testing as a learning tool, it is important toascertain if particular types of tests are more eff ective than others.&n a study designed to e+amine precisely this issue, 2ang, 9cermott, and
/oediger !344;% manipulated the formats of both the initial and final
tests * multiple-choice !96% or short answer !$A% * using a fully-cr ossed,
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
7/41
within-sub"ects design. $tudents read four short "ournal articles, and immedi-
ately afterwards they were given an 96 0uiz, an $A 0uiz, a list of statementsto read, or a filler tas. eedbac was given on 0uiz answers, and the 0uizzes
and the list of statements all targeted the same critical facts. or instance,
after reading an article on literacy ac0uisition, students in the $A condition
generated an answer to
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
8/41
were significantly worse in the filler-tas condition than the other thr eeconditions, indicating that both testing !with feedbac% and focused r e-e+posure aid retention of the target information. &mportantly, only theinitial
$A condition produced significantly better final performance than the r ead-
statements condition8 the initial 96 and read-statements conditions didnot diff er significantly. /etrieval is a potent memory modifier !B"or, ;H%.These results implicate the processes involved in actively producing inf or ma-tion from memory as the causal mechanism underlying the testing eff ect./egardless of the format of the final test, the initial test format that r e0uir edmore eff ortful retrieval !i.e., short answer% yielded the best final perf or mance,and this condition was significantly better than having read the test answersin isolation.
$imilar results from other studies provide converging evidence that eff ort-ful retrieval is crucial for the testing eff ect !6arpenter 1 e@osh, 34458Llover, C%. Butler and /oediger !344;%, for e+ample, used art historyvideo
lectures to simulate classroom learning. After the lectures, students com-
pleted short answer or multiple-choice tests, or they read statements as in2ang et al. !344;%. >n a final $A test given G4 days later, Butler and
/ oediger
found the same pattern of results: !% retention of target facts was best
when students were given an initial $A 0uiz, and !3% taing an initial 96test produced final performance e0uivalent to reading the test answers !with-out taing a test%. As discussed in a later section of this chapter, these findingshave been replicated in an actual college course !9caniel, Anderson,
erbish, 1 9orrisette, 344;b%.
Although most evidence suggests that tests that re0uire eff ortful r etrievalyield the most memorial benefits, it should be noted that this depends upon
successful retrieval on the initial test !or the delivery of feedbac%. 2ang et al.!344;% had another e+periment identical to the one described earlier e+cept
that no feedbac was provided on the initial tests.
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
9/41
subse0uent tests. &n a similar vein, it has been shown that items that elicit
errors on a cued recall test have almost no chance of being recalled corr ectly
at a later time unless feedbac is given !Dashler, 6epeda,
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
10/41
%able "." 9ean proportion recalled in Agarwal et al.’s !344C% (+periment 3 on testformats. Droportion correct was greater for test conditions than study conditions8however, sub"ects predicted learning would be greater for study conditions relative to
test conditions. @earning conditions that contained both testing and feedbac, namelythe closed-boo test with feedbac and the open-boo test conditions, contributed to best performance on the delayed test.
Pro&ortion 'orr e't
(ondition Initial test )ne*wee+ delayed test
$tudy K .N4$tudy 3K .H4$tudy GK .HN
6losed-boo test .5; .HH6losed-boo test with feedbac .5H .55>pen-boo test .C .55$imultaneous answering .CG .H )on-studied contr ol .5
open-boo test. Derhaps a diff erence between these two conditions willemerge with a longer delay for the final test, an outcome that has occurr edin other e+periments !e.g., /oediger 1 2arpice, 3445a%. $uch a findingwould probably further depend on how students approach the open-boo
tests !e.g., whether they attempt retrieval of an answer before consulting the
study material for feedbac, or whether they immediately search the study
material in order to identify the target information%. uture research in our
lab will tacle this topic.&n summary, one reason why testing benefits memory is that it pr omotes
active retrieval of information. )ot all formats of tests are e0ual. Testformats that re0uire more eff ortful retrieval !e.g., short answer% tend to produce a greater boost to learning and retention, compared to test f or matsthat engage less eff ortful retrieval !e.g., multiple-choice%. 'owever, tests thatare more eff ortful or challenging also increase the lielihood of r etrievalfailure, which has been shown to reduce the beneficial eff ect of testing.Therefore, to ameliorate low performance on the initial test, corr ectivefeedbac should be provided. The practical implications of these findingsfor improving learning in the classroom are straightforward: instead of
giving students summary notes to read, teachers should implement mor e
fre0uent testing !of important facts and concepts% * using test formats thatentail eff ortful retrieval * and provide feedbac to correct errors.
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
11/41
TESTING AND FEEDBAC
Broadly defined, feedbac is information provided following a
r esponse or recollection, which informs the learner about the status of current per- formance, often leading to improvement in future
performance !/ oediger , Paromb, 1 Butler, 344C%. A great deal of
laboratory and applied research has
e+amined the conditions under which feedbac is, and is not, eff ective in
improving learning and performance.
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
12/41
1 9arsh, 344%. $imilarly, Butler, 2arpice, and /oediger !344C% suggested
that feedbac reinforces the association between a cue and its target r esponse,
increasing the lielihood that an initially low-confidence correct r esponsewill be produced on a final criterial test./egarding the best time to deliver feedbac, there e+ists a great deal of
debate and confusion, as immediate and delayed feedbac are oper ational-ized diff erently across studies. or instance, the term immediate feedbac has been used to imply feedbac given "ust after each test item or feedbac
provided immediately after a test. >n the other hand, delayed feedbac can
tae place anywhere from C seconds after an item to 3 days after the test
!2uli 1 2uli, CC%, although in many educational settings the feedbac
may actually occur a wee or more later .Butler et al. !344;% investigated the eff ects of type and timing of feedbac
on long-term retention. $ub"ects read prose passages and completed an initial
multiple-choice test. or some responses, they received standardfeedbac !i.e., the correct answer%8 for others they received feedbac by
answering until correct !labeled A#6: i.e., each response was labeled as
correct or incorr ect, and if incorrect they chose additional options until
they answered the 0ues- tion correctly%. 'alf of the sub"ects received the
feedbac immediately after each 0uestion, while the rest received the
feedbac after a -day delay.
>ne wee later, sub"ects completed a final cued recall test, and these data ar eshown in igure .N. >n the final test, delayed feedbac led to substantially better performance than immediate feedbac, while the standard feedbac
condition and the answer-until-correct feedbac condition resulted in similar performance. Butler et al. discussed why some studies find immediate feed- bac to be more beneficial than delayed feedbac !e.g., see 2uli 1 2 uli,CC, for a review%. )amely, this inconsistency might occur if learners do not
fully process delayed feedbac, which would be particularly liely in a pplied
studies where less e+perimental control is present. That is, even if students
receive delayed feedbac they may not loo at it or loo only at feedbac on
0uestions that they missed.
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
13/41
ig!re ". /esults from Butler, 2arpice, and /oediger !344;%. >n the final test,delayed feedbac !B% led to substantially better performance thanimmediate feedbac, while the standard feedbac condition and the
answer-until-correct !A#6% feedbac condition resulted in similar per-formance.
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
14/41
concentrate on others that have not yet been learned. The assumption is that
learning to a criterion of one correct recitation means that the item is learned
and that further practice on it will be for naught.
2arpice and /oediger !344C% published a study that 0uestions this com-mon wisdom. &n their study, students learned foreign language vocabulary in
the form of $wahili*(nglish words pairs. !$wahili was used because students
were unfamiliar with the language, and yet the word forms were easily pr o-
nounceable for (nglish speaers, such as -ash!aboat %. $tudents learned
N4 pairs under one of four conditions. &n one condition, students studied
and were tested on the N4 pairs in the usual multitrial learning situation
favored by psychologists !study*test, study*test, study*test, study*test,
la beled
the $T condition%. &n a second condition, students received a similar firststudy*test cycle, but if they correctly recalled pairs on the test, these pairs
were dropped from the ne+t study trial. Thus, across the four trials, the
study list got smaller and smaller as students recalled more items. 'owever,in this condition, labeled $ )T, students were tested on all N4 pairs during
each test period. Thus, relative to the $T condition, the $ )T condition
involved fewer study opportunities but the same number of tests. &n a third
condition, labeled $T ), students studied and were tested the same way as
in the other conditions on the first trial, but after the first trial they repeatedly studied the
pairs three more times, but once they had recalled a pair, it was dr oppedfrom the test. &n this condition, the study se0uence stayed the same on f our
occasions, but the number of items tested became smaller and smaller .inally, in a fourth condition denoted $ )T ), after the first study*test trial,items that were recalled were dropped both from the study and test phase of
the e+periment for the additional trials. &n this case, then, the study list and
the test se0uence became shorter over trials. This last condition is most li estandard advice for using flashcards * students studied and were tested on the
pairs until they were recalled, and then they were dropped so that attention
could be devoted to unlearned pairs.&nitial learning on the N4 pairs in the four conditions is shown in igur e
.H, where it can be seen that all four conditions produced e0uivalent learn-
ing. The data in igure .H show cumulative performance, such that studentswere given credit the first time they recalled a pair and not again !for those
conditions in which multiple recalls of a pair were re0uired, $T and $ )T%. Atthe end of the learning phase, students were told that they would come bac a
wee later to be tested again and were ased to predict how many pairs they
would recall. $tudents in all four groups estimated that they would r ecall
about 34 pairs, or H4I of the pairs, a wee later. After all, the learningcurveswere e0ual, so why would we e+pect students’ "udgments to diff er=
igure .5 shows the proportion of items recalled wee later, in each of
the four conditions. $tudents in two conditions did very well !$T and $ )T,
around C4I% and in the other two conditions students did much more poor ly
!$T ) and $ )T ), around GHI%.
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
15/41
ig!re "./ &nitial cumulative learning performance from 2arpice and / oediger !344C%. All four conditions produced e0uivalent learning.
&n both the $T and $ )T conditions, students were tested on all N4 pairs for all
four trials. )ote that in the $T condition students studied all N4 items f our
times, whereas in the $ )T condition, the items were dropped from study.
'owever, this reduced amount of study did not matter a bit for retention a
wee later. $tudents in the $T )
and $ )
T )
conditions had only enough testingfor each item to be recalled once and, without repeated retrieval, final r ecall
was relatively poor. >nce again, the condition with many morestudy
opportunities !$T )% did not lead to any appreciably better recall a wee
later than the condition that had minimal study opportunities !$ )T )%.
The bottom line from the 2arpice and /oediger !344C% e+periment is
that after students have retrieved a pair correctly once, repeated
r etrieval is the ey to improved long-term retention. /epeated studying after
this point does not much matter .
/ecall that the students in the four conditions predicted that they would doe0ually well, and recall about H4I after a wee. As can be seen in igure .5,
the students who were repeatedly tested actually outperformed their pr edic-
tions, so they underestimated the power of testing. >n the other hand, the
students who did not have repeated testing overestimated how well they
would do. &n a later section, we return to the issue of what students now
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
16/41
ig!re ".0 inal learning results from 2arpice and /oediger !344C%. $tudents in the$T and $
)T conditions performed very well, and students in the $T
) and$ )T ) conditions did much more poorly. &n both the $T and $ )T condi-tions, students were tested on all N4 pairs for all four trials. $tudents in the$T
)and $ )T ) conditions had only enough testing for each item to be
recalled once and, without repeated retrieval, recall was relatively poor.The condition with many more study opportunities !$T )% did not lead toany appreciably better recall a wee later than the condition that hadminimal study opportunities !$
)T )
%.
about the eff ects of testing and whether they use testing as a study strategywhen left to their own devices.
/epeated testing seems to be great for consolidating information into
long- term memory, but is there an optimal schedule for repeated testing=
@andauer and B"or !;C% argued that a condition they called e+panding
retrieval was optimal, or at least was better than two other schedules
called massed practice and e0ual interval practice. To e+plain, let us stic
with our f or eign- language vocabulary learning e+ample above, -ash!aboat ,
and consider patterns in which three retrievals of the target might be
carried out. &n the immediate test condition, after the item has been presented, -ash!a would be presented three times in a row for boat to
be recalled each time. This condition might provide good practice because,
of course, performance on each test would be nearly perfect. !&n fact, in
most e+periments, this massed testing condition leads to CI or higher
correct recall with pair ed-associates.%
i t i
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
17/41
The massed retrieval condition will be denoted a 4-4-4 to indicate that the
three retrievals occurred bac to bac, with no other study or test items
between retrievals of the tar get.
A second condition is the e0ual interval schedule, in which tests are givenafter a delay from study and at e0ual intervals after that. $o, in a H-H-Hschedule, a pair lie -ash!aboat would be presented, then five other pairs or tests would occur, and then -ash!a 11 would be given as a cue. Thissame
process would occur two more times. Although distributed retrieval could be beneficial relative to massed retrieval, "ust as distributed study practice is beneficial relative to massed practice, one looming problem occurs in the caseof retrieval * if the first test is delayed, recall on the first test may be low and,as discussed above, low performance on a test can reduce or eliminate the power of the testing eff ect. To overcome this problem, @andauer and B"or !;C% introduced the idea of e+panding retrieval practice, to insure a near lyerrorless early retrieval with a 0uic first test while at the same time gaining
advantages of distributed testing or practice. $o, to continue with our e+ample, in an e+panding schedule of -N-4, students would be tested with
-ash!a * == after only intervening item, then again after N intervening items,
and then after 4 intervening items. The idea behind the e+panding schedule
is familiar to psychologists because it resembles the idea of shaping behavior
by successive appro+imations !$inner, HG, 6hapter 5%8 "ust as schedules of
reinforcement !erster 1 $inner, H;% e+ist to shape behavioral
r esponses, so schedules of retrieval may shape the ability to remember. &f
a student wants to be able to retrieve a vocabulary word long after study, the
e+panding retrieval schedule may help to shape itsr etrieval.@andauer and B"or !;C% conducted two e+periments pitting massed,
e0ual interval, and e+panding interval schedules of retrieval against one
another. or the latter conditions, they used H-H-H spacing and -N-4
spacing. )ote that this comparison e0uates the average interval of spacing
at
H. The materials in one e+periment were fictitious first and last names, suchthat students were re0uired to produce a last name when given the first name.@andauer and B"or measured performance on the three initial tests and thenon a final test given at the end of the e+perimental session. The results for thee+panding and e0ual interval retrieval se0uences are shown in Table .3
f or
%able ".# 9ean proportion recalled in @andauer and B"or ’s !;C% e+periment onschedules of testing8 data are estimated from igures . and .3. (+panding r etrievalschedules were better than e0ual interval schedules on both the initial three tests andthe final criterial test.
Initial tests inal test
" # $
(+panding .5 .HH .H4 .N;(0ual .N3 .N3 .NG .N4
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
18/41
the three initial tests and then the final, criterial, test. (+panding r etrieval
schedules were better than e0ual interval schedules, as @andauer and B"or predicted, on both the initial three tests and then the final criterial test. The
;I advantage of e+panding interval retrieval to e0ually spaced retrieval onthe final test was small but statistically significant, and this is the comparisonthat the authors emphasized in the paper. They replicated the eff ect in aseparate e+periment with face*name pairs. 'owever, note a curious fact
about the data in Table .3: >ver the four tests shown, performance
dr ops steadily in the e+panding interval condition !from .5 to .N;% whereas
in the
e0ual interval condition performance is essentially flat !.N3 to .N4%. This pat-tern suggests that on a more delayed final test, the curves might cross over and e0ual interval retrieval might prove superior to e+panding r etrieval.
$trangely, for some years researchers did not investigate @andauer and
B"or ’s !;C% intriguing findings, perhaps because they made such good
sense. 9ost of the studies on retrieval schedules compared e+panding and
massed retrieval, but did not include the critical e0ual interval condition
needed to compare e+panding retrieval to another distributed schedule
!e.g., /ea 1 9odigliani, CH%. All studies maing the massed versus e+pand-ing retrieval comparison showed e+panding retrieval to be more eff ective,
and
Balota, uche, and @ogan !344;% have provided an e+cellent review of this
literature. They show conclusively that massed testing is a poor strategy
r ela- tive to distributed testing, despite the fact that massed testing produces
very high performance on the initial tests !much higher than e0ual interval
testing%. Although this might seem commonplace to cognitive psychologistssteeped
in the literature of massed versus spaced presentation !the spacing eff ect%,from a diff erent perspective the outcome is surprising. $inner !HC% pr o-moted the notion of errorless retrieval as being the ey to learning, and he
implemented this approach into his teaching machines and pr ogrammed
learning. 'owever, current research shows that distributed retrieval is
much
more eff ective in promoting later performance than is massed retrieval,
even
though massed retrieval produces errorless perf or mance.>n the other hand, when comparisons are made between e+panding and
e0ual interval schedules, the data are much less conclusive. The other main
point established in the Balota et al. !344;% review is that no consistent evi-
dence e+ists for the advantage of e+panding retrieval schedules over e0ual
interval testing se0uences. A few studies after @andauer and B"or ’s !;C%seminal study obtained the eff ect, but the great ma"ority did not. ,or
e+ample, 6ull !3444% reported four e+periments in which students learneddifficult word pairs. Across e+periments, he manipulated variables such asintertest interval, feedbac or no feedbac after the tests, and testing versus
restudying the material. The general conclusion drawn from the four
e+peri-ments was that distributed retrieval produced much better retention on a final
test than did massed retrieval, but that it did not matter whether the schedule
had uniform or e+panded spacing of tests.
/ecent research by 2arpice and /oediger !344;% and @ogan and Balota
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
19/41
2ene3ts o4 testing -e-oryG4 Roediger et al.
!344C% actually shows a more interesting pattern. >n tests that occur a day
or more after original learning, e0ual interval schedules of initial testing
actually produce greater long-term retention than do e+panding schedules
!"ust the opposite of @andauer and B"or ’s findings%. /ecall the data inTable .3 and how the e+panding retrieval testing condition showed asteady
decline with repeated tests whereas the e0ual interval schedule showed essen-tially no decline. Because the final test in these studies occurred during thesame session as initial learning, the retention interval for the final test wasfairly short, leaving open the possibility that on a long-delayed test the func-
tions would actually cross. This is "ust what both 2arpice and /oediger
and @ogan and Balota f ound.
2arpice and /oediger !344;% had students learn word pairs taen from
practice tests for the Lraduate /ecord (+am !e.g., sobri6!et
ni'+na-e ,benisonblessing % and tests consisted of giving the first member of the pair and asing for the second. Their initial testing conditions were massed !4-4-
4%, e+panding !-H-% and e0ual interval !H-H-H%. &n addition, they included
two conditions in which students received only a single test after either
intervening pair or H. The design and initial test results are shown on the left
side of Table .G. &nitial test performance was best in the massed condition,
ne+t in the e+panding condition, and worst in the e0ually spaced condition,
the usual pattern, and students in the single test condition recalled less
after a delay of H intervening items than after . There are no surprises in theinitial recall data. 'alf the students too a final test 4 minutes after theinitial learning phase, whereas the rest received the final test 3 days later .These results are shown in the right side of Table .G. irst consider dataat
the 4-minute delay. The top three rows show a very nice replication of the
pattern reported by @andauer and B"or !;C%: (+panding retrieval in theinitial phase produced better recall on the final test than the e0ual intervalschedule, and both of these schedules were better than the massed r etrieval
%able ".$ 9ean proportion recalled in 2arpice and / oediger ’s !344;% e+periment
on schedules of testing. (+panding retrieval in the initial phase produced better r ecallthan the e0ual interval schedule on the 4-minute delayed test, and both of theseschedules were better than the massed retrieval schedule. 'owever, after a 3-daydelay, recall was best in the e0ual interval condition relative to the e+pandingcondition, although both conditions still produced better performance than in themassed condition.
Initial tests inal tests
" # $ "5 -in 7 h
9assed !4-4-4% .C .C .C .N; .34(+panding !-H-% .;C .;5 .;; .; .GG(0ual !H-H-H% .;G .;G .;G .53 .NH$ingle-immediate !% .C .5H .33
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
20/41
2ene3ts o4 testing -e-oryG Roediger et al.
$ingle-delayed !H% .;G .H; .G4
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
21/41
2ene3ts o4 testing -e-oryG3 Roediger et al.
schedule. Also, the single-immediate test produced better delayed recall than
the single-delayed test. 'owever, the startling result in this e+periment
appeared for those sub"ects who too the test after a 3-day delay. /ecall
was now best in the e0ual interval condition ! M M .NH% relative to thee+panding condition ! M M .GG%, although both conditions still produced
better perf or m- ance than in the massed condition ! M M .34%. &nterestingly,
performance also reversed across the delay for the two single test conditions:
recall was better in the single-immediate condition after 4 minutes, but was
reliably better in the single-delayed condition after 3 days.
2arpice and /oediger !344;% argued that, congenial as the idea is,
e+pand- ing retrieval is not conducive to good long-term retention.
&nstead, what
seems to be important for long-term retention is the difficulty of the firstretrieval attempt.
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
22/41
2ene3ts o4 testing -e-oryG3 Roediger et al.
half the statements to be right and half to be wrong, and normally false itemsare plausible in order to re0uire rather fine discriminations. $imilarly, f or multiple-choice tests, students receive a 0uestion stem and then four possible
completions, one of which is correct and three others that are err oneous
!but again, statements that might be close to correct%. Because erroneous
information is presented on the tests, students might learn that incorr ect
information, especially if no feedbac is given !as is often the case in collegecourses%. &f the test is especially difficult !meaning a large number of wr ong
answers are selected%, the students may actually leave a test more confused
about the material than when they waled in. 'owever, even if conditions
ar e such that students rarely commit errors, it might be that simply reading
and carefully considering false statements on true?false tests and
distractors on multiple-choice tests can lead later to erroneous nowledge.
$everal studies have shown that having people simply read statements!whether true or false% increases later "udgments that the statements are
true !Bacon, ;8 Begg,
Armour, 1 2err, CH8 'asher, Loldstein, 1 Toppino, ;;%. This
eff ect
underlies the tactics of propagandists using the big lie techni0ue byr epeating
a statement over and over until the populace believes it, and is also a
favor ed tactic in most #$ presidential elections. &f you repeat an untruth
about an opponent repeatedly, the statement comes to be believed./emmers and /emmers !35% first discussed the idea that incorr ect
information on tests might mislead students, when the new techni0ues of
true?false and multiple-choice testing were introduced into education !/ uch,
3%. They called this outcome the negative suggestibility eff ect, although
not much research was done on it for many years. 9uch later Toppino andhis
colleagues showed that statements presented as distractors on true?false and
multiple-choice tests did indeed accrue truth value from their mere pr esenta-
tion, because these statements were "udged as more true when mi+ed with
novel statements in appropriately designed e+periments !Toppino 1Brochin,
C8 Toppino 1 @uipersbec, G%. &n a similar vein, Brown !CC% and
Oacoby and 'ollingshead !4% showed that e+posing students to misspelled
words increased misspelling of those words on a later oral test.
/oediger and 9arsh !344H% ased whether giving a multiple-choice test
!without feedbac% would lead to a ind of misinformation eff ect !@oftus
et al., ;C%. That is, if students tae a multiple-choice test on a subset of
facts, and then tae a short answer test on all facts, will prior testing incr easeintrusions of multiple-choice lures on the final test= /oediger and 9arsh
conducted a series of e+periments to address this 0uestion, manipulating thedifficulty of the material !and hence level of performance on the multiple-choice test% and the number of distractors given on the multiple-choice test.
Three e+periments were submitted using these tactics, but the editor ased usto drop our first two e+periments !which established the phenomenon% and
report only a third, control, e+periment that showed that the negative sugges-tibility eff ect occurred under tightly controlled but not necessarily r ealisticconditions.
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
23/41
GG Roediger et al. 2ene3ts o4 testing -e-ory $ $
most interesting e+periments !in our opinion% were not reported. ur first e+periment was e+ploratory, "ust to mae sure we could get theeff ects we sought. f
interest was whether selecting distractors on the multiple-choice test would
lead to their intrusion on a later test.
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
24/41
2ene3ts o4 testing -e-oryGN Roediger et al.
multiple-choice performance, and this allowed us to see if any negative
eff ects
of testing were limited to conditions in which sub"ects made more errors onthe multiple-choice test !i.e., difficult items and relatively many distractors%.The interesting data are contained in Tables .H and .5 !again, the top
panels devoted to (+periment %. Table .H shows the proportion of short
answer 0uestions answered correctly !bowling in response to
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
25/41
GH Roediger et al. 2ene3ts o4 testing -e-ory $ /
%able ".0 Droportion target incorrect answers on the cued recall test as a function of 0uestion difficulty and number of alternatives !including the correct answer% on the prior multiple-choice test. )on-guess responses are in parentheses !proportion corr ectnot including those that received a not sure rating%. The prior multiple-choice test
led to more errors on the final test and this eff ect grew larger when more distractors
had been presented on the multiple-choice test. /emoving the not sure r esponsesreduced the size of the negative suggestibility eff ect, but left the basic pattern intact.
8!-ber o4 &re9io!s alternati9es
:ero ;not*tested< % wo %hr ee o!r
(+periment (asy 0uestions .4C .4 . .
!.4H% !.45% !.4% !.4C%
'ard 0uestions .; .35 .GN .G5
!.4% !.34% !.3% !.3N%
(+periment 3/ead passages .4H .4 .4 .
!.4% !.45% !.4H% !.4;%
)on-read passages . .3N .3H .G;!.4G% !.G% !.3% !.H%
The data show clearly that taing a multiple-choice test can simultaneouslyenhance performance on a later cued recall test !a positive testing eff ect% andharm performance !a negative suggestibility eff ect%. The former eff ect comes
from 0uestions answered correctly on the multiple-choice test, whereas thelatter eff ect arises from errors committed on the multiple-choice test. &n fact,;CI of the multiple-choice lure answers on the final test had been selected
erroneously on the prior multiple-choice test. This result is noteworthy because it suggests that any negative eff ects of multiple-choice testing r e0uir eselection of an incorrect answer, and that simply reading the lures !and then
selecting the correct answer% is not pr oblematic.
&n (+periment 3 we e+amined whether students would show the sameeff ects when learning from prose materials.
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
26/41
2ene3ts o4 testing -e-oryG5 Roediger et al.
The multiple-choice data are displayed in the bottom part of Table .N.
Again, the results are straightforward: )ot surprisingly, performance on
unread passages was lower than for read passages, and performance
gen- erally declined with the number of distractors !albeit more for unreadthan read passages%. >nce again, the manipulations succeeded in varying
multiple- choice performance across a fairly wide r ange.
The conse0uences of multiple-choice testing can be seen in the bottom of Table .H, which shows the proportion of final short answer 0uestionsanswered correctly. A positive testing eff ect occurred for both read andunread passages, although for unread passages the eff ect declined with thenumber of distractors on the prior multiple-choice test. $till, as in the firste+periment, a positive testing eff ect was observed in all conditions, even whennot sure responses were removed !the data in par entheses%.
As can be seen in Table .5, the negative suggestibility eff ect also a ppear ed
in full force in (+periment 3, although it was greater for the nonr ead
passages, with their corresponding higher rate of errors on the multiple-
choice test than for the read passages. or the read passages, the error
r ate nearly doubled after the multiple-choice test, from HI to I.
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
27/41
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
28/41
2ene3ts o4 testing -e-oryGC Roediger et al.
answers sounds more interesting when one thins about the importance of endorsing multiple-choice lures for the negative suggestibility eff ect. ne group of
undergraduates was warned they would receive a penalty for wrong
answers and that they should choose a don’t now option if they were not
r eason- ably sure of their answer. Another group was re0uired to answer
all of the
multiple-choice 0uestions. Both groups showed large positive testing eff ects,and smaller negative testing eff ects. 6ritically, the penalty instruction signifi-cantly reduced the negative testing eff ect, although it was still significant./esearch on negative suggestibility is "ust beginning, and only a few
variables have been systematically investigated. Three classes of variables are liely to be interesting: ones that aff ect how liely sub"ects areto selectmultiple-choice lures !e.g., reading related material, a penalty for wr onganswers on the 96 test%, ones that aff ect the lielihood that selected multiple-choice lures are integrated with related world nowledge !e.g., corr ectivefeedbac%, and ones that aff ect monitoring at test !e.g., the warning againstguessing on the final test used in /oediger 1 9arsh, 344H%. The negativetesting eff ect could change in size for any of these reasons. or e+ample,consider one recent investigation involving the eff ects of adding a none of the above option to the 96 test !>degard 1 2oen, 344;%.
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
29/41
G Roediger et al. 2ene3ts o4 testing -e-ory $ >
constraints !e.g., time pressure%, and online monitoring during the learning
e+perience !i.e., sub"ective assessments of how well the material has been
learned8 Ben"amin, 344;%. &n other words, a student’s beliefs about learning
and memory and his or her sub"ective evaluations during the learning e+peri-ence are vital to eff ective learning !unlosy, 'ertzog, 2ennedy, 1
Thiede,
344H%. &n this section we shall discuss the metacognitive factors concomitant
with testing, how testing can improve monitoring accuracy, as well as the use
of self-testing as a study strategy by students.
/esearch on metacognition provides a framewor for e+amining how
students strategically monitor and regulate their learning. Monitoring r efers
to a person’s sub"ective assessment of their cognitive processes, and 'ontr ol
refers to the processes that regulate behavior as a conse0uence of monitoring!)elson 1 )arens, 4%. >ne indirect way in which testing can enhance
future learning is by allowing students to better monitor their learning !i.e.,
discriminate information that has been learned well from that which has
not been learned%. (nhanced monitoring in turn influences subse0uentstudy behavior, such as having students channel their eff orts towardslesswell-learned materials. A survey of college students’ study habitsr evealed
that students are generally aware of this function of testing !2ornell 1
B"or ,
344;%. &n response to the 0uestion &f you 0uiz yourself while you study, whydo you do so= 5CI of respondents chose To figure out how well & havelearned the information &’m studying, while only CI selected & learn mor e
that way than through rereading, suggesting that relatively few students
view testing as a learning event !see too 2arpice, Butler, 1 /oediger, 344%.
To gain insight into sub"ects’ monitoring abilities, researchers as
them to mae "udgments of learning !O>@s%. )ormally done during study,
students predict their ability to remember the to-be-learned information
at a later point in time !usually on a scale of 4*44I%, and then these
predictions ar e compared to their actual performance. #sually people are
moderately accur - ate when maing these predictions in laboratory paradigms !e.g., Arbucle 1 6uddy, 5%, but O>@s are inferential in
nature and can be based on a variety of beliefs and cues !2oriat, ;%.
The accuracy of one’s metacogni- tive monitoring depends on the e+tent to
which the beliefs and cues that one uses are diagnostic of future memory
performance * and some of students’ beliefs about learning are wrong. or
e+ample, sub"ects believe that items that are easily processed will be
easy to retrieve later !e.g., Begg, uft, @alonde, 9elnic, 1 $anvito,
C%, whereas we have already discussedthat more eff ortful retrieval is more liely to promote retention.
$imilar ly,
students tend to give higher O>@s after repeated study than after r eceivinginitial tests on the to-be-remembered material, but actual final memory per-formance e+hibits the opposite pattern !i.e., the testing eff ect8 Agarwal et al.,344C8 2ang, 344a8 /oediger 1 2arpice, 3445a%. /epeated studying of thematerial probably engenders greater processing fluency, which leads to anoverestimation of one’s future memory perf or mance.
$tudents’ incorrect beliefs about memory mean that they often engage in
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
30/41
2ene3ts o4 testing -e-oryN4 Roediger et al.
suboptimal learning strategies. or e+ample, O>@s are often negatively cor-
related with study times during learning, meaning that students spend mor etime studying items that they feel are difficult and that they still need to
master !$on 1 9etcalfe, 34448 although see 9etcalfe 1 2ornellE344GF
for conditions that produce an e+ception to this generalization%. )ot only is
testing a better strategy, but sometimes substantial increases in study time ar e
not accompanied by e0uivalent increases in performance, an outcome ter medthe labor-in-vain eff ect !)elson 1 @eonesio, CC%.
6onsider a study by 2arpice !in press% that e+amined sub"ects’ str ategiesfor learning $wahili*(nglish word pairs. 6ritically, the e+periment had
repeated study*test cycles !multi-trial learning% and once sub"ects were able
to correctly recall the (nglish word !when cued with the $wahili word% they
wer e given the choice of whether to restudy, test, or drop an item for the
upcoming
trial, with the goal of ma+imizing performance on a final test wee later .$ub"ects chose to drop the ma"ority of items !54I%, while about 3HI and HI
of the items were selected for repeated testing and restudy,
r espectively. $ub"ects also made O>@s before maing each choice, and
items selected f or restudy were sub"ectively the most difficult !i.e., lowest O>@s%, dropped
items
were perceived to be the easiest, and items selected for testing were in between. As e+pected, final performance increased as a function of the pr o- portion of items chosen to be tested, whereas there was no r elationship between the proportion of items chosen for restudy and final recall. inally,there was a negative correlation between the proportion of items dr oppedand final recall, indicating that sub"ects dropped items before they had firmlyregistered the pair .
These results suggest that learners often mae suboptimal choices during
learning, opting for strategies that do not ma+imize subse0uent r etention.
Also, the tendency to drop items once they were recalled the first time r eflectsoverconfidence and under-appreciation of the value of practicing r etrieval.ollow-up research in our lab !2ang, 344b% is investigating whether e+periencing the testing eff ect !i.e., performing well on a final test for items
previously tested, relative to items that were previously dropped orr estudied%can induce learners to select preferentially self-testing study strategies that
enhance future recall. @s suggest an important
role for testing in improving monitoring accuracy. elayed O>@s refer to
ones solicited at some delay after the items have been studied, whereas
immediate O>@s are solicited immediately after each item has been
studied. elayed O>@s are typically more accurate than immediate O>@s
!e.g., )elson 1
unlosy, %. This delayed O>@ eff ect is obtained only under certainconditions, specifically when the O>@s are cue-only O>@s. This term r efersto the situation in which studied items are A*B pairs and sub"ects are pr o-
vided only with A when ased to mae their prediction for later recall of the
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
31/41
N Roediger et al. 2ene3ts o4 testing -e-ory "
target B8 the eff ect does not occur when O>@s are sought with intact cue-target pairs presented !unlosy 1 )elson, 3%. >ne e+planation for this
finding is that sub"ects attempt retrieval of the target for cue-only
delayedO>@s, and success or failure at retrieval then guides sub"ects’ pr edictions!i.e., a high O>@ is given if the target is successfully retrieved8 if not then a
low O>@ is given%. This enhanced ability to distinguish well-learned fromless well-learned items, coupled with the testing eff ect on items r etrieved
successfully during the delayed O>@, has been proposed to account f or the increased accuracy of delayed O>@s !$pellman 1 B"or, 38 2elemen 1
@s were not always accurate: 2oriat and B"or !344H% had sub"ects learn
paired associates, including forward-associated pairs !e.g., 'heddar'heese%,
bacwards-associated pairs !e.g., 'heese.'heddar %, and unrelated pairs.
uring learning, sub"ects were ased to "udge how liely it was that they
would remember the 3nd word in the pair. $ub"ects over-predicted their
ability to remember the target in the bacwards-associated pairs, and the
authors dubbed this an illusion of competence. n the
first study*test cycle, sub"ectsshowed the same overconfidence for the bacward-associated pairs, but O>@s
and recall performance became better calibrated with further study* testopportunities. This finding suggests that prior test e+perience can enhancelearners’ sensitivity to retrieval conditions on a subse0uent test, and can be a
way to improve metacognitive monitoring.
The studies discussed in this section converge on the general conclusionthat the ma"ority of college students are unaware of the mnemonic benefit of testing:
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
32/41
N3 Roediger et al. 2ene3ts o4 testing -e-ory #
engage in suboptimal study behavior. (ven though college students might be
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
33/41
2ene3ts o4 testing -e-oryN3 Roediger et al.
e+pected to be e+pert learners !given their many years of schooling and
e+perience preparing for e+ams%, they often labor in vain !e.g., rereading the
te+t% instead of employing strategies that contribute to robust learning and
retention. $elf-testing may be unappealing to many students because of thegreater eff ort re0uired compared to rereading, but this difficulty during learn-ing turns out to be beneficial for long-term performance !B"or, N%. Ther e-fore, the challenge for future research is to uncover conditions thatencour age
learners to set aside their naSve intuitions when studying and opt for retrieval-
based strategies that yield lasting r esults.
A""LICATIONS OF TESTING IN CLASSR OOMS
/ecently, several "ournal articles have highlighted the importance of using
tests and 0uizzes to improve learning in real educational situations. The
notion of using testing to enhance student learning is not novel, however ,
as Lates employed this practice with elementary school students in ;
!see too Oones E3GF and $pitzer EGF%. >ne cannot, however, assume
thatlaboratory findings necessarily generalize to classroom situations, given
thatsome laboratory parameters !e.g., relatively short retention intervals,tighte+perimental control% do not correspond well to naturalistic conte+ts. This
distinction has garnered interest recently and we will outline a few studiesthat have evaluated the efficacy of test-enhanced learning within a
classr oom
conte+t.
@eeming !3443% adopted an e+am-a-day procedure in two sections
of &ntroductory Dsychology and two sections of his summer @earning
and 9emory course, for a total of 33 to 3N e+ams over the duration of the
courses. &n comparable classes taught in prior semesters, students hadreceived only four e+ams. inal retention was measured after 5 wees.
@eem-
ing found significant increases in performance between the e+am-a-day
pr o-cedure and the four e+am procedure in both the &ntroductory Dsychology
sections !C4I vs. ;NI% and @earning and 9emory sections !CI vs. CI%.
&n addition, the percentage of students who failed the course decr eased
following the e+am-a-day procedure. @eeming’s students also participated in
a survey, and students in the e+am-a-day sections reported increased inter estand studying for class.
9caniel et al. !344;a% described a study in an online Brain and Behavior
course that used two inds of initial test 0uestions, short answer and
multiple-choice, as well as a read-only condition.
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
34/41
2ene3ts o4 testing -e-oryNG Roediger et al.
e+aminations in multiple-choice format were given after G wees of
0uizzes,and a final cumulative multiple-choice assessment at the end of the semester
covered material from both units. Although facts targeted on the initial0uizzes were repeated on the unit?final e+ams, the 0uestion stems wer e
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
35/41
NG Roediger et al. 2ene3ts o4 testing -e-ory $
phrased diff erently so that the learning of concepts was assessed rather than
memory for a prior test response. >n the two unit e+ams, retention f or 0uizzed material was significantly greater than that for non-0uizzed material,regardless of the initial 0uiz format. >n the final e+am, however, only shortanswer !but not multiple-choice% initial 0uizzes produced a significant benefitabove non-0uizzed and read-only material. The results from this study provide further evidence of the strength of the testing eff ect in classr oomsettings, as well as replicating prior findings showing that short answer tests produce a greater testing eff ect than do multiple-choice tests !e.g., Butler 1/oediger, 344;8 2ang et al., 344;%.
9caniel and $un !344% replicated these findings in a more tr aditional
college classroom setting, in which students too two short-answer 0uizzes
per wee. The 0uizzes were emailed to the students, who had to complete
them by noon the ne+t day. After emailing their 0uiz bac to the pr ofessor ,
students received an email with the 0uiz 0uestions and correct answers.
/etention was measured on unit e+ams, composed of 0uizzed and non-
0uizzed material, and administered at the end of every wee. Derformance f or 0uizzed material was significantly greater than performance on non-0uizzedmaterial.
inally, /oediger, 9caniel, 9cermott, and Agarwal !344% conducted
various test-enhanced learning e+periments at a middle school in &llinois. The
e+periments were fully integrated into the classroom schedules and used
material drawn directly from the school and classroom curriculum. &n the
first study, 5th grade social studies, ;th grade (nglish, and Cth grade
science
students completed initial multiple-choice 0uizzes over half of the classr oom
material. The other half of the material served as the control material. Theteacher in the class left the classroom during administration of 0uizzes, so she
did not now the content of the 0uizzes and could not bias her instruction
toward !or against% the tested material. The initial 0uizzes included a pr e-test
before the teacher reviewed the material in class, a post-test immediately
following the teacher ’s lecture, and a review test a few days after the
teacher ’s lecture. #pon completion of a G- to 5-wee unit, retention was
measured on chapter e+ams composed of both 0uizzed and non-0uizzed
material. At all
three grade levels, and in all three content areas, significant testingeff ects
were revealed such that retention for 0uizzed material was greater thanfor non-0uizzed material, even up to months later !at the end of the
school year%. The results from Cth grade science, for e+ample, can be seen in
igure .;.
This e+periment was replicated with 5th grade social studies students
who, instead of completing in-class multiple-choice 0uizzes, participated in
games online using an interactive website at their leisure. This design was
implemented in order to minimize the amount of class time re0uired for atest-enhanced learning program. espite being left to their own devices,
students still performed better on 0uizzed material available online than non-0uizzed material on their final chapter e+ams. urthermore, in a
subse0uent
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
36/41
2ene3ts o4 testing -e-oryNN Roediger et al.
ig!re ".= $cience results from /oediger, 9caniel, 9cermott, and Agarwal!344%. $ignificant testing eff ects in a middle school setting were r evealedsuch that retention for 0uizzed material was greater than for non-0uizzedmaterial, even up to months later !at the end of the school year%.
e+periment with 5th grade social studies students, a read-only conditionwas included, and performance for 0uizzed material was still significantlygreater than read-only and non-0uizzed material, even when the number of
e+posures were e0uated between the 0uizzed and read-only condition.&n sum, recent research is beginning to demonstrate the robust eff ects of
testing in applied settings, including middle school and college classr ooms.
uture research e+tending to more content areas !e.g., math%, age gr oups
!e.g., elementary school students%, methods of 0uizzing !e.g., computer-based
and online%, and types of material !e.g., application and transfer 0uestions%,
we e+pect, will only provide further support for test-enhanced learning
pr ograms.
CONCL!SION
&n this chapter, we have reviewed evidence supporting test-enhancedlearning in the classroom and as a study strategy !i.e., self-testing% for
impr oving student performance. re0uent classroom testing has both indirect
and dir ect benefits. The indirect benefits are that students study for more time and
with
is ibt t
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
37/41
NH Roediger et al. 2ene3ts o4 testing -e-ory /
greater regularity when tests are fre0uent, because the specter of a loomingtest encourages studying. The direct benefit is that testing on material servesas a potent enhancer of retention for this material on future tests, either
relative to no activity or even relative to restudying material. Dr oviding
correct answer feedbac on tests and insuring that students carefully pr ocessthis feedbac greatly enhances this testing eff ect. eedbac is especially
important when initial test performance is low. 9ultiple tests produce alarger testing eff ect than does a single test. &n addition, tests re0uiring pr o-duction of answers !short answer or essay tests% produce a greater testingeff ect than do recognition tests !multiple-choice or true?false%. The latter testsalso have the disadvantage of e+posing students to erroneous
inf or mation, but giving feedbac eliminates this problem. Test-enhanced
learning is not limited to laboratory materials8 it improves performance
with educational materials !foreign language vocabulary, science passages%
and in actual classroom settings !ranging from middle school classes in
social studies, (nglish, and science, to university classes in introductory
psychology and biological bases of behavior%. n the analysis of the factors of recall in the learning pr ocess.
Psy'hologi'al Monogra&hs, "", H* ;;.
Agarwal, D. 2., 2arpice, O. ., 2ang, $. '. 2., /oediger, '. @., 1 9cermott, 2.B. !344C%. (+amining the testing eff ect with open- and closed-boo tests. A &&lied (ogniti9e Psy'hology, ##, C5* C;5.
Arbucle, T. 7., 1 6uddy, @. @. !5%. iscrimination of item strength at time of
presentation. Jo!rnal o4 E?&eri-ental Psy'hology, 7", 35* G.
Bacon, . T. !;%. 6redibility of repeated statements: 9emory for trivia.
J o!rnal o4 E?&eri-ental Psy'hology@ H!-an Learning and Me-ory, /, 3N* 3H3.
Balota, . A., uche, O. 9., 1 @ogan, O. 9. !344;%. &s e+panded retrieval pr actice
a superior form of spaced retrieval= A critical review of the e+tant literature.
&n O. $. )airne !(d.%, %he 4o!ndations o4 re-e-bering@ Essays in honor o4 Henry
L. Roediger, III !pp. CG*4H%. )ew 7or: Dsychology Dr ess.
Bangert-rowns, /. @., 2uli, 6. 6., 2uli, O. A., 1 9organ, 9. !%.
The instructional feedbac in test-lie events. Re9iew o4 Ed!'ational
Resear'h, 0",
3G* 3GC.
Begg, &., Armour, ., 1 2err, T. !CH%. >n believing what we remember.
(anadian
Jo!rnal o4 2eha9ioral S'ien'e, "= , * 3N.
Begg, &., uft, $., @alonde, D., 9elnic, /., 1 $anvito, O. !C%. 9emory
pr edic- tions are based on ease of processing. Jo!rnal o4 Me-ory and
Lang!age, #7,
54* 5G3.
Ben"amin, A. $. !344;%. 9emory is more than "ust remembering: $trategic control of
encoding, accessing memory, and maing decisions. &n A. $. Ben"amin 1 B. '.
/oss !(ds.%, %he &sy'hology o4 learning and -oti9ation@ S+ill and strategy in
-e-or y !se !ol. NC, pp. ;H*33G%. @ondon: Academic Dr ess.
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
38/41
2ene3ts o4 testing -e-oryN5 Roediger et al.
B"or, /. A. !;H%. /etrieval as a memory modifier: An interpretation of negative
recency and related phenomena. &n /. @. $olso !(d.%, In4or-ation &ro'essing and
'ognition@ %he Loyola Sy-&osi!- !pp. 3G*NN%. )ew 7or: @% and the delayed-O>@ eff ect. Me-ory and (ognition, #5, G;N* GC4.azio, @. 2., 1 9arsh, (. O. !344%. $urprising feedbac improves later memor y.
Psy'hono-i' 2!lletin and Re9iew, "0 , CC* 3.
eller, 9. !N%. >pen-boo testing and education for the future. St!dies in Ed!*
'ational E9al!ation, #5, 3GH* 3GC.
erster, 6. B., 1 $inner, B. . !H;%. S'hed!les o4 rein4or'e-ent . )ew
7or : Appleton-6entury-6rofts.
Lates, A. &. !;%. /ecitation as a factor in memorizing. Ar'hi9es o4 Psy'holog y,
0 !N4%.
Llover, O. A. !C%. The testing phenomenon: )ot gone but nearly f or gotten.
Jo!rnal o4 Ed!'ational Psy'hology, 7", G3* G.
'asher, @., Loldstein, ., 1 Toppino, T. !;;%. re0uency and the conference of
referential validity. Jo!rnal o4 erbal Learning and erbal 2eha9ior , "0 , 4;* 3.
Oacoby, @. @., 1 'ollingshead, A. !4%. /eading student essays may be hazardous
to
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
39/41
N; Roediger et al. 2ene3ts o4 testing -e-ory =
your spelling: (ff ects of reading incorrectly and correctly spelled words. (anadian
Jo!rnal o4 Psy'hology, , GNH* GHC.Oones, '. (. !3G%. The eff ects of e+amination on the performance of learning.
Ar'hi9es o4 Psy'hology, "5, * ;4.2ang, $. '. 2. !344a%. (nhancing visuo-spatial learning: The benefit of r etrieval
practice. 9anuscript under r evision.2ang, $. '. 2. !344b%. The influence of te+t e+pectancy, test format and test
e+perience on study strategy selection and long-term retention. #npublished
doctoral dissertation, , #$A.
2ang, $. '. 2., 9cermott, 2. B., 1 /oediger, '. @. !344;%. Test format andcorr ec- tive feedbac modify the eff ect of testing on long-term retention. E!ro&ean J o!rnal o4 (ogniti9e Psy'hology, ">, H3C* HHC.
2arpice, O. . !in press%. 9etacognitive control and strategy selection: eciding
to practice retrieval during learning. Jo!rnal o4 E?&eri-ental Psy'hology@ Beneral .
2arpice, O. ., Butler, A. 6., 1 /oediger, '. @. !344%. 9etacognitive strategiesin
student learning: o students practise retrieval when they study on their own=
Me-ory, "= , N;* N;.
2arpice, O. ., 1 /oediger, '. @. !344;%. (+panding retrieval practice
pr omotes short-term retention, but e0ually spaced retrieval enhances long-term
r etention. Jo!rnal o4 E?&eri-ental Psy'hology@ Learning, Me-ory, and
(ognition, $$,
;4N* ;.
2arpice, O. ., 1 /oediger, '. @. !344C%. The critical importance of retrieval
f or learning. S'ien'e, $">, 55* 5C.2elemen,
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
40/41
2ene3ts o4 testing -e-oryNC Roediger et al.
@eeming, . 6. !3443%. The e+am-a-day procedure improves performance in Dsych-
ology classes. %ea'hing o4 Psy'hology, #>, 34* 33.
@oftus, (. ., 9iller, . L., 1 Burns, '. O. !;C%. $emantic integration of
verbal information into a visual memory. Jo!rnal o4 E?&eri-ental Psy'hology@ H!-an Learning and Me-ory, , * G.
@ogan, O. 9., 1 Balota, . A. !344C%. (+panded vs. e0ual interval spaced r e-trieval practice: (+ploring diff erent schedules of spacing and retention intervalin younger and older adults. Aging, 8e!ro&sy'hology, and (ognition, "/, 3H;* 3C4.
9caniel, 9. A., Anderson, O. @., erbish, 9. '., 1 9orrisette, ). !344;a%. Testing
the testing eff ect in the classroom. E!ro&ean Jo!rnal o4 (ogniti9e Psy'hology, ">,
NN* HG.
9caniel, 9. A., /oediger, '. @., &&&, 1 9cermott, 2. B. !344;b%. Lener alizing
test-enhanced learning from the laboratory to the classroom. Psy'hono-i' 2!lletin
and Re9iew, ", 344* 345.
9caniel, 9. A., 1 $un, O. !344%. The testing eff ect: (+perimental evidence in acollege course. 9anuscript under r evision.
9arsh, (. O., Agarwal, D. 2., 1 /oediger, '. @., &&& !344%. 9emorial
conse0uences of answering $AT && 0uestions. Jo!rnal o4 E?&eri-ental
Psy'hology@ A&&lied , "/,
* .
9arsh, (. O., /oediger, '. @., &&&, B"or, /. A., 1 B"or, (. @. !344;%. The memorial
conse0uences of multiple-choice testing. Psy'hono-i' 2!lletin and Re9iew, ",
N* .
9etcalfe, O., 1 2ornell, ). !344G%. The dynamics of learning and allocation of
study time to a region of pro+imal learning. Jo!rnal o4 E?&eri-ental Psy'hology@Ben* eral , "$#, HG4* HN3.
9oreno, /. !344N%. ecreasing cognitive load for novice students: (ff ects of e+plana-
tory versus corrective feedbac in discovery-based multimedia. Instr!'tional
S'ien'e, $#, * G.
)elson, T. >., 1 unlosy, O. !%. ., 1 )arens, @. !C4%. )orms of G44 general-information 0uestions:
Accuracy of recall, latency of recall, and feeling-of-nowing ratings. Jo!rnal o4
erbal Learning and erbal 2eha9ior , ">, GGC* G5C.
)elson, T. >., 1 )arens, @. !4%. 9etamemory: A theoretical framewor and newfindings. &n L. '. Bower !(d.%, %he &sy'hology o4 learning and -oti9ation !ol. 35,
pp. 3H*;G%. )ew 7or: Academic Dr ess.
>degard, T. )., 1 2oen, O. . !344;%. )one of the above as a correct and incorr ectalternative on a multiple-choice test: &mplications for the testing eff ect. Me-or y,"/, C;G* CCH.
Dashler, '., 6epeda, ). O.,
-
8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.
41/41
2ene3ts o4 testing -e-ory >
/emmers, '. '., 1 /emmers, (. 9. !35%. The negative suggestion eff ect on true-false e+amination 0uestions. Jo!rnal o4 Ed!'ational Psy'hology, "= , H3* H5.
/ichland, @. (., 2ornell, ). 1 2ao, @. $. !344%. The pretesting eff ect: o unsuccess-
ful retrieval attempts enhance learning= Jo!rnal o4 E?&eri-ental Psy'holog y@
A&&lied , "/, 3NG* 3H;.
/oediger, '. @., 1 2arpice, O. . !3445a%. Test enhanced learning: Taing
memory tests improves long-term retention. Psy'hologi'al S'ien'e, "= , 3N* 3HH.
/oediger, '. @., 1 2arpice, O. . !3445b%. The power of testing memory:
Basic research and implications for educational practice. Pers&e'ti9es on
Psy'hologi'al S'ien'e, ", C* 34.
/oediger, '. @., 9caniel, 9. A., 9cermott, 2. B., 1 Agarwal, D. 2. !344%.
Test-enhanced learning in the classroom: The 6olumbia 9iddle $chool pro"ect.
9anuscript in pr epar ation.
/oediger, '. @., 1 9arsh, (. O. !344H%. The positive and negative conse0uences of
multiple-choice testing. Jo!rnal o4 E?&eri-ental Psy'hology@ Learning, Me-or y ,
and (ognition, $", HH*H.
/oediger, '. @., Paromb, . 9., 1 Butler, A. 6. !344C%. The role of repeated
r etrieval in shaping collective memory. &n D. Boyer and O. .