how multiple choice items distort test takers results in tests of structure in thailand a...
TRANSCRIPT
How Multiple choice items distort test takers results in tests of structure in Thailand
A presentation by Mick Currie and Nanta Chiramanee
Prince of Songkla University, Hatyai
14th December 2007
1
How Multiple choice items distort test takers results in tests of structure in Thailand
The context of the study: Thailand
Asia and the EFL/ESL world
Previous studies
Our study: Subjects
Methodology
Findings
Implications
Your questions and comments
Overview
2
How Multiple choice items distort test takers results in tests of structure in Thailand
Thailand has not been able to develop widespread communicative skills in English among its population.
Studies have repeatedly found that many teachers do not teach English as a communicative skill.
e.g. Musigrunsi, (2002),
Prapaisit, (2003)
Thongsri, (2005)
All identified tests as one of the reasons
3
How Multiple choice items distort test takers results in tests of structure in Thailand
What is being tested? Communicative skills or grammar?
And how is it being tested?
Currie (2007) 97% of students interviewed had been tested in grammar Less than 60% had been tested in writing or speaking
Tests overwhelmingly used multiple choice format
Upshur and Palmer (cited in Canale and Swain, 1980) The measurement of linguistic accuracy in Thai students is not an accurate predictor of their ability to communicate
Knox (1996)
“If the ability to communicate in English is to be taught (and more importantly to be learned) it is
vital that this ability also be tested.”
4
How Multiple choice items distort test takers results in tests of structure in Thailand
Thailand is not alone.
Korea: Li (1998)
Japan: Gorsuch (2000)
- Multiple choice university entrance examinations affect the way teachers teach and student want to learn
China: Liu (2007) -Tests in China concentrate on ‘linguistic competence’ -Multiple choice is the main method in high stakes
tests taken annually by more than 8,000,000 students
5
How Multiple choice items distort test takers results in tests of structure in Thailand
Little research conducted into whether multiple choice tests effectively assess linguistic knowledge
No published studies comparing stem equivalent items in tests of structure
Rodriguez (2003): a meta analysis of studies into construct equivalence. Found high correlations in stem equivalent items. Identified differences in effects in different domains.
Pike (1979) compared constructed response and multiple choice formats by correlations of reliability (not stem equivalent)
Shohamy (1984) compared stem equivalent items for reading
Cheng (2004) compared stem equivalent items in listening
Both found large format effects induced by multiple choice items
6
How Multiple choice items distort test takers results in tests of structure in Thailand
Our research methodology
Subjects: 152, 1st and 3rd year students from Prince of Songkla University
Instruments: A short answer test with 40 structure items
3 multiple choice tests in 3, 4 and 5 option format
A post test questionnaire why did subject change their answer on one selected item?
Procedure: All subjects sat short answer testGroups of 52, 55 and 45 sat 3, 4 and 5-option multiple choice tests 5/6 weeks later
Post test questionnaire after 2nd test
7
How Multiple choice items distort test takers results in tests of structure in Thailand
Our research methodology
Example of item construction method. Structure section item # 10
Constructed response (short answer) item: Stem only
Man: ———— a bank in the university? Student: Yes, it’s opposite the science faculty.
Multiple choice, 3, 4 and 5 option items: Stem and options
Man: ———— a bank in the university? Student: Yes, it’s opposite the science faculty.
Numbers of subjects
who chose the option in3-option 4-option 5-option in the short answer test a. Is a. Is a. Is (19)b. Where is b. Where b. Is there (18)*c. Is there c. Where is c. Where is (39) d. Is there d. Where (17)
e. Have (17) * Expected
response
7.1
How Multiple choice items distort test takers results in tests of structure in Thailand
Our research methodology
Analysis: Comparison of subjects scores between the two tests.
Comparison of item performance
Direct comparison of subjects responses (item by item) in the two tests
Controls Control groupControl items (4-option) in all m/choice testsCriterion referenced test data
1st year subjects: O-net scores3rd year subjects: Mid term test
Established: Groups of equal ability and no practice effect
8
How Multiple choice items distort test takers results in tests of structure in Thailand
Our Findings:
Comparison of multiple choice test scores with O-net scores
O-net Study Correlation t value
44.69% Control items 44.00% 0.493** 0.538
44.69% Composite m/c 42.80% 0.710** 2.083*
*significant at p< 0.05, **significant at p< 0.001 (df=156)(df=157)
9
How Multiple choice items distort test takers results in tests of structure in Thailand
Our Findings: Short answer test
3-option m/choice test
4-option m/choice test
5-option m/choice test
Group n 152 52 55 45
Mean score (31 items) 19.38% 52.23% 46.45% 45.10%
Min-max 0 – 83.87% 12.90 – 90.32% 9.68 – 96.77% 9.68 – 90.32%
Reliability (alpha) 0.880 0.872 0.890 0.885
Mean item facility 0.19 0.52 0.46 0.45
Mean discrimination 0.45 0.45 0.48 0.54
Individual difference: 2nd test over 1st test
33.33% (10.23 items)
26.86% (8.33 items)
25.74% (7.98 items)
Min-max increase 3.23 - 67.74% 6.45 - 51.61% 0 - 51.61%
t value(1st/2nd test) 20.646* (df=51) 18.050* (df=54) 15.206* (df=51)
f value (ANOVA) on m /choice tests 1.472, (df = 2 & 149)
*significant at p < 0.001
10
0
5
10
15
20
25
30
3-optiongroup
4-optiongroup
5-optiongroup
1st test(shortanswer)
2nd test(multiplechoice)
Individual scores: 1st & 2nd test
10.1
0
5
10
15
20
25
30
0 5 10 15 20 25 30
Score in 1st test
Score
in 2
nd test
3-option
4-option
5-option
Individual scores: 1st test vs 2nd test
10.2
How Multiple choice items distort test takers results in tests of structure in Thailand
Our Findings:
Test group
Correlation:1st test score with 2nd test score
Corrected for attenuation
Significance
(p < ) df
3-option 0.842 0.966 0.001 50
4-option 0.880 0.980 0.001 53
5-option 0.886 exceeds 1 0.001 43
Correlation: 1st test score/increase in score
3-option 0.133 - - 50
4-option 0.149 - - 53
5-option 0.426 - 0.01 43
11
How Multiple choice items distort test takers results in tests of structure in Thailand
Our Findings:
Code1st test
response
Available or not available option
in 2nd test Response in 2nd test
A no answer incorrect
B no answer correct
C incorrect available Incorrect (different from 1st test)
D incorrect available Incorrect (same as 1st test)
E incorrect available correct
F correct available incorrect
G correct available correct
H incorrect not available incorrect
J incorrect not available correct
K acceptable not available correct
L acceptable not available incorrect
12
How Multiple choice items distort test takers results in tests of structure in Thailand
Our Findings:
Pattern3-option
group
4-option
group5-option
group Overall
A 2.85% 2.58% 2.94% 2.78%
B 1.99% 1.88% 1.36% 1.76%
C 5.89% 11.61% 13.05% 10.08%
D 13.40% 11.67% 12.69% 12.56%
E 11.66% 11.03% 10.97% 11.23%
F 2.23% 2.76% 2.72% 2.57%
G 14.45% 14.43% 14.19% 14.37%
H 22.52% 24.16% 22.65% 23.15%
J 22.21% 17.07% 17.06% 18.82%
K 1.67% 1.99% 1.22% 1.66%
L 1.12% 0.82% 1.15% 1.02%
Correlations
3/4 option: 0.950***
3/5 option: 0.939***
4/5 option: 0.995***
df = 9
ANOVA’s
C: f = 13.078***
J: f = 5.757**
df = 2&149**significant at p<0. 01
***significant at p<0.001
13
How Multiple choice items distort test takers results in tests of structure in Thailand
Our Findings:
3-option high
4-option high
5-option high
3-option mid.
4-option mid.
5-option mid.
3-option low
4-option low
4-option high 0.985**
5-option high 0.991** 0.988**
3-option mid. 0.675*
4-option mid. 0.430 0.866**
5-option mid. 0.512 0.860** 0.990**
3-option low 0.167 0.719*
4-option low 0.039 0.858** 0.970**
5-option low 0.062 0.866** 0.915** 0.974**
**significant at p < 0.001; df = 9
14
How Multiple choice items distort test takers results in tests of structure in Thailand
High ability subjects:Maintained more correct and less incorrect answers between testsSelected the correct response when their answer from the 1st test was not among the options in the multiple choice test, 3 times out of 4
High and middle ability subjects: Were twice as likely to switch from their incorrect answer in the first test to the correct option, than low ability subjects
Low ability subjects:Selected an incorrect response when their answer from the 1st test was not among the options in the multiple choice test, 3 times out of 4
Our Findings:
15
How Multiple choice items distort test takers results in tests of structure in Thailand
Subjects in the 3-option group Were more successful at selecting the correct response when their answer from the first test was not among the options than subjects in the 4 and 5-option groups (pattern J)
Switched between incorrect options when their original response was among the options, half as often as did subjects in the 4 and 5-option groups (Pattern C)
But: Overall the number of options had very little effect
Our Findings:
16
How Multiple choice items distort test takers results in tests of structure in Thailand
Why did the subjects change their answers between the two tests?
Knowledge
Learning
Cued recall
Test taking strategy/technique
Blind guessing
Our Findings:
17
How Multiple choice items distort test takers results in tests of structure in Thailand
Our Findings:Lack of knowledge 2 4 2 1 6 1 1 2Incorrect partial knowledge 7 1 16 11 6 12 6 17 2 1
Incorrect knowledge 3 1 3 6 3
Correct partial knowledge 1 1 1 1 1
Correct knowledge 2 1 1Ineffective tests taking strategies 1 1 1
Ineffective blind guessing 1 1 1 1 1
Poor test technique 2 2
Incorrect partial knowledge
Ineffective learning
Correct partial know
ledge
Effective learning
Ineffective cued recall
Effective cued recall
Ineffective test taking strategies
Effective test taking strategies
Ineffective blind guessing
Effective blind guessing
Poor test taking technique
First
test
Second test
18
How Multiple choice items distort test takers results in tests of structure in Thailand
What are we to make of these results?
Our Conclusions:
The multiple choice format enabled the subjects to achieve higher scores than the short answer test
0
5
10
15
20
25
30
3-optiongroup
4-optiongroup
5-optiongroup
1st test(shortanswer)
2nd test(multiplechoice)
19
How Multiple choice items distort test takers results in tests of structure in Thailand
What are we to make of these results?
Our Conclusions:
The multiple choice format enabled the subjects to achieve higher scores than the short answer test
The improvements were only weakly correlated with the first test suggesting that language ability was not responsible for the improvement
Correlation: 1st test score/increase in score
3-option 0.133
4-option 0.149
5-option 0.426
20
How Multiple choice items distort test takers results in tests of structure in Thailand
What are we to make of these results?
Our Conclusions:
The multiple choice format enabled the subjects to achieve higher scores than the short answer test
The improvements were only weakly correlated with the first test suggesting that language ability was not responsible for the improvement
The high correlations between the two tests
Test group
Correlation:1st test score with 2nd test score
Corrected for attenuation
Significance
(p < ) df
3-option 0.842 0.966 0.001 50
4-option 0.880 0.980 0.001 53
5-option 0.886 exceeds 1 0.001 43
21
How Multiple choice items distort test takers results in tests of structure in Thailand
What are we to make of these results?
Our Conclusions:
The multiple choice format enabled the subjects to achieve higher scores than the short answer test
The improvements were only weakly correlated with the first test suggesting that language ability was not responsible for the improvement
The high correlations between the two tests are misleading
21.1
How Multiple choice items distort test takers results in tests of structure in Thailand
Pattern3-option group
4-option group
5-option group Overall
A 2.85% 2.58% 2.94% 2.78%
B 1.99% 1.88% 1.36% 1.76%
C 5.89% 11.61% 13.05% 10.08%
D 13.40% 11.67% 12.69% 12.56%
E 11.66% 11.03% 10.97% 11.23%
F 2.23% 2.76% 2.72% 2.57%
G 14.45% 14.43% 14.19% 14.37%
H 22.52% 24.16% 22.65% 23.15%
J 22.21% 17.07% 17.06% 18.82%
K 1.67% 1.99% 1.22% 1.66%
L 1.12% 0.82% 1.15% 1.02%
22
How Multiple choice items distort test takers results in tests of structure in Thailand
What are we to make of these results?
Our Conclusions:
The multiple choice format enabled the subjects to achieve higher scores than the short answer test
The improvements were only weakly correlated with the first test suggesting that language ability was not responsible for the improvement
The high correlations between the two tests are misleading
Only around 27% of the answers from the 1st test were chosen from the multiple choice options in the 2nd test
73% of the 2nd test option selection was forced or induced by the test format
23
How Multiple choice items distort test takers results in tests of structure in Thailand
What are we to make of these results?
Our Conclusions:
The multiple choice tests grossly distorted the measurement of the test takers performance
What was actually being measured was largely the subjects’ ability to deal appropriately with the multiple choice format
Based on this study, the multiple choice format should not be used in tests of language structure
24