are all questions created equal?: factors that influence cloze question difficulty
DESCRIPTION
Are all questions created equal?: Factors that influence cloze question difficulty. Brooke Soden Hensler Carnegie Mellon University (starting graduate school at Florida Center for Reading Research this Fall) Joseph E. Beck Carnegie Mellon University. - PowerPoint PPT PresentationTRANSCRIPT
1
Are all questions created equal?: Factors that influence cloze
question difficulty.
Brooke Soden Hensler Carnegie Mellon University
(starting graduate school at Florida Center for Reading Research this Fall)
Joseph E. Beck Carnegie Mellon University
Funding: National Science Foundation
Society for the Scientific Study of Reading – July 2006
2
Why Look at Multiple Choice Cloze Questions?
Multiple Choice Cloze are widely used assessments of comprehension
Problem: outcome measure is typically binary (little information about student).
Goal: use multiple choice cloze questions to… More accurately assess students Track student reading development Better understand what makes cloze questions
hard
3
Project LISTEN’s Computer Reading Tutor(Mostow & Aist, 2001)
Automated
Students use throughout year
Accompanying paper standardized test scores (pre & post)
4
Student is reading a story aloud to the Reading Tutor…
5
A question appears…*Reading Tutor reads both Question and Response Choices.(Mostow, et al., 2004)
6
Student resumes reading story aloud to the Reading Tutor…
7
Reading Tutor Advantages
Well-specified & unbiased question construction (randomly generated)
Questions automatically administered, scored, & recorded
Longitudinal collection over school year
Large N (students & questions)
8
How many Q’s from Whom?Data Description
81,175 Questions
1042 Students
11 = Median number of questions answered (Many students infrequent users of tutor)
2001-02 & 2002-03 School years
Diverse population in Pittsburgh area
9
Research Questions
Is a particular part of speech (e.g., nouns, verbs, etc.) more difficult for students? If nouns are learned first (Gentner, 1982; Golinkoff, et
al., 2000), might students be more proficient at answering noun questions?
Which factors influence question difficulty?
How can we better assess students using multiple choice cloze questions? Vocabulary researchers have given partial credit for
correct part of speech (e.g., Schwanenflugel, et al., 1997)
10
Approach
Build logistic regression model to predict individual question performance Terms in model: student identity, part of speech
of answer, properties of question (e.g., question length)
Advantages of modeling approach Simultaneously estimates impact of question
properties and student proficiency on question performance
Makes use of all ~80k questions
11
Effect of Parts of Speech
Nouns Verbs AdverbsAdjectives(p < 0.001)
< <<(p < 0.001) (p < 0.05)
12
Effect of Parts of Speech
Nouns Verbs AdverbsAdjectives(p < 0.001)
easier harder
< <<(p < 0.001) (p < 0.05)
13
Impact of other Part of Speech terms
Difficulty SignificanceMost Common p < 0.01Part of Speech
# of Choices p < 0.001with Answer’s POS
“Sally had to _______ her lips when she heard the news.”(cloud, purse, holds, magnificent)
“Henry read his _______ under the tree.”(cup, dog, book, hair)
14
Difficulty SignificanceMost Common p < 0.01Part of Speech
# of Choices p < 0.001with Answer’s POS
“Henry read his _______ under the tree.”(cup, dog, book, hair)
“Sally had to _______ her lips when she heard the news.”(lamp, purse, beautiful, magnificent)
Impact of other Part of Speech terms
less common POS = harder
more common POS = easier
15
Difficulty SignificanceMost Common p < 0.01Part of Speech
# of Choices p < 0.001with Answer’s POS
“Henry read his _______ under the tree.”(cup, dog, book, hair)
“Sally had to _______ her lips when she heard the news.”(lamp, purse, beautiful, magnificent)
Impact of other Part of Speech terms
fewer choices with correct POS
more choices with correct POS = harder
= easier
(verb)
(noun)
16
Impact of other terms
Difficulty SignificanceQuestion p < 0.001Length
Deletion p < 0.001Location
“We can _______ the stars in the sky despite the bright city lights around us.”(at, with, most, see)
“They rode their _______ .”(farmer, bikes, play, blue)
17
Impact of other terms
Difficulty SignificanceQuestion p < 0.001Length
Deletion p < 0.001Location
“We can _______ the stars in the sky despite the bright city lights around us.”(at, with, most, see)
“They rode their _______ .”(farmer, bikes, play, blue)
longer = harder
shorter = easier
18
Impact of other terms
Difficulty SignificanceQuestion p < 0.001Length
Deletion p < 0.001Location
“We can _______ the stars in the sky despite the bright city lights around us.”(at, with, most, see)
“They rode their _______ .”(farmer, bikes, play, blue)
blank earlier = harder
blank later = easier
19
Using model to assess student reading comprehension
Model estimates Beta parameter for each student Represents how well student did at answering cloze
questions (controlling for difficulty factors) Should correlate with external comprehension measure
Compare Beta vs. percent correct for predicting WRMT comprehension composite* Student Beta: r = .644, p < .001 Percent correct: r = .507, p < .001 Reliability of difference in correlations, p < .01
Also provides check on validity of regression model
*N = 465, 1 extreme outlier was eliminated from analyses.
20
Conclusions
Length of question, location of deleted word, and part of speech of correct answer affect question difficulty.
Logistic regression is a strong choice for analyzing cloze data.
Multiple-choice cloze questions can assess a student at a more accurate level than current practice.
21
Questions?
Nominated for Best Paper Award:
Soden Hensler, B., Beck, J. E. (2006). Better student assessing by finding difficulty factors in a fully automated comprehension measure. Intelligent Tutoring Systems.
Brooke Soden [email protected]
Joseph E. [email protected]
Project LISTEN & The Reading Tutorhttp://www.cs.cmu.edu/~listen/
22
References
Gentner, D. (1981). Some interesting differences between verbs and nouns. Cognition and Brain Theory, 4(2).
Golinkoff, R.M., Hirsh-Pasek, K., Bloom, L., Smith, L. B., Woodward, A. L., Akhtar, N., Tomasello, M., & Hollich, G. (2000). Becoming a word learner: A debate on lexical acquisition. New York: Oxford University Press.
Mostow, J. & Aist, G. (2001). Evaluating tutors that listen: An overview of Project LISTEN. In K. Forbus & P. Feltovich (Eds.), Smart Machines in Education (169 - 234) Menlo Park, CA: MIT/AAAI Press.
Mostow, J., Beck, J. E., Bey, J., Cuneo, A., Sison, J., Tobin, B. & Valeri, J. (2004). Using automated questions to assess reading comprehension, vocabulary, and effects of tutorial interventions. Technology, Instruction, Cognition and Learning, 2, p. 97-134
Schwanenflugel, P.J., Stahl, S. A., & McFalls, E. L. (1997). Partial word knowledge and vocabulary growth during reading comprehension. Journal of Literacy Research, 29(4).
23
Additional Slides
x
24
Terms in Model
Factors Description of Term
Part of SpeechSimplified part of speech classification of the correct answer as Noun, Verb, Adjective, Adverb, or Function Word.
Most Common Part of Speech
Whether or not the correct answer’s POS is the most common POS the word could take on.
POS Confusability The number of POS the correct answer can take on.
Level of Difficulty4 Levels of Difficulty based on frequency in English or special annotation.
Student Identity Unique Identification for each student.
Covariates
Question LengthNumber of characters of the cloze question and the corresponding response choices.
Deletion LocationProportion of the sentence that is before the blank (location of word deletion).
# Choices with Answer's POS
Probability that the student could have answered the question using only part of speech information.
25
Developmental Trends in Learning Parts of Speech
<=2 2…3 3…4 4…5 >=5
Reading Proficiency
Like
lihoo
d of
ans
wer
ing
ques
tion
corr
ectly
Nouns
Verbs
Adjectives
Adverbs
Function Words
26
Developmental Trends in Learning Parts of Speech
<=2 2…3 3…4 4…5 >=5
Reading Proficiency
Like
lihoo
d of
ans
wer
ing
ques
tion
corr
ectly
Nouns
Verbs
p < .001
p = .71
p = .99
p = .52 p = .64
27
Syntactic Awareness
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
<=2 2…3 3…4 4…5 >=5
Reading Proficiency
Rel
ativ
e Im
pac
t
Impact of #POS wordcan take on
p = .48
p = .73
p = .01p = .02
p < .001
28
Effect of Part of Speech*Interpretation: positive Beta means student is more likely to answer question correctly
Part of Speech
Noun < Verb < Adjective < Adverb <Function Words
Beta 0.39 0.29 0.19 0.12 (comparison point)
Significance p < .001 p < .001 p < .001 p < .001 ---