word frequency and readability: lexical characterization ... · 10-fold cv acc. = 48% (5%...
TRANSCRIPT
LEAD Graduate School
12/28/15, LEAD Colloquium
Word Frequency and Readability: Lexical Characterization of Text ComplexityXiaobin Chen, Detmar Meurers
2 | WORD FREQUENCY AND READABILITY
Contents
1. Introductioni. The Importance of Reading Ability and the Realityii. Readability Assessment: Why and How
2. Literature Reviewi. The Reading Processii. Vocabulary and Readingiii. Word Frequency and Readability
3. Research Design1. Research Questions2. Methodology
4. The Three Studies and Their Results
5. Conclusions
3 | WORD FREQUENCY AND READABILITY
The Importance of Reading
● It is considered the most basic subject of school education and the major source of knowledge development for students.
● A person’s prose/document literacy is positively related to his/her education attainment, income, and occupational prestige. (Kutner et al., 2007, U. S. Department of Education)
4 | WORD FREQUENCY AND READABILITY
Literacy Level and Education Attainment
5 | WORD FREQUENCY AND READABILITY
Literacy Level and and Job Opportunities
6 | WORD FREQUENCY AND READABILITY
Literacy Level and Employment
7 | WORD FREQUENCY AND READABILITY
Literacy Level and Gross Earnings
8 | WORD FREQUENCY AND READABILITY
Literacy Level and Occupational Prestige
9 | WORD FREQUENCY AND READABILITY
The Reality
● A significant amount of high school graduates still cannot meet the college or career readiness benchmarks (ACT, 2015).
10 | WORD FREQUENCY AND READABILITY
How to Improve Reading Proficiency
● One way to enhance reading outcomes is to engage students “with texts of appropriate complexity throughout schooling” (Nelson, Perfetti,Liben, and Liben, 2012).
● Students usually gain a sense of success and are motivated to read more when they are given texts that enable them to practice being competent readers (Milone & Biemiller,2014).
● Reading materials that meet the “i + 1” criterion (Krashen, 1985) are optimal for promoting language abilities.
11 | WORD FREQUENCY AND READABILITY
Readability Assessment
● Definition of readability: the sum of all elements of a text that affect a reader's understanding, reading speed, and level of interest in the text (Dale & Chall, 1949).
Key aspects of text readability, adapted from Collins-Thompson ( 2014)
12 | WORD FREQUENCY AND READABILITY
Readability Assessment
● Qualitative methods (see review by Pearson & Hiebert, 2014):
– Text leveling (TL)
– Rubrics plus examplars (R + E)
– Text maps (TM)
● Quantitative methods (see reviews by Kollins-Thompson, 2014; Benjamin, 2012; Zakaluk & Jay, 1988):
– Traditionally: multiple regression on surface features
– Modern methods: natural language processing (NLP) and machine learning (ML)
– Features: morphological, lexical, semantic, syntactic, structural, psycholinguistic, genre, etc.
13 | WORD FREQUENCY AND READABILITY
The Present Study
● Extends readability research from the lexical perspective—an important but yet to be deeply explored area—by making use of the latest development in NLP/ML and linguistic theory and practice.
● Our interest was in the use of word frequency lists for readability assessment, an issue that had caught on since the very beginning of readability research but not yet settled.
14 | WORD FREQUENCY AND READABILITY
Why Frequency?
● Why is word frequency interesting?
● How is it related to readability?
15 | WORD FREQUENCY AND READABILITY
The Reading Process
● Reading is a coordinated execution of a series of processes (Just & Carpenter, 1980), including:
– word encoding
– lexical access
– assigning semantic roles
– and relating the information contained in a sentence to earlier sentences in the same text and the reader's prior knowledge.
● Successful comprehension of texts depends a lot on reader's:
– syntactical competence and semantic decoding abilities (Marks, Doctorow, & Wittrock, 1974) and
– vocabulary knowledge on the language (Laufer & Ravenhorst-Kalovski, 2010; Nation, 2006).
16 | WORD FREQUENCY AND READABILITY
Vocabulary and Reading
● Lexical coverage and vocabulary knowledge are good predictors of reading comprehension, an idea shared by a number of other researchers (e.g., Bernhardt & Kamil, 1995; Laufer, 1992; Nation, 2001,2006; Qian, 1999, 2002; Ulijn & Strother, 1990, etc.).
17 | WORD FREQUENCY AND READABILITY
The Frequency Effects
● A reader's vocabulary knowledge is related to the amount of exposure the reader has received on words.
– Word frequency is predictive to word difficulty (Ryder & Slater, 1988).
– Word frequency is strongly associated with both actual difficulty (how well can people choose the correct definition of the word) and perceived difficulty (how difficult does a word look) (Leroy and Kauchak, 2013).
– High-frequency words are more easily perceived (Bricker & Chapanis, 1953) and readily retrieved by the reader (Haseley,1957).
– High-frequency words are perceived and produced more quickly and more efficiently than low-frequency ones (Balota & Chumbley, 1984; Howes & Solomon, 1951; Jescheniak & Levelt, 1994; Monsell, Doyle, & Haggard, 1989; Rayner & Duy, 1986), resulting in more efficient comprehension of the text (Klare, 1968).
– Frequency of word occurrence affects not only the ease of reading, but also its acceptability (Klare, 1968).
Syntactical competence
Semantic decoding abilities
Vocabulary knowledge
Frequency effects
Reading comprehension
Relating to prior knowledge
Frequency and Reading Comprehension
19 | WORD FREQUENCY AND READABILITY
Related Literature
● Researchers have constantly used semantic and syntactic features of a text to predict its difficulty level (e.g., Dale & Chall, 1948; Flesch, 1948; Gray & Leary, 1935; Kincaid et al., 1975; Kintsch et al., 1993;Kintsch & Vipond, 1979; Lexile, 2007; Vajjala & Meurers, 2012).
● The semantic variable of word difficulty usually accounts for the greatest percentage of readability variance (Marks et al., 1974).
● Consequently, textual difficulty of a reading passage is assessable by investigating the frequency of the words chosen for the writing.
20 | WORD FREQUENCY AND READABILITY
Traditional Readability Research● Lively and Pressey (1923)
– “zero-index words” and median of the index numbers of words from Thorndike's lists of 10,000 most frequent words in English (Thorndike, 1921).
– “The median index number was the best indicator of the vocabulary burden of reading materials.”
● Patty and Painter (1931)
– Average word weighted value: the average of products of index value from Thorndike's list and the frequency of words in the text sample.
– “an apparent improvement in technique for readability judgment”
● Ojemann (1934)
– words from the text that are among the first 1,000 and first 2,000 most frequent words of the Thorndike list.
– “highly correlated with difficulty”
21 | WORD FREQUENCY AND READABILITY
Modern Readability Research● Lexile (Lexile, 2007)
– word frequencies from the Carrol-Davies-Richman corpus (Carroll,Davies, & Richman, 1971)
● ATOS (Milone & Biemiller, 2014)
– Graded Vocabulary list
● Commercially successful and effective (Nelson, Perfetti, Liben, & Liben, 2012).
22 | WORD FREQUENCY AND READABILITY
Problems with Previous Research (I)● Frequency list
– Problem: did not take into consideration spoken language exposure.
– Why not optimal: not a faithful representation of the reader's actual language experience, hence unable to predict the ease of retrieval and perception accurately.
– Solution: a frequency list that represents actual language experience.
● Frequency measures
– Problem: only count actual occurrence of words.
– Why not optimal: did not consider the number of contexts in which a word may occur.
– Solution: include Contextual Diversity (CD) measures, which were found to be a better predictors of word frequency effect on Lexical Decision Tasks (Adelman, Brown,& Quesada, 2006).
23 | WORD FREQUENCY AND READABILITY
Problems with Previous Research (II)● Methodology
– Methods used: simple average frequency count, percentage of words from the top frequency bands of the list.
– Problem: unable to capture the full picture of text readability.
– Why not optimal: 1) average procedure is easily affected by extreme values and loses details; 2) contribution of less-frequent words neglected.
– Solution: develop an understanding of how a frequency list can be used as a “ruler” of the text's difficulty level.
24 | WORD FREQUENCY AND READABILITY
Research Questions● What is the relationship between word frequency and text
readability?
● Which frequency measures are better predictors of textual complexity?
● How can word frequency lists be better used to characterize text readability?
25 | WORD FREQUENCY AND READABILITY
Methods: Frequency Lists● SUBTLEXus (Brysbaert & New, 2009):
– 74,286 word forms
– calculated from a 51-million-word corpus of subtitles from 8,388 American films and television series between the years 1900 and 2007.
● SUBTLEXuk (van Heuven, Mandera, Keuleers, & Brysbaert, 2014)
– 160,022 word forms
– calculated from a 201.7-million-word corpus of subtitles from nine British TV channels broadcast from January 2010 to December 2012.
26 | WORD FREQUENCY AND READABILITY
Frequency Measures
27 | WORD FREQUENCY AND READABILITY
Methods: Corpora● Training corpus: WeeBit (Vajjala & Meurers, 2012)
– Sources: educational magazine Weekly Reader and BBC-Bitesize website
– 789,926 words, 616 texts in each level, 5 levels
● Testing corpus: Appendix B of the Common Core State Standards (CommonCore, 2010), 168 texts
28 | WORD FREQUENCY AND READABILITY
Experimental Procedure● Tokenize corpus texts
– CoreNLP Tokenizer (Manning et al., 2014)
● Calculate various frequency values as features of texts
● Train a ML classification model with training corpus
– The “class” package of R
– Algorithm: K-nearest neighbors
● Apply the trained model on test corpus
● Report results
– Within-corpus statistics: 10-fold CV accuracy, 10-fold CV Spearman's ρ
– Cross-corpus statistics: Spearman's ρ
29 | WORD FREQUENCY AND READABILITY
Study 1: Frequency means and SD as Features ● Purposes:
– Testing the use of frequency lists for predicting readability
– Testing if frequency lists from different corpora have different effects in predicting readability
– Testing if different frequency measures make a difference
● Features:
– Average frequency of word forms with/without standard deviation
– Average frequency of word types with/without standard deviation
– Tested all the frequency measures provided by SUBTLEXus and SUBTLEXuk
30 | WORD FREQUENCY AND READABILITY
Study 1: Results
31 | WORD FREQUENCY AND READABILITY
Study 1: Findings
● Models trained with both the mean and SD features performed consistently better than those with only mean frequencies, be it type or token averages.
● Type models had uniformly better accuracy and validation performance than token models (see illustration).
● The corpus from which the frequency list was constructed mattered when it is used to characterize text readability.
32 | WORD FREQUENCY AND READABILITY
Token and Type Differences for Common Core
33 | WORD FREQUENCY AND READABILITY
Study 2: Proportions of Words from Frequency Bands of Increasing Fine-grainedness
● Purposes:
– Testing the effectiveness of using frequency lists as a ruler of readability.
● Hypothesis:
– The more words of a text are from the less frequent bands, the higher the perception demand for these words, hence higher textual difficulty and less readability.
● Features:
– Percentage of words from each frequency band
– Gradual increase of band fine-grainedness, or the number of bands the frequency list is cut into
– Band stratification with different frequency measures: LOGFREQCBEEBIES_ZIPF and CD_CBBC from SUBTLEXuk; ZIPF_VALUE and SUBTLCD from SUBTLEXus.
34 | WORD FREQUENCY AND READABILITY
Study 2: Illustration
Band 1
Band 2
Band 1
Band 2
Band 3
Band 1
Band 2
Band 3
Band 4
Band 1
Band 2
Band 3
Band 4
Band 5
… (up to 100 bands)
35 | WORD FREQUENCY AND READABILITY
Study 2: Results with SUBTLEXuk Measures
36 | WORD FREQUENCY AND READABILITY
Summary of Results
● Best-performing token model: 20 bands on LOGFREQCBEEBIES_ZIPF10-fold CV Acc. = 48% (5% improvement to the corresponding model in Study 1)within-corpus ρ = .65, p<.001cross-corpus ρ = .54, p<.001
● The LOGFREQCBEEBIES_ZIPF type models were not generalizable.
● Neither the CD_CBBC type models nor the token models were generalizable.
37 | WORD FREQUENCY AND READABILITY
Study 2: Results with SUBTLEXus Measures
38 | WORD FREQUENCY AND READABILITY
Study 2: Summary of Results: SUBTLEXus features
● ZIPF_VALUE had better training performance, while the SUBTLCD measure had more stable testing performance.
● Finer-grained frequency bands did not improve testing results beyond 10 bands.
SUBTLCD ZIPF_VALUEType m
ode lToken m
od el
Number of bands
Spe
arm
an's
rho ―― Within-corpus rho
―― Cross-corpus rho
39 | WORD FREQUENCY AND READABILITY
Study 2: Findings
● It is effective to use the frequency list as a “ruler” of language use to measure readability.
● Although the training performance improves with finer stratification schemes, the testing performance does not improve beyond 10 bands.
● The US list has better performance when the trained models are carried over to a test corpus. Models trained with the UK list do not generalize.
● The type models involving contextual diversity (i.e., SUBTLCD) have more stable performance than the other models.
40 | WORD FREQUENCY AND READABILITY
Study 3: Word Frequencies Cluster Means as Features
● Purposes:
– Approaching readability from an “internal” perspective, namely the frequency of words chosen for the text.
● Hypotheses:
– Difficult texts usually use more less-frequent words, while easier texts use less.
– Groupings of word frequencies and the group values are revealing to the text's readability
● Features:
– Cluster means
– Cluster Zipf values from both SUBTLEXus and SUBTLEXuk
– Up to 100 clusters tested
41 | WORD FREQUENCY AND READABILITY
Study 3: A Simplified Illustration
Word 1
Word 7
Word 6
Word 14
Word 9
Word 2
Word 4
Word 8
Word 10
Word 5
Word 3
Word 13
Word 15
Word 12
Word 11
Word 1
Word 7
Word 6
Word 14
Word 9
Word 2
Word 4Word 8
Word 10Word 5
Word 3
Word 13
Word 15
Word 12
Word 11Cluster 1
Cluster 2
Cluster 3
Text
Text
Clustering
42 | WORD FREQUENCY AND READABILITY
Study 3: Results
43 | WORD FREQUENCY AND READABILITY
Study 3: Results
LOGFREQCBEEBIES_ZIPF ZIPF_VALUE
Type mode l
Token mod el
―― Within-corpus rho―― Cross-corpus rho
44 | WORD FREQUENCY AND READABILITY
Study 3: Findings
● The type and token models had similar performance in terms of accuracy estimates, within- and cross-corpus ρs.
● No significant difference were found between the performance of models trained on measures from different lists.
● Improved performance with the increase of cluster numbers, cross-corpus ρs peaking at around 70 clusters.
● The ZIPF_VALUE measure from the US list performed marginally better than its counterpart from the UK list.
● The trained classifiers were generalizable to the test corpus—a finding that suggests the existence of frequency effects on readability.
45 | WORD FREQUENCY AND READABILITY
Conclusions
● The lexical measure of word frequency is effective in characterizing text difficulty.
● Frequency lists: faithfully represent language usage and exposure.
● Frequency measures: normalized measures that accurately estimate the cognitive load involved in vocabulary perception and retrieval.
● The methods: – Simple overall mean and sd: easy and effective, given that the measure meets the
previous two criteria.
– Stratification: improved performance, requires fine-tuning number of bands, less generalizable
– Clustering: best performance, least sensitive to list and measure, most expensive
46 | WORD FREQUENCY AND READABILITY
References● ACT, (2015). The Condition of College & Career Readiness 2015 (National). ACT● Adelman, J. S., Brown, G. D. A., & Quesada, J. F. (2006). Contextual diversity, not word frequency, determines word naming and
lexical decision times. Psychological Science, 17, 814–823.● Balota, D. A., & Chumbley, J. I. (1984). Are lexical decisions a good measure of lexical access? The role of word frequency in the
neglected decision stage. Journal of Experimental Psychology: Human Percep- Tion & Performance, 10, 640–357.● Benjamin, R. G. (2012). Reconstructing readability: recent developments and recommendations in the analysis of text difficulty.
Educational Psychology Review, 24(1), 63–88.● Bernhardt, E. B., & Kamil, M. L. (1995). Interpreting relationships between L1 and L2 reading: Consolidating the linguistic
threshold and the linguistic interdependence hypotheses. Applied Linguistics, 16, 15–34.● Bricker, P. D., & Chapanis, A. (1953). Do incorrectly perceived stimuli convey some information? Psychological Review, 60, 181–
188.● Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and
the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990.
● Collins-Thompson, K. (2014). Computational assessment of text readability: A survey of past, present, and future research. In T. François & D. Bernhard (Eds.), Recent Advances in Automatic Readability Assessment and Text Simplification. Special issue of International Journal of Applied Linguistics (pp. 97–135).
● CommonCore. (2010). Common Core State Standards for English Language Arts and Literacy in History/Social Studies, Science, and Technical Subjects. Common Core State Standards Initiative. Retrieved from http://www.corestandards.org
● Dale, E., & Chall, J. (1948). A formula for predicting readability. Educational Research Bulletin, 27(Jan. 21 and Feb. 17), 1–20, 37–54.
● Dale, E., & Chall, J. (1949). The concept of readability. Elementary English, 26(3).● Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32, 221–233.● Gray, W. S., & Leary, B. E. (1935). What makes a book readable. Chicago: University of Chicago Press.● Haseley, L. (1957). The relationship between cue-value of words and their frequency of prior occurrence. Ohio university.● van Heuven, W. J. B., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). SUBTLEX-UK: A new and improved word frequency
database for British English. The Quarterly Journal of Experimental Psychology, 67(6), 1176–90.
47 | WORD FREQUENCY AND READABILITY
References● Howes, D. H., & Solomon, R. L. (1951). Visual duration threshold as a function of word-probability. Journal of Experimental
Psychology, 41, 501–410.● Jescheniak, J. D., & Levelt, W. J. M. (1994). Word frequency effects in speech production: Retrieval of syntactic information and
of phonological form. Journal of Experimental Psychology: Learning, Memory, & Cognition, 20, 824–843.● Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87(4),
329–354.● Kintsch, W., Britton, B. K., Fletcher, C. R., Kintsch, E., Mannes, S. M., & Nathan, M. J. (1993). A Comprehension-Based Approach
to Learning and Understanding. In D. L. Medin (Ed.), Psychology of Learning and Motivation (Vol. 30, pp. 165–214). New York: Academic Press.
● Kincaid, J. P., Rogers, R. L., Fishburne, R. P., & Chissom, B. S. (1975). Derivation of new readability formulas ( Automated readability index , Fog count and Flesch reading ease formula ) for navy enlisted personnel. Millington, Tennessee.
● Kintsch, W., & Vipond, D. (1979). Reading comprehension and readability in educational practice and psychological theory. In L. G. Nilsson (Ed.), Perspectives on memory research (pp. 24–62). Hillsdale, NJ: Erlbaum.
● Klare, G. R. (1968). The role of word frequency in readability. Elementary English, 45(1), 12–22.● Krashen, S. (1985). The Input Hypothesis: Issues and Implications. New York: Longman.● Kutner, M., Greenberg, E., Jin, Y., Boyle, B., Hsu, Y., Dunleavy, E., & White, S. (2007). Literacy in everyday life results from the
2003 national assessment (NCES 2007-480). U.S. Department of Education. Washinton, DC: National Center for Education Statistics.
● Laufer, B. (1992). How much lexis is necessary for reading comprehension? In H. Bejoint & P. Arnaud (Eds.), Vocabulary and applied linguistics (pp. 126–132). Basingstoke & London: Macmillan.
● Laufer, B., & Ravenhorst-Kalovski, G. (2010). Lexical threshold revisited: Lexical text coverage, learners’ vocabulary size and reading comprehension. Reading in a Foreign Language, 22(1), 15–30.
● Leroy, G., & Kauchak, D. (2013). The effect of word familiarity on actual and perceived text difficulty. Journal of the American Medical Informatics Association, (0), 1–4.
● Lexile. (2007). The Lexile Framework ® for reading: Theoretical Framework and Development. Durham, NC.● Lively, B. A., & Pressey, S. L. (1923). A method for measuring the “vocabulary burden” of textbooks. Educational Administration
and Supervision, 9, 389–398.
48 | WORD FREQUENCY AND READABILITY
References● Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford CoreNLP Natural
Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 55–60).
● Marks, C. B., Doctorow, M. J., & Wittrock, M. C. (1974). Word frequency and reading comprehension. The Journal of Educational Research, 67(6), 259–262.
● Milone, M., & Biemiller, A. (2014). The development of ATOS: The Renaissance readability formula. Wisconsin Rapids.● Monsell, S., Doyle, M. C., & Haggard, P. N. (1989). Effects of frequency on visual word recognition tasks: Where are they? Journal
of Experimental Psychology: General, 118, 43–71.● Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.● Nation, I. S. P. (2006). How large a vocabulary is needed for reading and listening? The Canadian Modern Language Review,
63(1), 59–82. ● Nelson, J., Perfetti, C., Liben, D., & Liben, M. (2012). Measures of text difficulty: Testing their predictive value for grade levels and
student performance.● Ojemann, R. J. (1934). The reading ability of parents and factors associated with reading difficulty of parent education materials.
University of Iowa Studies in Child Welfare, 8, 11–32.● Patty, W. W., & Painter, W. I. (1931). A technique for measuring the vocabulary burden of textbooks. Journal of Educational
Research, 24, 127–134.● Pearson, P. D., & Hiebert, E. H. (2014). The State of the Field: Qualitative Analyses of Text Complexity. The Elementary School
Journal, 115(2), 161–183.● Qian, D. D. (1999). Assessing the roles of depth and breadth of vocabulary knowledge in reading comprehension. The Canadian
Modern Language Review, 56, 282–308.● Qian, D. D. (2002). Investigating the relationship between vocabulary knowledge and academic reading performance: An
assessment perspective. Language Learning, 52, 513–536.● Rayner, K., & Duffy, S. A. (1986). Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and
lexical ambiguity. Memory & Cognition, 14, 191–201.● Ryder, R. J., & Slater, W. H. (1988). The relationship between word frequency and word knowledge. The Journal of Educational
Research, 81(5), 312–317.
49 | WORD FREQUENCY AND READABILITY
References● Ulijn, J. M., & Strother, J. B. (1990). The effect of syntactic simplification on reading EST texts as L1 and L2. Journal of Research
in Reading`, 13, 38–54.● Vajjala, S., & Meurers, D. (2012). On improving the accuracy of readability classification using insights from second language
acquisition. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP.● Zakaluk, B. L., & Samuels, S. J. (Eds.). (1988). Readability: Its Past, Present, and Future. Newark, Del.: International Reading
Association.
50 | WORD FREQUENCY AND READABILITY
Thank you!Contact:Xiaobin [email protected]
Detmar [email protected]
LEAD Graduate School,Eberhard Karls Universität Tübingenwww.lead.uni-tuebingen.de