mel-tr-74-29 - eric · document.resume sd 097 642 cs 001 383 author williams, allan r., jr.; and...
TRANSCRIPT
DOCUMENT.RESUME
SD 097 642 CS 001 383
AUTHOR Williams, Allan R., Jr.; And OthersTITLE Readability of Textual Material--A Survey of the
Literature.INSTITUTION Air Force Human Resources Lab., Lowry AFB, Colo.
. Technical Training Div.REPORT NO MEL-TR-74-29PUB DATE Jul 74NOTE 67p.; See related document CS 001 381
EDRS PRICE MF-$0.75 HC-$3.15 PLUS POSTAGEDESCRIPTORS Armed Forces; *Literature Reviews; *Readability;
Reading; *Reading Comprehension; Reading Instruction;Reading Level; *Reading Materials; ReadingResearch
ABSTRACTThis literature review documents the state-of-the-art
of readability assessment and indicates directions for futureresearch in readability measurement. The period since 1953 isemphasized, although there is some consideration of earlier work.Primary emphasis within the review is based on readabilitymeasurement; however, a final section is included which reviewsrecent work bearing on the topic of increasing comprehensibilitythrough multimodal presentation methods. Various formulas forcalculating readability are presented and placed in historicalperspective. The contents include: "Introduction," which presents thescope and organization of the review; "Methods for MeasuringReliability,u which discusses rating methods, use tests, readabilityformulas, early formulas, detailed formulas, recent formulas, clozeprocedures, and multimodal presentations; and "Discussion," whichdiscusses the estimation of reading levels, formulas, and futureresearch. An appendix of various readability measures is alsoincluded. (NR)
'UR FORCE LI.;
H
UMAN
RES0URC
U S DEPARTMENT OF HEALTH.EDUCATION WELFARENATIONAL INSTITUTE OF
EDUCATIONT HI DO( oAAL NI HAS BEEN REPRODUCfh t pEcrivro FROMT1i1 PI RSON.:IR ORGANIZATION ORIGINAf:M,'T POINTS 01 VIEW OR OPINIONSSIA PO NOT NECESSARILY REPRE
.)I I IC 1AL NATIONAL INSTITUTE OFE.DU(nTIUN POSITION Ok POLICY
AFHRLTR-74-29
READABILITY OF TEXTUAL MATERIALA SURVEY OF THE LITERATURE
By
Allan R. Williams, Jr.Arthur I. Siegel
Applied Psychological Services, Inc.Wayne, Pennsylvania 19087
James R. Burkett
TECHNICAL TRAINING DIVISIONLowry Air Force Base, Colorado 80230
July 1974
Approved for public release; distribution unlimited.
LABORATORY
AIR FORCE SYSTEMS COMMANDBROOKS AIR FORCE BASE,TEXAS 78235
NOTICE
When US Government drawings, specifications, or other data are usedfor any purpose other than a definitely related Governmentprocurement operation, the Government thereby incurs noresponsibility nor any obligation whatsoever, and the fact that theGovernment may have formulated, furnished, or in any way suppliedthe said drawings, specifications, or other data is not to be regarded byimplication or otherwise, as in any manner licensing the holder or anyother person or corporation, or conveying any rights or permission tomanufacture, use, or sell any patented invention that may in any wayly: related thereto.
This report was submitted by Applied Psychological Services, Inc.,Wayne, Pennsylvania 19087, under contract F41609.72-C-0011, workunit 11210403, with the Technical Training Division, Air Force HumanResources Laboratory (AFSC), Lowry Air Force Base, Colorado 80230.Dr. James R. Burkett, Instructional Technology Branch, was thecontract monitor.
This report has been reviewed and cleared for open publication and/orpublic release by the appropriate Office of Information (Op inaccordance with AFR 190.17 and DoDD 5230.9. There is no objectionto unlimited distribution of this report to the public at large, or byDDC to the National Technical Information Service (NTIS).
This report has been reviewed and is approved.
MARTY R. ROCKWAY, Technical DirectorTechnical Training Division
Approved for publication.
HAROLD E. FISCHER, Colonel, USAFCommander
UnclassifiedSECURITY CLASSIFICATION OF THIS PAGE (When Data Entered)
REPORT DOCUMENTATION PAGE BEFORE COMPLETING FORMI. REPORT NUMBER
AFHRITR-74-292. GOVT ACCESSION NO. 3. RECIPIENT'S CATALOG NUMBER
4. TITLE (and Subtitle)READABILITY OF TEXTUAL MATERIALS ASURVEY OF THE LITERATURE
15. TYPE OF REPORT & PERIOD COVERED
6. PERFUHMING ORG. REPORT NUMBER
7. AUTHOR(a)Allan R. Williams, Jr.Arthur I. SiegelJames R. Burkett
8. CONTRACT OR GRANT NUMBER(a)
F41609-72-C-0011
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Applied Psychological Services, Inc.Wayne, Pennsylvania 19087
10. PROGRAM ELEMENT. PROJECT, TASK
62185t 6 WORK UNIT NUMBERS
11210403
11. CONTROLLING OFFICE NAME AND ADDRESSHq Air Force Human Resources Laboratory (AFSC)Brooks Air Force Base, Texas
12. REPORT DATEJuly 1974
13. NUMBER OF PAGES70
1i. MONITORING AGENCY NAME & ADDRESS(it different from Controlling Office)
Technical Training DivisionAir Force Human Resources LaboratoryLowry Air Force Base, Colorado 80230
IS. SECURITY CLASS. (of this report)
UnclassifiedI5a. DECLASSIFICATION:DOWNGRADING
SCHEDULE
16. DISTRIBUTION STATEMENT (of Mitt Report)
Approved for public release, distribution unlimited.
17. DISTRIBUTION STATEMENT (of the abstract entered In Block 20, if different from Report)
18. SUPPLEMENTARY NOTES
19. KEY WORDS (Continue on revere* side if necessary and identify by block number)
readabilityreading levelsreading comprehensiontraining materialstechnical writing
20. ABSTRACT (Continue on reverse side If necessary end Identify by block number)
The literature relating to methods of measuring the readability/comprehensibflity of textual materials is reviewedand analyzed. Various formulas for calculating readability are presented and placed in historical perspective. Thegeneral status of research into the development of readability indices is discussed.
Dn F RM147340 I JAN 73 EDITION OF 1 NOV SS IS OBSOLETE
Unclassified
SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered)
SUMMARY
Problem
It is expected that the mental aptitude and academic achieve-ment levels of enlistees into the United States Air Force may dropas the goal of an all-volunteer military force is approached. More-over, there is currently a gap between the reading achievement levelrequired to read certain Air Force technical literature and the read-ing ability levels of enlistees. Accordingly, the Human ResourcesLaboratory initiated a program to define methods to optimize thematching of technical training materials to the literacy skills of AirForce trainees. This review of the literature relating to methodsfor determining the readability of textual material represents thefirst result of this program. The two remaining aspects of the pres-ent study are: (1) experimental evaluation of modified training materi-als, when presented with and without auditory. supplementation, and(2) preparation of a training materials modification handbook, whichwill integrate information from the literature review, the materialmodification effort, and the experiment.
This literature review is intended to: (1) document the state-of-the-art of readability assessment, and (2) indicate directions forfuture research in readability measurement.
App roach
This report selectively reviews the literature relative to read-aoility-comprehensibility measurement. The period since 1953 is em-phasized, although there is some consideration of earlier work.
Sources searched for relevant literature included the Psycho-logical Abstracts, from 1950 through 1971, the Technical AbstractBulletin of the Defense Documentation Center, from 1962 through1971, and the U. S. Government Research and Development Reportsof the Department of Commerce, from 1968 through 1971. ThePASAR automated retrieval system of the American PsychologicalAssociation was employed to search more completely the literatureabstracted in the Psychological Abstracts between 1967 and 1970. Ofcourse, many additional references were found while reading the pa-pers indicated through the above sources.
1
NResults
Many readability formulas have been derived. The majorityare.based on linear regression equations relating various observedcharacteristics of text; i. e., sentence length or word length, to somecriterion of comprehension, such as a comprehension test score. Allof the formulas are highly intercorrelated and are all undoubtedly high-ly correlated with cloze score. Cloze score is based on a relativelynew procedure in which a judgment of readability is based on the per-centage of deleted words in textual material which subjects are ableto replace correctly. Cloze score has gained considerable recent ac-ceptance as a readability criterion. But practical considerations maymake application of one of the many other available formulas more ap-propriate in many instances.
Conclusions
The readability measurement field suffers from the lack of aunifying conceptual or theoretical structure. Clearly, readability ismultifactor in nature. The dimensions of readability must be deter-mined with consideration given to variables within both the text andthe reader. Upon isolation of these variables, studies to indicate howthe variables interact in dynamic reading situations will be required.While the call for a unifying theoretical structure may seem to evadepractical issues, it appears that little real progress can be made inreadability measurement without such a structure. Specific sugges-tions for required research are contained within the body of this re-port.
ii
TABLE OF CONTENTS
Pam.
CHAPTER I - INTRODUCTION 1
Scope and Organization of this Review 1
CHAPTER II- METHODS FOR MEASURING RELIABILITY 3
Rating Methods 3Use Tests 3The Readability Formulas 4Early Formulas 5Detailed Formulas 8Less Cumbersome Formulas 14Recent Formulas 21Cloze Procedures 30Application of Readability Formulas as
"Rules for Writing" 33Multimodal Presentation 34
CHAPTER III- DISCUSSION 39
Estimation of Reading Level 39Discussion of Formulas 41Future Research Avenues 44
REFERENCES 47
APPENDIX A - Glossary of Readability Measures 57
CHAPTER I
INTRODUCTION
It is expected that the mental aptitude and academic achieve-ment levels of enlistees into the U. S. Air Force will drop as thegoal of an all-volunteer military force is approached (Valentine &Vito la, 1970). Project 100, 000 has also had the effect of loweringthe overall academic achievement level of the Air Force. Accord-ingly, it is expected that efforts to increase the comprehensibility ofwritten materials, both those intended for training purposes and thoseused on the job, will yield significant advantages. An example of themismatch between the reading level of military personnel and the read-ability of the materials they use can be found in the work of Vineberg,Sticht, Taylor, and Caylor (1970), who reported that 75 per cent of asample of Army Military Occupational Specialties' reading materialswere written at ^ level six to eight school grades higher than the read-ing level of low- .ave1 (Category IV) personnel and four to six grade lev-els higher than the average reading levels of non-Category IV person-nel. The need tc match the reading level of Air Force technical litera-ture to the reading level of trainees and job incumbents is quite clear.
Scope and Organization of this Review
This report reviews the literature relevant to techniques formeasuring the readability/ comprehensibility of written materials.Use of such techniques could help to improve the intelligibility of AirForce reading materials and by implication reduce the gap betweenthe reading level of the reader and the material he reads. Primaryemphasis within the review is based on readability measurement. How-ever, a final section is included which reviews recent work bearing onthe topic of increasing comprehensibility through multimodal presenta-tion methods.
The term "readability" may be defined as:
the sum total (including the interactions) of all those elements within agiven piece of printed material that affects the success a group of readershave with it. The success is tne extent to which they understand it, readit at an optimum speed, and find it interesting (Dale & Chall, 1949).
OtSt
4011404
Since readability measurement within a training context is the topicof interest here, only the first part of this comprehensive definition- -readability as it affects ease of understanding or comprehension--willbe considered directly.
Within the measurement concept, "readability formulas, " inwhich attempts are made to predict the readability of written materialbased on quantitative analysis of the material, will be considered first.These methods have been traditionally based on items such as averageword length (in letters), average sentence length (in words), frequencyof occurrence of words not appearing on lists of common words, andfrequency of occurrence of prepositional phrases and independentclauses. Mare (1963) reports that various reviewers have countedbetween 29 and 56 such predictive formulas. However, work on thistype of formulation has greatly slowed within the past 15 years (Bormuth,1966; Tannenbaum, & Greenberg, 1968). Studies of readability sincethat time have tended to concentrate on the "cloze" procedure of W. L.Taylor (1953).
The cloze procedure is the second topic of discussion. In thistechnique of readability assessment, the percentage of deleted wordsin a passage correctly filled in by a reader is taken as an index ofreadability. This procedure overcomes the problem of measuring thereadability of technical literature having an unusual, specialized vocabu-lary which is familiar to individuals within a specific milieu--the pre-cise problem of interest in dealing with technical job-related literature.
2
CHAPTER II
METHODS FOR MEASURING READABILITY
Many approaches are possible to the problem of objectivelymeasuring the readability or comprehensibility bf written prose. Themost elementary methods; e. g. , rating methods and use tests, pos-sess serious drawbacks.
Rating Methods
In rating mm thods, judgments of the readability of written manu-als are made by samples which are representative of the intended userpopulation or by persons considered to be expert in the covered field.These judgments are necessarily quite subjective, requiring that alarge number of raters be employed to "average out" the variabilitybetween raters. Raters must also be presented with materials cover-ing a wide range of comprehensibility, so that they may choose theparticular materials exhibiting the optimum level of comprehensibilityfor the intended user population. This procedure may be very expen-sive to use due to the range of written materials which must be prepared,the necessary size of the rating group, and the effort required to inter-pret the ratings. Problems may also be encountered in selectingan ap-propriate rating group and in generalizing from the rating group to theusing population.
Use Tests
The use test method fol. evaluating readability involves admin-istering comprehension (use) tests to a sample of those for whom thematerial is intended, after the material has been read by the personsin the sample. High test scores dre assumed to be associated withhighly readable text, and low scores with less comprehensille text.In addition to the time required to collect and test the sample of users,large amounts of time are required to write tests based on the text.
3
Standardization of the tests is impossible, a priori, and spe-cific tests may well be much too easy or difficult to rate accuratelythe text versions of interest. Moreover, if low scores are attained,it is not known whether the low test scores can be attributed to thecharacteriatics of the text itself or, possibly, to the inability of thetested sample to comprehend in general. Finally, there is %theguidance available regarding whether the tests should test transferof factual information, transfer of main points, ability to apply in-formation, or what ?
The Readability Formulas
Quantitative analysis of written text has become the most popu-lar method of assessing readability. The reliability and economy ofsuch methods may be expected to far surpass that of the previouslymentioned methods. One reviewer (Mare, 1963) reports that be-tween the publications of the first such formula in 1923 and 1959,over 29 "readability formulas" were developed.
Quantitative analysis of written text has been conducted in hopeof finding what determines a readable style. Until very recently, notheoretical model of language behavior was available from which togenerate hypotheses. Hence, the method employed in analysis of texthas been one of correlation. Characteristically, the text is analyzedand possible factors affecting readability (such as sentence length andvocabulary measures) are conjectured.
A set of readings is then collected and ordered by readabil-ity according to a specific criterion (such as reading speed, testedcomprehension, judgment). The chosen variables are then meas-ured, their correlations with the readability criteria determined,and regression equations written. Until very recently, linear regres-sion techniques and an additive model have been used exclusively. Ingeneration of the equations, analysts have added additional factors un-til the increase in predictable variance accounted for by adding al atherfactor was negligible.
In recent years, activity in this area has sharply declined(Tannenberg& Greenbaum, 1968). We may, however, soon witnessa renewal of interest in measurement of readability as the science ofpsycholinguistics grows. Analysis of readability from a theoretical
4
431,Copp
point of view, e.g., Bormuth (1966), may contribute greatly to thescientific understanding of the written information transfer process.Use of electronic data processing equipment in analyzing readabilityis also a great aid in overcoming two of the traditional problems inreadability analysis: the time and effort required to perform the anal-ysis by hand, and reliability of measurement--both across time andacross individuals.
Early Formulas,
The first important attempt to measure objectively readabilityrepresented an attempt to aid teachers in choosing appropriate textsfor their classes.
In the years around 1920, science teachers at the junior highschool level complained that the number of technical and scientificterms present in textbooks intended for classroom use was becomingexcessively large. It was becoming necessary to devote large amountsof class time to teaching the meanings of the new vocabulary, lesseningthe time available to teach scientific concepts. Lively and Pressey(1923) attempted to develop an objective method for determining the vo-cabulary difficulty, or "burden, " of textbooks, so that those books withexceptionally dqficult vocabularies could be avoided.
Their measure was based on a simplification of Thorndike'sTeacher's Word Book of 10 000 Words, published in 1921. This booklists the frequency of occurrence of the most commonly occurring10,000 words in the English language. Lively and Pr..,ssey used a"weighted median index number'' as their measure of vocabulary dif-ficulty. In order to compute the index, a sample of one thousandwords evenly distributed through a text was taken. The individualwords were found on Thorndike's list, and an index was assigned toeach on the basis of its location in the list. For example, those wordsappearing in the most common thousand.(according to Thorndike) re-ceived an index number of ten. Those in the second thousand wereassigned index number nine, and so on through the ten, thousand-wordblocks in the list. Words not appearing on Thorndike's list were as-signed an index number of zero.
5
BESTcopy AVAILABLE
To compute the vocabulary burden, the median value of theindices of the words sampled from the text was determined, count-ing each value of zero twice. Substantial agreement was obtainedbetween the judged difficulty of a wide variety of reading selections,ranging from second grade to college level, and the ordering of theselections by the weighted median index number. These findings en-abled Lively and Pressey to conclude that the weighted median indexnumber provided an estimate of the vocabulary difficulty of texts.Lively and Pressey were aware of the weaknesses in their readabil-ity measure. They pointed out that their 'index relies on the appro-priateness of Thorndike's word count, and this may not be optimalfor their intended applications, since Thorndike's count appeared tothem to be based largely on materials employing a literary and evenpoetic vocabulary. They also admitted that larger samples than onethousand words from an entire book may be more appropriate, al-though they made no effort to evaluate the effects of varying samplesizes.
In addition to its purely historical significance, the readabil-ity assessment procedure of Lively and Pressey is significant becauseit led directly to the first of the "readability formulas" of the moderntype,-that of %V ashburne and Vogel (Chall, 1958)--in which various fac-tors correlating with a selected criterion of readability were combinedinto a single multiple regression equation.
Washburne and Vogel's effort began as one of generating normsfor use with the Lively and Pressey method, a need which was pointedout by Lively and Pressey when they presented their initial report.Working in the Winnetka, Illinois, school system, Washburne and Vogeldetermined the weighted median index numbers of 700 books reportedas having been read and liked during the preceding year by at least 25of the 37, 000 students in the school system. Categorization scorescomprising a weighted median index number were correlated with me-dian grade levels derived from the paragraph meaning section of theStanford Achievement Test for those students who reported readingand liking a given book. it was found that a correlation of . 80 existedbetween grade level and number of "zero-value" words present in thetested sample from the corresponding book (Washburne & Vogel, 1926).This information was summarized in the Winnetka Graded Book List,which received considerable use by parents, teachers, and librariansin selecting books to be made available to children in grades 3 through 9(Chall, 1958).
8E67COPY
AVAILABLE
A need was soon realized for a method of determining the ap-propriate grade level for books published after the Winnetka list wasprepared, without repeating the massive effort employed in developingthe original list.. Accordingly, Washburne and Vogel selected 150 ofthe books on the Winnetka list to isolate the internal factors related toreading difficulty; i, e. , factors which might be effective in distinguish-ing those books read by lower grade pupils from those read by studentsof higher grades. Ten factors were found, and their correlation withgrade level was determined. All factors correlated significantly withthe criterion of grade level, but'only four were subsequently used inderiving a regression equation for measuring readability, since manyof the other factors had very high intercorrelations (Vogel & Washburne,1928).
The readability formula developed by Vogel and Washburne isbased on a systematically chosen sample of 1000 words from the bookto be tested, which is analyzed as follows:
number of different words appearing (x2)
number of occurrences of prepositions (x3)
number of words not on Thomdikeis list of 10, 000 (x4)
numbs of simple sentences in 75 sample sentences (x5)
The difficulty of the book is expressed in terms of grade levels of theStanford Paragraph Meaning Test (X1). The regression equation de-rived was:
Xi = . 085x2 + . 101x3 . 604x4 - 411x5 + 17. 43.
This formula correlated . 845 with the reading test scores of studentswho read and liked the respective books during the previous year. Itis important to note that the criterion employed here is not strictly com-prehensibility of the text. It is confounded by subjective evaluation of thebooks by students and factors such as subject matter and number of pic-tures, which are not of direct interest to the question of readability.
7
RESTCOPY
4411ABLE
In addition to initiating the use of multiple regression equa-tions in the study of readability, Washburne and Vogel's study is sig-nificant because of their conceptual approach to the problem. Theywere the first to: (1) analyze the effects of structural factors in thetext, (2) employ an objective criterion of textual difficulty as opposedto qualitative judgment, and (3) describe readability in terms of schoolgrade levels, as opposed to purely relative measures (Chall, 1958).
The readability measures of Lively and Pressey and of Vogeland Washburne are typical of the early studies of readability in thatthey: (1) measure readability essentially as a function of vocabularyfactors, (2) depend heavily on Thorndike's word list, and (3) employrelatively crude criteria of difficulty of text. Other less significantearly attempts to measure readability include those of Johnson (1930),Patty and Painter (1931), and Thorndike (1934) himself.
Detailed Formulas
The readability studies of Ojemann, Dale and Tyler, and Gray.and Leary are the most significant of the efforts undertaken between 1934and 1938, a period characterized by' evaluation of larger numbers offactors potentially related td readability, reduced emphasis on wordfrequency lists, such as that of Thorndike, and concern over more ob-jective readability criteria.
The year 1934 is considered to be the beginning of the trendtoward detailed analyses of factors relating to readability, in which alarge number of factors, including qualitative ones, were evaluated.The first of these to appear was that of Ojemann (1934), who institutedthe use of comprehension test scores as the criterion of readability.
Ojemann collected 16 parent-education passages, each approxi-mately 500 words in length. The passages were read by adults, andcomprehension tests were given covering each passage. A reading achieve-ment test was also administered. The grade-level of difficulty of eachpassage was taken as the mean tested reading level of those people whocorrectly answered at least half of the comprehension test questionsfor that selection. After arranging the 16 selections in order of diffi-culty, Ojemann analyzed their contents for eight sentence factors, six
aer
vocabulary factors, and three qualitative factors. All of the 14 quan-titative factors (sequence factors and vocabulary) correlated signifi-cantly with the criterion.
Of the sentence factors, three exhibited correlations of . 60or above. These were: number of simple sentences, number ofprepositions, and number of prepositions plus infinitives. The fivesentence factors having correlations of less than .6 with the criteri-on of difficulty were: number of complete sentences, number of com-pound sentences, number of dependent clauses, mean length of de-pendent clauses, and ratio of total words in independent clauses tothe total words in a selection.
All six vocabulary factors correlated . 60 or higher with thecriterion. These were: percentage of words in Thorndike's first1000, percentage of words in Thorndike's first 2000, percentage ofwords known by 70 per cent of sixth-grade pupils, percentage ofwords known by 90 per cent of sixth-grade pupils, mean difficultyof different words, based on Thorndike's list, and mean difficultyfor each word.
No attempt was made to write a regression equation, sincethe qualitative factors, concreteness versus abstractness of rela-tions, obscurity in expression, and incoherence in expression, ap-peared to have considerable importance in determining the difficultyof the passages, although they could not be quantified. Instead of aformula, Ojemann presents his 16 tested selections, with their re-spective values for all the quantitative factorb and tests of grade-lev-el difficulty. This type of presentation was intended to allow evalua-tion of the qualitative factors in testing new passages, in addition tocomparing new passages according to the quantitative factors.
9
In addition to being the first application of comprehensiontest scores as a criterion of reading difficulty, Ojemann's studywas the first Lo employ adult subjects and adult reading materials.Furthermore, he was tly.: first to demonstrate the'importance ofqualitative, nonstatisticalfactors in the determination of readabil-ity.
Dale and Tyler (1934) evidenced vdry great concern over therange of applicability of then-available re,:dability measures. Theirinterest was in defining those factors determining the difficulty ofreading materials dealing with health education for adults of extreme-ly low reading ability. They stated that:
A critical analysis of the widely varying results of previous studies indi-cates the impossibility of determining the factors in the reading materials whichmake them understandable unless the investigations separate the influence offactors within the reading materials from those outside. Thereader's interestin the topic treated in the reading matter, his ability to read, the kind of com-prehension appropriate to the purposes of the reading matter, and the difficultyof the ideas developed in tht: reading matter are all factors which greatly affecthis comprehension of the rnaterial read but are distinct from the characteristicsinvolved in the materials themselves... the various factors not in the readingmaterials themselves must be controlled in order to determine the effects offactors within the materials (pp. 384-385).
In order to isolate the factors which affect reading difficulty,given a fixed topic, a fixed readership, and a fixed level of compre-hension, Dale ana Tylor asked adults of low reading ability to readhealth education articles collected from newspapers, magazines, andbooks. The readers then completed a multiple-choice comprehensiontest designed to measure their understanding and retention of the mainideas of the selections.
The type of comprehension test employed was one of the most dif-ficult that could have been used. Questions did not deal with particularitems of information which could be remembered with relative ease.by the subject. Rather, the subject was required to integrate the en-tire selection and determine its main point. Additionally, the testquestions were written so that of the five possible responses to each,one option was "best" and the others were characterized by varying de-grees of correctness; i. e., accurate statements of secondary points
10
OtstCopp
44494
of the article or slightly Inaccurate statements of the points in thearticle, requiring the subject to know what was covered in the selec-tion, and also what was not covered. Good performance on such atest must have been a formidable task for adults determined to bereading at the third to fifth grade level--the level of these subjects.Not surprisingly, adequate comprehension was not obtained on any ofthe collected reading materials for determining the correlation betweencomprehension and the 29 quantitative factors.
To obtain sufficient comprehension, Dale and Tyler found it nec-essary to write their own, much simplified, test passages. Three vin-
'ciples were found which produced articles with the required ease of read-ing: (1) use of very basic vocabulary, (2) use of informal style charac-terized by conversational manner and anecdotal examples, and (3) free-dom from digression from the topic of interest.
After rewriting., restoring, and readministering their materials,Dale and Tyler found that of the 29 factors studied, 10 correlated signi-ficantly with comprehension. Of these, three were included in a regres-sion equation for predicting comprehension; i. e. , technical words in thepassages (x2), the number of nontechnical words not known to 90 per-cent of sixth grade pupils (x3) [from an unpublished study by Dale], andthe number of indeterminate clauses (x4).
The equation predicted the percentage of adults of reading lev-els of grade 3 to grade 5 who could understand the main point of a pas-sage (X1) based on the above measures, and took the form
X1
-9. 4x2
- 0.4x3 + 2 2x4 + 114. 4.
Predicted comprehension correlated , 51 with the criterion.
A comprehensive study of factors influencing readability wasperformed by Gray and Leary (1935). They listed 82 factors relatingto readability as drawn from a study of: (1) adults' and children's read-ing materials, and (2) recommendations of adult students, teachers,and professors of English. Of the 82 factors, 18, such as "image bear-ing words, " "physical and psychic association, " etc., were discardedbecause they seemed to "defy objective measurement. " Twenty more
11
irsi
factors were discarded because of their infrequency of occurrencein the'samplea materials. Twenty of the remaining 44 factors cor-related significantly (r > . 27) with comprehension test scores of asample of 756 adult subjects who were given general adult readingmaterials, both fiction and nonfiction. The subjects were selectedin such a way as to be representative of the then current populationof adult readers.
When Gray and Leary separated out their upper quartile ("good"readers) and their lower quartile ("poor" readers), the factors corre-lated differently (by group) with the comprehension test scores. It wasfound that vocabulary measures were most highly correlated with com-prehension scores for poor readers (those achieving comprehensionscores in lowest quartile), while readers scoring in highest quartileshowed comprehension scores to be most closely related to sentencestructure and length.
Gray and Leary provided an entire family of regression equationsrelating quantifiable factors to predicted comprehension test scores.For low ability readers, as defined by the comprehension test admin-istered, their initial equation included eight factors, and correlated64 with their criterion. Further analysis showed that considerable
reduction in the number of factors included in the equation resulted inminimal change in the multiple correlation with the criterion measure,or in the probable error of the estimate. Nine formulas were presentedemploying various sets of four of the original eight factors. Multiplecorrelations computed for these formulas varied from .6350 to .6402,and the probable errors of the formulas varied from .2956 to .2975.These findings allowed Gray and Leary to suggest th't in employingtheir measure of readability, one of the four-factor formulae be em-ployed and that the sole criterion for choosing between them be theease of measuring the factors required in each formula.
However, the most popular application of Gray and Leary'sresults has been based ou their five-factor formula, which demonstratesa correlation of .64 with the criterion, and the lowest probable errorof all (by . 0004). This formula takes the form:
where:
X1
= 01029x2
+ 009012x5
- 02094x6
- 03313x7
-. 01485x8 + 3.774
12
X1
=2
x5
x6 =
x7 =
x8 =
predicted average comprehension score, alonga scale of +4 to -4
average number of hard words (words not appear-ing on Dale list of 769 easy words) appearing insamples*
average number of 1st, 2nd, and 3rd person pro-nouns appearing in sample
average sentence length in words
average percentage of different words in sample
average number of prepositional phrases appear-ing in sample
Such an equation obviously lacks broad utility since it is only applic-able to low ability readers--the sample on which it is based.
Three hundred fifty books were tested using the five factorformula. The distribution of predicted difficulty of the books was verynearly normal. Five categories of difficulty were defined and labeled"A" ("very easy") through "E" ("very difficult"). Predicted compre-hension scores in the "very easy" category ranged from 1.46 to 1.15,and the "very difficult" scores included values from 0.22 to -0.09.Correlation of indiv41uals' comprehension scores with their respectivetested reading grade levels indicated that category A materials werewritten at approximately the 2nd to 3rd grade levels, category B wasat roughly the 4th grade level, and category C at the 6th grade to JuniorHigh School level. The two highest difficulty categories did not corre-late reliably with reading grade level.
*a 100 word passage from each chapter analyzed.
13
8E41°PI 41/4/480,
Other narrower studies of this period are included in the sum-mary table in Appendix A and are briefly discussed by Mare (1963).
Less Cumbersome Formulas
The next trend in readability research, that toward more effici-ent and easily applied formulas, saw the development of the two mostpopular and widely applied readability measures--the Flesch and Dale-Chall formulas.
The appropriateness of searching for efficient formulas wasdemonstrated by Lorge (1939). He recomputed correlations and re-gression equations based on the structural variables contained in thefive factor formula of Gray and Leary. Lorge used as criterion ma-terials the reading passages of the Standard Test Lessons in Readingof McCall and Crabbs. This is a set of 376 reading passages normedin terms of number of comprehension test questions correctly answer-ed, and related to grade levels of the Thorndike-McCall Reading Scale.Since its initial application by Lorge, this has become the most fre-quently used and highly respected criterion in studies of reliability(Klare, 1963).
Using the variables of Gray and Leary, Lorge obtained highercorrelations with his criterion than had Gray and Leary with any oftheir formulas. He produced two regression equations, each based ononly two of the Gray-Leary variables, x2 and x6, and x2 and x8 (intheir notation). Multiple correlations of . 7406 and 7456, respective-ly, were reported. This is the first instance in which over one-half ofthe variance in a pure measure of comprehension has been accounted forby a readability formula. Lorge attributed his high multiple correlationscompletely to his use of more adequately standardized criterion materi-als and his use of a larger sample of criterion materials--376 passages,as opposed to 48 for Gray and Leary.
After study of additional variables, all of which were measuresof aspects of vocabulary, Lorge concluded that other variables add in-significantly to the predictive accuracy attainable from vocabulary alone.He suggested that this finding may be attributable to insufficient reliaLil-ity of available criteria, including that of McCall and Crabbs. But, hecontended that improved criterion measures would not negate his thesisthat vocabulary is the most important determinant of reading compre-hension.
14
ars,copr
40/48te
Lorge did not present his own readability formula until 1944(Lorge, 1944). His formulation included two of the Gray-Leary fac-tors, x6 (average sentence length in words) and x8 (number of prepo-sitional phrases per hundred words), as well as x9 (ratio of hard wordsto total words in the sample). Lorge categorized these as a sentencestructure factor, idea density factor, and vocabulary factor, respec-tively, and considers them the primary structural elements relative toreadability. A "hard word, " to Lorge, was one that does not appear onthe Dale list of 769 easy words. This list includes words common toThorndike's first thousand most frequent words and the first thousandmost frequent words known to children entering the first grade, fromthe International Kindergarten List (Gray & Leary, 1935).
Reanalysis of Lorge's data by Dale uncovered certain compu-tational errors, and Lorge's formula was subsequently recomputed in1948 (Lorge, 1948). The final formula is:
C50 = 10x
2+ 06x
6+ 10x
8+ 1.99
where C5a is the reading grade level of individuals answering one-halfof the Mcc;a11-Crabbs questions on that passage correctly. The ob-tained predictor-criterion correlation was . 6 ?, and addition of othervariables raised the correlation to .705.
Lorge is the first to emphasize that readability formulas ingeneral are not to be construed as prescriptions for writing, but areonly usable as an approximate of the reading difficulty of a passage.
Rudolf FlesCh was strongly dissatisfied with the then availablemeasures of readability as they could be applied to adult reading ma-terials (Flesch, 1943a). Readability formulas, when applied to adultliterature, such as magazines, tended to rankthe literature in ordersfar different from the ordering that wa.s produced by judgment or know-ledge of the educational levels of readers of each magazine. Fleschthought that this misranking.was due to excessive emphasis on rangeof vocabulary and insufficient stress on other factors.
15
litsi copy
Flesch hypothesized three types of factors that should highlyinfluence readability for adults. Pirst, he suggested that sentencelength should be an important correlate of adult reading difficulty,although it did not appear to be for children. He found support forthis idea in the work of Gray and Leary (1935), who found sentencelength to be an important variable associated with readability forgood readers only. Second, Flesch hypothesized that abstractness wascorrelated with readability for adults. Third, he thought that the read-er's interest in the topic of a reading passage should influence its read-ability.
Based on Lorge's data employing the McCall-Crabbs StandardTest Lessons in Reacam, Flesch developed his initial formula, inwhich adult reading material would be assigned to a wide range ofschool grade levels on the basis of: (1) average sentence length, (2)number of affixed morphemes (affixes and suffixes, with certain ex-ceptions), an index of abstractness, and (3) number of words of per-sonal reference (pronouns, names, words directly relating to people,such as aunt, baby, etc. )(Flesch, 1943b). His regression weightswere subsequently recomputed by Lorge when computational errorswere found by Dale (Lorge, 1948).
Almost simultaneously, due to difficulty of application, ap-parent lack of sensWOity to the human interest factor, and misuse ofthe formula as a rule for writing, Flesch completely revised his formu-la and published his "reading ease" and "human interest" formulas(Flesch, 1948). The two new formulas were again standardized againstthe McC..11-Crabbs lessons. The "reading ease" formula appeared as11. E. = 206. 835 - 846 WL 1. 015 SL, in which SL is average samplesentence length in words and WL is word length measured as syllablesper 100 words, Syllable count was substituted for the earlier count ofaffixes. The correlation between the two measures was .87, and thesyllable count was expected to be more easily and reliably taken.
The "human interest" index is presented as:
H. I. = 3. 635PW + . 314PS
where PW is average percentage of personal words, using a slightly
16
614,
narrower definition than previously, and PS is the percentage ofsentences spoken or addressed to the reader, including exclama-tions, or grammatically incomplete sentences whose meaning mustbe determined from the context. This factor tests the "conversa-tional quality" of the passage, and its inclusion represents an attemptto bring out the easy readability of direct conversational style. Thereading ease formula correlated 70 with the McCall-Crabbs; the hu-man interest index correlated .43 with the McCall-Crabbs,
The new formulas were held by Flesch to locate tested passageson scales which range from 0 to 100. A reading ease score of zero isconsidered to represent "practically unreadable" material, while 100represents text which is easily read by any literate person. A humaninterest score of zero indicates no human interest, and 100 indicatesthat the passage is "full of human interest."
The Flesch formulas have become the most widely applied inthe entire history of readability research. This wide application isdue in part to the ease of computation of his formulas and partly tothe wide exposure given to his formulas through a long series of popu-larized books (e.g., The Art of Plain Talk [Flesch, i946], The Art ofReadable Writing [Flesch, 1949], and How to Test Readability [Flesch,1951]). These books saw wide circulation in business, governmental,and journaliztic circles, where they were employed as rules for writ-ing (Chall, 1958).
A simplification of the Flesch reading ease formula was pro-posed by Farr, Jenkins, and Patterson (1951), who observed a corre-lation of -.91 between the average number of syllables per word andthe number of one syllable words in a passage. Accordingly, theysuggested that the number of one syllable words be used in Flesch'sformula so as to r.ermit more rapid testing with no loss of reliability.Farr, Jenkins, and Patterson generated such a formula, and found it tocorrelate .95 with scores produced by the Flesch formula. They there-fore concluded that their formula,
Reading Ease = 1.599 (number of one syllable words)
- 1.015 (mean sentence length in words) - 31,517
should be considered an acceptable substitute for the Flesch formula.
17
itir
Both Flesch (1952) and Klare (1952) immediately criticizedthis proposal. Flesch maintained that accuracy would be lost in eval-uating very easy or very difficult materials, and Klare suggested thatthe reliability of counting one syllable words should be lower than thatof counting syllables in the manner suggested by Flesch. England,Thomas, and Patterson (1953) experimentally tested these criticismsand were able to discount them completely. The Farr, Jenkins, andPatterson formula has since received a moderate amount of accept-ance--considering the rate at which it is reported as being used inapplied evaluations of materials. Chall (1958) pointed out the ironyof this return to word length as a criterion, of difficulty, one of thevery factors Flesch considered unsuitable.
Dale and Chall (1948) employed the Flesch formula to evaluateeducational materials published by the National Tuberculosis Associa-tion in terms of readability for the average adult. They eventuallysought to develop their own measure, because of two drawbacks en-countered in the use of the Flesch formula. First, they found low be-tween rater reliability for the count of affixes necessary for the Fleschformula. Although Dale and Chall expressed considerable respect forthe justification, presented by Flesch (1943b), for using a count of af-fixes as an index of abstractness, they wondered (since affixes corre-lated. 78 with abstractness) whether or not some other and more man-ageable correlate of abstractness could be employed. This question issupported by Lorge (1944), who stated that all measures of vocabularyload, including abstractness, are intercorrelated. Second, Dale andChall considered the use of personal references in the Flesch humaninterest formula to be oversimplified. References to senators, thoughpersonal, in the Flesch sense, are not generally associated with a low-ering of abstractness of text, as are references to "Dad" or "John, "which are generally associated with a very concrete type of statement.
Again using the McCall-Crabbs test data originally collectedby I,orge, Dale and Chall derived a new regression equation. Thisequation is based on average sentence length, and the "Dale score"--the relative number of words in the text samples not appearing ona list of 3000 words known to 80 per cent of a sample of fourth grad-ers. The form of the equation was:
1579 (Dale score) + . 0496 (sentence length) + 3. 5365XC50
in which the criterion, X is the reading grade level of an individualC50'
18
able to answer correctly half of the comprehension test questions.The score yielded by this equation correlated .70 with the McCall-Crabbs criterion.
Another formula based on the Flesch formula is that ofGunning (1952). In this formula, reading grade level necessary tounderstand tested material is equal to .4 of the sum of mean sen-tence length in words and percentage of words of three or more syl-lables. No numerical correlation was reported between this and oth-er estimates of reading difficulty, although there is little reason tobelieve that the correlation was low.
The McCall-Crabbs Standard Test Lessons in lieadint, whichwere the basis of the last five formulas discussed, were developed in1926. They were revised in 1950, and at least 60 of the passages inthis more recent edition are entirely new, dealing with modern topics,such as atomic energy and aviation (Powers, Sumner, & Kearl, 1958).Based on this revision of tilt_ McCall-Crabbs tests, Powers, Sumner,and Kearl undertook to recompute the Flesch, Da le-Cball, Farr-Jenkins-Patterson, and Gunning indices of readability. They hoped to produceformulas accounting for changes in reading abilities over the twenty-four year period between test editions, and to facilitate comparison ofthe formulas, since they would be based on identical measurement rules,mathematical procedures, etc.
The recomputed formulas and their respective correlations withthe constant criterion of grade score of pupils answering one-half of thetest questions correctly are:
Flesch:-2.2029 + (n0778)(mean sentence length) + (.0455)(number of syllables per 100 words)
Dale-Chall:3.2672 + (. 0596)( mean sentence length,' + (. 1155)(percentage of words not appearing or :)ale list of3000)
10
r = .6351
r = .7135
Farr - Jenkins - Patterson:8. 4355 + (. 0923)(mean sentence length)- (. 0648)(percentage of one syllable words)
Gunning:(.0984) (percentage of words of three or moresyllables)
test
r = . 5836
r = . 5865
In support of their recalculations, Powers, Sumner, and Kearlpointed out that the revised formulas give estimates more consistentwith one another than did the original Flesch and Dale-Chall formulas;the other formulas were not compared in their original forms. It wasconcluded that the Dale-Chall formula is the "best, " since it had thehighest predictive power (correlation) and the lowest standard error,. 77 grade levels, of the four formulas tested.
The final readability measure of this period to be discussed isthat of McElroy. This measure was unfortunately called a "fog count, "the same term applied by Gunning to his formula. This duplication ofnames has caused confusion on the part of some later researchers,such as Kincaid (1972).
McElroy's fog count is, again, related to the Flesch formulain that it is based upon a count of syllables (Klare, 1963). The pro-cedure to be followed in using this formula is assign a value of 1 toeach word of one or two syllables appearing in the sampled passage,and a value of 3 to each remaining word--which will have three ormore sylls.bles. The assigne"I values are added and the value thus de-termined is the fog count. '^o determine the reading grade level as-sociated with a particular fog count value, if the sum is over 20, it isdivided by 2. If the sum is under 20, 2 is subtracted and the resultis divided by 2.
No statistical information relating to the development, or ac-curacy of this formula is available (Kiare, 1963; Kincaid, 1972).
20
e ent Formulas
Klare (1963) considers the period 1953 to 1959, when his re-view was written, to be one exemplified by a trend toward speciali-zation in readability formulas. However, it seems that there is littleto be gained in naming a trend on the basis of only seven formulas ofrelatively minor importance which appeared in a six year period. Ex-tension of the period to include all recent studies of readability meas-urement techniques appearing from 1953 to the present does, however,appear appropriate and renders impossible the naming of a trend ofstudy characterizing this recent period.
During this period, in addition to formulas intended for a limitedrange of application, additional readability measures intended for gen-eral application were developed, and two new approaches to the problemof measuring readability were presented.
During the last 18 years, four formulas appeared which are in-tended for application to primary school texts. These are the formu-las of Spach.: (1953), Wheeler and Smith (1954), Bloomer (1959), andTribe (1956).
The formula of Spache (1953) predicts the primary grade level(grades 1-3) of textual material on the basis of average sentence lengthin words (x1) and percentage of words not appearing on the Dale list of769 common words (x2). The formula predicts grade level as equal to:.141x1 + . 086x2 + .839. The reported correlation between predictedscore of tested materials and usual grade level of application was .818.
Wheeler and Smith (1954) based their formula for prediction ofgrade level on the grade 'designation assigned by the publisher of thetextual materials in their sample, Their formula equated grade lev-els to 10 times the product of the mean length of units (sentences,with minor exceptions) in -words and the percentage of multisyllablewords. The value thus determined is located in a table and grade lev-el of 1 through 4 read ofi. This was the first formula to be based ona multiplicative model of factors. This type of equation allows inter-actions between the factors (as discussed by McLaughlin [1969]), sothat a particular change in factor A will affect the predicted value dif-ferently at varying levels of factor B. This may be a highly important
21
BEST COPY/4144kt
characteristic for a readability formula, in the light of data such asthat of Gray and Leary (1935), showing that the relative importanceof vocabulary and structural factors does not remain constant acrossall reading levels. Multiplicative formulas may be able to deal moreappropriately with findings. of this type than do additive formulas.But, it is doubtful that the simplified formula of Wheeler and Smithis a very large step in the proper direction.
The formula of Bloomer (1959) again used customary gradelevel application as the readability criterion. In Bloomer's develop-ment, reading grade level is predicted from abstraction level as indi-cated by the number of words per modifier (modifying phrase) and"sound complexity" (sic) of modifiers. Bloomer contended that thesevariables may be used as predictors of readability, although he pre-sented no formula, predictive method, or method for application.His approach is based on the observation that abstraction in text in-creases with grade level, and that the two variables (words per modi-fier and sound complexity) employed are closely associated with ab-straction. The multiple correlation between the two variables andassigned grade level was .78, whi'h Bloomer considers to comparefavorably with the correlations obtained through the procedures ofFlesch and of Lorge.
The McCall-Crabbs Standard Test Lessons in Reading, 1950revision, were again used as a criterion by Tribe (1956). Grade levelscore of children, in grades 2 through 8, who could correctly answerone-half of the reading test questions was found to be equal to:
.0719x1 + .1043x5 + 2.9347
where xi is the average sentence length and x5 is the percentage(times 100) of words not appearing on the Rinsland word list. A cor-rection factor is then applied to the pr edicted grade level. This fac-tor is based on a table presented by Dale and Chall (1948).
22
An unusual criterion was employed by Jacobson (1965) todevelop formulas to determine the readability levels of high schooland college chemistry and physics texts. The predicted criterionmeasure was the average number of words indicated as not beingunderstood (as indicated by reader underlining) by readers, basedon 200 word sample passages. This procedure was first employedin 1928, and is known as the Kyte test. Test-retest reliability ofthis test over a one week interim was reported by Jacobson to be.95 for the physics texts and .85 for the chemistry texts in hissample. Two formulas are presented - -one for physics texts andone for chemistry texts. The variables included are;
X mean underlining score (the criterion)
x1
mean number of words per independent clausein each 200 word sample passage
x2 mean number of mathematical terms in thesampled passages
x3 mean number of words in the sampled passagesthat are above the 6,000 word level in Thorndike's20,000 word list
x4 ; mean number of (technical) words per passage'which do not appear in the Powers list of 1828essential scientific terms
The formula derived for use with physics texts is:
X = -.0003 + 29.7059x2 + 21. 119x3 + 35.0029x 4
which exhibited a correlation with the criterion of .70.
23
The formula for chemistry texts is:
X = . 003 + . 1706x1
13.+ 13 7231x2
- 43.7262x3 - 2. 3577x4
Predictions from this formula correlated .67 with the criterion.
Flesch introduced three additional readability indexes (the term"formula" is probably inappropriate here) in 1954 and 1958 (Flesch, 1954,1958). All are relatively subjective measures in which counts are made,based on 100 word samples of text, and the counted values are convertedto arbitrary scales through reference to conversion tables. All werevalidated by inspection only.
The "r" score (Flesch, 1954) is an index of realism, based onthe number of references to specific human beings, their attributes orpossessions, locations, objects numbered or named, dates, times,and colors.
The "e" score (Flesch, 1954) is an index of energy, based onindications of voice communication, such as inflection.
The "formality-popularity" scale (Flesch, 1958) is based onthe total numbers of: capitalized, underlined or italicized words, num-bers (not spelled out), punctuation marks, symbols (#, $, etc.), be-ginnings of paragraphs, and endings of paragraphs.
Forbes developed a readability measure intended for applicationto the instructions and items contained in all types of standardized testsand opinion polls, except vocabulary tests (Forbes & Cottle, 1953). Todetermine a test's reading grade level, each word appearing difficultto the grader is looked up in the 1942 Thorndike Junior Century Dic-tionary, and its listed frequency of occurrence, from the first to the20, 000th words noted. The total of all indices above 4 is computedand divided by the total number of words sampled. This vocabularyindex is luoked up in a table, which indicates the corresponding gradelevel.
24
114rOpp
They reported that the readability of tests measured usingthis procedure correlates . 96 with the average of the readability ofthe same material calculated using five other procedures, includingthe Dale-Chall and Flesch methods. According to Forbes and Cottle,at the time of this work, no readability measure directly applicable totest forms was available. However, in view of the reported high cor-relation (. 96) between this technique and the others, it seems that thecontribution is minimal. Additionally, if the other measurement pro-cedures are not appropriate to this application, what is the value of theForbes and Cottle contribution? No other method of assessing the read-ability of the test materials is reported.
Six new readability measures of the traditional form have ap-peared in very recent years. The first of these was that of Coleman,developed in 1965 and discussed by Szalay (1965). Coleman developeda family of four formulas, using one through four measured variables,respectively, to predict readability level. The readability criterionin this case was the mean "cloze" score on the passage achieved bya sample of college students. (The "cloze" score is the percentageof deleted words of a passage that are correctly guessed and writtenin by a subject. This approach to readability is discussed in subse-quent paragraphs. )
Correlation's among the four formulas and criterion scoresvaried between . 85 and . 91 when tested independently by Szalay andColeman (Szalay, 1965). It is therefore recommended by Szalay thatonly the simplest formula be employed:
Predicted cloze score = 1. 29 (percentage of one syllable words)- 38. 45.
Factors present in the other three formulas but not making significantadditions to predictive ability are: (1) sentence length, (2) frequencyof occurrence of pronouns, and (3) frequency of occurrence of prepo-sitions.
25
BESTCOPT
AVAILABLE
A very similar formula was developed by HuMRRO personnelfor measuring the readability of Army technical literature (Caylor,Sticht, Fox, & Ford, 1972). Their measure is called the FORCASTformula after its developers, FORd, CAylor, and STicht. They con-sidered existing formulas inappropriate for their purpose becauseschool students and school or general texts had been most often em-ployed in developing readability formulas. This type of standardiza-tion was believed to make the applicability of prior formulas to tech-nical publications for adults suspect. Moreover, application of manyof the existing formulas required special grammatical or linguistic
,competence on the part of the person attempting to apply the formulas.Ford, Caylor, and Sticht elected to use doze score as the criterion ofreadability. They believed the doze test to be more objective thanmultiple choice tests or the other more traditional indices of corn-prehensioi They also pointed out that doze has "consistently yieldedvery high correlations with multiple choice tests and other more sub-jectively constructed measures of comprehension and difficulty"(Caylor, et al., 1972, p. 12).
Additionally, as part of their own work, they found a corre-lation of approximately .80 between doze score on 150-word passageschosen from the readings required in a wide range of Army jobs andachieved reading grade level, as measured by the United States ArmedForces Institute Reading Achievement Test III, Form A, AbbreviatedEdition.
Previous research had indicated that if a doze score of 35 percent is achieved by a given subject on a particular test passage, thenit may be reasonably expected that the subject will correctly answerapproximately 70 per cent of a set of multiple-choice questions basedon that passage. Hence, doze score appears to be a good indicator ofboth comprehension and reading achievement level.
A doze score of 35 per cent was arbitrarily, chosen as the cri-terion of potentially adequate comprehension of a text passage. Thereading grade level of a passage was defined as the lowest readinggrade level (as determined by USAFI Reading Achievement Test score)at which 50 per cent of the tested subjects achieved a doze score of35 per cent or higher on that passage.
26
Oar
A literature search provided Ford, Caylor, and Sticht witha list of 15 structural properties of text that had been applied inprevious readability formulas and required no special competence orequipment to measure. Correlations between cloze score and each ofthe structural properties were computed and several regression equa-tions were derived. Their preferred formula employed only a singlefactor, number of one-syllable words per passage. This factor isvery easily measured, and basing the equation on additional factorsallowed no practical increase in predictive power.
The FORCAST formula predicts reading grade level as equal to:
20 - (number of one-syllable words/10)
The correlation between predicted reading grade level of a passage withtested reading grade level associated with 35 per cent cloze score was.87. A subsequent application using new test passages and new sub-jects produced a correlation of .77.
The two last discussed formulas are extremely simple in form.However, the inclusion of additional factors in the formulas allowed in-consequential gains in predictive power, while greatly increasing theeffort required in applying the formulas. In contrast, the developersof the remaining four readability measures of recent years set outwith the stated purpose of attempting to provide a method of measur-ing readability with minimal effort.
Smith and Senter (1966) developed a readability equation whosedata may be collected from mechanical counters easily installed on anIBM Selectric typewriter. This technique allows measurement of read-ability at essentially rough draft typing speed. Mechanical countersare used to record the numbers of kay strokes, blank spaces, andsentences (an equal sign must be typed at the end of each sentence; thenumber of activations of this key indicating the number of sentencestyped). From these counts, the mean number of words per sentence[number of spaces divided by number of sentences (w/s)] and the meanlength of words (number of strokes aivided by number of spaces (s/w)]may be computed. Based on examination of graded school texts, theregression equation predicting grace level (GL) from the above ratiosis:
GL =0.50 (w/s) + 4.71 (s/w) 21.43.
27
BEST COPY AVAILABLE
This may be simplified to yield the arbitrarily scaled AutomatedReadability Index (ARI) equal to (w/s)+ 9 (s/w). The authors sup-port use of the ARI instead of predicted grade level because consider-able variability exists in characteristics of texts written for schoolstudents beyond the junion high school level. Accordingly, a precisestatement of grade equivalent appears inappropriate. It is also pointedout that readability, as measured by a formula such as this, increasesmore slowly with grade level at high levels than at low ones.
Dismissal of prediction of grade level removes the need tocomplicate the formula by attempting to deal with this nonlinearity.
As advantages of this procedure of estimating readability,Smith and Kincaid (1970) pointed out: (1) the speed of data collectionthat is possible, (2) the concrete nature of the acquired data, makingits collection extremely objective and reliable, and (3) the ease withwhich it could be incorporated into modern computerized typesettingmachinery.
Coke and Rothkopf (1970) have adapted the Flesch readabilityformula for automatic computation by computer. The determinationof words per sentence is straightforward in their algorithm, but wordlength in syllables is indexed by number of vowels per word. Usingthese two variables, their program produced Flesch reading easescores that exhibited a correlation of . 92 with scores calculated usingthe normal procedure. They discount the practical utility of theirprogram, but indicate that it is useful for testing the adequacy of textsampling procedures. They present a graph showing the probabilityof miscalculating reading ease score by five points or more as a func-tion of sample size.
F'ry (1968) presented his readability measure as a simple graphon which grade level may be looked up, given average sentence lengthin words and syllables. With this presentation, he hoped to reducegreatly the amount of time required to compute an index of readabil-ity, thereby increasing the popularity of such a measure. His gradelevel designations were based on inspection of textbooks used at vari-ous grade levels. The graphic presentation employed permits thereadab4ity measure to reflect accurately nonlinearities in the gradelevel function without resorting to higher order equations or arbitraryscaling.
28
McLaughlin, a psycholinguist, developed a readability formu-la which is based on the 1961 revision of the McCall-Crabbs Test
AMIONM11.
Lessons. His formula is based on a multiplicative rather than addi-tive model of factors (McLaughlin, 1969). McLaughlin feels that wordlength, a measure of semantic difficulty, and sentence length, a meas-ure of syntactic difficulty, interact with reading difficulty in a mannerthat cannot be accounted for by additive formulas. Interestingly enough,McLaughlin found that his readability formula, based on a multiplica-tive model, could be computed more easily than any previously existingmeasure. He first points out that the step of multiplication of wordand sentence lengths may be avoided, since a count of syllables in aset number of sentences is an equivalent procedure. Ile next pointsout that even counting all the syllables in N sentences is unnecessary.He found that the number of syllables in 100 words is equal to threetimes the number of words of over two syllables, plus 112. He thenderived his regression equation, and found that by adjusting testedsample size the formula for predicting grade level necessary to an-swer 100 per cent of the McCall-Crabbs questions correctly could begreatly simplified. A very close approximation to his formula is pre-sented:
SMOG Grade = " + square root of polysyllable count:in 30 sentences
Predictions from this formula correlated at approximately . 70 withthe criterion. The term SMOG grade, or SMOG count, is in tributeto the FOG count of Gunning, the first application of the count of num-ber of polysyllabic words to readability determination and to the charac-teristic atmospheric condition of his home city--London.
*number of words of three or more syllables.
29
Cloze Procedures
8E3rCOPP
41/4148lE
The doze procedure, which has been briefly mentioned pre-viously, was introduced by Taylor (1953) as a measure of readabiLtythat is free from many of the disadvantages of traditional readabilitymeasures. In the doze method; samples of text are presented withsome words deleted and replaced by blank spaces. The subject's taskis to fill in the blank spaces with .he correct words. The name "doze"was applied to this procedure by Taylor because of its resemblance tothe principle of closure of the Gestalt school of psychology: the "humantendency to complete a familiar but not-quite finished pattern--to 'see'a broken circle as a whole one... by mentally closing the gaps. "
In his initial report, Taylor presented data showing that thedoze procedure consistently ranked tested "standard" reading passagesin the same order as the readability formulas of Flesch and of Dale-Chall. He also indicated that the rank ordering of doze scores is main-tained regardless of system of word deletion employed, be it every nthword or random, with low (10 per cent) or high (20 per cent) rates ofword deletion. A random or, equivalently, every nth deletion systemis strongly defended. If enough words are deleted, all kinds of wordsare deleted in the proportion in which they actually occur in the text.
Analysis of the effects of scoring for only precise matcheswith the deleted word as compared with applying the more tedious pro-cedure of accepting synonyms as correct responses raises all scoresequally. Accordingly, there is no effect on discriminability and themore difficult procedure is not warranted.
Taylor also demonstrated that the doze test can handle unusualmaterials more effectively than readability formulas. Gertrude Stein,for example, writes in quite short sentences with a fairly simple vo-cabulary. However, her style is such that her material is very dif-ficult to read. This is accurately reflected by the doze test, but theFlesch and Dale-Chall formulas rate sample passages taken from herwritings as very easy reading.
30
SESTcOPY
411411481E
Taylor (1957) found correlations of . 70 to .80 between dozescores and comprehension scores of Air Force trai.ttees readingtypical Air Force technical material. Bormuth (1968) found cor-relations of . 90 to . 96 between doze scores and scores on testsof comprehension of passages from the Gray Oral Reading Tests.Bormuth's questions were of the transformational type, measuringretention of "facts" from the passages only. In constructing ques-tions of this type, one word or clause of a statement in the passageis deleted and replaced by a question marker. The answer to thequestion, then, is the deleted element. For example, "The boy rodethe horse, " becomes "Who rode the horse?" Bormuth indicated thatlimiting the test to questions of this type circumvented the problemof poor matching of the readability levels of the passages and the tests,since the questions are determined by the sentences of the passage.
Rankin and Culhane (1969) similarly correlated doze scores andcomprehension scores, but without limiting the types of questions in-cluded in the costprehension tests. Their test questions included itemsrelating to voc4ulary, fact, sequence, causal relationship, main idea,inference from facts, and author's purpose. They obtained a correla-tion of . 68 between doze scores based on excerpts from encyclopediaarticles and comprehension test score.
Bormuth (1967) determined the doze scores corresponding to:(1) 75 per cent comprehension test score, the comprehension levelgenerally considered necessary to allow effective classroom study ofa text with assistance of a teacher available, and (2) 90 per cent com-prehension test score, the level which is considered an indication thatthe text is sufficiently comprehensible to allow effective independentstudy. A 30 per cent doze score was found to be associated with 75per cent comprehension test score, and 50 per cent doze score wasassociated with 90 per cent comprehension score. Replication of thestudy in the succeeding year (Bormuth, 1968) showed a doze scoreof 44 per cent to be associated with the "classroom level" and 57 percent to be associated with the "independent level. " Similar proce-dures by Rankin and Culhane (1969) using their more difficult type ofcomprehension test,- as described previously, found doze scores of 41and 61 per cent at the comprehension score points of interest. The cor-respondence between these results and those of Bormuth is quite re-markable in view of the fact that Bormuth considered his 1967 results
31
oittcot411411480
relating doze score to 90 per cent comprehension score relativelyinvalid, because of ceiling effects present in the comprehensiontest used in that study.
Based on these results, Rankin and Culhane concluded thatit would be appropriate for teachers to consider books on which pu-pils cannot attain a doze score of approximately 40 per cent to betoo difficult for those students.
'ine doze procedure has numerous advantages and disadvan-tages. Among the advantages pointed out by Taylor (1953) and Kiare,Sinaiko, and Stolurow (1970) are: (1) scoring reliability is very high,(2) it works well with "non-standard" material, (3) it accounts for in-terest and prior knowledge of reader populations, (4) subjects of allabilities seem to enjoy it, and (5) test materials are easy to con-struct.
Among its disadvantages are: (1) doze is a measure, not a ,
predictor of readability, requiring testing. of sizable samples of people,(2) Klare et al. (1970) hypothesize that it may depend more on know-ledge of language than subject matter, (3) it may not accurately re-flect all types of comprehension, and (4) it may depend excessivelyon "short-range constraints"--the four or five words appearing oneach side of the deleted word.
32
litSt004441011E
A. lication of Readabilit Formulas as "Rules .for Writin
The urge to apply mechanically the traditional readabilityformulas for purposes of improving readability is very strong, butmay not be appropriate, Smith and Kincaid (1970) point out thatreadability scores are gross measures of difficulty at best and mustnot be taken as indices of good or bad writing. A deliberate attemptto shorten sentence and word length does not necessarily enhancereadability. In fact, readability may be degraded. A more advantage-ous approach is to make the writing more logical and precise, Whencombined with these principles, consideration of the structural fac-tors contained in readability formulas may, however, contribute toreadability.
The recommendations of Flesch (1951) are typical of thosemade in hope of improving readability, He suggests that: (1) a per-sonal type of discourse be adopted, (2) the importance of points pre-sented be discussed, (3) introductions and summary statement be in-cluded, (4) punctuation be used in such a manner as to nelp the read-er, (4) points be discussed in chronological order or in order of in-creasing importance, and (6) excessive wordiness be avoided. He doesrecommend, additionally, that short paragraphs, sentences, and wordsbe used.
The advantage to be gained by careful consideration of the over-all structure of a passage of text was demonstrated by Lee (1965). Sig-nificantly better learning was found from articles written in a highlystructured manner compared to "normal" articles or those in whichparagraph order has been randomized. The highly structured articlesdiffered from the normal in that they included: (1) an introductory para-graph outlining the points to follow, (2) a final summarizing paragraph,(3) a number of major and minor headings, and (4) transitional para-graphs elaborating on completed and subsequent topics and emphasizingthe organization of the paper.
33
gatCOPY
411414814
Multimodal Presentation
The field of information transfer has also turned to the in-vestigation of mult :modal information presentation to enhance andfacilitate the information transfer process. Research in this areahas been mainly devoted to answering the question: Can method Xtransfer information as well as some other method? It has beenargued (Phillips, 1966) that this is not a worthwhile issue becausethe question assumes that the method against which comparisons arebeing made has been the most effective in transmitting the informa-tion. What the research may actually demonstrate is that method Xmay transfer information just as poorly as method Y. Phillips (1966)suggests that a more relevant question is:
Which resource or combination of resources (people, places, media) is ap-propriate for teaching what type of subject matter to what type of learnerunder what conditions (time, place, size of group, and so on) to achievewhat purpose? (p. 374).
Research on improving instructional materials has often oc-curred mainly without any reference to any precise theoretical no-tions (Briggs, 1966; Lumsdaine, 1964). Theories of learning havenot been taken into account. Even though a great deal of researchhas been performed to improve the effectiveness with which materi-als are presented in certain media, when one asks which media willbe more effective in presenting a certain type of material to a specialclass of learner, one comes to a standstill.
An experiment by Bouriss,eau, Davis, and Yamamato (1965)demonstrated how assumptions in the audiovisual field often fail topossess merit. It is generally assumed that in tez*ms of direct sen-sory appeal, pictures are superior to printed or spoken words. "Apicture is worth a thousand words" is generally assumed. Contraryto this belief, Bourisseau, Davis, and Yamamato (1965) demonstratedthat pictorial stimuli are inferior to verbal (printed) stimuli in regardto both the number of subjects making sensory responses and the totalnumber of sensory responses evoked.
34
tercotesouszt
There have been a number of studies which suggest little, orno payoff from multimodal presentation.
Virag (1971) attempted to assess the effectiveness of threemodes in transmitting content to students with different aptitudes.The three instructional modes were: (1) low verbal (tape-slides andshort film episodes--materials presented at a fixed pace via the audio-visual communicative channels), (2) high verbal (written case studies--presented at a self pace through the channel of print), and (3) con-ventional (short lectures--presented at a fixed pace through the audio-communicative channel). The results indicated that no single mode ofinstruction was consistently more effective than the other two for anyparticular aptitude pattern.
In an attempt to prepare an audio-tutorial minicourse, Long(1970) made use of five methods. The procedures presented the ma-terials to be learned through: (1) printed text, (2) printed text withsupplementary programmed items, (3) printed text with supplementarylaboratory manipulations and observations, (4) taped audio programwith supplementary programmed items, and (5) taped tutorial programwith laboratory manipulations and observations. No significant differ-ences were found to exist between the five instructional methods em-ployed.
Travers (1965) reported a study performed by Van Mondfransin which verbal materials to be learned were presented auditorially,visually, and simultaneous audio-visual presentations. The resultsindicated that there was no difference between single sense channelpresentations and no difference between single and multiple sensechannel presentations. This research supports the earlier conclu-sion by Van Mondfrans and Travers (1964) that the use of two sensorymodalities has no advantage over one in the learning of material whichis redundant across modalities.
In a study by Goodrich (1971), four types of literary forms orsubjects were presented via four instructioal media: (1) fiction, (2)nonfiction (autobiography), (3) a lecture about literary symbolism,and (4) a lecture about composition. The four media were: (1) TV,(2) audio tape, (3) face-to-face lecture, and (4) text. There wereno significant differences between the different media. Goodrich con-cluded that the effects of the medium may not be noticeable undernormal learning situations.
35
BarCOPY
AVAILABLE
Sticht (1969) presented materials of differing levels of diffi-culty, to Ss of different aptitudes through the visual and auditory sen-sory channels. According to Sticht, the results indicated that listen-ing was as effective as reading in transmitting information of all threedifficulty levels for both average and low aptitude subjects. Siegel,Barciki and Macpherson (1965), at Applied Psychological Services,presented materials to four groups of college students via the auditoryand visual channels. Two groups, one auditory and one visual, alsoreceived adjunct programmed materials. The adjunct materials con-sisted of multiple choice questions with correct answers. The use ofthe adjunct materials provided feedback to the learner on what infor-mation was missed. The results indicated that both audio and visualpresentation with adjunct programmed materials were significantlybetter than without adjunct materials. there were no significant dif-ferences between the different sensory channel presentations.
On the other hand, a number of studies have suggested somegain to accrue from multichannel presentation.
Singer (1970) investigated comprehension as effected by vary-ing visual and auditory presentation. The information to be learnedwas presented: (1) textually, (2) verbally, and (3) verbally paired withreading. Comprehension was poorer when the materials were pre-sented only auditorially, but no difference existed between reading andreading with listening.
Severin (1967) presented materials via the auditory, the visual,and the audio-visual modes of presentation. The results indicated thatprint and print with audio were superior to audio for transferring in-formation. There was no difference between print and print with audio.These findings were substantiated in a later study (Singer, 1970).
The results of a recent study (Senour, 1971) concerning the ef-fects of student control of audio tape learning indicated that providingcontrol functions to the student; i, e. , the capability for starting, stop-ping, and replaying the tape, aided learner achievement as comparedto the situation which denied them. There was a significantly positivecorrelation between learner achievement and the number of times thelearner elected to use the controls. The subjects reported that theydid not have to take notes because they could replay the tape until theyknew it.
36
Nelson (1970) reported a study on the effects of visual-audi-tory modality preference on learning mode preference. The resultsindicated that auditory and visual perceptual subtests of readingreadiness tests are not sufficiently sensitive to disccrnrriodalitypreferences. Another report (Hueber, 1970) supported the resultsobtained by Nelson (1970).
If sensory modality preferences do affect information transfer,a means of determining these preferences must be developed. If audiotape presentations of information are going to be used, can subjectsbe taught to listen more attentively? A large body of research indi-cated that training to listen is possible (Brown, 1954; Erikson, 1954;Irwin, 1953; Nichols, 1949; Lewis, 1956). Other researchers (Erikson,1954; Irvin, 1954) suggest that low listening ability subjects benefit fromsuch training more than subjects of high listening ability.
CHAPTER 1II
DISCUSSION
The most important characteristics of the readability formu-las described here, as well as of those formulas appearing before1953, and considered of at least moderate importance are presentedin tabular fc m in Appendix A to this review.
In terms of application of these formulas for predicting read-ability, in the late fifties, there appeared to be rather general agree-ment (Chall, 1956; Klare, 1963; Powers et al. , 1958), that the mostprecise formula available, and therefore the one to be most generallyrecommended, was the Dale-Chall method. If reference to a wordlist was to be avoided, the Flesch formula was recommended, unlessother factors such as special reader populations, types of reading ma-terial, or particular advantages of one or another formula warrantedanother choice. It does not seem that this general set of recommenda-tions should be changed at this time.
Estimation of Reading Level
The majority of the readability formulas predict readabilityin terms of reading grade level. In order to apply them most effec-tively, knowledge must be obtained concerning the distribution of read-ing grade levels of the expected reader population. The variability ofreading grade level within school grades and within groups of adultshaving achieved particular levels of education found by Gray and Leary(1935) suggests that pure estimation of reading grade level is some-what unreliable. Reading grade level may be determined by admin-istering standard tests of reading ability, of which 37 are reported asbeing available in The Sixth Mental Measurements Yea rt--,ok (Buros,1a65), to a sample of the expected reading population.
39
OarCOPPANIGIOlt
However, in certain areas of application, most notably themilitary, a much more efficient, less time consuming, and lessexpensive procedure may be employed. It has been found in theAir Force (Madden & Tupes, 1966) and in the Army (Caylor et al.,1972) that reading grade level may be estimated from certain apti-tude test scores. Madden and Tupes noted that the general aptitudeindex (AI) of the Airman Qualifying Exam (AQE) correlated above .70with reading level. This is largely due to the inclusion of a readingvocabulary subtest score within the general AL Although readinggrade level as measured by the California Test of Reading Vocabularyand Reading Comprehension was estimable from the general AI alone,
more accurate prediction of an individual's reading grade level couldbe made by using a regression equation based on general AI and theindividual's selector AI score. The latter is one of three aptitude
area scores--administrative, mechanical, or electronicwhich arereferred to when assigning men to career fields in the Air Force. Forsome career fields, the selector variable is the administrative AI;for others, it is the mechanical AI, while for others, the electronic AIis employed. Selection to a few career fields is based solely on gen-eral Al. The regression equations appropriate for estimating readinggrade level of individuals in career fields which use the selector AIs
for personnel selection are:
administrative: RGL = . 0437 (GenAI) + . 050 l(MAI) + 5.0730
mechanical: RGL = . 099 l(GenAI) + 0085(MechAI) + 5.0459
electronic: RGL = .0743(GenAI) + . 0222(E1AI) + 4.6088
Caylor et al. (1972) similarly produced a regression equationto predict reading grade level as measured by the U. S. Armed ForcesInstitute (USAFI) Reading Achievement Test III, Form A, from know-ledge of an individual's Armed Forces Qualifying Test (AFQT) score.Their equation is:
RGL = . 75(AFQT score) + 5.52.
40
These regression formulas allow much more confidence tobe placed in statements concerning the appropriate readability lev-els of materials intended for use by Army or.Air Force enlistedpersonnel.
Discussion of Formulas
A number of general criticisms have been applied to readabil-ity formulas. The definition of the criterion of comprehensibility hasbeen a problem since the earliest studies (Bormuth, 1966). The usu-al practice has been to administer multiple choice criterion questionsjust after the passages being tested are read. Lorge (1939) criticizedthis procedure because test performance may be strongly influencedby the difficulty of the language of the test questions.
The difficulty of the questions may also be varied, so that thesubject may be able to score highly by simply remembering details ofa passage, or he may be required to determine the author's purposein writing the passage, and select the best of several highly similar al-ternatives.
In addition, Fry (1968) pointed out that reading grade levelsare not rigorously defined, so that different reading tests, especiallythose developed at different times, may provide different grade levelsfor identical subjects. New reading tests are more difficult than oldones, indicating that students at given grade levels read better thantheir predecessors. He summarizes the problem as one of "tryingto determine grade level when grade level won't stand still. " Itseems strange that the various predictions have not been correctedfor criterion unreliability. Also, there is no agreement on the levelof comprehension to be accepted. Fifty and 75 per cent were commonwhen McCall-Crabbs tests were employed, but the FORCAST formulauses 70 per cent (35 per cent cloze score) and the SMOG count wasbased on 100 per cent comprehension.
Most of the readabili-,, formulas do not account for the effectson readability of the reader's interests, experiences, tr aptitudes.Exceptions to this are the supplementary indices of human interest,abstraction, realism, energy, and formality-popularity of Flesch,
41
Cepr
along with the FORCAST formula, and Jacobson's measures of thereadability of physics and chemistry texts. However, some of thesemeasures will allow accurate reflection of the readability of materi-al appearing in professional or technical journals.
Finally, as Bormuth (1966) pointed out, until very recent yearsno theoretical base was available from which to generate testable hy-potheses relating to readability. Powerful theories of language behavi-or did not exist, so that only the most obvious statistical characteris-tics of the written text were studied.
Although he did not consider it appropriate to present his for-mulas in his 1966 paper, Bormuth reported that he has written regres-sion equations whose predictions correlate up to . 93 with doze testscore. He considers doze scores to be the only acceptable criterionof readability currently available. He reported that all c. the variablesin his formulas are new ones generated from modern linguistic theoriessuch as those of Chomsky and Yngve. None of the traditional readabil-ity variables were powerful enough to be included in his formulas.
Inter-user and test-retest reliabilities of formulas vary widely.Kincaid (1972) found that the test-retest reliability of the McElroy fogcount was only . 5. He also pointed out that he gets a headache aftertaking a fog count for 30 minutes. The highly objective ARI, however,has a measured test-retest reliability of over . 99 (Huff, 1970). England,Thomas, and Patterson (1948) found the interrater and test-retest reli-abilities of the Flesch reading ease formula were approximately . 90.
The reliabilities of other readability formulas had not been sat-isfactorily tested prior to 1958, according to Mare (1963), and workof this type has not appeared since that time.
42
04,49..
All of the readability measuring procedures discussed showroughly similar correlations with their respectiN e criteria of dif-ficulty. Additionally, all of the criteria seem to be highly intercor-related. Hence, from the validity standpoint, there is little basisfor selecting one readability measure over another. In this case,utility and practicality for the individual user seem to represent thecriteria to be employed when selecting a readability measure. Ashas been previously reported, the Dale-Chall and Flesch formulas,as revised by Powers, Sumner, and Kearl (1958), are the most highlyrespected of the traditional formulas, but considerations of particularcharacteristics of an individual situation may warrant choice of oneof the many other available formulas.
There ib, however, some indication that the mo13t powerfulmethod of measuring readability currently available is the cloze meth-od. However, cloze is less easy to apply than the otht..r techniques.Application of the cloze method requires preparation of numerous testforms, assembly of a group of subjects who are representative of theappropriate reader population, and considerable administration andscoring time. Also, cloze tests cannot be used to monitor the'read-ability of material as it is being written.
It has been knowri, since at least the time of the Dale-Tylerstudy in 1934, that using readability formulas as a basis for "rulesfor writing" is not an effective way to produce readable material.The recommendations made to writers interested in improving theinformation transfer capability of their output have always been ofa "qualitative" nature, as opposed to the "quantitative" nature of thereadability formulas. Recommendations to writers concern such con-siderations as stylistic variables, level of abstractness, coherency,obscurity of expression, difficulty of ideas expressed, ideationaldensity, and soon. These factors are generally ignored by readabil-ity formulas, since they are extremely difficult to measure or evendefine. This leaves us in the paradoxical situation of attempting tomeasure readability by measuring factors which do not determinereading ease. Readability formulas are based on relatively unim-portant characteristics of text which must be considered almost en-tirely artifactual.
43
Future Research Avenues
The search for more relevant variables and methods ofmeasuring them impresses us as the most pressing need for futureresearch in the area of readability measurement. It is likely thatmodern psycholinguistic theory could be a very stimulating areafrom which to draw important readability variables. Applicationsof information theory to the study of readability have not proven fruit-ful thus far. Such applications have been limited to attempts to meas-ure the redundancy of textual material as reflected by the variabilityof responses made in doze tests.
In order to me asure information transfer using the conceptsof entropy and bits, it is necessary to be able to specify accuratelythe stimulus alphabet. In this case, the stimulus variables are lettersor words and the probabilities and conditional probabilities of occur-rence of each symbol, from the point of view of the receiver, or read-er. Inability to specify adequately these probabilities makes the in-formation theoretic approach seem relatively inappropriate at thistime.
Research is also needed which focuses on development of amanageable criterion of readability. Cloze scores do not representa panacea, but the other criteria available have disadvantages. Com-prehension test scores are easily confounded with variable aspects ofthe test questions, and reading grade level is another step removedfrom comprehension, the topic of interest.
Second, the effects of the additional variables of general intel-ligence, interest, and prior experience on readability are not wellunderstood. In 1935, Gray and Leary observed that quantitative fac-tors related to readability were not the same for good and poor readers.But work on this variable has not continued. Few, if any, investigationshave sought to determine the effects of reader interest and experienceon readability, although a few formulas have attempted to account forthese factors, notably the Flesch reading ease formula and the FOR-CAST formula.
44
Third, the majority of readability formulas have been vali-dated on school students of various levels. Students comprise thepopulation of interest in only a small portion of the possible appli-cations of readability formulas. Hence, more cross-validationalstudies are needed which are based on samples of adult readers ofall reading ability levels.
Fourth, no measure of readability is available for evaluationof tests or programmed instructional material. These types of mate-rial are being used more and more in our society, and a method ofmeasuring their readability is greatly needed.
In all future research, it will be appropriate to evaluate eachreadability formula developed in mathematical forms other than theadditive linear model traditionally employed. Many authors havefound nonlinearities in the relationships between their chosen vari-ables and criteria. They have dealt with this problem by presentingtheir data graphically, or presenting a table of corrected values forthe predictions from their formulas. It is likely that nonlinear re-gression techniques account for available data in a significantly moreadequate manner than do formulas of the current, linear variety.
45 Lk.,.)k
REFERENCES
Bloomer, R.H. Level of abstraction as a function of modifier load.Journal of Educational Research, 1959, 52, 269-272.
Bormuth, J. R. Readability, a new approach. Reading ResearchQuarterly, 1966, 1, 79-132.
Bormuth, J. R. Comparable cloze and multiple-choice comprehen-sion test scores. Journal of Reading, 1967, 10, 291-299.
Bormuth, J. R. Cloze test readability; criterion referenced scores.Journal of Educational Measurement, 1968, 5, 189-196.
Bourisseau, W., Davis, 0. , & Yamamato, K. Sense-impressionresponses to differing pictorial and verbal stimuli. AV Com-munication Review, 1965, 13, 249-258.
Briggs, L. A procedure for the design of multimedia instruction. InProceedings of the 74th Annual Convention of the AmericanPsychological Association. Washington, D. C.: AmericanPsychological Association, 1966. P. 257-258.
Brown, J. How teachable is listening? Education Research Bulle-tin, 1954, 33, 85-93.
Buros, O.K. (Ed. ). The Sixth Mental Measurement Yearbook, High-land Park, N. J.: Gryphon Press, 1965.
Caylor, J. S., Sticht, T. G., Fox, L. C., & Ford, J. P. Methodolog-ies for determining reading requirements of military occupa-tional specialities. Presidio of Monterey, Calif.: HumanResources Research Organization, HumRRO Division No. 3,Draft Technical Report, January 1972.
Chall, J. S. Readability: An appraisal of research and application.Columbus, Ohio: Bureau of Educational Research, OhioState University, 1958.
Bar copy
Coke, E. U., & Rothkopf, E. Z. Note on a simple algorithm for acomputer-produced reading ease score. Journal of AppliedPsychology, 1970, 54, 208-210.
Dale, E., Chall, J. S. A formula for predicting readability (and)Instructions. Educational Research Bulletin, 1948, 27, 11-20,28, 37-54.
Dale, E., & Chall, J. S. The concept of readability. ElementaryEnglish, 1949, 26, 19-26.
Dale, E., & Tyler, R. W. A study of the factors influencing the diffi-culty of reading materials for adults of limited. reading ability.Library Quarterly, 1934, 4, 384-412.
De Long, V. R. Primary promotion by reading levels. ElementarySchool Journal, 1939, 38, 663-671.
Dolch, E. W. Fact burden and reading difficulty. Elementary EnglishReview, 1939, 16, 135-138.
Dolch, E. W. The use of vocabulary lists in predicting readability andin developing reading materials. Elementary English, 1949, 26,142-149.
England, G. W., Thomas, M. , Paterson, D. G. Reliability of the ori-ginal and the simplified Flesch reading ease formulas. Journalof Applied Psychology, 1953, 37, 111-113.
Erikson, G. Can listening efficiency be improved? Journal of Commu-nication, 1954, 4, 128-133.
Farr, J. N. , Jenkins, J. J. , Paterson, D. G. Simplification of Fleschreading ease formula. Journal of Applied Psychology, 1951, 35,333-337.
Flesch, R. Estimating the comprehension difficulty of magazine articles.Journal of General Psychology, 1943, 28, 63-80. (a)
Flesch, R. Marks of readable style. New York: Bureau of Publications,Teachers College, Columbia University, 1943. (b)
Flesch, R. The art of plain talk. New York: Harper & Bros., 1946.
48
147 Cat
Flesch, R. A new readability yardstick. Journal of Applied Psychol-o 1948, 32, 221-233.
Flesch, R. The art of readable writing. NewYork: Harper &. Bros., 1949.
Flesch. R. How to test readability. New York: Harper & Bros., 1951.
Flesch, R. F. Reply to "Simplification of Flesch reading ease formula.Journal of Applied Psychology., 1952, 36, 54-55.
Flesch, R. F. How to make sense. New York: Harper & Bros., 1954.
Forbes, F. W. , & Cottle, W. C. A new method for determining reada-bility of standardized tests. Journal of Applied Psychology, 1953,37, 185-190.
Fry, E. A readability formula that saves time. Journal of Reading, 1968,11, 513-516, 575-578.
Lillie, P. J. A simplified formula for measuring abstraction in writing.Journal of Applied Psychology, 1957, 41, 214-217.
Goodrich, H. An investigation of tile differential effects of four differentmedia on information acquisition and preception. DissertationAbstracts International, 1971, 31, 3775.
Gray, W. S. , & Leary, B. E. What makes a book readable. Chicago:University of Chicago Press, 1935.
II1
Gunning, R. The technique of clear writing. New York: McGraw-Hill,1952.
Huebner, R. Interaction between patterns of individual differences insensory modalities of students and methods of classroom instruc-tion. Dissertation Abstract International, 1970, 31, 679.
Huff, K. H. & Smith, E. A. Reliability, baseline data, and instructionsfor the automated readability index.. AFHRL-TR-70-14, AD-729209. Lowry AFB., Colo. : Technical Training Division, Air ForceHuman Resources Laboratory, October 1970.
Irvin, C. An analysis of certain aspects of listening training programamong college freshman at Michigan State College. Journal ofCommunication, 1954, 4, 2.
49
1406,fria._-w1"77/Irwin, C. Evaluating and training program in listening for college '4$0.
freshmen. School Review, 1953, 61 25-29. 4#04,
Jacobson, M. D. Reading difficulty of physics and chemistry textbooks.Educational and Psychological Measurement, 1965, 25 449-457.
Johnson, G. R. An objective measure of determining reading difficulty.Journal of Educational Research, 1930, 21, 283-287.
Kessler, E. The readability of selected contemporary bodks for leisurereading in high school biology. Science Education, 1941, 25,260-264.
Kincaid, J. P. Making technical writing readable. Human Factors So-ciety Bulletin, 1972, 15(5), 3-4.
Klare, G. R. A table for rapid determination of Dale-Chall readabilityscores. Educational Research Bulletin, 1952, 31, 43-47.
Klare, G. R. The measurement of readability. Ames, Ia. : Iowa StateUniversity Press, 1963.
Klare, G. R. , Sinaiko, H. W. , & Stolurow, L. M. The cloze procedure:A convenient readability, test for training materials and transla-tions. Arlington, Va. : Institute for Defense Analysis, .Scienceand Technology Division Paper P-660, 1971.
Lee, W. Supra-paragraph prose structure: Its specification, perception,and effects on learning. Psychological Reports, 1965, 17, 135-144.
Lewerenz, A. S. Measurement of the difficulty of reading materials.Educational Research Bulletin, Los Angeles City Schools, 1929,8, 11-16. Cited by G. R. Klare, The measurement of readability.Ames, Ia.: Iowa State University Press, 1963, P. 40.
Lewerenz, A. S. Vocabulary grade placement of typical newspaper con-tent. Educational Research Bulletin, Los Angeles City Schools,1930, 10, 4-6. Cited by G. R. Klare. The measurement of read-ability. Ames, Ia. : Iowa State University Press, 1963.
Lewerenz, A. S. A vocabulary grade placement formula. Journal of Ex-perimental Education, 1935, 3, 236.
Lewerenz, A. S. Selection of reading materials by pupil ability and pupilinterest. Elementary English Review, 1939, 16, 151-156.
50
iithrot
Lewis, N. Listen please! Clearing House, 1956, 30, 535-536.Lively, B.A., & Pressey, S. L. A method for measuring the "vo-
cabulary burden" of textbooks. Educational Administrationand Supervision, 1923, 9, 389-398.
Long, G. A study on the preparation of an audio-tutorial minicourse.Dissertation Abstracts International, 1970, 31, 1690-1.691.
Lorge, I. Predicting reading difficulty of selections for children.Elementary English Review, 1939, 16, 229-237.
Lorge, I. Predicting readability. Teachers College Record, 1944,45, 404-409.
Lorge, I. The Lorge and Flesch readability formulae: A correc-tion. School and Society, 1948, 67, 141-142.
Lumsdaine, A. Educational technology, programmed learning, andinstructional science. Theories of Learning and Instruction.63rd Year Book, National Society for the Study of Education,Chicago: University of Chicago Press, 1964, 371-401.
Madden, H. L. , & Tupes, E. C. Estimating reading ability levelfrom the AQE General Aptitude Index. PRL-TR-66-1, AD-632 182. Lackland AFB, Texas: Personnel Research Lab-oratory, Aerospace Medical Division, February, 1966.
McClusky, A. Y. A quantitative analysis of the difficulty of readingmaterials. Journal of Educational Research, 1934, 28, 276-282.
McLaughlin, G. H. SMOG grading: A new readability formula.Journal of Reading, 1969, 12, 639-646.
Morris, E. C. , & Halverson, D. Idea analysis technique. Unpub-lished. On file at Columbia University Library. 1938. Citedby G. R. Klare, The measurement of readability. Ames, Ia.:Iowa State University Press, 1936.
Nelson, J. The effects of visual-auditory modality preference onlearning mode preference in first grade children. DissertationAbstracts International, 1970, 31, 2219-2220.
Nichols, R. Teaching of listening. Chicago Schools Journal, 1949,30, 273-278.
51
ser 11°P1 111Ojemann, R. The reading ability of parents and factors associated 14matwith the reading difficulty of parent education materials. InG. D. Stoddard (Ed. ) Researchers in Parent Education II.Iowa City, Ia. : University of Iowa, 1934. Cited by J. S. Chall,Readability: An appraisal of research and application. Colum-bus, Ohio: Ohio State University, Bureau of Educational Re-search, 1958.
Patty, W. W. , & Painter, W.I. A technique for measuring the vocab-ulary burden of textbooks. Journal of Educational Research,1931, 24, 127-134.
Phillips, M. Learning materials and their implementation. Reviewof Educational Research, 1966, 36, 373-379.
Powers, R. D. , Sumner, W. A. , & Kearl, B. E. A recalculation offour adult readability formulas. Journal of Educational Psy-chology, 1958, 49, 99-105.
Rankin, E. F. , & Culhane, J. W. Comparable cloze and multiple -choice comprehension test scores. Journal of Reading, 1969,13, 193-198.
Senour, R. A study of the effects of student control of audio tapedlearning experiences (via the control functions incorporatedin the instructional device). Dissertation Abstracts Interna-tional, 1971, 31, 3351-3352.
Severin, W. The effectiveness of relevant pictures in multiple-chan-nel communications. AV Communications Review, 1967, 15,386-401.
Siegel, A. I. , Barcik, J. D. , & Macpherson, D. H. Mass trainingtechniques in civil defense. I. Introductory studies of tele-phonic adjunct training. Wayne, Pa.: Applied PsychologicalService 3, February 1965.
Siegel, A. I. , & Fischl, M. A. Mass training techniques in civil de-fense. II. A further study of telephonic adjunct training.Wayne, Pa.: Applied Psychological Services, July 1965.
Singer, C. Eye movements, reading comprehension, and vocabu-lary as effected by varying visual and auditory modalitiesin college students. Dissertation Abstracts International,1970, 31, 164.
Smith, E. A. , & Kincaid, J. P. Derivation and validation of theAutomated Readability Index for use with technical mate-rials. Human Factors, 1970, 12, 457-464.
52
Smith, E. A. & Senter, R. J. Automated Readability Index. Wright -Patterson Air Force Base, Ohio: Aerospace Medical Division,AMRL-TR-66-220, November 1970.
Spache, G. A new readability formula for primary grade reading ma-terials. Elementary School Journal, 1953, 53, 410-413.
Sticht, T. G. Learning by listening in relation to aptitude, reading, andrate-controlled speech. Presidio of Monterey, Calif. : HumanResources Research Organization, HumRRO Division No. 3,Technical Report 69-23, December, 1969.
Stone, C. R. Measures.of simplicity and beginning texts in reading.Journal of Educational Research, 1938, 31, 447-450.
Szalay, T.G. Validation of the Coleman readability formulas. Psy-chological Reports, 1965, 17, 965-966.
Tannenbaum, P. H. , & Greenberg, B. S. Mass communication. In P. R.Farnsworth (Ed. ), Annual Review of Psychology, 1968, 19, 351-387.
Taylor, W. Cloze procedure: a new tool for measuring readability.Journalism Quarterly, 1953, 30, 415-433.
Taylor, W. L. Recent developments in the use of "dime procedure. "Journalism Quarterly, 1956, 33, 42-46, 99.
Taylor, W. L. "Cloze" readability scores as indices of individual differ-ences in comprehension and aptitude. Journal of Applied Psychol-ogy, 1957, 41, 19-26.
Thorndike, E. L. Improving the ability to read. Teachers College Re-cord, 1934, 36, 1-19, 123-144, 229-241.
Travers, R. M. W. On the transmission of information to human receivers.Audiovisual Communication Review, 1964, 12, 373-385.
Tribe, E. B. A readability formula for the elementary school based uponthe Rinsland vocabulary. Unpublished Doctoral dissertation, Uni-versity of Oklahoma, 1956. Cited by G. R. Klare, The measure-ment of readability. Ames, Ia. : Iowa State University Press,1963, P. 69-70.
53
torzavAtakt
Valentine, L.D. Jr. , & Vito la, B. M. Comparison of self-motivatedAir Force enlistees with draft-motivated enlistees. AFHRL-TR-70-26, AD-713 638. Lack land AFB, Teitass PersonnelResearch Division, Air Force Human Resources Laboratory,July 1970.
Van Mondfrans, A., & Travers, R. Learning of redundant materialspresented through two sensory modalities. Perceptual & MotorSkills, 1964, 19, 743-751. .
Van Mondfrans, A. , & Travers, R. Paired-associate learning withinand across sense modalities and involving simultaneous and se-quential presentations. American Educational Research Journal,1965, 2, 89-99.
Vineberg, R. ,Sticht, T. G. , Taylor, E. N. , & Caylor, J. S. Effects ofaptitude (AFQT), job experience, and literacy on job performance:summar of HumRRO work units UTILITY and REALISTIC.Presidio of Monterey, Calif. : Human Resources Research Or-ganization, HumRRO Division No. 3, August 1970.
Virag, W. The effectiveness of three instructional modes employed totransmit content to students with different aptitude patterns.Dissertation Abstracts International, 1971, 31, 5520.
Vogel, M. & Washburne, C. W. An objective method of determining gradeplacement of children's reading material. Elementary SchoolJournal, 1928, 28, 373-381. Cited by C. R. Klare, The measure-ment of readability, Ames, Ia. : Iowa State University Press, 1963,P. 38-39.
Washburne, C. ,& Morphett, M. V. Grade placement of children's books.Elementary School Journal, 1938, 38, 355-364.
Washburne, C., & Vogel, M. What books fit what children? School andSociety, 1926, 23, 22-24. Cited by G. R. Klare, The measurementof readability, Ames, Ia. : Iowa State University Press, 1963.P. 38-39.
Wheeler, L. R. , & Smith, E.H. A practical readability formula for theclassroom teacher in the primary grades. Elementary English,1954, 31, 397-399..
54
OatisuPygripA
wietEWheeler, L. R. , & Wheeler, V.A. Selecting appropriate reading ma-terials. Elementary English, 1948, 25, 478-489.
Yoakam, G.A. A technique for determining the difficulty of readingmaterials. Unpublished. University of Pittsburgh, 1939.(Revised 1948.) In G. A. Yoakam, Basal Reading Instruction,New York: McGraw-Hill, 1955. Cited by J. S. Chall, Read-ability: An appraisal of research and application. Columbus,Ohio: Bureau of Educational Research, Ohio State University,1958. P. 28.
55/PI 64nk
APPENDIX A
Glossary of Readability Measures
Aut
hor(
s)D
ate
Cri
teri
onV
aria
bles
Pred
icte
dV
alue
Liv
ely
&19
23Ju
dged
dif
fi-
Wei
ghte
d m
edia
n W
ei(
Rel
ativ
ePr
esse
ycu
ltyno
. , b
ased
on
Tho
rndi
kew
ord
list
diff
icul
ty
Dol
ch19
28G
rade
leve
las
sign
ed b
ypu
blis
hers
Five
indi
ces
of v
ocab
u-la
ry "
load
Was
hbur
ne19
28M
ean
test
edX
2: N
umbe
r of
dif
fere
ntR
eadi
ng&
Vog
el(W
inne
tka)
read
ing
leve
lof
thos
e lik
ing
book
wor
ds in
100
0 w
ords
X3:
Num
ber
of p
repo
sitio
nsin
IC
C(
wor
ds
scor
e on
Stan
ford
Ach
ieve
-m
eat T
est
(X1)
X4:
Num
ber
of w
ords
not
on T
horn
dike
list
of
10 ,C
CG
wor
ds
X :
Num
ber
of s
impl
ese
nten
ces
in 7
5se
nten
ces
'Lew
eren
z19
29O
rder
of
pre-
sent
atio
n on
Perc
enta
ge o
f w
ords
be-
ginn
ing
with
W, H
, B (
easy
)G
rade
leve
l
Stan
ford
I, a
nd E
(ha
rd)
GP
Ach
ieve
men
tL
OT
est
Lew
eren
z19
301)
Rat
io o
f A
nglo
-Sax
on to
Gra
de le
vel
1935
Gre
ek a
nd R
oman
wor
ds19
39(d
iffi
culty
)
John
son
1930
Gra
de le
vel
assi
gned
by
publ
ishe
rs
Patty
&19
31N
one
(rel
ied
Pain
ter
on v
alid
ity o
fT
horn
dike
wor
dlis
t)
2) R
atio
of
wor
ds in
"C
lark
'sfi
rst 5
06"
toto
tal d
iffe
rent
wor
ds (
dive
rsity
)
3) E
stim
ate
of im
age
bear
ing
or s
enso
ry w
ords
(in
tere
st)
4) P
olys
ylla
bic
wor
d co
unt
5) V
ocab
ular
y m
ass
(num
ber
of d
iffe
rent
wee
ds in
sam
-pl
e:th
at a
re n
ot o
n C
lark
'slis
t of
500
com
mon
wor
ds
Perc
enta
ge o
f po
lysy
llabl
eG
rade
leve
lw
ords
1) T
horn
dlke
wor
d lis
t ind
ex R
elat
ive
num
bers
diff
icul
ty
2) N
umbe
r of
dif
fere
nt w
ords
in s
ampl
e
Form
ula
Typ
e of
Ran
ge o
fM
etho
d of
coa
t -M
ater
ial
Mat
eria
lpr
ison
with
Stud
ied
Stud
ied
crite
rion
Wei
ghte
d m
edia
n in
dex
num
-be
r
Bas
e ju
dgm
ent o
f di
ffic
ulty
on "
seve
ral o
f vo
cabu
lary
fact
ors
X1
=.0
85X
2 +
.101
X3
+ .6
04X
4
.411
X +
17.
43
Mea
n of
tabl
ed v
alue
s fo
r ea
chof
var
iabl
es
Mea
n of
tabl
ed v
alue
s fo
r ea
chof
var
iabl
es
Tab
led
valu
e of
per
cent
age
ofpo
lysy
llabl
es
Inde
x N
umbe
r=A
A.W
.W.V
.
Scho
ol,
Gra
de 2
-In
spec
tion
gene
ral,
colle
gesc
ient
ific
Rea
ding
Prim
er-
Insp
ectio
nte
xts
grad
e 4
Chi
ldre
n's
libra
rybo
oks
Gra
des
3-9
Cor
rela
tion
.845
(Thi
s va
lue
is c
on-
foun
ded
by e
ffec
tsof
pop
ular
ity o
fbo
oks)
Stan
ford
Gra
de 2
-G
raph
icA
chie
ve-
colle
gem
erit
Tes
tpa
ragr
aphs
Wid
ePr
imer
toIn
spec
tion
vari
ety
grad
e 8
Scho
olA
.W.W
. V. =
mea
n T
born
dike
text
sIn
dex
Num
ber
R =
num
ber
of d
iffe
rent
war
dsin
sam
ple
Gra
des
9-12
40)
Aut
hor(
s)D
ate
Cri
teri
onV
aria
bles
Pred
icte
dV
alue
Form
ula
Typ
e of
Ran
ge o
fM
ater
ial
Mat
eria
lSt
udie
dSt
udie
d
Met
hod
of c
om-
pari
son
with
crite
rion
Oie
man
n19
34X
C50
Dal
e &
1934
Tyl
erC
ompr
e-he
nsio
n sc
ore
of a
dults
of
low
rea
ding
leve
l
6 m
easu
res
of v
ocab
ular
ydi
ffic
ulty
, 8 m
easu
res
of s
truc
ture
, 4 q
ualit
ativ
efa
ctor
s
X -
Num
ber
of d
iffe
rent
tech
nica
l ter
ms
X3:
Num
ber
of d
iffe
rent
hard
non
-tec
hnic
alw
ords
X :
Num
ber
of in
dete
rmi-
4 na
te c
laus
es
McC
lusk
y19
34R
eadi
ngsp
eed,
com
-pr
ehen
sion
scor
e
Voc
abul
ary,
sen
tenc
est
ruct
ure
Tho
rndi
ke19
34V
ocab
ular
ydi
ffic
ulty
Num
ber
of w
ords
not
on
Tho
rndi
ke li
st in
10,
000
wor
d sa
mpl
e
Gra
y an
d19
35M
ean
com
-X
2: N
umbe
r of
dif
fere
ntL
eary
preh
ensi
onte
st s
core
of
adul
ts
wor
ds n
ot o
n D
ale
Lis
ta
769
Was
hbur
ne &
193
8M
orph
ett (
re-
visi
on o
f V
ogel
& W
ashb
urne
,19
28)
Mea
n te
sted
read
ing
leve
lof
thos
e lik
ing
book
. Als
ote
ache
r ju
dg-
men
t at g
rade
s1-
2.
X5:
Num
ber
of p
erso
nal
X6:
Mea
n se
nten
ce le
ngth
in w
ords
X7:
Per
cent
dif
fere
nt w
ords
X8:
Num
ber
of p
repo
sitir
mal
X2:
Num
ber
of d
iffe
rent
wor
ds p
er 1
000
wor
ds
X3:
Num
ber
of d
iffe
rent
wor
ds p
er 1
000
not i
nT
horn
dike
's m
ost
com
mon
150
0 w
ords
X4:
pron
ouns
phra
ses
Num
ber
of s
impl
ese
nten
ces
in 7
5 se
nten
-ce
s
Perc
ent o
fad
ults
of
low
RG
L(3
-5)
who
will
und
er-
stan
d pa
ssag
e(X
1)
Gra
de le
vel
Mea
n co
m-
preh
ensi
onte
st s
core
of
adul
ts (
X1)
Gra
de le
vel
(X1)
16 p
assa
ges
are
pres
ente
d in
ord
erof
dif
ficu
lty, t
o be
com
pare
dw
ith m
ater
ial t
o be
test
ed
Xl =
-9.
4X2
- . 4
.X3
+ 2
.2X
4
+ 1
14.4
± 9
.0
No
form
ula.
Des
crip
tion
ofvo
cabu
lary
and
sen
tenc
e st
ruc-
ture
cha
ract
eris
tics
'n e
asy
and
hard
mat
eria
ls.
Tab
led
valu
e of
num
ber
of w
ords
not o
n T
horn
dike
list
.
Xi =
.010
29X
2 +
.009
012X
5 -
.020
94X
6 -
033.
3X7
0148
5X8
+ 3
.774
X1
=. 0
0255
X2
+ .
Q45
8X3
0307
X4
+ 1
.294
Adu
ltG
rade
6-
educ
atio
nco
llege
Adu
lt he
alth
Bel
owed
ucat
ion
grad
e 8
mat
eria
l
Qua
ntita
tive
fact
ors:
corr
elat
ion.
Indi
vidu
alfa
ctor
s up
to .6
0Q
ualit
ativ
e fa
ctor
s:in
spec
tion.
Cor
rela
tion
.511
.
Wid
e va
riet
y A
bove
Insp
ectio
ngr
ade
8
Gen
eral
.fic
tion
and
non-
fict
ion
Gen
eral
adul
t
Gra
de 4
-9
Gra
de 2
-co
llege
Cor
rela
tion
.643
5
Chi
ldre
n's
Gra
des
1-9
Cor
rela
tion
.86
libra
ry b
ooks
(Thi
s va
lue
is c
on-
foun
ded
by e
ffec
tsof
pop
ular
ity o
fbo
oks)
Aut
hor(
s)D
ate
Cri
teri
onV
aria
bles
Ston
e19
38R
atio
of
new
wor
ds to
tota
lw
ords
, per
cent
sen
tenc
esco
mpl
ete
on o
ne li
ne
De
L o
ng19
38Pe
rcen
tage
of
wor
ds in
var
-io
us le
vels
of
Ston
e's
wor
dlis
t.
Lor
ge19
39M
cCal
l -X
2: P
erce
nt o
f w
ords
out
-19
44C
rabb
s te
stsi
de D
ale
list o
f 76
919
48no
rms
wor
ds
X6:
Mea
n se
nten
ce le
ngth
X: P
erce
nt p
repo
sitio
nal
8 ph
rase
s
Mor
ris
&H
alve
rson
1938
4 Q
ualit
ativ
e vo
cabu
lary
clas
ses:
I) w
ords
lear
ned
earl
y in
life
II)
loca
lism
s
III)
con
cret
e w
ords
IV)
abst
ract
wor
ds
Yoa
kam
1939
Voc
abul
ary
inde
x: b
ased
on
Tho
rndi
ke w
ord
list.
Wor
dsbe
twee
n 4t
h an
d 20
th th
ou-
sand
inde
x is
the
num
ber
of th
ousa
nd in
whi
ch w
ord
appe
ars.
Abo
ve 2
0th
thou
-sa
nd in
dex
= 2
0.
Kes
sler
1941
Judg
ed d
il-a
culty
Mea
n se
nten
ce le
ngth
inw
ords
Mea
n nu
mbe
r of
dif
fere
ntha
rd w
ords
per
10G
wor
ds
Fles
ch19
43Ju
dgm
ent a
ndX
s: M
ean
sent
ence
leng
thM
cCal
l-C
rabb
s te
s'no
rms
X: N
umbe
r of
aff
ixes
per
m10
0 w
ords
Xh:
Num
ber
of p
erso
nal
refe
renc
es p
er 1
00w
ords
Pred
icte
dV
alue
Form
ula
Typ
e of
Mat
eria
lSt
udie
d
Ran
ge o
fM
ater
ial
Stud
ied
Met
hod
of c
om-
pari
son
with
crite
rion
Rel
ativ
edi
ffic
ulty
Six
leve
lsof
rel
ativ
edi
ffic
ulty
Rea
ding
grad
e le
vel
need
ed to
answ
er 5
0%of
que
stio
nsco
rrec
tly(C
50)
Rel
ativ
edi
ffic
ulty
Gra
de le
vel
Rel
ativ
e di
ffic
ulty
bas
ed o
nva
riab
les
stat
ed.
Tab
led
valu
e of
per
cent
ages
of w
ords
at v
ario
us le
vels
of S
tone
's w
ord
list
C50
.10X
2C
6 N
1 O
' S
+ 1
.99
Tab
led
norm
s re
late
cou
nt o
fw
ords
in 4
cla
sses
to d
iffi
culty
Rea
ding
Gra
de 1
text
s
Rea
ding
text
s
McC
all-
Cra
bbs
test
pas
-sa
ges
Mea
n in
dex
num
ber
com
pari
son
.Sch
ool
with
tabu
late
d va
lues
text
s
Gra
de le
vel
Fact
ors
com
pare
d to
Gra
y an
dL
eary
sta
ndar
ds in
depe
nden
tly
Rea
ding
grad
e le
vel
need
ed to
answ
er 7
596
C75
= .1
338X
s +
.064
5Xm
-
.06S
9Xh
+ 4
.249
8
of q
uest
ions
Lat
er c
orre
cted
to;
corr
ectly
C75
= .0
7Xm
+ .0
7Xs
- .0
SXh
(C75
) (c
orre
c-tio
n fo
r hi
gh+
3.2
7gr
ade
leve
ls)
Pre
-pri
-m
er to
grad
e 2
Gra
des
Cor
rela
tion
3-12
.67
Mul
tiple
cor
rela
-tio
n of
.74
was
fou
ndbe
twee
n cl
asse
s I,
III,
IV
, and
McC
all-
Cra
bbs
test
nor
ms
( L
orge
, 193
9)
2nd
-14t
hIn
spec
tion
grad
e
35 te
xt-
Gra
de 2
-In
spec
tion
book
sco
llege
Adu
lt m
ag-
Gra
des
3-C
orre
latio
n .7
4w
ines
and
12M
cCal
l -C
rab'
- p
as.
sage
s
Aut
hor(
s)
Dat
eC
rite
rion
Fles
ch(r
evis
ion)
Dal
e &
Cha
ll
Var
iabl
esPr
edic
ted
Val
ueFo
rmul
a
Typ
e of
Mat
eria
lSt
udie
d
1948
Judg
men
t and
McC
all-
Cra
bbs
test
norm
s
1948
McC
all-
Cra
bbs
test
norm
s, c
om-
preh
ensi
onte
st s
core
s
Dol
ch19
48G
rade
leve
las
sign
ed b
ypu
blis
hers
CI
Whe
eler
&W
heel
er19
48N
one
(rel
ied
on v
alid
ity o
fT
horn
dike
wor
d lis
t)
Fles
ch19
50C
ompr
e-he
nsio
n te
stsc
ores
Farr
,Je
nkin
s, &
Patte
rson
1951
Fles
ch r
eadi
ngea
se s
core
Gun
ning
1952
Fles
ch s
core
s,M
cCal
l-C
rabb
sno
rms,
and
judg
men
t
wl =
Mea
n w
ord
leng
th (
syl-
Rel
ativ
ela
bles
per
10C
wor
ds)
diff
icul
ty,
rela
tive
sl =
Mea
n se
nten
ce le
ngth
inte
rest
pw =
Num
ber
of p
erso
nal
wor
ds p
er 1
0C w
ords
ps =
Num
ber
of p
erso
nal
sent
ence
s pe
r 10
Gse
nten
ces
X1:
Per
cent
of
wor
ds n
oton
Dal
e lis
t of
30G
Cco
mm
on w
ords
X2:
Mea
a. s
ente
nce
leng
thin
wor
ds
Rea
ding
Eas
e =
206
.835
-. 8
46w
1M
cCal
l--
1. 0
15si
Cra
bbs
pas-
sage
sH
uman
Int
eres
t = 3
.635
+ .3
14Pw
ps
Rea
ding
C50
= 1
579X
1 +
.049
6X2
+gr
ade
leve
lne
eded
to3.
6365
answ
er 5
0%of
que
stio
nsco
rrec
tly (
Cso
)
X1:
Med
ian
sent
ence
leng
th G
rade
leve
l
X2:
90t
h pe
rcen
tile
sent
ence
leng
th3:
Perc
ent h
ard
wor
ds (
not
on D
olch
list
of
100C
wor
ds).
Perc
ent o
f w
ords
in e
ach
thou
sand
of
Tho
rndi
ke20
,000
wor
d lis
t
wl:
Wor
d le
ngth
(sy
llabl
espe
r 10
0 w
ords
)dw
: per
cent
age
of "
defi
nite
wor
ds."
nose
s: N
umbe
r of
one
-syl
-la
ble
wor
ds
41: M
ean
sent
ence
leng
th
Mea
n se
nten
ce le
ngth
, per
-ce
nt o
f w
ords
of
3 or
mor
esy
llabl
es
Loo
k up
thre
e va
riab
les
in ta
bles
,av
erag
e th
e ta
bled
val
ues.
Inst
ruct
iona
l, T
abul
ate
wor
ds in
pas
sage
s by
Inde
pend
ent,
Tho
rndi
ke th
ousa
nds.
For
in-
and
aver
age
stru
ctio
nal l
evel
, 90%
of
wor
dsgr
ade
leve
lm
ust b
e be
low
stu
dent
's R
GL
.Fo
r in
depe
nden
t lev
el 9
5%.
R: L
evel
of
R =
168
.095
+ .5
32dw
- .8
11w
1ab
stra
ctio
n"on
arb
itrar
ysc
ale
Rel
ativ
edi
ffic
ulty
Fog
Inde
x(t
he r
eadi
nggr
ade
leve
lne
eded
tore
ad a
nd u
nder
-st
and
mat
eria
l)New
Rea
ding
Eas
e In
dex
=1.
599n
- 1.
015
51 -
31.
517
Fog
Inde
x =
.4 x
(m
ean
sen-
tenc
e le
ngth
+ p
erce
nt o
fw
ords
of
3 or
mor
e sy
Lla
bles
)
Ran
ee o
fM
ater
ial
Stud
ied
Gra
des
3-12
Met
hod
of c
om-
pari
son
with
crite
rion
McC
all-
Gra
des
3-C
iabb
s pa
s-12
, 3-1
6sa
ges,
sch
ool f
or h
ealth
mat
eria
l,pa
ssag
eshe
alth
art
i-cl
es
Rea
ding
text
s
Scho
ol te
xts
Adu
lt, p
ub-
lic r
elat
ions
Gen
eral
adul
t and
scho
ol
Cor
rela
tion
R.E
.: .7
0H
. I. :
. 43
Cor
rela
tion
.70
Gra
des
1-In
spec
tion
6 Gra
des
3-C
orre
latio
n .7
212 Fu
ll ra
nge
Cor
rela
tion
.95
of F
lesc
h"r
eadi
ngea
se"
valu
es
Gra
des
6-In
spec
tion
12
Aut
hor(
s)D
ate
Cri
teri
on
McE
lroy
1953
(AF
Mau
ual
11-3
)
Var
iabl
esPr
edic
ted
Val
ueFo
rmul
a
Typ
e of
Mat
eria
lSt
udie
d
Ran
ge o
fM
etho
d of
com
-M
ater
ial
pari
son
with
Stud
ied
crite
rion
Forb
es &
1953
Mea
n va
lue
Cot
tleof
fiv
e ex
ist-
ing
read
abili
-ty
for
mul
as
Spac
he19
53G
rade
leve
las
sign
ed b
ypu
blis
hers
Fles
ch19
54O
bser
ved
char
acte
rist
ics
of e
asy
and
hard
mat
eria
l
Whe
eler
&19
54G
rade
leve
lC
t)Sm
ithas
sign
ed b
yto
publ
ishe
rs
Tri
be19
56M
cCal
l-C
rabb
s te
stsc
ores
(re
-vi
sed
form
)
Gill
ie19
57Fl
esch
"le
vel
of ^
bst
ract
ion"
scor
e.
Eas
y w
ords
(pr
onou
nced
in1-
2 w
ords
), a
ssig
ned
valu
e 1
hard
wor
ds (
pron
ounc
ed in
3or
mor
e sy
llabl
es),
ass
igne
dva
lue
of 3
Mea
n T
horn
dike
voc
abu-
lary
dif
ficu
lty in
dex,
wor
dsin
4th
thou
sand
and
abo
ve.
X1: .
ylea
n se
nten
ce le
ngth
X: P
erce
ntag
e of
wor
ds n
ot2
on D
ale
list o
f 76
9 ea
syw
ords
Fog
Cou
nt(a
rbitr
ary
valu
e) a
ndre
adin
g gr
ade
leve
l
Rea
ding
grad
e le
vel
Gra
de le
vel
- N
umbe
r of
ref
eren
ces
toA
rbitr
ary
hum
an b
eing
s, e
tc. ,
per
100
sca
les
ofw
ords
real
ism
("r
")-
Num
ber
of in
dica
tions
of
and
ener
gyvo
ice
com
mun
icat
ion
per
("e"
)10
0 w
ords
.
Mea
n le
ngth
of
"uni
ts"
(sim
- R
eadi
ngila
r to
sen
tenc
es),
per
cent
age
grad
eof
mul
ti-sy
llabl
e w
ords
.le
vel
Fog
Cou
nt p
er s
ente
nce
= s
um o
fl's
and
3's
in s
ente
nce
RG
L =
sum
of
l's a
nd 3
's d
ivid
edby
num
ber
of s
ente
nces
.If
val
ueis
ove
r 20
, div
ide
by 2
.If
bel
ow20
, sub
trac
t 2, t
hen
divi
de b
y2.
Loo
k up
mea
n vo
cabu
lary
inde
xSt
anda
rd-
Gra
de 5
-C
orre
latio
n .9
6in
tabl
es to
obt
ain
RG
Liz
ed te
sts
colle
ge
Gra
de le
vel =
.141
X +
.086
X2
Gra
desc
hool
3+
.839
test
s
Gra
des
1-C
orre
latio
n .8
2
"r"
scor
e: ta
bled
val
ue c
orre
spon
d-in
g to
obs
erve
d nu
mbe
r of
ref
eren
c-es
to h
uman
bei
ngs,
etc
."e
" sc
ore:
tabl
ed v
alue
cor
resp
ond-
ing
to'rv
ed n
umbe
r of
indi
ca-
tions
of
dice
com
mun
icat
ion
Gen
eral
adul
tm
ater
ial
Inde
x =
10
(mea
n le
ngth
of
units
x S
choo
lpe
rcen
t mul
ti-sy
llabl
e w
ords
).te
xts
Loo
k up
inde
x in
tabl
e to
obt
ain
RG
L
Adu
ltIn
spec
tion
Gra
des
I-In
spec
tion
4
X :
Mea
n se
nten
ce le
ngJi
Rea
ding
C50
= .0
719X
1 +
.104
3X5
+M
cCal
l-G
rade
s 3-
1gr
ade
leve
lC
rabb
s pa
s-12
need
ed to
2.93
47sa
ges,
195
0X
s: P
erce
ntag
e of
wor
ds n
ot a
nsw
er 5
0%-C
orre
ctio
n is
then
app
lied
thro
ugh
revi
sion
on R
insl
and
wor
d lis
t,of
que
stio
nsre
fere
nce
to ta
ble
of D
ale
and
mul
tiplie
d by
100
.co
rrec
tly (
C50
) C
han
(194
8)
X :
Num
ber
of f
inite
ver
bsA
bstr
actio
nA
bstr
actio
n le
vel =
36
++
X1
1 pe
r 20
0 w
ords
leve
l (ar
-bi
trar
y sc
ale)
X2:
Num
ber
of d
efin
ite a
rti-
cles
with
nou
ns p
er 2
00w
ords
X.,:
Num
ber
of n
ouns
of
abst
ract
ion
per
200
wor
ds.
- (2
X 3
)
Scho
olG
rade
4-
Cor
rela
tion
.83
text
s,co
llege
gene
ral
adul
t ma-
teri
al
Aut
hor(
s)D
ate
Cri
teri
onV
aria
bles
Pred
icte
dV
alue
Form
ula
ype
ofrt
ange
or
Mat
eria
lM
ater
ial
Stud
ied
Stud
ied
Met
hod
of c
om-
pari
son
with
crite
rion
Ston
e19
57G
rade
leve
las
sign
edby
publ
ishe
rs
Pow
ers,
Sum
ner,
&K
earl
Fles
ch
1958
McC
all-
C r
ab b
s T
est
Les
sons
, 195
0re
visi
on
1958
Obs
erve
dch
arac
teri
stic
sof
wri
tten
mat
eria
l
Blo
omer
1959
Gra
de le
vel
assi
gned
by
publ
ishe
rs
Jaco
bson
1965
Mea
n nu
m-
ber
of w
ords
stat
ed a
s no
t-un
ders
tood
by
read
ers
X :
Num
ber
of f
inite
;ver
bsI
per
200
wor
ds
X :
Num
ber
of d
efin
ite2
artic
les
with
nou
ns p
er20
G w
ords
X3:
Num
ber
of n
ouns
of
ab-
stra
ctio
n pe
r 20
0 w
ords
Rec
alcu
latio
n of
fou
r ex
ist-
ing
form
ulas
: Fle
sch
Rea
d-in
g E
ase,
Dal
e-C
hall,
Far
r,Je
nkin
s, a
nd P
ater
son,
and
Gun
ning
. Var
iabl
es a
ndun
chan
ged.
Coe
ffic
ient
s al
tere
d as
ap-
prop
riat
e
Abs
trac
tion
Abs
trac
tion
leve
l = 3
6X
2X
1le
vel (
ar-
bitr
ary
scal
e)-
(2X
3)
XC
SO
XC
50
XC
SO
XC
SO
Cou
nt o
f ev
iden
ces
of in
for-
Arb
itrar
ym
ality
: cap
italiz
atio
n, u
n- s
cale
of
derl
inin
g, p
unct
uatio
n, s
ym-
"for
mal
ity-
bols
, beg
inni
ngs
or e
nds
ofpo
pula
rit
para
grap
hs, e
tc.
- N
umbe
r of
wor
ds p
er m
odi-
Lev
el o
ffi
erab
stra
ctio
n-
Soun
d co
mpl
exity
of
mod
-if
iers
X1:
Mea
n nu
mbe
r of
wor
dsY
: Mea
nnu
mbe
r of
per
inde
pend
ent c
laus
ew
ords
not
know
n pe
rX
2: C
once
ntra
tion
of m
athe
- sa
mpl
e
mat
ical
term
s
X3:
Con
cent
ratio
n of
wor
ds
outs
ide
of m
ost c
omm
on6,
000
on T
harn
dike
list
X4:
Con
cent
ratio
n of
wor
dsno
t on
Pow
er's
list
of
es-
sent
ial s
cien
tific
term
s.
Scho
olG
rade
4-
text
s, g
en-
colle
geer
al a
dult
mat
eria
I
Pesc
h:X
C50
= -
2.2
029
McC
all-
+ .0
778
(sen
tenc
e le
ngth
).0
445
Cra
bbs
pas-
(syl
labl
es p
er 1
0G w
ords
)sa
ges,
195
0re
visi
onD
ale-
C h
all:
Xc5
0=
3. 2
672
+ .
0596
McC
all-
Cra
bbs
pas-
(sen
tenc
e le
ngth
) -
.064
8 (p
erce
nt s
ages
, 195
0of
wor
ds n
ot o
n D
ale
list)
revi
sion
Cor
rela
tion
.83
Gra
des
3-C
orre
latio
n .6
412 G
rade
s 3-
12
Farr
-Jen
kins
-Pat
erso
n:M
cCal
l-G
rade
s 3-
Xcs
o=
8.4
355
' .09
23C
rabb
s pa
s-12
(sen
tenc
e le
ngth
) -
.064
8 (p
erce
nt s
ages
, 195
0of
one
syl
labl
e w
ords
)re
visi
on
Gun
ning
: Xcs
o=
3.0
680
+ .0
877
McC
all-
Gra
des
3-C
rabb
s pa
s-12
(sen
tenc
e le
ngth
)* .0
984
(per
cent
sag
es, 1
950
of o
ne s
ylla
ble
wor
ds)
revi
sion
Find
tota
l num
ber
of e
vide
nces
of
Popu
lar
toA
dult
info
rmal
ity p
er s
ampl
e, u
se ta
ble
scho
larl
yto
det
erm
ine
form
ality
-pop
ular
ity p
erio
dica
lsca
tego
ry.
Gra
de le
vel p
redi
cted
fro
m th
eR
eadi
ngva
riab
les
liste
d.Pr
oced
ure
not
text
sde
scri
bed.
Phys
ics
text
s: Y
= -
.001
3 +
29.7
059X
24
21.1
19X
3 4
35.0
029X
4
Che
mis
try
text
s: Y
= .0
C3
1706
X1
4 13
.723
1 X
2 -
43. 7
262X
3-
2. 3
577X
4
Phys
ics
&C
hem
istr
yte
xts
Cor
rela
tion
.71
Cor
rela
tion
.S8
Cor
rela
tion
.59
on
Gra
des
1-C
orre
latio
n .7
86 H
igh-
scho
ol &
colle
ge
Cor
rela
tion
Phys
ics
text
s:.7
0C
hem
istr
y te
xts:
.67