document resume md 095 900 kauffman, dan; and others · document resume. md 095 900 ir 001 082....
TRANSCRIPT
DOCUMENT RESUME
MD 095 900 IR 001 082
AUTHbr Kauffman, Dan; And OthersTITLE The Empirical Derivation of Equations for Predic'cing
Subjective Textual' Information. Final Report.INSTITUTION Air force Human' Resources Lab., Williams AFB, Ariz.
Flying Training Div.REPORT NO. AFHRL-TR-74-55PUB DATE Jul 74NOTE 40p.
EDRS PRICE MF-$0.75'HC-$1.85 PLUS POSTAGEDESCRIPTORS *Education; *Information.Theory; *Instructional
Technology; *Psychology; *Training
ABSTRACTA study was made to derive An equation for predicting
the "subjective" textual information contained in a.text of materialwritten in-the English language. Specifically, this investigationdescribes, by a mathematical equation, the relationship between the"subjective" information militant of written textual material and therelative number of errors committed by a learner when asked topredict, letter by letter, the content of given textual material. Therelationship shows that the subjective information of a given textfor a specific learner is directly proportional to the number ofwrongly-guessed signs made by that learner. This is expressedmathematically by: 1=3.1E; Where: I=Information in Bits; E=Number ofwrongly gueiSed signs. The application of Shannon's guessingprocedure (1951) in this study permits the measurement of the',subjective information of a given text for a specific learner. Thederived equation permits the measurement of information in tel.'s of avalue that is dependent not only on the inherent qualities of thesubject matter, but also on the internal state of the learner.(Author /. WCM)
)3E T COPY. AMAMIAFHR.I.-TR-74-55
FOCE
110.1.1111PARTIAN NT OP HEALTH,
'EDUCATION a WELFARENATIONAL INSTITUTE OP
EDUCATIONTHIS DOCUMENT HAS IIE,EN REPRODUCED EXACTLY AS RECEIVED FROMTHE PERSON OR ORGANIZATION ORIGINATISIG IT POINTS OF VIEW OR OPINIONSSTATED DO NOT NECESSARIL,V REFIRESENT OFFICIAL NATIONAL INSTITUTE OFEDUCATION PDSiTION.OR POLICY
H PTHE EMPIRICAL DERIVATION OF EQUATIONS FOt
SUBJECTIVE TEXTUAL INFORMATION
'AN
R
S
By
Dan KauffmanMike JohnsonGene Knight
Arizona State UniversityTempe, Arizona 85281
FLYING TRAINING DIVISIONWilliams Air Force Base, Arizona 85224
July 1974
Approved for public release; distribution unlimited.
LABORATORY
AIR FORCE ,SYSTEMS COMMANDBROOKS. AIR FORCE BASE,TEXAS 78235
NOTICE
.
When US Government drawings, specifications, or other dap. are usedfor. any purpose other than a definitely related Governmentprocurement operation, the Government thereby incurs noresponsibility nor any obligation whatsoever, . and the fact that theGovernment.. may have formulated, furnished, or in any way suppliedthe said drawings, specifications, or other data is not to be regarded byimplicatioin or otherwise, as in any manner licensing the holder or anyother person or corporation, or conveying any rights or permission tomanufaciure, ust., or sell any patented invention that may in any waybe related thereto.
This final report was submitted by Arizona State University, Tempe,Arizona 85281, under contract F41609-71. 00027, pr. ject '1123, withthe Flying Training Division, Air Force Human Resources Laboratory(AFSC), Williams Ai: Force Base, Arizona 85224. Captain Gary B 'Reid,Flying Training Division, was the contract monitor.
This report has been reviewed and cleared for open publication and/orpublic release by the appropriate Office of Information (01) inaccordance with AFR 190-17 and 1 5230'.9. There is no objectionto unlimited distribution of this ieoort to the public at large, or byDDC4o the National Technical Information Lervice (NTIS).
This technical report has been reviewed and is approved.
WILLIAM V. HAGIN, Technical Director. .
Flying Training Division
Approved for publication.
HAROLD E. FISCHE., Colonel, USAFCommander
UnclassifiedSECURITY CLASSIFICATION OF THIS PAGE (Whorl Data Entered)
REPORT DOCUMENTATION PAGEREAD
BEFORE COMPLETING FORM
I. REPORT NUMBER
AFHRLTR74.55
2. GOVT. ACCESSION NO. 3. RECIPIENT'S CATALOG NUMBER
4. TITLE (and Subtitle)
THE EMPIRICAL DERIVATION OF EQUATIONS FORPREDICTING SUBJECTIVE TEXTUAL INFORMATION
'
5.. TYPE OF REPORT & PERIOD COVERED
Final
6. PERFORMING ORG. REPORT NUMBER
7. AUTHOR(s)Dan KauffmanMike JohnsonGene Knight
S. CONTRACT OR GRANT NUMBER(s)
. .
F41609.71-C0027
9. PERFORMING ORGANIZATION. NAME AND ADDRESS
Arizona State UniversityTempe, Arizona 85281 .
10. PROGRAM ELEMENT. PROJECT, TASKAREA & WORK UNIT NUMBERS
r
11230205
II. CONTROLLING OFFICE NAME AND ADDRES:i
Hq Air Force Human Resources Laboratory (AFSC)lo
Brooks Air Force Base, Texas 78235.
.
12. REPORT DATE
July 197413. 'NUMBER OF PAGES
40
14. MONITORING AGENCY NAME & ADDRESS(!( different from Controlling Office)Flying Training Division ..
..
Air Force Human Resources Laboratory
.crWilliartis Air Force Base, Arizona 85224
.
'
15. SECURITY CLASS. (of this report)
Unclassified
15a. DECL ASSI FICATION/ DOWNGRADINGSCHEDULE
16 DISTRIBUTION ,STATEMENT (ol this Report)
Approved forfor public release; distribution unlimited. .
, .
7. DISTRIBUTION STATEMENT (of :he abstract entered In Block 20, If different from Report)
13. SUPPLEMENTARY NOTES .
This report was approved (74-58. 9 April 1974) for Publication in the Instructional Science, an internationaljournal.
19. KEY WORDS (Continue.on reverse side if necessary and Identify by block number)
.information theory #psychologyinstructional. technologyeducation .
training
20. ABSTRACT (Continue on reverse side If necessary and identify by block number)Parameters pertaining to information processing by human beings have, in the past, been determined by
learning and memory experiments with nonsense syllables, number sequences, etc. However, in the real world we areconcerned with the processing of "meaningful information" not senseless texts, Therefore, the determination ofsubjective information directly from meaningful material becomes extremely important in the instructionlearningprocess.
This ,study derives an equation for predicting the subjective textual information contained in a text of material:written in the English language. Specifically, this investigation describes, by a matheinatical equation, therelationship between the subjective information content of written textual material and the relative number of errors
DD 1 JAN 73 1473 EDITION OF 1 NOV 65 IS OBSOLETEUnclassified
SECURITY CLASSItrICATION OF THIS PAGE (When Data Entered)
Unclassified
SECURITY CLASSIFICATION OF THIS PAGE(Whon Data Entered)
, Block 2t.) (continued)
committed by a learner when .asked to predict. letter by letter, the content of given textual material. Thisrelationship shows that the subjective information of a given text for a specific learner is directly proportional lo thenumber of wronglyguessed signs made by that learner. This is expressed mathematically by
Whe
1 3.1 E
re: I = Information in Bits
E = Number of wrongly guessed signs
The application of Shannon's guessing proce'dure (1951) in this study permits the measurement of thesubjective information of a given text for a specific learner. Unlike senseless texts, the subjective information of ameaningful text varies from learner to learner. Therefore, the derived equation permits the measurement ofinformation in terms of a value that is dependent not only upon the inherent qualities of the subject matter, but alsoupon the internal !tate of the learner.
The dethe German I
rived equatIbn fOr the English languag.: is then compared to an equation derived by Weltner (1967) foranguage and found to be remarkably s nilar.
UnclassifiedSECURITY CLASSIFICATION OF THIS PAGE(When Data Entered)
PREFACE
This study was Conducted under Project 1123, USAF Flying Train-ing Development, Task 112302, Instructional Innovations in USAF FlyingTraining.
- The research was carried out under the provisions of contractF41605-71-00027.by the Educational Thchnology Department of ArtzonaState University, Tempe, Arizona 85281. Contract monitor was CaptainGary B. Reid.
4
BEST COPY &RABLE
TABLE OF CONTENTS
LIST OF TABLES
Pale
LIST OF FIGURES5
.INTRODUCTION 7/
Objective 7 .
Rationale 7
Instruction Learning Process -8.
Quantifying Information 8Examples of Guessing Procedlre '11
Procedures and Results 12
PHASE I: COMPUTER PROGRAM DEVELOPMENT . . . . ..... . . . . 13.
Proge6m CYBER4 13
Program CYBER2 13
Program EQUFIT 14
Program PTAFIT 15
PHASE II: SELECTION AND PRESENTATION OF TEXTUAL INFORMATION . . ,15
Experimental Subjects 15
PHASE III: CALCULATION OF SUBJECTIVE INFORMATION 17
PHASE IV: DETERMINAT1ONOF EQUATIONS 17
PHASE V: SELECTION OF "BEST-FITTING" EQUATION
The English Equation 14
The German Equation 34
CONCLUSION 36
REFERENCES 37
3
LIST OF TABLESova
Table'
. .
1. X and Y Coordinates for Hypothetical Sample
2. X and Y Coordinates for Group .3
3. Least Squares Curve Fit - EQUFIT Program
4. PTAFIT Least Squares Curve Fit r
5. Equation Type 6: YwX/(A+B*X
6. Equation Type 1:i YagA+(B*X)
.1)221
14
18
19
20
22
22.
7. 'Comparison of Indices of Determination 23
8. Equation Type 1: YA+(B*X) 24-
9.' Equation Type 6: YsiX/(A+B*X) 25
10. Outlier Test Using Chazal's Method 26
11. Outlier Test Using Chazal's Method 27
12. Least Squares Curve Fit with Outlier Removed . . . 28
13.. Equation Type 3: Ys.A*(X+8) - Least Squares"CurveFit with Outlier Removed 29
14. Equation Type 1: Y..A+(B*X) - Least Squares CurveFit with Outlier Removed 30
15. Comparison of Indices of Determination withOutlier Removed 31
16. 'Equation Type 1: Y=A+(B*X) - Summary Table withOutlier Removed 32
17. Equation Type 3: Y=A*(X4 - Summary Table withOutlier Removed 33
4
BESTcon AVAILABLE
LIST OF FIGURES
Figure, 11
1. SimpliCed View of Communication System 9.
2. Plot of English and German equations 21
0
BEST COPY AVAILmLL
THE EMPIRICAL DERIVATION OF EQUATIONS FOR PREDICTING
SUBJECTIVE TEXTUAL INFORMATION
ts.
Introdu.ztlon
This stydy was designed to empirically derive an equation f8rpredicting the subjective textual information contained in a text ofmaterial written in the English language. Specifically, this investi- .
gation describes, by a mathematical equation, the relationship betweenthe subjective InfOrMation content of written textual material and therelative number of errors committed by a learner when asked co predict,letter by.letter, the content.of the textual material.
Rationale
Parameters pertaining to information processing by human beingsPave; in the past, been determined.by learning and memory experimentswith nonsense syllables, number sequences, etc. Because of the rela-tive simplicity of these types of experiments, the flow of informationand the subsequent information processing of these senseless texts'could be measured and varied according to exact prescriptions. However,in the real world we are,concerned with the processing of "meaningfulinformation.," not senseless texts. When the parameters are determinedfrom senseless texts uncertainty is contained in the extrapolation tomeaningful materials. Therefore, if we could determine the subjectiveinformation directly from meaningful material, we would most certainlyreduce these uncertainties.. As a result, one would be able to answerquestions of the following type: 0
1. What is the amount of information contained in the lawsof thermodynamics for a specific learner?
2. How much information does a learner gain, in a certainperiod of instruction, under a given set of Instructionalconditions?
3. How great is the flow of information through a particularinstruction-learning.channel? Is it too much? Is it toolittle?
h. How much does.the learner already know?
5. How much does the learner need to know?
7
tot usi wooInstruction Leaftlina_Process
instruction and learning are two vital aspects of the educationalprocess that depend upon the manipulaticn of three activities: theinput of meaningful information; the processing of this information;and the output of meaningful information. These three activities forMthe basis for the science of information theory.
The science of information theory, since it deals with the fun-damental processes involved in the instruction-learning process, pro-vides the quantitative instruments needed to describe this process.
When we discuss the instruction-learning process we are essen-tially describidg a communication system. Typically, a communicationsystem can be described in terms of the following (see Figure 1):
Information source: The mechanism that selects a desiredmessage. out of a set of possible messages.
Transmitter: Encodes the message into a signal.
Channel: The medium through which the signal is transmitted.
Receiver: Accepts and decodes the transmitted signal intothe message.
Destination: Interprets the message.
Noise: Unwanted additions imposed on VA message. Theseadditions can be either from an external source, inter-nal source or combinations of the two. Noise changesthe intended message..
Quantifying Information
Since instruction is concerned with the transference of infor-mation and learning is the act of .(or the result of) the informationtransfer, it is obvious that we are not only describing a communicationsystem, but also an instruction-learning system. The information sourceand transmitter being the teacher, textbook, computer, etc., while thereceiver is the learner. Because our instruction-learning system is eninformation procesFing system, we are now faced with several questions:
4
1. What is information?2. How can we measure information?3. What are the characteristics of an efficient coding process?4. How could we measure the capacity of a channel?
8
0, BEST COPY AVAILABLE
1;- Information Source
2 - Transmitter3 - Channel
4 - Receiver5 - Destination6 - Noise
Figure .1 .--Simpl ified View *of Communication System.,
9
Aug ABLE
5. What are the general characteristics of noise?6. What are the effects of noise on a message?7. How do we increase or reduce the effects of noise?
All of the questions are intriguing and deserve attention. How-ever, this investigation addresses itself only to the first two ques-tions. The other questions provide a basis !or additional studies alongwith extending the results of this investigation.
Many definitions of information are given in the literature(Ash, 1965; Edwards, 1964; Frank, 1962; Fuchs, 1968; Guilbaud, 1959;Khinchin, 1957; Singh, 1966; Weltner, 1970; & others). But one appro-priate for this investigation is the definition given by Weaver (1969).He says, "Information is a measure of one's freedom of chnice when one'.selects a message." This conveniently ties the definition of informa-tion and its measurement directly together.
To take the simplest case of twv) alternative messages (yes orno), we will say that the information associated with this ensembleof two messages is unity. The concept of information, then, appliesnot to the individual message but to the ensemble as a' whole. '
To oe'more precise, we; an define the amount of informhtion bythe logarithm of the number of available choices (messages) containedin the ensemble. Thus, in our case of two messages (yes or no), theinformation is proportional tolthe logarithm of 2 to the base 2 suchthat: 1og22 u 1 which is unit. This unit of information is oajledia "bit" from the two words binary digit. To generalize then, l'f7Vill..dhave 8 alternative messages, among which we are equally free to choose,we have log28 = 3 giving us an/ensemble consisting of 3 bits of infor-mation.
This situation is completely adequate if we are only dealingwith messages than have an equally probable chance of being selected.The probability of this occurring in language structure is extremelysmall and is zero for the/English language. In this study we are con-cerned with the information content cF the English language which iscowposed of messages (letters, signs) arranged into stochastic ensem-bles. These ensembles are arranyed according to certain probabilitiesin which the probabilities depend on previous events.
Because of the probabilistic dependencies of the message (let-ters), it becomes exceedingly more difficult to measure the amount ofinformation in the English language. However, Shannon (1951) deriveda "guessing procedure" for analyzing the syntactical information contentof meaningful texts. His guessing procedure enables one to calculatethe subjective information of a text for a specific learner. Thelearner attempts to guess the ensemble, letter by letter. After each
BEST COPY AVAILABLE
guess he is told whether his guess v4s correct or not. If incorrect,
he guesses again. If correct, he guesses the next letter or message
of the ensemble until the entire ensemble has, been derived. This pro-
cedure is based on the assumption that at each guess the learner-will
name the letter with the highest subjective probability. Thus, we have
.a measure of the amount of information a particular ensemble has for_a_
particular learner. Shannon's equation forms the basis for this inves-
tigation and is presented later. An example of the guessing procedure
and subsequent analysis follows.
Example of Guessing Procedure
rhe guessing procedure enables one to empirically estimate the
subjective information of a text for a particular subject. The subjec-
tive information Lan be cakulated ac'ording to several techniques.
However, each .echnique assumes that the subject guesses the--.sign with
the highest subjective probability. Consider the following hypothetical
example:
THE COW JUMPED 0
612 1 421 1 811111 f 3
Under each letter of the text is the number of guesses the subject made
until the correct sign (letter) was guessed. From these numbers it is
relatively easy to calculate the information content of the -text by
adding the information value of the signs. The first letter (T) con-
tains 1o926 bits of information, the second letter (H) 1og21 bits of
information. Therefore, the first letter (T) contains 2.58 bits(1og26 a 2.58) whereas the second letter (H) contains 0 bits (log21 a 0).
While the first letter did contain information for the subject (2.5.8
bits), the second was perhaps expected with certainty and was, therefore,
devoid of any information (0 bits). The total information of the text
can now be measured in bits as:
H(text) m 1og26 + 1og21 + 1og22 + 1og21 + 1og24 + 1og22
+ logol + log21 + 1og28 + 1og21 + 1og21 + log21t
+ log21 + 1og21 + 1og21 + 1og23,
= 1og28 + 1og26 + 1og24 + 1og23 + 2 1og22 + 10 1og21
. 3.00 + 2.58 +.2.00 + 1.58 + 2.00 + 0
= 11.16 BITS
PRE CRY AVAILABLE
Dividing this by the length of the 4.xt N = 16 we arrive at .698 Bits/Sign forhe information content of our sample text.
Although this method is relatively simple, it does not providethe best estimate of the information content. Shannon.(1951) derivedequation (1) which according to Frank (1962) is not only a better mea-sure but also is closer to the actual figure than other calculations.would indicate.
H(text) = E (r ld(r) - (r-1) ld(r -1))
Where: N = amount of information measured in bits
Nr= absolute frequency with which the number r occurs
.
in the guessing sequence of .N signs
Id-= logarithm to the base 2 (log2r)
Applying Shannon's equation to our example, we obtain 18.248 bits asthe information of the text segment or 1.405 bits/sign which is abetter estimate than our previous figure of .698 bits/sign.
Procedures and Results
This study consists of five procedural phases of operation.They are::
'Phase I: The design and development of computer programsto present and process the experimental data.
Phase If: -The selection and presentation of textual infor-mation to selected subjects and the collectionof their responses.
Phase III: Calculation of subjective information contentcontained in the textual material for selectedsubjects.
(1)
Phaie IV: Derivation of equations to fit experimental data.
Phase V: Selection of "best-fitting" equation, and deletionof outlier point.
12
la
PHASE I
eEsr colt *Novo
Computer Program Development
Two computer programs were written specifically for this inves-
tigation. Two other programs were used in the synthesis of the data.he programs, written in BASIC, were designed to operate in a time-
shared environment.
Program CYBER4
Program CYBER4 was designed as a general purpose program to pre-sent seletted textual passages to the subjects. CYBER4 can present
textual information by letters, wordi, phrases, sentences, paragraphs,or any combination thereof. The content presentation can be predeter-mined by the investigator or the program can be adapted to functionunder learner control. In addition, it can also be adapted to displayinformation at any intellectual level.
The program initially outputs an extended phrase or set ofphrases. The length of the phrase is pre-programmed by the investiga-tor and, for the most part, depends on the complexity of the materialand the relative. sophistication of the subject with regard to the
.academic content.
After the initial phrase is printed by the computer terminal,the subject is required to guess, letter by letter, the subsequentmessage. If the subject is incorrect, his response is placed in a"response file;" a zero is recorded in a "score file," and the subjectis required to guess again. This process continues until his response
is correct.
If the subject's guess is correct, his response is placed inthe "response file;" a one is recorded in the "score file;" his cor-rect answer is. acknowledged by the computer; and the next letter inthe guessing sequence is presented. When the subject has respondedto.the entire text, he is thanked for his effort and the time of com-pletion is recorded by the computer.
Program CYBER2,
Program CYBER2 was developed to calculate the amount of infor-.
.matIon for each subject, using equation (1) derived by Shanhon (1951).
13
As can be seen. in the sample output listed in Table 1, CYBER2outputs the percentage of incorrect responses (X s) per subject and theamount of information per subject (Yss) contained in the text for thatparticular subject. The measurement of information is in units ofbits/sign, the sign being the message unit letter. Each pflir of numbersrepresent the X and Y coordinates used in fitting an equation to thedata. In addition to calculating the X and Y values, CYBER2 also ordert.the results from the lowest to highest on the percentage of incorrectresponses (Xss
TABLE I
X AND Y COORDINATES FOR HYPOTHETICAL SAMPLE
Percent Incorrect (Xss) Bits/Sign (Yss)
X( 1) = 22.2222 Y( 1) - .9148X( 2) = 25.0000 Y( 2) - .5629X( 3) = 26.0000 Y( 3) -1.3686x( 4) = 60.0000 yc 4) 1.8500
Program EQUFIT
Program EQUFIT is a commercially developed program available onthe MG-255 time-sharing compUter system at Arizona State University.EQUFIT applies curve fitting techniques to determine which of six generaltypes of curves best fits the supplied data. The six general types ofcurves range from a linear function to a hyperbolic function.
The X and Y coordinates generated by Program CYBER2 are used byEQUFIT to determine the equations that have the l'best fit" for the sup-plied data. EQUFIT outputs the following information for each equationspecified by the user:
1. Index of determination for each equation
2. Coefficients for each equation
3. X and Y values as supplied by user
4. Calculated values of Y for each specified equation
5. The percentage difference between the/actual Y value andthe calculated Y value for each tpecified equation.
14
Program PTAFIT.r,
PTAFIT is vmodified and improved version of EQUFIT developed byPlan-Test Associates of Phoenix, Arizona.* PTAFIT greatly extends the
capabilities oEQUFIT. In addition to the information supplied byEQUFIT, PTAFIT supplies the following:
BESTCOPYAMIIABLE
1. The level of significance at which the equation fits thesupplied data.
ti
2. The value of each coefficient, the standard deviationassociated with that coefficient and confidence limitsfor the coefficieht. The confidence limits are chosenby tile user and can be at the 90%, 95;, and/o the 99%level of significance.
3. Confidence bands about each (X, Y) coordinate. The user
maY'specify the 90%, 95%, and/or 99% prediction limits.
4. Evaluation of how significant is the apparent differencesbetween equations.
PilASE II
Selection and Presentation-of Textual Information
The textual information for this investigation was selectedfrom two areas. The first area was concerned with academic facts inthe areas of information science. All participants had either priorexperience in the academic field or were enrolled in an informationscience course. The second area was concerned with current events.The most topical of current events, at the time of the investigation,was the Watergate affair. It was being presented daily on televisionand in the news media. It was thought that all subjects would beknowledgeable in this area; however, this was not the case. A greaterpercentage of incorrect responses was made on the Watergate textualmaterial than on the information science textual material.
Ex erimental Subjects
A total of 239 subjects were available from the population ofgraduate students in the College of Education. An analysis of the
*Plan-Test Associates, Financial Center, Suite 609, 3443 NorthCentral Avenue, Phoenix, Arizona 85012.
15
GOPY WAKE
subjects' academic background indicated that all participants werefunctioning within the normal range of"intelligence. All subjectsused in this investigation were enrolled in College of Educationgraduate classes in an uncontrolled manner typical of college enroll-ment. It Is assumed that all subjects used in this investigation arerepresentative, in terms of language skills, of the population ofcollege graduates whose primary language, is English.
Out of the 239 subjects available to participate in this inves-tigation 118 were selected. These 118 subjects were divided intothree groups based upon their academic, background.
Croup 1: 25 subjectsA test group to validate the computer programs andselection of the textual information.' The dataderived from these subjects was not used in thefinal determination of the equation.
Group 2: 59 subjects
An experimental group of subjects who were exposedto the two types of textual materials. Group 2subjects were of varied academic background andtherefore considered to be less reliable in termsof their usefulness as a homogeneous group. Thetextual material presented to this group consistedof both academic materials (Information Science)and current material (Watergate affair). The data :
generated by Group 2 was suspect because of numer-.ous computer failures throughout the, experimentalperiod and was therefore rejected.
Group 3: 34 subjects
An experimental group of subjects who were equiva-lent in terms of academic background in the fieldof information science. All of the Group 3 sub-jects were concurrently enrolled in an informationscience course at the time of the study. Thetextual material presented to this group wasdirectly concerned with previously learned mater-ial and material they were about to study in theirclass. This group provided the most reliable datafor the final determination of the derived equation.
16
BEST COPY AVAILABLE
PHASE III
Calculation of Subjective Information
After all subjects had completed the guessing toquinces, ProgramCYBER2 processed the "score file," calculated the percentage of incor-rect responses and computed the subjective,'textual information foreach subject according to equation. (1):
H(text) = E Nr (r .1d (r) - (r-1) ld(r-1)) (1)
Table 2 presents the results derived, from test Group 3 as calcu-lated By Program CYBER2. This table shows the percentage of incorrectresponses and the corresponding information content for each of thesubjects. The table indicates that, in general, as the percentage ofincorrect responses increases the amount of information increases.This Aends to confirm the theoretical predictions.
PHASE IV
Determination of Equations
After all subjects had completed the guessing sequences, programCYBER2 processed the "Score File," calculated the percentage of incor-rect responses and computed the subjective textual information for eachsubject according to equation (1). This provided the necessary infor-mation for EQUFIT to process.
Program EQUFIT'provides a least squares curve fit for the datagenerated in Phase III by program CYBER2. The six curve types andresults are displayed in Table 3. The index of determinatior indicatesthat equatibns 6, 3 and 1, respectively, provide a fit that is "qUitegood."
Examination of Table 3 would tend to indicate that equation (6)is the best fit. Although this equation shows the highest Index ofDetermination, the other two equationl are so close that it is difficultto judge which is the best fitting equation using only the Index ofDetermination.
Ope would prefer the linear function because of its ease ofusage, but only if it proved to be "as good a fit" as the other.func-tions. The millatiOn derived by Weltner (1967) for the German,language
17
. Table 2
X AND Y COORDINATES FOR GROUP 3
Percent Incorrect Bits /Sign
x( 1) = 7.6923 Y( 1) .2281x( 2) 4. 10.0000 Y( 2) . .2664x( 3) = 10.0000 Y( 3) a ..2897x( 4) 12.5000 Y( 4Y as 3255x( 5) - 15.0000 Y( 5) / .
x( 6) - 15.0000..4942
6) .5328x( 7) = 15.3846 Y( 7) . .4238x( 8) = 15.3846 Y( 8) , 3077x( 9) = 17.5000 Y( 9) .4821x( 10) ' 17.5000 Y( 10) mi .4724x( 11) = 20.0000 Y( 11) a. .5407x( 12) = 20.0000 Y( 12) 1. .6020x( 13) = 20.0000 `I( 13) .5264x( 14)
x( 15)
= 20.0000= 20.0000
Y(
Y(14) a .5286,
15) so' :5627x( 16) = 20.0000 `I( 16) .5698x( 17) = 21.4286 Y( 17) .5364X( 18) = 22.5000 Y( 18) .6406x( 11) = 27.2727 X( 19y - .6918x( 20) = 27.5000 Y( 20) 1. .9640x( 21) = 27.5000 Y( 21) = 1.0123x( 22) = 27.5000 Y( 22) 9593x( 23) = 27.5000. Y( 23) .8112x( 24) = 32.5000 Y( 24) , .9116X( 25) = 35.0000 25) 1. 1.2194X( 26) m' 35.0000 Y( 26) = 1.2905x( 27) = 35.0000 Y( 27) a 1.2210.X( 28) = 37.5000 Y( 28) = .7500X( 29) = 37.5000 Y( 29) = 1.2191X( 30) = 38.4615 30) = 1.1554X( 31) - 40.0000 Y( 31) = 1:2366x( 32) 42.5000 32) = 1.3555x( 33) = 45.4545 Y( 33) = 1.2501x( 34) 56.2500 Y( 34) mi 1.5010
18
11W
BEST COPY AVAILABLE
1
TABLE3
LEAST SQUARES CURVE FIT
EQUFIT PROGRAM
In ea7cir
Curve Type Determination A
1. Y=A+(B*X) .892389' -1.49193E-02 3.02491E-02
2. Y=A*EXP(B*X) .851611 .229436 4.19008E -02
3. Y=A*(X411) .922853 2.41865E-02 1.05825
4. Y=A+(B/X). .693833 1.35166 -12.2042
5. Y=1/(A+B*X) .711047 3.48035 - 6.92674E-02
6. Y=X/(A+B*X) .930395 36.2543 - 5.09608E-02
19
.
BEST COPY AVAILABLE
is of a linear nature (see Figure 2). This would lead us to suspect. that the English language equation might Also exhibit similar linear'properties. Therefore, it was decided, at this point, to further ana-lyze the Group. 3 data. This is done in Phase V. .
PHASE V
Selection of "Best- Fitting" Equation.
Phase V, in part, utilized the progrim PTAFIT to further analyzethe equations to determine which would be the most appropriati todescribe the empirical data.
Table 4 shows the leastcurves fit generated by program PTAFIT.All equations.are shown to be significant at the 99% level of confi-dence.
TABLE 4
PTAFIT LEAST SQUARES CURVE FIT
Curve Type Index of
DeterminationA B Significance
90% 95% 99%
V 1*k+(B*X) .8924 -1.494E-2 3.025E-2 * * A
2 Y=A*EXP(B*X) .8517 .2294 .0419 * * . *
3 Y=A*(XtB) .9219 2.418E-2 1.058 * * *
4 Y=A+(B/x) .6918 1.353 -12.2 * * *
5 y=1/(A+Px) .7ri1 3.48 - 6.927E-2 * * *
6 Y=X/(A+B*A, .9304 36.26 - 5.097E-2 * * *
Number of Observations: 34
Equations 6, 3 and 1 again exhibit the highest Index of Deter-mination and therefore are prime candidates for further examination.Tables 5 and 6 show the standard deviation and confidence limits forthe coefficients of equations 6 and 1.
20
4.ZZ
3.EZ-
3.ZZ-
2.ZZ
PLOT OF: Y(E) = .03 IX - .031
(ENSLI51-1)
Y: G) = .1,4312X - .EBE
CEERMN)
ENSLIEH EURTiLN nY: KRIFFrEn
EE BEN
ERSTION EY: NELTNER
=r
rze
riin
LC
N OF LNG:PRP-CT RESPONSES
Figure 2.--Plot of English and German equations.
c3
/ Yu).-
.
1!
YCE)
TABLE 5
EQUATION 'TYPE 6
Y=X/(A + B*X)
or.
BEST COPY AMIABLE
Coefficient Value Standard 95%Deviation Confidence Limits
A 36.26 ''1.75 32.68 39.83
B - 5409E-2 9.52E-2 - .24 .14
TABLE 6
EQUATION TYPE 1.
Y=A+(B*X)
Coefficient Value Standard 95%Deviation. Confidence Limits
A -1.49E-2 5.20E-2 .12 9.11E-2
B 3.02E-2 1.85E-3 2.64E-2 3.40E-2
22
Bra copy
Tab:3 7 shows the PTAFIT output comparing the fit Of all pairsof equations. It shows that equation 6 does not.fit significantlybetter,than equation 1. Therefore, since equation 1 represents astraight line it will be given special consideration inviewof itsconvenience of interpretation.
, TABLE 7
COMPARISON OF INDICES OF'DETERMINATION
95". CONFIDENCE
2 :3 '5 E.
A_31..2
4
99% CONFIDENCE
CURVE' NO.1 2 3 4 5 6Ob.*
. 0.C WOO. 0
4
r ALTERISKS INDICATE PAIR:: OF CURVES RRE'THE(NOT S: "31 TO HAVE SIGNIFICAOTLY DIFFERENT INDICU!DETERMINATION.AT THE STATED CONFIDENCE LEVEL:).
=" 11 me.pl`I 8 and 9 show further output from the PTAFIT program. Theint with the coordinates of 37.5 and)0.75 appeared suspiciously dev-'
iAnt and an outlier test run on the residuals confirms this suspicion.,The outlier test was run using a Plan-Test Associates program based ona method reported by Chazal (1967). Thegraphical rand tabulaoutputf om this'program are shown in Tables 10 and 11, respectively.
In view of thii, PTAFIT was rerun omitting the outlier. Theresulting outputs are shown in Tables 11 through 17. We'now find thatequation 3 fits better than equation 6 (see Table 12), but still neither'is significantly better than equation'l (see. Table 15).
23
1 ...... t
it -C; -i. ..:. .J.) C. 1, '. f.....1 1...? (....., C.? i'' r': f ,) IN) ro roro ro r...' ro ro r....o rt) "..: :s,Cr, 0 r0 .= C.) '' 1 i -.-.1 Ln i_n LC ro -.) -.4 -.I -.4 -.4 ro -- .-..-. -4 -,i e_n ...n e_n i_n ro c-. c.
i . . c...-......-.....r.:_ '_n . 4 e _n e_n i_n e_r. '_fl .;_n e..1 r. til -: ,r fn io.,. f_n - .1 -,i_n Ln ..
1
-4 ,..; CO e_n ro .7.1i
,10 LnCT
. 11. ..-
ro
17, O-CCt 4. 9--j tfi 6.f.
'4) 1:' f: CT, er Cr 121 Crl 1_01 -: e_n ro roCr} Et t:r. -,1 rt: Cf:. .
)- .4) .- o) r.... ro rt. e_n .4) C7,r- 06 .. CC, CC, ro -: --
4b,
-'7 f0 '.:-, ' 1:1 *..:, '4:1 '.f.' 1.: '-'6 '''' (..11 1. ,...) 1.0 Cr, CC, CC ''' I
''''' 4- 4. 4-. .... Cr, C; c*, 17, -,..n io.-..--..-. .-..--..-. 46 -: C.:, ..f., CO C.: ' -,1 ---.1
4) -: -;.. 41. I-0 ..1:. ..1) ..1:1 -.L. - -2 (.0 ' 4. W 4: 011'...) rij Lr 0 -4 Itr-
I t 1 1 1 1 1 1 1 1 1111111 1 1
- CO Cr% (0 CO (0 r0 1'0 0.- (0 ro CO (0 4. 0) CN (0 ro -4- .7, co f..! CO -4 -A -U ck n) N
,c7.. . 0) . cr.. -4 .--. cr. N cr,
c.. ..1) 17, f_;t -4 CO it ,7 a J = n
Cr% - r..: CC4 CC, CT, CO o) .4) .4) ro 4 -A 4. C.) CO
' CO CO T., -.4 -4 -4 -4 o W 0 fj.) ej)
j '7.1,Y, 0) OD 0) .7% T. Cr. CIN _Tij I,,iv Ii W 0 '.1) ti, :0
cD n! (J! fJJ o LI o o (0 70 ri!e!) .4.) 0) 0) CO 0) .4 0 C. 0 .4 Cr.. Cr. (7% cr. Cr% CO tit 6-,
e_n
CNnj
.
4- nJ-1 o- ! -
(.! 6--
.1
RI ro LI
ro 4 -4- CO ':c CO cr.M M M1 1 i
ro rCi .4
.4) It CO 0) CO CO .)) CO =1 j J J Cr. Cr. Cr.. e_fl=. .76 =. - 4.. 4. 4. 4. Cr, Cr, =. 4 IX* :3
cr, 4. 4: 4: 4: -.J- - t_n io '_n CO 0) C0 .4) o o ro
.
S.
o 4: 4. 41 : 0: tO 0) 0: o) ro n) n) ro n) n) ro ro ro ro'n) n)cr. .21 ro c. co -A -A o o n) -..j -A 7A -J -..j N. .--. -A -J.._n ro c.
ro 4. 0 4. o Ln o n) Ln 4: o o o)o 0 5t
- - - -. . . . -. ...C. oi 0 0 cr, ,21 Ln 4. 4. 0: 4. e_fi
ro *- r0 r0 r0 r 0 - =. CT: s_n -U 510 10 r0 =. Cr: CC. =r r. 0 ro.J1 .21 - - 4. .- I) Ct. r.) .4) ro ro ro 0., ro 4. .21 '
Ln *L. q) - - rV r: 0 0) cr. 41, cr, 41 -A 0) -A 4. P- -A 0) CQ P: .J1 -1 4. -
. . . . -A -A -.1 CT: CT, o !_ri -U 4: 4. 4. 4: t_.1 r,.,
r.7 iv. 1:0 .X CO .=. .7. .7.. CU. .7. %T.. 57. 1) 1) 1,1 r,_, Is., - -
t 41 CT". 1.1 - - LI '4:1 ..1.1 -A -A --.1 4. ' ry 1) fl (1e_r1 r ro ._n CT, CT: Cr, 1) - 0) CO LIT Is
II.- cr. co '_n rV
cr. -A 0) q)re CT. -
- .1)
o) ro nJ ro ro ro n) ro nj 4. .- c7.. -A Cr. 41 41 ro ro ry -A 4. Clr,co ro o) co .4) cr. Clr. '
,.1:1 Cs) 4. co nj -A 0) Cr'. .21 Ln n) o -ACT. c .11 n) .-. ..Ln o o) -A CY. Li' 41 4. CT. 4. CO c. CT. c. .1) o o -J 4.4 Cr. 0 u, -A .10 .4) 4. -1 ro cr. o) ro 4. o .7% L.) ro T. -A
rn
rce
cnis -A -A -A -A CT. CT. 1, ii iT cr, 00000 4: 4: 4 4 4. 4. 4. 0 0 n, n, n,
cc, ro .172 ...I) t . ro _n 01 s_n c., Lri ,j1 ..L't 4. 4. ljnJ ry ro o) co 0) co OD n) -A -A -I -A -A *...J o) ro o ro nJo o .r, 4. ro -A -A -A -A -A -A -01.1 -A -A o nj
1,m1"1
.- 4. tt,t oj nj ro ro n) ro N.. r-T. "' '.1.:. CO sic, 0) 0) 0) 0) .-.7., Cr* o o o .21 .4. W f.o'ro ..
=1...1% .7, .=-.. -1 o o .- .- -- o) %.) oi ..o (.o V., Cf. 'is .:::. C. C. C. 2. C. f.:%, Cr. Cf. CT'. 4. 4. 0 0 0 4.CC, 4. ....C. Or. 0 ..o co o) 0)-0) Ln .t. 4.:. 4. 4.:. ro 4. 4. -A -.1 -J -A -4 -A CC, 0: .-. C. ,..., 0 ro .- .- .r.. ..0 .... .. .r. -4 oj r.o -A --,j'-A .i, C7. Cr. Cr'. Cr. Cr, ..- .7, ro ro ro ro ro ro .- .1-- o) 4.;., I.:1) 1:0 ,- P--' P-- P-t -4
, . r , i....). .. ., .
t. Or to woo
SIABLE 10
00.1ER TEST USING CHAZALIS METHOD
SYMBOL DEFINITION:
=
: =
=+ =
MEANGPANDLOUEPUPPER
OF .17rimpLt DATAMEAN OF DATAOUTLIEP LIMITOUTLIER LIMIT
-.36
k.EXCLUDING
-.1E 0.04 0.24 0.44. 0.64...+...,+
S 0.0+-A - +M - - *: +P - - :
-4:
L 4 : +E 5.0+ - : +
_: +
I - : +D - -
: +_
: +10.0+ - .
: +- : +- : +- _ +- _ +
15.0+ - -* +_
: +_ _ +_ _
: +- +
20 . o+ . _ - : +_ +_.
- ..- +.- _ +- _
: +25.0+ - +
- - +_
1).+
_ +
t- _ +
30.0+ : +_._ _ :* +
.
_v.
:
:
++
- - : +
INCFEMENTINCFEMENT
+ . + + + p + .4- a + I 4..36 .16. 0.04 ' 0 .24 .0.44
OF HOPIZONTAL :[ALE:OF VEPTICHL :CHLE:
2.00E-02.1.00
HO. OF POINT: PLOT1ED OH VEPTIf_AL :CALE: 34
26
.+ . . . . +0%64
=M en1,-.1 rin 73
-4 -4 1C,1C g- =- momm_
-- m0 m
0 0 1C1En ti P-4 2
.7. ..--. ..--, lic) alc) a a. c)ic) c) c)11 1 1 1 liallillic) c)lic) CD -,
r rn rn rn OOOOOOOOOOOOOOO OO OO rnc ._., _... .-- .-- a. .r... a. i o .-- ro ,-- b- CI b- b- 0- .7... 0- C. C..' ..- C- C. ..7.. Co C. Cot C. b- 07.7.. ..7... CP C.' ..- .7....' C. il
1 = 1 C . t 1 I . ! 0 b..' C O 4 . c . 1 7 ` . c , 4 . 1 4 c o 4 i ) 4 0- r C I '.J O r e ) 1 1 7 * C r . 0- ..V. ro CO 4 4 roi CA '4) CO ro r .-- Zm r r r ON cDILI1 ro eN ND a -J L0 -J 4) Pa L0,-J ON OD 0 -4 -4 b-- 4. no ND C5 ro ro 0 ON Ui 4: CO - 0 C5ti e= IC I
ICI U IGI
= 3 3 -3,
eri en cn....
-71IP 0 0 0 1
I =i t L_ 8--
0 1 1 1r r r---... 1141 IIO 1141 1111.1.111IIIIIIVIIIIIIIIIIIIIIIIIII.....0 rn M rrl . diem. 0. 1313 1r3 ro ro N no ro ro ro rea no ro r0 no r0 r0 no no no no no no r0 no ro no no r0 Pa r0 r0 no Pa ro no ro r-
0 0 0 Ca OCa Ca Ca ) Ca C) CO CO 0 Ca Ca CO Ca (a Ca 6) Ca CO 6) C Ca Ca CO CO Ca Ca CO Oa Ca C Ca C -4, _ . - e - J -4-J A -A A -J -4 - 4 -J -4 -4 -A -4 -4 -4 -A -A -A -.1 -4 -A -4 -A -4 -A -4 -A -4 -4 -A -J -A -J t-
o-.
0 II II rin
13
,
r-"-aZ
... .... .... ..... .... .... ....P .... q--. .. a...A 1 Co ...r ao .... ,.. a, M C. C. Cr C. Co M ao C. C. .=, I. 0ru ro ro rU ro ru no no ro Pa r0 ro no Pa no no no no ro no no no r0 PO no r0 PO ro no ria Pa no no PiCT: O.. es. (r. (r. O: ON LT: Os. 17. 0., sT. 87. ON cr: O: (r. ON O: 0. ON o. os. cr. IT. CIN ON O: a: o ON Cr: ON a:ru ro no ru ro ro no r0 r PO PO no Pa r0 r0 PO rea PO riO no Pa no Pa Pa no Pa PO PO Pa no ro ro ro Pa
BEST COPY AVAILABLE
TABLE 12
LEAST SQUARES CURVE FIT WITH OUTLIER REMOVED
CUPVE TYPE. OFDETEPHIMATIOH
A !TAGN1FICANCE90% 5% 9 %
1 YL:A+,1*.:, .9249 -3.081E-J: :.I LE 4
Y=A*::t1:.. .9418 2.,?44E-2 1.08t.
4 Y=.--A+,1-: 1.3755 yr.i..:A4TC-:, .71T1 3.5 -7.057T-26 36.71
NUMUR OF OF.:=EFNATIONC.: 33
...,... 4.
TABLE 13
nthC°11401/1Aiii.
EQUATION TYPE 3
Yg.A*(XtB)
LEAST SQUARES CURVE FIT WITH OUTLIER REMOVED
COEFFICIENT VALUE STANDARDDEVIATION
95 %CONFIDENCE LIMITS
I A: 2.244E-2 1.166 1.642E--2 3.067E-2
E: 1.086 4..847E-2 .9872 1.105
29
BEST COPY AVAILABLE
TABLE 14 /
EQUATION TYPE 1
Y=A+(3*X)
LEAST SQUARES CURVE FIT WITH OUTLIER REMOVED
COEFFICIENT VALUE :'.TANDAPD . 95CONFIDENCELIMIT
A: -3.01E-2 4.442E-2 -.1214 5.978E-2
E: 3.132E-2 1.603E-3 205E-2 3.459E-2
UTIMATED POPULATILN 'OTANDAPD DEVIATIONABOUT PEGFESSION LINE = .1041
c.
30
BEST COPY AVAILABLE
TABLE 15
COMPARISON OF INDICES OF DETERMINATION
WITH OUTLIER REMOVED
95 CONFIDENCE 9 CONFIDENCE
CURVE NO. . CURVE NO.1 2 .:. 4 5 l'-'. 1 2 3 4 5 6
3 4 ' 3
6 6
.1 1
,... 25 5 ....
,..,
4 '4
gTERISKS INDICATE PAIRS OF CUP.VES ARE THE 'SAME'.(NOT SHOWN TO HAVE SIGNIFICANTLY DIFFERENT INDICES OF'DETERMINATION AT THE STATED CONFIDENCE LEVELS).
31
c_n
i. 4-
4-
L.:.
C.:.
0fy
ft)
nj 1
".ry
-1
-.1
'4 -
1 -.
1-1
-1
rL.fl
4- L
t1Ln o LI o
0 4-
oo 0) o
ry ii
Cf:
.21
cr.
o Pi
rr
Pi
Ln
P-s.
..,-j
- T
.
-.7
. .7.
t_n
4- 4
-f_
nry
n.1
.-.1
7.=
g17
.I')
C.:,
cr.
4- -
I).c
..7
.I"
... .c
.-.
1 L.
)4-
.1.,
*7. C
.)CC.
C`.
4-
CT
. 4-
-.4
-1 4
--
CO
CC
,I"
,01
-1
4- -
-
.i. co 0) co ) o) cr. cr. o ...n o Lno n 01 ._n
4. 4
. 4. 4
. 0 n
j r.
nj-:
-A1'
0 -
.... C
..7
...7:
. CO
.....
......
'IQ...
: n:
-4 4- .i. 'ID 'ID
..L.) *1) .1) -- .--._ri o o o cr. co .)-.. -.
1
C.:.
1)I,.
.,-
J 4-
Cr.
.7. C
r. -
J.7
.....z
. c. c
f....
,i..
..., c
. ..4
,...t1
Lr.
Ls
Ln o
- 1
-.4
c.'4
7.,
*17.
,.c
. n.,
I",.,
c'7
..-
- ,
ry 4
. 4. v
t 0 0
7 .-
- ,..
n n
Ln o
... 1
) 4-
CT
. Cr.
.7.
Cr.
CY
. CT
. CO
'....
.*i
..-1
:U 4
- .-
-I r '7
1
II
II
I1
11
11
1111
III
I1
R.
-.j
r,.,
c_n
CC
, cr.
n:.1
) n:
0)Cr.
ocr.
..- .
..
pjLn
Ln
..
yj.
C.D
f_n cr. o)
co
-4 cn cr. o
h-1
r*:.,
CC
'C
O 4
- C
Y.
Ln17
' 4. P
.,f.T
. -1
P.,
-.1
C. -
4C
.1C
O R
.14 T
I4.
co
CO
C.:.
.7. C
C.
- 1
-..n
'7. -
1C
C,
-1 I.
.? 4
-11
P"
.1.)
-%1
CY
. Cr.
CT
. .7.
'7. 4
.f2
:1ro
4- -
1 -
1 -1
4. 4
.-
4-,1
)1)
4.)
Ln 01 01
n: 0
) co
0)
0) c
r.. -
-L.
: L.:
0ry
.1)
0) ,)
)
m P.1
6-4 0 r
CO
-.-
1--
1 C
Y. '
7'. C
Y.
.7..
1,1.
01 i_
n 4-
14
,1".
.'C
r'.-
11.r
.i.'
..,...
.)r.
..) n
: n: r
u -
.--
- -
.-.
.7.f:
t...
n.
0 i.0
.*.r
.er
. ...n
o 0
) ..z
. .r,
L.:
3er
. .--
n..
4- *
f.,C
r. C
O C
.:, C
C, .
c. 4
- 4-
4-
4- ..
.).1
7.C
r.4- l CO C0 -.1
-.1
.:.:.
....)
i....1
0)
:-..CO ,..0
%..:, 4- ,...,
..-..,
,.0 f..: 4-
.7. C
T. C
Y.
..7.
*i..
- 1
....
.r.,
*:,
'4.:,
'4.:.
.1..
*i.,
4- 4
- i..
n: n
: f.:
:I *-
-1 n
j-4 c...'
..
rn
1) CO CO CO CO
4. 4. 41 C.! C.: C.) CO 1.0 ry ry n2 n2 n2 n2 N N n2 n2 ro n2 n2 -4-1 -4 0 0 ro c.
R.ti s 121 Ul 1%1 t-.11 0 0 (.2 .11 .4) :7
0 ...A Cr. -A W co n2 C
-CO .4:.' Cr. CT. III i-r! fr: Cr. 1:11 ill !A 4. Crl C..' rr r-Q
1%. C.2 r,.. r.., r.) r,., g 4* Cr. rt ..:1 r.2 =...... ," rf: L- - - =. Cr. CO Cr. r.2 ry -A 4. CI CO- Cr. -.1 f-f! 1. - cr. n., ro ..7 4. Cr. CO -A 4. - 1 CO CO
:1 Cr. Cr. 01 Cl (21 f_fl 4. 4. 4. 4. r..,
-1 4. v., - ," =' C' r.) 11- - N CO CO 0 CO CO OD n., ron2 TJj7jt co f.2
cr. -4 f.) .1) f. - 1:. .1) ...C. 0 0 -1 .» cr. cr, cr, co r-
I !- N N - - P.:, - - - - Cr - 4*. f"... 1'0 - cq4. 4. Cr. 4* 4. ,C1 CT oz.
- r . . . . . . .1... .11 Cr. I:, ," Cr, CO Cr' IIn., 4. CO CO LT L-J C.., 0-. I-0 ,..f7 4. ry cc,
cn CT. 1
J
.4) .4) 02 co co co co -1 cr. Cr, 17 Cr. .7'. 0 4* 4* 4- 4- 4. 0 CO C.:, CO CONNN-O .! C. .1:0 'L. -.10.... ry n2 n2 ro ry -J 4. 4. 4. 4. 4. CO CO 6.) ro n2 CF..
-.1 CO 4. ro ro c. -4 -A - 4. CO '0 4. Cr. Cr. 4.n2 CO 0) Cr. CP. .7. 4+ CIN o) 0) o) 0) o) co .- co 0) co cr. cr, "r1
rn
t7o
N ro) co 1 --.1 1 -.1 -.1 -.1 CT, Cr ,0 CY! f_r! 0.1 4. O.:. ;_:.. I-0 1.4
-a-..
..1 !Y. ...1, N ,-r: - '1:. -A -.1 -.1 -A O'. 1... D .C. C. C. C. C. C. 0..! s 1..., ,". -A -.1 ,1:* ro rQ 4* r...,
kr, 01 CO 4. - .... 0 0 0 0 ...n -A n2 n2 ry ry n2 n2 n2 ro n2 . co oo o) .- cv ,...v co -.-.4
a
BESTSO?! AVM ABLE
The English Equation
. Our straight line equation, rounded to three decimal places,from Table 14 becomes
0 41*
H(text)E = .031X - .031 (2).
. Where: H(text)E= subjective textual information per text--English
. And: 'X = percentage of incorrect responses
It is interesting to note, however, that the -.031 intercept could wellbe zero for the population since its 95% confidenCe band spans thatvalue (see Table 14)4) This zero value is to be expected since thetheoretical information content should be zero for zero incorrectresponses. In other words, if a person does not make a response errorhe is predicting the teXt with certainty and, therefore, the text con-tains zero information for him. The English equation closely approxi-mates this. Also the slope could (at the 95Vlevel of confidence) spanfrom 0.028 to .0345.
The German Equation
The equation derived by Weltner for the German language ins:
H(text)G = .039X - .080 (3)
Where: H(text)G= subjective textual information per text--German
'And: X = percentage of incorrect responses
We have no measure of the confidence bands on the German data,but the agreement between the equations is quite remarkable. See thegraphs in Figure 2 (page 21..
One additional point in regard to the residuals is the extremeruns observed whiCh might suggest a somewhat better fit with higherdegree polynomials. This has not been pursued in this study.
34
It is ,nteresting to note that the equation can be put into a
more useful form. The derivation is as follows:
Since:
which can be represented by E/N 100
Where E = number of wrongly guessed signs
And N text length
Then
H(text)E = .031X - .031
X = percentage of incorrect responses
M H(text) = .031[E/N 100] - .031
H(text) = 3.1E/N 7 .031
NH(text) = 3.1E .031N
Setting NH(text) = I, where I = information in bits we have
1
I = 3.1E - .031N
(2)
(4)
With the equation in this form we can now directly evaluate the Infor-mation content of a text of length N for a specific subject. In view
of our earlier evaluation of the confidence on the .03,1 coefficientwhich may well be zero, it may be concluded that our equation is inde-pendent of the text length N such that equation (4) at the 95% confi-dence level becomes
35
(5)
Conclusion
The application of Shannon's guessing procedure permits us tomeasure the sub'ective information of a given text for a specitic
--learner. Un ike senseless texts, the subjective information of a.meaningful text can vary from learner to learner. The derivation ofequation (2), equation (4) and equation (5) in this study now allowsa direct evaluation of the "meaningfulness" of a specific English textfora specific learner.
Educational theory tells.us that the interactive events betweena.learner and his environment have a direct influence upon.his learning.SinCe these interactive events can now be specified in terms of infor-mation theory, the specification of information-theory-basedcriteriafor the selection and completion of education goals is poiSible.
These equations now permit us to measure information in terms of.a value that is dependent upon the internal state of the learner. Wehave always known that the internal state of the learner is directlyrelated to his ability to process information in a meaningful manner.(learning). We have also known that subject matter has an inherentdifficulty factor. The degree of this difficulty factor plus theinternal state of the learner are direct variables that influence the.effectiveness of an instructional system. The equations derived, inthis study, now provide a quantitative measure of these factors. This,in turn, permits us to specify attainable goals for both the instruc-tional system and the learner. In practice, this can provide us withthe means to specify precisely the instructions needed by the learners.
To the educator this means that he cannot only derive a quanti-tative measure of information contained in an instruction sequence foreach learner, but he can use this measure to evaluate the effectivenessof his instruction.
36
REFERENCES
Ash, R. Information theory. New York: Interscience, 1965.
Chazal, M.' How to discard invalid measurements. Chemical Engineering,February, 1967. Pp. 182-184.
Edwards, E.
Frank, H.
Information transmission. London: Chapman' s Hall, 1964.
Kybernetische grundlagen der padagogik. Baden-Baden: AgisVerlag, 1962.
Fuchs, W. Cybernetics for the modern mind. New York: Macmillan, 197J.
Guilbaud, G. What is cybernetics? London:. William Heineman, 1959.
Khinchin, A. Ir Mathematical foundations of information theory. NewYork: Dover, 1957.
'Minsky, M. 'Semantic information processing.1968.
Cambridge: MIT Press,
A.
Singh, J. Great ideas in information theory, language and cybernetics.New York: Dover, 19't.
Shannon, C. The mathematical theory of communications.Technical Journal, 1948, 27, 279-425/623=656.
Shannon, C. Prediction and entropy of printed English.Technical Journal, January, 1951.
Shannon, C., & Weaver, W. The mathematical theory of communication.Chicago: University of Illinois Press, 1969.
Weltner, K. Zum ratetest nach Shannon. In H. Schmidt (Ed.),Grundlagenstudien aus kybernetik und geisteswissenschaft.Band 6. Quickborn Bei Hamburg: Schnelle Verlag, 1965.
Weitner, K. Der Shannonsche rateteSt in der praxis der programmierteninstruktion. Lehrmaschinen in kybernetischer urid padagogischersicht 4. Stuttgart: Ernst Klett Verlag, 1966. Pp. 40-$4.
ti
Weitner, K. Zur bestimmung der subjektiven information durch rate-tests. Praxis und perspektiven des programmierten unterricts.Band II. Bevensee, Quickborn, Germany: Verlag Schnelle, 1967.Pp. 69-74.
Bell System
Bell System
37
Weltner, K. Informationstheorie and erziehungswissenschaft. Quickborn:Verlag Schnelle, 1970.
38
at