of lent - eric · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on...

54
R84209 ERIC REPORT RESUME ED 010 202 1..ri.310...67 24 (REV) THE EFFECTS ON ACHIEVEMENT TEST RESULTS OF VARYING CONDITIONS OF EXPERIMENTAL ATMOSPHEREv.NOTICE OF TEST* TEST ADMINISTRATION* AND TEST SCORING. GOODWIN* WILLIAM L. * AND OTHERS XY081702 UNIV. OF WIS.* MADISON CAMP. RES. AND DEV. CTR. CRP2850.m.TR-..2 TRei°2 1-65 DEC-5.40-454 EDRS PRICE MF-40.09 HC-42.24 . 56P. *ACHIEVEMENT* *ENVIRONMENTAL `INFLUENCES* ELEMENTARY SCHOOL STUDENTS* *TESTING* TESTING PROGRAMS* *SCORING, ARITHMETIC, TEST RESULTS* *RESEARCH AND DEVELOPMENT CENTER* MADISON* WISCONSIN NULL HYPOTHESES WERE TESTED TO DETERMINE THE DIFFERENTIAL EFFECTS OF (1) EXPERIMENTAL ATMOSPHERE AND ABSENCE OF SAME* (-21 NOTICE OF' TEST (10 SCHOOL DAYS) AND NO NOTICE t SCHOOL DAM' TEACHER ADMINISTRATION AND OUTSIDE ADMINISTRATION OF TESTS, MD (4) TEACHER SCORING AND :`OUTSIDE SCORING OF TESTS. SIXT*-GRADE CLASSES (N*64), EACH FROM A 'DIFFERENT SCHOOL:LIN A' LARGE MIDWESTERN CITY'WERE.RANKED AND GROUPED *INTO FOUR STRATA ON THE BASIS' ARITHMETIC 'ACHIEVEMENT. WITHIN EACH STRATA* CLASSES WERE RANDOMLY :'ASSIGNE1)Ta 1 OF lA EXPERIMENTAL TREATMENTS 'GENERATED' ..BY A FACTORIAL' I DESIGN, USING. 4 INDEPENDENT VARIABLES. -.WRITTEN 'INSTRUCTIONS-WEREI.USED. THE RESULT/NS CLASS FOR EACH OF 3 , SUBTESTS IN'THE-.Ei(AMINATION WERE SUBJECTED TO A '4X2 ANALYSIS OF VARIANCE. THE- ERROR -TERM COMPOSED BY POOLING SELECTED HIGHER -ORDER INTERACTIONS. THE FOLLOWING CONCLUSIONS REACHED111 ADVANCE NOTICE TEST 'DATE HAS 'A, SIGNIFICANT 'EFFECT-:UPON,PUPIL TEST' PERFORMANCE WHEN 'TESTS -INCLUDE HOVEL :.CONCEPTS 'WHICH 'ARE':' EASILY 'TAUGHT4 -ItIrTEACHEVADMINISTRATION OF 'STANDARD' ZED-; TESTS HAS A SIGNIFICANT :-!.E.PFECTION-,:POPIL TEST 1ERFORMANCE 'AS'CompARED WITH TEST '..AootivsTRAtiott -Oursipe 11)ER SONNELi '(.3) .'E;PERIMENTAL',#)40SPHERE*Icaf11/041001(1THI' NOTICE STING * 'RESULTS PERFORMANCE' WHEN TESTS INCLUDE" NOVEL Tcoticepers: WHICW'Ait tito .NO" Nonce OF lent i COMBINED wil*WOUTSIDE-'s coAnsi 1*-sutTs tor ',...SIGNIFICANTLY,, LONER7..PUFIL TEST potFOR$ANCEi 'AND!'''.(5r:010$11 SORES :;Pitoputopv HIGHER .GRADE PLACEMENTS ''THAWTEACHER 'SCORERS :I. IN ,H1.0WSCRIIEV CLASSES `RATHER THAN 1 N':LOW4ACHIEVING" CLASSES"j.AND" VICE: vossA4, 4101

Upload: others

Post on 13-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

R84209 ERIC REPORT RESUME

ED 010 202 1..ri.310...67 24 (REV)THE EFFECTS ON ACHIEVEMENT TEST RESULTS OF VARYING CONDITIONS OFEXPERIMENTAL ATMOSPHEREv.NOTICE OF TEST* TEST ADMINISTRATION* ANDTEST SCORING.GOODWIN* WILLIAM L. * AND OTHERSXY081702 UNIV. OF WIS.* MADISON CAMP. RES. AND DEV. CTR.CRP2850.m.TR-..2

TRei°21-65 DEC-5.40-454

EDRS PRICE MF-40.09 HC-42.24 . 56P.

*ACHIEVEMENT* *ENVIRONMENTAL `INFLUENCES* ELEMENTARY SCHOOL STUDENTS**TESTING* TESTING PROGRAMS* *SCORING, ARITHMETIC, TEST RESULTS**RESEARCH AND DEVELOPMENT CENTER* MADISON* WISCONSIN

NULL HYPOTHESES WERE TESTED TO DETERMINE THE DIFFERENTIAL EFFECTS OF(1) EXPERIMENTAL ATMOSPHERE AND ABSENCE OF SAME* (-21 NOTICE OF' TEST(10 SCHOOL DAYS) AND NO NOTICE t SCHOOL DAM' TEACHERADMINISTRATION AND OUTSIDE ADMINISTRATION OF TESTS, MD (4) TEACHERSCORING AND :`OUTSIDE SCORING OF TESTS. SIXT*-GRADE CLASSES (N*64),EACH FROM A 'DIFFERENT SCHOOL:LIN A' LARGE MIDWESTERN CITY'WERE.RANKEDAND GROUPED *INTO FOUR STRATA ON THE BASIS' ARITHMETIC 'ACHIEVEMENT.WITHIN EACH STRATA* CLASSES WERE RANDOMLY :'ASSIGNE1)Ta 1 OF lAEXPERIMENTAL TREATMENTS 'GENERATED' ..BY A FACTORIAL' I DESIGN, USING. 4INDEPENDENT VARIABLES. -.WRITTEN 'INSTRUCTIONS-WEREI.USED. THE RESULT/NSCLASS FOR EACH OF 3 , SUBTESTS IN'THE-.Ei(AMINATION WERE SUBJECTEDTO A '4X2 ANALYSIS OF VARIANCE. THE- ERROR -TERM COMPOSED BYPOOLING SELECTED HIGHER -ORDER INTERACTIONS. THE FOLLOWINGCONCLUSIONS REACHED111 ADVANCE NOTICE TEST 'DATE HAS 'A,SIGNIFICANT 'EFFECT-:UPON,PUPIL TEST' PERFORMANCE WHEN 'TESTS -INCLUDEHOVEL :.CONCEPTS 'WHICH 'ARE':' EASILY 'TAUGHT4 -ItIrTEACHEVADMINISTRATIONOF 'STANDARD' ZED-; TESTS HAS A SIGNIFICANT :-!.E.PFECTION-,:POPIL TEST1ERFORMANCE 'AS'CompARED WITH TEST '..AootivsTRAtiott -Oursipe11)ER SONNELi '(.3) .'E;PERIMENTAL',#)40SPHERE*Icaf11/041001(1THI' NOTICE

STING * 'RESULTS PERFORMANCE' WHENTESTS INCLUDE" NOVEL Tcoticepers: WHICW'Ait tito .NO" NonceOF lent i COMBINED wil*WOUTSIDE-'s coAnsi 1*-sutTs tor ',...SIGNIFICANTLY,,LONER7..PUFIL TEST potFOR$ANCEi 'AND!'''.(5r:010$11 SORES :;PitoputopvHIGHER .GRADE PLACEMENTS ''THAWTEACHER 'SCORERS :I. IN ,H1.0WSCRIIEVCLASSES `RATHER THAN 1 N':LOW4ACHIEVING" CLASSES"j.AND" VICE: vossA4, 4101

Page 2: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

U. S. DEPARTMENT OF HEALTH, EDUCATION AND WELFAREOffice of Education

This downier* has been reproduced exactly as received Rom theperson or organization originating it. Points of view or oPinionsslated do riot necessarily represent official Office of E.duOatiOrtPosition or policy,

Technical Report No. 2

THE EFFECTS ON ACHIEVEMENT TEST RESULTS

OF VARYING CONDITIONS OF

EXPERIMENTAL ATMOSPHERE, NOTICE OF TEST,

TEST ADMINISTRATION, AND TEST SCORING

By William L. Goodwin

under the direction of

Julian C. Stanley

SPOIDAResearch and Development Center

for Learning and Re-Education

University of Wisconsin

Madison, Wisconsin

1965

The research and development reported herein was perforrated pUrsuant to acontract with the United States Office of Education, Department Of Health,

Education, and Welfare under the provisions of the Cooperative Research Program.

Page 3: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

POLICY BOARD OF THE CENTER

Max R. Goodson, Professor of Educational Policy StudiesCo-Director, Administration

Herbert I. Klausmeter, Professor of Educational PsychologyCo-Director, Research

Lee S. Dreyfus, Professor of Speech and Radio-TV EducationCoordinator of Television Activities

John Guy Fowlkes, Professor of Educational AdministrationAdvisor on Local School Relationships--

Chester W. Harris, Prof n al 'chologyA ear

Burton W. Kraitlow, Professor of Agricultural and Extension EducationCoordinator of Adult Re-Education Activities

Julian C. Stanley, Professor of Educational Psychology(on leave September 1, 1965 August 31, 1966)

Lindley Stiles, Dean of the School of EducationAdvisor on Policy

Henry Van Engen, Professor of Mathematics and Curriculum & InstructionAssociate Director, Development

ACKNOWLEDGMENTS

The writer wishes to express his approciation. to Drs. Julian C. Stanley,Herbert J. Klausrnaler, Philip Lambert, Tames Mo Upham, and Richard A. ROMP*

miller for their assistance and*counsel on this undertaking. Drs. Stanley,, Klaus-meier, and Lambert were eipecially generous with their time in reading andcriticizing earlier drafts of the anuscript

Page 4: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

PREFACE

Most of the research of the Center is conducted in the schools. Publishedand locally-produced tests are used in collecting data. How should data col-lection be handled? Does it make any differerrce whether teachers (a) know theyare participating in an experiment, (b) receive advance notice of the test date,(c) score the test Are the results the same whether teachers or "outsiders" ad-minister the test ? These are questions of high significance to educational re-searchers. Through an excellent design developed by W millia 90Odwin andunstinting cooPeration by a large midwest school system, answers to the ques-tions were sought and obtained. The researcher will wish to examine the designand procedures as carefully as the results.

Herbert )1. KlausmeierCo-Director for Research

Page 5: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

List of TablesList of FiguresAbstract . . .

PRECEDING PAGE BLANK NOT FILMED

TABLE CF CONTENTS

I. Introduction to the Problem

II, Review of the Literature . ... . 8 6 3Experimental Atmosphere . . a 3Notice of Testing

ID 5Administration of the Test 7Scoring of the Test 8Relationship Between Literature Reviewed and This Experiment . 8 8

III. Method . . . . . ... . . . . ..... . . .. . . . . 0 . 10Experimental Setting and Subjects 10Sampling Procedures . . . . . . .. 12Exjperimental-Design, 12

Independent Variables . . 12Dependent Variables . . . . . . 13Analysis of Data.. . 15

Procedures. . . . 15Time Schedule . . . eo 15

O_ utside Test A d m i n i t tr a t o r s 16Conduct of Testing and Scoring in the Experiment . . . . 16

IV, Results , . . . . . .Determivation of Final Error TermAnalyses of Variance

Arithmetic CoMPutations . .Arithmetic Concepts .Arithmetic Applications.Average Arithmetic Score

V.

' -I -

Discussion and COnclusionsExperimental 'A mosphereNotice of TestingAdministration of TeatScoring of the TestPrevious Ari etic Achievement

It

O. i - ... _. . O., 7.:.:...

..:, .. o ,, io' *. i ..27.4, ,. ..

.- *.. .. 6 .... 4 . 1 e '2.8'.'..

-.--,. '- 0 ....._ :_..O., ,, 4 t-.-0 .. --. 2.0:t7"-

Page 6: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

Page

Significant Interactions and Interactions of Interest .';93General Observations on the Experimint.

Conclusions . . . . 34

Appendbc A: Experimental Treatments: InstructiOns to Teachers . . 36

Bibliography . . ow ti 43

r.

1

- *ISy if 44:76

4

AL)

-1 t

Page 7: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

LIST OF TABLES

Table Page

1 Errors Made in Scoring Standardized Tests 82 Average Total Arithmetic Grade Placements on Iowa Test of

Basic Ski .11,1 and Non-Verbal IQs on Lorae-Thorndike IntelliattnceZs= by Experimental Unit, Treatment, and Stratum; TestingConducted in Octobers 1964 14

3 Average Total Aritlunotic Grade Placements on Iowa Test ofBasic Skills and Non-Verbal IQs on Lor e-Thorndice In elli en eJ3 by Independent Variable; Testing Conducted in October, 1964. . 14

4 Degrees of Freedom and Expectations of Mean Squares forAnalysis of Variance . 16

5 Mean Squares and F-Ratios of Selected Three-Factor Interactionson S anford Arithmetic Achievement Test 18

6 Average Computation Grade Placements on Stanford ArithmeticAchievement] u and Number of Pupils by Experimental Unit,Treatment, and Stratum; Testing Conducted in April, 1965 19

7 Mean Squares and F-Ratios for Analysis of Variance of ComputationGrade Placements on Stanford ArithmeticAchievement Test_ 20

8 Average Computation Grade Placements on Stanford ArithmeticAchievement Test by Experimental Atmosphere and Notice of Testing 20

9 Average Concepts Grade Placements on StaAchievement Test by Experimental Unit, Treatment, and Stratum;Testing Conducted in April, 1965 . . 21

10 Mean Squares and F-Ratios for Analysis of Variance of ConceptsGrade Placements on anford Arithmetic Achieve went Te st 21

11 Average Applications Grade Placements on Stanford ArithmetitAchielymoileid by Experimental Unit, Treatment, and Stratum;Testing Conducted in April, 1965 . . 6 22

vii

Page 8: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

viii

Table

12 Mean Squares and F-Ratios for Analysis of Variance ofApplications Grade Placements on Stanford Arithmetic AchievementTest

13Achieve ent Test by Experimental At 'osphere and Notice of Testing . 23Average Applications Grade Placements on Stanford Arithmetic

Page

23

14 Average Applications Grade Placements on Stanford Arithmetic,Achievement Test by Notice of Testing and Test Scorer 23

15 Average Applications Grade Placements on Stanford ArithmeticAchievement Test by Test Scorer and Previous ArithmeticAchievement (Stratum) 24

16 Average Grade Placements on Stanford Arithmetic Achievement Test,by Experimental Unit, Treatment, and Stratum; Testing Conducted inApril, 1965 24

17 Mean Squares and F-Ratios for Analysis of Variance of AverageGrade Placements on Stanford Arithmetic Achievement Test 25

18 Average Grade Placements on Stanford Arithmetic Achievement Testby Experimental Atmosphere and Notice of Tes ng 25

19 Average Computation, Concepts, Applications, and Total GradePlacements on the Stanford Arithmetic Achievement Test by Independ-ent Variable; Testing Conducted in April, 1965 . . OOOO 26

20 Percent Error Rates of Teacher-Scorers by Independent Variableand Previous Arithmetic Achievement (Stratum) 26

21 Average Grade Placements on Stanford Arithmetic Achievement Testby Experimental Atmosphere Test Notice Treat ent Combinationand Bub-lest.. .

22 Average Grade Flacements on Stanford ritluevemen Test- 4

by Test Notice 7. Test Scorer Treatment Combination and. Sub-test . 30

23 Average Grade Placement's on Stanford Ar metic Achievement Testby,-Test Scorer previou s Arithmetic Achievement (Stratum)Combination, and Sub-test . . o

24 Average,Grade Plecenyents on Stanford 1 en Tealby Test Aglinints#ator -7-f Test Scorer ,!rim,Previous ArithmeticAchievement (ttratum) Combination and Sub7test

30

31

32

Page 9: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

UST OF FIGURES

Figure Page

1 Average composite grade placements on sixth- grade, Iowa Testof Basic Skills for 112 elementary schools in a large midwesterncity; Testing co ducted in October, 1964 10

2 Average total arithmetic grade placements on sixth-grade Iowa. Testof Basic Skills for 112 elementary schools in a large midwesterncity; Testing conducted in Octobers 1964 . 10

3 Average non-verbal IQ on Lorge-Thorndike Intelli ence Test for 112elementary schools in a large midwestern city; Testing conductedin October, 1964 (sixth grade)

4 Average total arithmetic grade placements on sixth-grade Iowa Test,of Basic Skilfor 87 classes in a large midwestern city; Testingconducted in October, 1964................ 11

5 Average non-verbal IQ on 14 e- eTh nce Test for 87sixth-grade classes in a large midwestern city; Testing cond ctedin Octobers 1964 12

6 Graph of test scorer by previous arithmetic achievement interaction32on applications sub-test

7 Graph of test administrator by test scorer by previous arithme crd

achievement interaction on applications sub-test 33

Page 10: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

PRECEDING PAGE BLANK- NOT FILMED

ABSTRACT

In classroom experimentation, the instruments used for data collection areoften commercial or specially-designed tests. The researcher is faced with se-lecting that means of administering the tests to maximize the validity and gen-eralizability of any conclusions reached. However, the complex interaction ofmany variables in the classroom often eludes the experimenter, and the result-ing uncontrolled variation frequently causes a finding of uno significant differ-ence°' or of spurious significance.

Four null hypotheses were tested to determine the differential effects of:1. Experime tal atmosphere and absence of same;2. Notice of test (10 school days) and no notice (one school day);3. Teacher administration and outside administration of the test; and4. Teacher scoring and outside scoring of the test.

The experimental unit was the classroom. Sixty-four sixth-grade classes,each from a different school in a large midwestern city, were ranked and groupedinto four strata on the basis of previous arithmetic achievement. Within eachstrata, classes were randomly assigned to one of the 16 experimental treatmentsgenerated by a 24 factorial design using the four independe t variables as listedabove and in connection with a recent arithmetic achievement test as a responsemeasure.

Experimental atmosphere was created using written instructions, and testnotice was given by mail. The outside test administrators and scorers weregraduate and undergraduate students. Resulting class means of the 64 classesfor each of the three s 'b -tests in the exam were subjected to a 4 X 24 analysisof variance. The error term was composed by pooling selected hig er-orderinteractions.

Tests of the main effects revealed significantly higher class means on oneof the three sub-testa for those classes receiving 10 school deys° notice of theupcoming test and significantly higher class means on all three sub-tests forthose classes whose regular teacher administered the test. Several two-factorinteractions were significant, most notably the combination of experimental at-mosphere and notice of testing producing higher grade placements than the com-bination of no experimental atmosphere and no notice.

The conclusions reached were:1. Advance notice of test date has a significant facilitating effect on pupil test

performance if the test includes novel concepts easily taught.2. Teacher administration of standardizedtests has a significant facilitating ef-

fect on pupil test performance as compared with administration of the testsby outelde personnel.

3. Expeetmental atmosphere combined with notice of testing results in signifi-cantly higher pupil test performance if the test includes novel concepts easilytaught.

4. No notice of testing combined with outside scoring results in significantlylower pupil test performance.

S. Outside scorers produced higher grade placements than teacher-scorers inhigh achieving classes, while teacher-scorers produced higher grade place-ments than outside scorers in low achieving classes.

xi

Page 11: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

INTRODUCTION TO THE PROBLEM

The importance of controlled experimentationin school settings is gradually gaining accept-ance among educators. In classroom experi-m e nt a tio n, the instruments used for datacollection are often commercial or specially-designed tests. The researcher is faced withselecting that means of administering the teststo maximize the validity and generalizabilityof any conclusions. reached. However, thecomplex interaction of many variables in theclassroom often eludes the experimenter, andthe resulting uncontrolled variation frequentlycauses a finding of "no significant difference"or of spurious significance.

One of the most critical variables in class-room experimentation is the teacher. In addi-tion to his primary task of administering theexperimental treatment, the teacher is oftenasked to collect the response data which willbe used to evaluate the experimental treatments.The practice of letting classroom teachers ad-minister tests in a research project is fairlywidespread because it is convenient and inex-pensive. Bringing in "outsiders" to do t h etesting is more costly and might cause teachersresentment. Some persons insist that pupilswill not perform up to capacity unless theirregular tetioher gives the test. In a researchproject, however, the objective is not to elicitthe best possible performance from each indi-vidual pupil, aided by his classroom teacher.Instead the emphasis is upon the necessity oftesting each class under identical conditions,insofar as possible.

Most researchers conducting studies withinthe classroom use the teacher to administertreatments or to assist in varying ways; in ef-fect, the teacher is cast in the role of a sub-experimenter. M any conjectures have beenmade as to the effect of sub-experimenters onexperimental results, but little systematic ob-servation or measurement of such effects hasoccurred. Of special concern is the effect that"being in an experiment" has on these sub-experimenters.

Specifically, the problem to be researchedhas four facets: What is the effect of varyingconditions of ixperimental atmosphere, noticeof testing, test administrator (teacher or "out-sider", and test scorer (teacher or "outsider")upon student performance as measured by testresults ? Expressed in the null form, the fourhypotheses to be tested are:I. There is no significant difference in test

performance between pupils whose teachersbelieve an experiment is in progress andpupils whose teachers do not so believe.

2. There is no significant difference in testperformance between pupils whose teachersreceive notice of the test date and pupilswhose teachers do not receive notice.

3. There is no significant difference in tes::performance between pupils whose regularteacher administers the test and pupils whoare tested by an "outside" administrator.

4. There is no significant difference in testperformance between pupils whose regularteachers score the test and pupils whoseteachers do not.

Also of interest will be the testing of thetwo-factor interactions and the interpretationof those which are significant. Ten two-factorinteractions are generated by the five factorsin the study: four independent variables asimplied in the hypotheses above and a singleleveling variable, previous arithmetic achieve-ment. It would be laborious and somewhat re-dundant to list the null hypotheses related tothe 10 two-factor interactions.

The importance of the problem to educationalpsychology is readily apparent. Generalizabil-ity of the problem would provide guidelines tobe followed by educational psychologists inorder to enhance the meaningfulness and effectof their classroom experimentation. The hypo-theses to be tested are of such a nature thatsome persons might infer that the honesty ofthe classroom teacher is being questioned andinvestigated. This is not the case. No deli-

Page 12: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

2

berate or conscious effort on the teacher's partto unethically aid his pupils is beir.2 suggested.Any teacher who is psychologically healthyknows and likes his students, and he sincerelyand naturally desires that they perform andachieve well. This desire of the teacher canproduce unconscious motivations and even actsthat significantly assist, the pupils in theirclassroom endeavors,' such as taking tests.

No less salient than consideration of theteacher variable is another warning for the ex-perimenter: in his zeal to avoid sources ofbias, he must carefully go about the recruitingof experimental assistants. If the experimenteris going to use "outsiders" to administer testsat the conclusion of an experiment) what as-surances has he that this Is not a bi asedgroup ? Has the experimenter seen or talkedto them individually or as a group ? Has hediscussed the experiment with any of them ?Have any of them read previous studies by theresearcher, studies from which one could ac-curately infer the variables currently beinginvestigated ?

It is seldom possible to deal with all thesignificant problems germane to a specifiedarea in a single study; certain problems are notinvestigated by this experiment. One signifi-cant question not considered is: what is the

difference in test performance between pupilswho have the test administered by researchpersonnel (outsiders) who believe that an ex-periment is in progress and pupils whose out-side teat administrators do not so believe ?The question was not brought under investiga-tion in this study because an attempt to answerit concurrently might have jeopardized the testof the first hypothesis (in that it was deemednecessary to give the outside administratorinformation regarding experimental atmospherethat in no way contradicted the informationgiven the teacher whose pupils the outsiderwas testing).

Another significant question is: what is thedifference in scoring performance betweenscorers who believe that the data was gatheredin an important experiment and scorers who donot so believe ? This question is an importantone and is obviously related to the area investi-gated by the four hypotheses above. By a pro-coos of "double scoring" the tests that werescored by outsiders, it was possible to answerthis question as it pertains to amount of timespent scoring, errors committed, and averagescores tabulated, This aspect of the investi-gation, because of its tangential nature, is notincluded in this technical report, but is avail-able in another source (Goodwin, 1965).

Page 13: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

II

REVIEW OF THE LITERATURE

The literature related to this problem Is notdefinitive . Although many educators havespoken of various aspects of the problem, thoseseeing fit to publish their beliefs are few in-deed. The literature will be considered in fourparcels (corresponding to the four independentvariables under investigation).

EXPERIMENTAL ATMOSPHERE

The literature available on this subject canbe divided into the effect of participating in anexperiment (1) on subjects and (2) on experi-menters themselves.

The motivational influence of being subjectsin an experiment has been dubbed the "Haw-thorne Effect." At Western Electric's Hawthorneplant inChirego, a series of research investi-gations was carried out in the late 1920's andearly 1930's. The attention given to the work-ers as experimental subjects was evaluated asone variable causing high production on thepart of the employees, regardless of the vary-ing work conditions established (Mayo, 1945;Roethlisberger, 1941; and Roethlisberger andDickson, 1941).

Interest in, and casual references to, theHawthorne Effect have been in evidence forseveral dec ade s. Recently the subject hasbeen taken under investigation in a U.S. Officeof Education Cooperative Research Project atOhio State University. The director of the pro-ject has written on the relationships betweenthe Hawthorne Effect and research in education(Cook, 1962). Note the definition formulated:

The Hawthorne effect is a phenomenon char-acterized by an awareness on the part of thesubjects of special treatment created byartificial experimental conditions. Thisawareness becomes confounded with the in-dependent variable under study, with a sub-sequent facilitating effect on the dependentvariable, thus leading to ambiguous results(Cook, 1962, p. 118).

3

Cook writes of how the Hawthorne Effect hasplagued and confounded educational research.

The bias of experimental subjects has beenalluded to in an article by Orne (1962). In apilot project, Orne attempted to design taskswhich subjects would refuse to do, or wouldtire of quickly; the tasks developed wer enoxious, boring, meaningless, and/or ridicu-lous. The most usual result was a heroic per-serverance on the part of the subject. Past-experiment questionnaires indicated that thesubjects ascribed meaning to their performance,visualizing it as a teat of endurance or the like.

Orne marvelec. at the willing, almost cheer-ful compliance of the experimental subject.The subject, in Orne's estimation, is concernedwith two sets of variables in an experimentalsituation. First, there are the variables es-tablished by the instructions or experimentaltask itself. The second set of variables islabeled "demand characteristics" by Orne.This concept envisions the subject as takingit upon himself to "figure out" the hypothesesbeing tested in the experiment, and he more orless actively seeks cues to achieve this end.In °mess words, ". the totality of cueswhich cor.vey an experimental hypothesis tothe subject become significant determinants ofsubjects' behavior" (Orne, 1962, p. 779).Possible cues are rumors about the research,the information conveyed when the subject isasked to participate, the person of the experi-menter, the laboratory setting, and implicitand explicit communications during the experi-ment (Samson and Minard, 1963). Obviously,the sophistication, intelligence, and experi-ence of subjects vary and these, in turn, willpartly determine the demand characteristics ofa given experimental situation. For our purposehere, suffice it to say that the subject is alertand susceptible to bias from many possiblesources; indeed, the phenomenon is a possibleand plausible explanation of why many investi-gators attempting to replicate earlier experi-ments are unable to do so.

Page 14: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

4

This entire area of the effect on subjects ofpacticiprtting J experiments has been treatedsystematically in relation to experimental de-signs in education. Under the heading of reac -tivity, Campbell and Stanley (1963) discuss theeffects of certain experimental arrangements"which would preclude generalization aboutthe effect of the experimental variable uponpersons being exposed to it in nonexperimentalsettings." These authors feel that the properor natural arrangement of experimental condi-tions will result in the subject° s being unawarethat an experiment is in progress.

The effect of the experiment upon the ex-perimenter himself, rather than upon the sub-ject, is considered extensively in a recentjournal article (Kintz, et al.,, 1965) and in thework of Rosenthal. ne addition to a review ofthe major articles in the area (1963), Rosenthalhas attacked the pro blem systematically onmany fronts in his own research. In a numberof interesting discussions and experiments,Rosenthal and his associates have shown thatexperimenter bias must be considered a vari-able of importance, even in experiments in-volving the performance of albino rats andplanarita (Rosenthal and Fode, 1961; Rosenthaland Hales, 1962). In the experiment involvingrats, it is pointed out that the experimenterneed not be obvious in his attitude toward asubject's performance in order to influence, andtherefore bias, the subject's actions. In otherwords, the experimenter can influence the out-come without slamming a rat into his home cageafter a poor run or giving another a pat or two fora good performance. Rather the experimenter' sattitude may be mediated to the subject muchmore subtly via changes in the experimenter'stemperature, skin moisture, etc., as he watcheswhat he considers a good or poor performanceby the animal. The comparative sophisticationand intelligence of human subjects would sug-gest a proclivity on their part.toward activeanalysis of any attitudes' subtly implied by theexperimenter through words, gestures, etc. Inone crucial experiment, it was shown that re-search assistants easily can be affected by thebias of their employer (Rosenthal,, et al., 1963).

In his summary article (1963), Rosenthalconcludes that " ex perimenter outcome-orientation bias is both . 'airly general and afairly robust phenomenon. In an interestingpassage, he states:

But perhaps the most compelling and themost general conclusion to be drawn is thathuman beings can engage in highly effectiveand influential unprogrammed and unintended

communication with one another. The sub-tlety of this communication is such thatcasual observation of human dyacis is un-likely to reveal the nature of this communi-cation process. Sound motion pictures mayprovide the necessary opportunity for moreleisurely, intensive, acid repeated study ofsubtle, influential communication processes.We have o b t a i n e d sound motion picturerecords of 28 experimenters each interactingwith several subjects.. . . In these films,all Es read identical words to their Ss sothat the burden of communication falls onthe gestures, expressions, and intonationswhich accompany the highly programmed as-pects of Es° inputs into the E-S interaction(Rosenthal, 1963, p. 279).

Recent reports on the analysis of subsequentlyobtained motion pictures have generated manyinteresting hypotheses in this regard (Rosenthal,1965).

McQuigan (1963) has looked at the experi-menter as an additional stimulus object. Hedivided multi-experimenter experiments i ntothree classes. In Class I, different experi-menters do not differentially affect the results.In Class II experiments, an experimenter variesfrom others but always in a consistent direction;for example, E1 obtains higher scores for allgroups than E2. In the third class of experi-ments, the characteristics of a particular ex-perimenter interact with treatment conditions.Whereas results of the first two classes aregeneralizable, the results in Class III experi-ments are not. McQuigan suggests that re-search reports include specification of varyingresults obtained by different experimenters.In this way he hopes to control the experimentervariable or at least increase the knowledgeabout the effects of experimenters on theirsubjects, and he discusses the essential ideasbehind generalizing to a population of experi-menters.

The teacher occupies a unique position inmost educational research. Seldom is theteacher the experimenter, yet he often has thetask of administering the experimental treat-ment. Thus, in educational research, addedto the problem of experimenter bias is the po-tential bias displayed by the classroom teacherwhose students make up the experimental popu-lation. Very often the teacher applies or ad-ministers the treatment; in a sense, he is aco- or sub-experimenter. At other times, theteacher is not aware that his class is in anexperiment. The question can be asked: dothe pupils of a- teacher perform differentially

Page 15: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

depending on whether the teacher does or doesnot know that his class is in an experiment ?If the pupils do, the researcher must carefullyweigh the advantages and disadvantages of in-forming the teachers that an experiment is inprogress.

There is little if any recent research on thepossible effects of using teachers as sub-experimenters. On the other hand, early ex-perimenters and writers warned against thepossible contaminating influence of such apractice. Consider the works of two of thefirst writers in the field as well as an earlyillustrative educational experiment.

McCall (1923) felt that each teacher con-sciously or unconsciously revealed to his pu-pils the experimertal treatment that he preferredwhen his class was involved in an experiment.The students then reacted either favorably orunfavorably toward the teacher's pr e f erre dtreatment, depending on their personal like ordislike of the teacher. McCall recommendedthat the best way to avoid bias was to keepthose administering the treatments ignorant ofthe objectives of the experiment.

Another early educator wrote in much thesame manner. Brooks, a superintendent ofschools discussed later because of hi 3 nearlycomplete reliance on standardized tests formeasuring teacher merit, considered the prob-lem of whether or not to inform the teacher ofthe experiment.

It is an open question whether or not theteachers themselves should be informed ofthe main purpose in viewthat is, the pur-pose- of comparing the efficiency of the twoMethods. If we could be perfectly sure thatboth teachers would be thoroughly interestedand honest about the experiment it wouldundoubtedly be wise to seek their intelli-gent cooperation, since by so doing weshould be more likely to get the best pos-sible results from both methods. But ifthinking their reputations are at stake, oneor both are likely to be tempted to stretchthe time limit for daily drill or to persuaduthe pupils to drill themselves for speed andaccuracyoutside of class, then it will prob-ably be better to leave them in blissful ig-norance of the main plot, rrerely seeing toitthat each teacher devotes thesameamount of time to class drill in the funda-mentals each day. In this way one can in-fer what each of the methods would accom-plish under everyday working conditions inthe hands of equally competent teachers(Brooks, 1921a, p. 340).

5

An early experiment in education providesa good example of teacher bias ("Student,"1931). In the Lanarkshire milk experiment,20,000 pupils served as subjects in an attemptto de t ermine the value of adding, milk to achild's diet. In each of 67 schools, 200 to400 pupils were "randomly" divided into ex-perimental and control groups, randomnesspurportedly achieved by balloting or by an al-phabetical means. Ho we v e r, efo designallowed the teachers to inspect the two result-ing groups and to substitute subjects when itappeared that the "random" procedure hadgiven an undue proportion of well-fed or ill-nourished children to one group or the other.The teachers were biased in that, given thischoice, they tended to place the smaller andunder-nourished children in the experimentalgroup that was to receive milk. Figures avail-able showed the milk group to be noticeablyshorter and lighter at the beginning of the ex-periment.

Other examples of teacher bias of a similarnature are included in other sections of thisreview. In this entire review, however, acritical fact to note is that every instance ofteacher-bias is an early educational experiment.The professional literature of the past 30 yearsis essentially devoid of any wention of thesubject. Teachers of today, as compared withthose of the 1920-1930 era, are better educatedand more professional in many respects. Theresults of early studies might not be capableof replication today because of basic changesin the characteristics of the American teacher.

MOTICE OF TESTING

Somewhat less extensive is the literatureavailable on the 'effect of test performance ofnotifying teachers (and thereby their pupils) ofthe particular day on which a test will be given.Several articles appear on coaching for tests.

Most articles on the coachability of testshave been written regarding intelligence tests;these a r e b r iefly reviewed in an article byFrench and Dear (1959). In their own particularinterest, French and Dear were concerned withthe effect of coaching for the College Board'sScholastic Attitude Test (SAT). Although sta-tistically significant differences were foundwhen coaching for the SAT occurred, the dif-ferences were small enough that they were oflittle practical significance. An associatedinvestigation showed that even when itemsidentical to the test items were s c a tt eramong the practice items, no substantial gain

Page 16: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

6

resulted unless the items were identified asactual test items during the coaching. Frenchand Dear concluded, for the SAT at least, thata candidate would be wise not to pay for spe-cialized coaching but rather review and readon his own.

In early experiments on coaching, Gilmore(1927) found that both the experimental (orcoached) group and the control group made sub-stantial gains on the a .ciazwapteglisulimAuk when it was readministered after a 12-..

week interval. In the same year, another re-searcher (DeWeerdt, 1927) found that hiscoached group gained more on the Illinois reik:

mingilothan did the control group. However,the superiority was confined to the analysis,synonym-antonym, a n d sentence-vocabularysections of the test, and was not apparent inthe verbal ingenuity section or three arithmetic-related sections.

In an experiment more closely related to thepresent one because it did not involve coach-ing, one group of junior high school studentsreceived two days' notice of an upcoming unittest in science while the other group receivedno notice (Tyler and Chalmers, 1943). Thedifference in the average score favored the"warned" group but was less than two percent-age points greater than the control group andfell short of statistical significance. Obvious-ly, in this experiment, the forewarned pupils hadfairly accurate perceptions as to the questionsthat might be asked on the test.

Turning from e x p e r i m e n t s in the area ofcoaching for tests, consider articles touchingon other ideas pertinent to this study. The im-plications of the discussion to follow overlapwith the next two vari a b l e s, administrationand scoring of tests. As a starting point, it isreadily apparent that a possible means of eval-uating a teacher's effectiveness is the gain tohis pupils' proficiency. Indeed, rather elabo-rate discussions have focused upon pupil gainas an accessible and potent measure of teachermerit (Bolton, 1945; Ryans, 1949).

Other authors have written of the pitfallsand dangers of such a procedure. Douglas(1935) quite early considered evaluatingteachers by testing their pupils subject to manypitfalls, chief among them being the inordinateemphasis on .those course objectives easilyamendable to testing. A more recent and de-tailed objection to such a practice has beenvoiced by Thorndiko and Hagen (1955). Theylabel the practice as questionable at best andquite possibly vicious, citing several consid-erations overlooked by such a method: that theachievement of a class group is a function of

more than the present year's effort; that a nachievement test battery measures only a frac-tion of the objectives of a modern school; thatteachers will become demoralized by such amechanical evaluatory procedure; and thatteachers wilco tend, with more or less directness,to concentrate on the skills so tested, thereby"teaching for the tests."

The last mentioned consideration could be acrucial one. A rather penetrating article isquoted at some length to highlight some of theramifications of this issue.

1.210thiLtimAtrijorauhsuarggy_ainsunsak? It may seem undignified even tosuggest that such an unprofessional practicemight be carried on. But in certain schoolsystems the practice carried on by cer-tain teachers, and those who are trying tointerpret tests should be aware of this pos-sibility. Of course, any time a group oflearners is taught a test the use of normsaccompanying the test becomes meaningless.Unfortunately some administrators andsupervisors have unwittinglyencouraged thi spractice, partially through a procedure sug-gested by the next question.

Are teachers elven raises or promotions opiba_10211.42LbuiLlguidts2 Some superin-tendents, principals, and supervisors cast-ing about for an objective basis for givingpromotions in rank or salary increases havesettled on the idea of giving these rewardsto those who can produce the best test re-sults. The goal is admirable but this par-ticular method has resulted in many unpro-fessional practices and should be eliminatedin anyplace it exists (Simpson, 1947, p. 63).

Thus, it can be seen that some defensiveteachers might, given notice of an upcomingtest, actively teach the test or otherwise go togreat lengths to prepare their pupils for thetest. Indeed, even a teacher who is not de-fensive might be expected to teach the test ifhis status and livelihood depends upon hispupils' performances. One writer support sthe use of fall testing programs as a remedyto reduce the likelihood of teachers teachingfor specific tests (Findley, 1945).

In the literature, one early example wasfound that accentuates an unparalleled and al-most unbelievable emphasis on evaluati ngteachers by testing their pupils. In threeseparate but related writings, Brooks (1921a,1921b, and 1922) detailed the techniques heused as superintendent of schools to evaluate

"MMIMtlMINill11121,

Page 17: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

his teachers. He rejected classroom observa-tions as it was too easy for a teacher to pre-pare and do well on a single day only, or topull out a beautiful lesson prepared some timeago to be used during an unannounced visita-tion. Instead he had the teachers administerstandardized tests to their pupils. To addteeth to his system, he 6ontinued:

The teachers were further warned that, al-though I had no reason to distrust anybody,the matter was too important to permit tak-ing any chances. Accordingly, I proposedto check the work of each teacher by givingone or two of the tests in her school Mitr,she had given all of them. By comparingthe results of my tests with theirs of thesame kind, I could readily detect any grosscarelessness or intentional dishonesty onthe part of the teachers. There is consid-erable temptation for some short-sightedteachers who know their own efficiency isbeing measured by these tests, to stretchthe time limit or to give illegitimate aid tothe pupils, or even to drill on the test it-self, in the effort to make their classesshow up well (Brooks, 1922, pp. 28-29).

Somehow Brooks got the teachers to approvethis system, and he made pupil subject-matterprogress the core element in a rating plan.Quite confidently he noted that the teacherswere to be paid bonuses for any annual increasein their pupils' achievement as measured bystandardized tests and that he was certainthat most of the teachers were working hard fora bonus.

Certainly this type of inordinate emphasison the results of standardized tests by adminis-trators could have undesirable effects on theteachers when they were informed of an upcom-ing test.

ADMINISTRATION OF THE TEST

In considering the differential effects pos-sible when a teacher administers a test to hispupils or when it is administered by an "out-sider" (who might even be another teacher), itis soon apparent that the question has seldombeen raised in the literature. Several reasonscould be advanced as to why pupils would notscore better on standardized tests with differenttest administrators.

Traxler (1951) suggested that some teachers,when testing their own pupils, might be so anx-ious for their students to do well that they offer

7

indirect suggestions that help them obtainhigher scores. This situation is well-illustratedin an early spelling experiment (Rice, 1897).The researcher was amazed at the tremendousscores in spelling achieved by 33,000 pupils,so he visited some 200 teachers:

Long before I had reached the end of myjourneymyfondest hopes had fled; for I hadlearned from many sources that the unusuallyfavorable results in certain classrooms didnot represent the natural conditions, butwere due to the peculiar manner in which theexamination had been conducted. . . . Anunfortunate feature of the first test was thefact that in many of the words careful enun-ciation would give the clue to the spelling.. . . Under these circumstances, eventhe most conscientious teachers could notfail, unwittingly, to give their pupils someassistance, if their enunciation were habit-ually slow and distinct; while in those in-stances in which my test had been lookedupon as an opportunity for an educationaldisplayin which the imperfections ofchildhood were not to be s ho w nth eteachers had been afforded the means ofgiving their pupils sufficient help throughexaggerated enunciations alone, to raisetie class average materially (Rice, 1897, p.165).

1Ric3 gave and supervised a second exam tothose who had done so well on the first and,on the average, the scores were reduced byone-fourth.

A similar example is reported by Lowell(1919). A test required the pupil to find allthe words that rhyme with "day, mill, andspring." One teacher felt that her children werenot responding as they should so sh'e said,"Why, children, you know how we have beenfinding words to go in the 'ing' family, so Idon't see why you can't find others like 'day'to go in the 'ay' family." The children immedi-ately began to write their responses, but theteacher had obviously given them an unstand-ardized hint.

Hopkins and Lefever (1964) recently investi-gated the comparability of test scores whenthe test was administered by the teacher and bytelevision. Not aware that they were in an ex-periment, fifth- and sixth-guide teachers in arandom half of the district's 20 elementaryschools gave the Metrorolitan Science Test,

ing "conventional teacher administrationl'In the other 10 schools, fifth and sixth graderswere given the test via television, with a single

Page 18: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

8

administrator using the standardized directions.A statistically significant difference was foundfor the fifth grade, favoring the teacher ad-ministered group, but no difference was foundfor grade six. However, the finding was con-sidered of small practical significance, beingless than a month in grade-equivalent units.

SCORING OF THE TEST

The final variable deals with the differencesone might find, depending on whether a stand-ardized test was scored by the c la s sr oo mteacher or by an "outsider."

It is well-established that persons scoringtests quite often make numerous errors. Pitner(1926) prepared an answer sheet for the NationalIntelligence Test that incorporated many com-mon and also unusual errors made on the test.Then he had a number of graduate students-score answer sheets marked identically; mostof the graduate students had had experience asteachers, supervisors, or school psychologists.The range of raw scores given to the same an-swer sheet was wide; in mental age, the ex-tremes ranged from seven to eleven years.

More recently, Phillips and Weathers (1958)examined errors made by27 third-grade and 24fifth-grade teachers in scoring 5,017 achieve-ment tests. The tests were rescored severaltimes by staff members, and 1,404 (28%) of thetests were found to have scoring errors, withmost teachers making between10 and 40 errorsper 100 tests. Inaccurate counting' was th eprimary cause of errors (44.8%) followed byinappropriate use of instructions (26.1%), in-appropriate use of the scoring key (14.9%), er-rors in using the conversion tables (13.5%), andcomputational errors (.7%). More interestingwas the essentially normal distribution of errorsaround the "accurate" or correct test score.Note the findings summarized in Table 1.

Hulton (1925) examined the grades given topupils by three junior high school teachers,each teaching a different subject. He foundthat each teacher was giving higher marks topupils from her own homeroom . Hulten con-eluded that these teachers uncons c.1 ouslyfavored the pupils from her homeroom in herparticular subject. It has even been demon-strated that knowledge of authorship (that is,knowing who wrote an exam) has an elevatingeffect on marks awarded by graders (Edmiston,1939).

TABLE 1

Errors Made in Scoring Standardized Tests

Difference between Number of Errors Affect-Corrected and Un-ing Grade Equivalentcorrected Grade

8Equivalent ;arising_ Lowering

.1 to .5 508 562

.6 to 1.0 66 811.1 to 1.5 36 281.6 to 2.0 9 72.1 to 2.5 8 62.6 to 3.0 6 5

Over 3.0 6 4

Total Number of errors* 639 693Smallest change .10 .10Median change .31 .29Largest change 3.80 3.50

*72 errors did not change grade equivalents.

RELATIONSHIP BETWEEN LITERATURE REVIEWED

AND THIS EXPERIMENT

Briefly, what is the relationship betweeneach of the areas of literature reviewed and thestudy herein reported ? Considering the pres-ence or absence of experimental atmosphere,an educational researcher must decide whetheror not to inform teachers (and probably, by sodoing, informing their pupils) that an experi-ment is in progress. 71 he decides not to in-form teachers, the work of Orne suggests thatsome teachers will perceive that they are in anexperiment and proceed to speculate about theveziablee and hypotheses under investigation.On the other hand;, if he informs the teachersthat an experiment is being conducted, he maywell quite subtly bias them in directions fa-vorable to his particular h y po these s, asRosenthal suggests. More critical, the teach-ers and their classes, being informed of theexperiment, may perform unnaturally well be-cause of the Hawthorne Effect, thereby jeopard-iztng the external validity or generalizabilityof the results. The dilemma is a real one. Inthis study, the differential effect of conditionsof experimental atmosphere and absence of thesame will be considered. The fact that the pos-sible bias of. outside administrators (due to ex-perimental atmosphere) is not investigated inno way should belittle the importance of such aconsideration. As noted in Chapter I, thisquestion was not investigated in this study dueto procedural and methodological limitations.

Page 19: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

Early educational experiments suggest thattest notice may be a crucial variable in manyresearch investigations, but recent literatureis notably silent in this regard. It is liable tobe of more significance in systems that usepupil progress on standardized achievementtests as an indication of teacher merit. A de-fensive teacher could use notice of the test toconcentrate his instruction in the area to betested. However, an experimenter has almostcomplete control over this variable. Regardlessof whether or not the teachers know an experi-ment is under way, the researcher, workingwith the school administrators, can schedulethe testing so that any desired period of noticecan be achieved. It may be, however, thatnotice of the test is not this potent a variable.The differential effect of 10 school days' andone school day's notice will be investigated.

The final variables, test administration andscoring, are investigated for practical consid-erations. Allowing the teacher to give andscore tests at the end of an experiment is afeasible course of action. It is convenient andinexpensive, and allows the pupils to react"naturally" and optimally to the testing situ-ation, because their regular teacher is the testadministrator. The point often overlooked,however, is that the teachers may be biasedin favor of their own students to varying degrees.Bias is a natural and desirable phenomenon in

some cases, but a research project is not oneof these. The existence of teacher bias resultsin increased uncontrolled variation in the ex-periment. No less important, although not in-vestigated in this study, is the potential biasof outside test administrators. They mightfeel that their results should confirm the ex-perimenter's hypotheses or that their skill as atest administrator will be determined by howwell the pupils that they test do.

from the Hopkins and Lefever study onewould expect no differences of any practicalsize between pupil performance on teacher-administered and outside-administered tests(recall in that study, teachers did not knowthat an experiment was in progress). Likewise,also in a non-experimental setting, the workof Phillips and Weathers suggests that errorsmade by the teachers in scoring their pupils'tests would raise grades as often as loweringthem. No literature is directly pertinent to thequestion of the differential effects of scoring byteachers and by outsiders in the case at hand,namely because the scoring is a routine pro-cedure involving a scoring kes multiple- choiceresponses, and no judgments by the scorer.The differential effects of teacher and outsideadministration of the test, and of teacher andoutside scoring of the test are investigated inthis study.

Page 20: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

METHOD

In this chapter, consideration As given tothe experimental setting and subjects, thesampling procedure, the experimental design,and the procedures used in the study. Insertedat critical points will be rationale for the par-ticular decisions that had to be made during theexperiment. Several figures and tables arepresented to clarify and visually demonstratekey elements of the text.

EXPERIMENTAL SETTING AND SUBJECTS

The subjects used were second-semestersixth graders in a large midwestern city. Theachievement level of the city's sixth gradersis somewhat above but generally comparable tothe nation as a whole. This can be seen 1 nFigure 1 in which is graphed the average Com-posite Scores by school on the sixth gradeIowa Test of Basic Skills, battery given i n

ab

00 200

25

0 15

10

5

000 r-e

sO.1311

Na a a .1 4.;

0

co c.0 A

Average Total Arithmetic Grade Placements

Figure 1. Average composite grade placementson sixth-grade km hat of Rua=ill for 112 elementary schools ina large midwestern city; Testing con-ducted in October, 1964.

10

October, 1964. Figure 2 is a histogram for thearithmetic subtext contained in the Iowa Testof Basic Skills battery. Apparently the systemis closer to the national average (or closer to6.2 as the testing was conducted in October)

30

to 25

0

tic

10

01'

or;

QV

N 0 el .0j jIn ao Its

a a to .zAverage Total Arithmetic Grade Placements

cr cr,

A

Figure 2. Average total arithmetic grade place-ments on sixth-grade Iowa Test ofB a sic Sian s for 112 elementaryschools in a large midwestern city;Testing conducted in October, 1964.

inTotal Arithmetic score than it is on the com-posite battery score. Figure 3 contains theschools' mean Non-Verbal IQs, as assuredby the Lorratv.lhorndike Intelligence Test, andis evidence of a moderate similarity betweenthis system and the national average. Non-verbal IQs are considered herein rather thanverbal or composite IQs because arithmetictest scores are used as dependent variables.

As will be explained below, the samplingunit was not individuals but rather classrooms.The school system promoted pupils semi-annually,_ 'Therefore, the grouping procedures

Page 21: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

30

250

034n 20

48 15

10

z5

0

co

V

coco

Co

cy

cr.

cr.e

got

wo in 0% 010 0 0 -p.4 o..4 $.4

1 1 1 7co N .0 0a' 0 0 -

Average Non-verbal IQ

Figure 3. Average non-verbal IQ oThorndtke Intelligence ,lreit for 112elementary schools in a large mid-western city; Testing conducted inOctober, 1964 (sixth grade).

in the individual schools varied considerably,and it was not unusual to group first- andsecond - semester sixth graders in the sameclassroom. In a few cases, the s e co n d-semester sixth graders were found in class-rooms with fIrst-semester seventh graders.

Given this pupil classification scheme, itwas decided to invite the participation of onlythose elementary schools having one or moreel*sses containing at least 15 second semes-ter sixth graders. One hundred four schoolsmet this criteria. The 104 school principalswere contacted by mail and asked to participate.It should be noted that the school system in-volved had an established policy outlining theprocedures to be followed in gaining access tothe schools to conduct an experi ent. Th eunusual nature of the experiment dictated thatclassroom teachers not be informed of the studyuntil a later time; in other words, coot nary tocustomary practice, the building principalco d not discuss the project with the teacherto ascertain the latter? s willingness to partici-pate. The officials of the system altered theestablished procedures to allow the buildingprincipals to conditionally approve participa-tion in the experiment. An elaborate plan wasinstituted whereby an alternate class could besubstituted for any randomly selected classwhose teacher declined to participate onceinformed of the project (i. e., when the experi-

11

mental treatment commenced). As it turned out,it was not necessary to use any of the alternateclasses for this purpose.

All building principals responded to the let-ter. Seventeen of the 104 declined to partici-pate. The main reason given for non-participa-tion was current involvement in other researchstudies. The 17 schools did not possess anycommon characteristics (e. g. , small size, lowachievement, etc. ) that would suggest otherfactors motivating their non-willingness toparticipate.

In 40 of the 87 schools, there were twoclasses contai ng 15 or more second-semestersixth graders, while one school contained threesuch classes. In each of these 41 schools,one class was selected by using a table ofran-dom numbers. Thus a pool of 87 classes was es-tablished, each containing at least 15 sixthgraders in their second semester and eachresiding in a different school. It was nec-essary to include o ly one class per school be-cause of the reactive or contaminating natureof the experimental treatments.

In Figure 4, the distribution of the 87 clas-ses in average Total Arithmetic scores on theJowa Test of aasic Skills (given in October,1964) can be seen. The shape of this distribtion is quite similar to that in Figure 2, support-ing the inference made above that the schoolsrefusing to participate comprised no singlecategory (such as high achieving or low achiev-

NInV

InNIn

a,InIn

04;

co

In5.4

01.-

4;

a'

A

Average Total Arithmetic Grade Placements

Figure 4.

Basic Skills

grademanta on sixth-grade 1pm AU 91

miolwestern city; Testing conductedin October, 1964.

Page 22: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

12

ing). Quite obviously one factor reducing thesimilarity between the two distributions (Fig-ures 2 and 4) is the change in unit considered,fro.. school to class, for many schools con-tained more than one sixth grade class. Thefigure also serves to re- illustrate the close ap-pro3cimationto the national average; 34 classeslie below the expected mean category (6.6.3, with its 19 classes) while another 34 Leabove it.

The average Non-Verbal IQs of the 87 clas-ses forming the pool of experimental units aregraphed in Figure 5, reinforcing the implicationsmade above that the system is not Uppreciablyunlike the national average in this respect andthat the 87 schools agreeing to participate wererepresentative of the entire system (see Figure3). In addition to the nearness of this systemto national norms on achievement and intelli-gence test scores, the system is also repre-sentative in that it contains schools locatedin a wide range of socioeconomic neighbor-hoods.

coV

aD

.$)co

0F-ar4

r4 %C1 CT et0 00 00 o-

CO

o-s r4 P.0 o=41

1 ill

NII I

.0 0Cf 0 0 o-o

Average Non-verbal IQ

M0-4p.1

A

Figure 5. Average non-verbal IQ on Lome-,Thorndike Intelligence Zeit for 87sixth-grade classes in a large mid-western city; Testing conducted inOctober, 1964.

SAMPLING PROCEDURES

To reduce random variability in the design,a stratified random sampling procedure wasused. The 87 classes were placed in fourstrata on the basis of previous ar 1 thmetic

achievement (as evidenced in the October,1964, Iowa Testof Basic Skills Total Arithmeticscores). In arriving at a class average to de-termine strata, onay the Total Arithmetic scoresof the pupils currently in the second semesterof sixth grade were used (students who hadbeen in the first semester of sixth grade inOctober, 1964). Likewise, in reaching con-clusions, only the scores of these pupils wereanalyzed, although all thepupils In each classwere tested in order to achieve realism andcredibility in all experimental treatments.

The number of levels was established atfour, rather than five, three, or two. Only 87classes were available; with the intention torun all 16 experimental treatments in eachstratum, the upper limit for number of stratawas five. However, this would have left onlyseven classes as alternates, a perilouslys m all number considering the administrativeprovision allowing teachers to refuse to -parti-cipate once the treatments commenced. Usingfour, three, or two strata would obviously leavesufficient alternate classes. Although the useof only two strata would allow ak within-cellerror term as two classes from the same stratumcould be randomly assigned to the same experi-

ental treatment, the precision of the experi-ment was obviously -enhanced by using fourstrata rather than three or two.

Thus, tho 87 classes were ranked on thebasis of previo s arithmetic achievement, andsubsequently grouped by fourths, from highestto lowest. Within each 4.-if the four resultingstrata, classes were assigned to the 16 ex-per ental treatments by use of a table of ?±an-dom numbers. Previous arithmetic achieve-inent, therefore, was used as a levelingvariable and was included in subsequent sta-tistical analyses..

EXPERIMENTAL DESIGN

Indopo ass Variables

The 16 experimental treatments were thecombinations generated by a e factorial de-sign using the following independent variablesin connection with a recent arithmetic achieve-ment test as a response measure:

1. Experimental atmosphere (+) and absenceof the same (-);

2. Notice of test date (+) and no notice (-);3. Testing by regular teacher (+) and testing

by "outsider" (-); and

Page 23: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

4. Scoring by regular teacher (+) and scoringby "outsiders' (-).

Treatment nder variable one was effectedby letter from the office of the school system'sdirector of research. Experimental units (1. e. fclasses) under the experimental-atmospherecondition were informed by mail 14 days beforethe testing date that theywere in an experiment.Units not under the experimental-atmospherecondition were told when notified of the testdata that they were randomly selected to col-lect normative data for a new standardized test.

"Notice of test dateuwas effected by mail 14days prior to test date (April 5, 1965). Underthe "no notice" condition, teachers were notinformed of the testing until the Friday pre-ceding the Monday test date.

The test-administrator variable was intro-duced by sending copies of the test to theteacher administrators whenever their class re-ceived notice of the testing. Outside adminis-trators (graduate and under-graduate students)were given the test packets by their collegeinstructors. All administrators (teacher andoutside) also received detailed written instruc-tions on how to prepare for adm:iiiistering thetest; neither group was contacted personallyby the experimenter or any assistant to discussproper test procedures.

The test scoring variable was accomplishedby leaving the exams with half the teachers forscoring by them within four days. The othertests were collected and scored by four out-siders whose orientation in regards to experi-mental atmosphere was identical to that of theteachers whose exams were scored (two of thescorers believed that the tests resulted frothan experiment while the other two believedthat normative data were being collected). Alltests were subsequently rescored to determinetheir accuracy; and accurate or correct scoreswere used in the analysis. Two considerationsprompted the analysis of correct or accuratescores, rather than analyzing scores sometimesinaccurate due to scoring errors. First, thetendency to make scoring errors was not ofprimary concern in this study; it was assumedthat errors would be random as no scorer,teacher or outsider, would deliberately recordan erroneous score (errors were later found tobe random for both groups of scorers). Second,use of accurate ecores reduced the error vari-ance due to individual variations in carefulnessof scoring. It is possible that a more appro-priate title for this variable might have beenthe "contemplation of the scoring of the test."

The 16 experimental treatments were 'xi-

13

madly implemented through the use of writteninstructions, as can be inferred from the dis-cussion above. The written instructions sentout to teachers to introduce the experimentalconditions are reproduced in Appendix A. Theidentical nature of corresponding paragraphscan be noted (for example, the + teacher ad-minstration paragraphs in treatments 1, 2, 5,6, 9, 10, 13, and 14). The instruction sheetsfor treatments 9 and13 (and each of the subse-quent three pairs: 10 and 14; 11 and 15; and12 and 16) are identical; however, teachers intreatments 9 through 12 received the instruc-tions on March 22, 1965, while those in treat-ments 13 through 16 (no-notice treatments) re-ceived them on April 2, 1965.

The 64 classes randomly selectee to parti-cipate were taught by 38 male and 26 femaleteachers. Table 2 gives a summary of the re-sults of the random assignment of classes toexperimental treatments. The close approxi-mation of the 64: experimental classes to thenational average should be noted, with an av-erage grade placement of 6.164 on the TotalArithmetic score for the Iowa Test of BasicSkills and an average Non-Verbal IQ of 101.09on the Lorae-TALorndike Intelligence Test. Theaverage Total Arithmetic Scores for the fourstrata are disparate, producing differences be-tween strata of .334, .348, and . 461 gradeplacement units. The average Total Arithmeticscores for the four classes in each of the 16treatments ranged from 6.047 to 6.250 whilethe Non-Verbal Its varied from 98.72 to 104.92.Table 3 depicts the results of the random samp-ling procedures insofar as the independentvariables are concerned.

Dependent Variables

Four dependent variables were investigated:grade placements in arithmetic computations,concepts, applicationo, and a total or average,score. The first three quantities are yielded.by the Stanford Arithmetic Achieement TestIntermediate II, while the fourth is an averagea: the first three. This test was copyrighted

in 1964 and was wholly unfamiliar to theteachers in the selected school system.

An arithmetic test rather than a test in someother subject was initially preferred becauseof the objective scoring possible. That is, itwas assumed that more consensus of opinionwould exist on the "correctness" of answerson arithmetic problems among persons (teachersand outsiders) hand-scoring the tests. How-ever, it soon became apparent that few of to-day's standardized achievement tests permit

Page 24: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

14

TABLE 2

Average Total Arithmetic Grade Placements on Iowa Test of Basic Skt lls andNon-Verbal IQs on Lorae7Thorndlice Intelligence Test by Experimental Unit,

Treatment, and Stratum; Testing Conducted in October, 1964

'IQ+A

0 4 s411E1°4

C41141 eci0

Stratum2 3 4 lkesuile___

TBS LTNV LTNV ITBS LTNV

1 + + + + 6.723 102.03 6.387 101.57 6.108 103.03 5.758 95.33 6.244 100.492 + + + - 6.971 116.23 6.347 103.25 5.927 97.46 5.753 93.47 6.250 102.603 + + - + 6.700 108.79 6.445 98.61 6.105 101.40 5.427 89.07 6.169 99.474 + + - - 6.619 108.50 6.394 109.22 6.166 104.77 5.604 90.21 6.196 103.185 + - + + 6.463 102.05 6.453 110.83 5.939 102.06 5.842 100.45 6.174 103.856 + - + - 6.740 107.26 6.282 104.55 5.862 99.54 5.305 89.14 6.047 100.127 + - - + 6.463 105.78 6.429 105.10 6.064 98.08 5.433 87.07 6.097 99.018 + - MN - 6.815 109.12 6.411 105.93 5.889 90.61 5.522 91.74 6.159 99.359 - + + + 6.579 105.38 6.184 102.40 6.083 104.50 5.808 91.20 6.164 100.87

10 - + + - 6.958 112.03 6.330 107.10 5.879 97.37 5.391 89.13 6.140 101.4111 - + - + 6.717 104.61 6.433 104.94 5.950 97.35 5.623 90.81 6.131 99.4312 - + - - 6.797 108.53 6.386 100.94 6.177 101.94 5.281 81,46 6.160 98.7213 - - + + 6.600 110.44 6.329 103.17 6.168 106.12 5.843 99.95 6.235 104.9214 - - + - 6.754 112.18 6.358 107.46 5.959 96.21 5.771 97.82 6.211 103.4215 MP NO .I. + 6.745 103.55 6.316 106.03 6.000 95.38 5.474 94.30 6.134 99.8216 MD MD OM CO 6.622 105.44 6.442 108.46 6.071 102.39 5.146 86.92 6.070 100.80

Average 6.704 107.62 6.370 104.97 6.022 99.89 5.561 91.88 6.164 101.09

TABLE 3

Average Total Arithmetic Grade Placements onIowa Test of Basic Skills, and Non-Verbal IQs

on Lorce-Thorndike Intelligence Test, byIndependent Variable;

Testing Conducted in October, 1964

IndependentVariable

Average TotalArithmetic(ITBS)

Average Non-Verbal IQ (LT)

Experimental + 6.167 101.01Atmosphere - 6.162 101.17

Notice of + 6.188 100.77Testiruj - 6.141 101.41

Teacher + 6.183 102.21Administration - 6.146 99.97

Teacher + 6.175 100.98Scoring - 6.154 101.20

Average 6.164 101.09

a student to leave an answer in his ownhand-writing. Rather, the student computes hisanswer and then selects and marks a responsefrom a list of alternatives. This being the case,to increase generalizability, a test of this typewas selected. The tests were hand-scoredusing a scoring kel, bit little opportunity

existedifor the scorer to make a judgment aboutthe approPriateness of an answer. Since arith-metic tests had been more thoroughly researchedby this experimenter and since it was desiredto minimally disrupt normal school routine (bygiving a test in a single subject area rather thanan entire test battery), it was decided to stilluse the single test in arithmetic achievement.

The final step involved determining the ap-propriate level of the test to use to best mea-sure the ability range existing in the selectedschool system. The Intermediate I level of theStanford Arithmetic Series was too easy, havingbeen designed for use from grades 4.0-5.4.The Intermediate II level was designed forgrades 5.5 to 6.9. As the test would be givento "6.8" p pits, the level seemingly would besuitable. The question remained, however,whether the test might prove too easy with thescores loading on the upper portion of the dis-tribution. Therefore, the test was given totwo sixth-grade classes in Wauwatosa, Wis-consin in early March, 1965. These classeshad also taken the Iowa Test of Basic Sk&lls, inOctober, 1964, and achieved at a level (approx-imately 7.0) matched only by the uppermostschools in stratum one in the selected schoolsystem. Therefore, it was assumed that theperformance of these two classes on the Inter-mediate II test would be an excellent indicationof anytendencyforthe scores to pile-up on the

Page 25: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

high end of the distribution. The two schoolsaveraged, in grade placements, 7.9 on arith-metic computations, 7.6 on arithmetic concepts,and 8.5 on arithmetic applications, for an av-erage arithmetic score of 8.0. Although thiswas somewhat above the expected mean scores,given the October, 1964, scores on the IowaTest 21.114112SLciSkills, no students got all theproblems correct in any sub-test and the scoresdid not stack up on the high end of the scale.The test was judged appropriate for the experi-ment.

As a result of the trial testing, two instruc-tions were added to reduce variability in testadministration:

1. If students ask whether they should guesssay: "Do the best you can;"

2. If students ask about time limits, say:"Work at a rapid pace; you'll probablyhave time to try all the problems." Thecomplete administration instructions aregiven in the main report (Goodwin, 1965).

Anaiysis str Dots

Resulting class means on each of the fourdependent variables were subjected to a 4 X 24analysis of variance. The four main effects,and the two- and three-factor interactionsgenerated by them, were tested using an ap-propriate error term. In addition, the effectof the blocking variable (previous arithmeticachievement) was tested, as well as the first-order interactions of it with the four indepen-dent variables.

The error term initially was composed of allfour-factor interactions and the single five-factor interaction (df = 16); these higher orderinteractions were assumed to be estimates of(f2, A ,atca ir it was decided to use this errorterm to test the remaining three-factor inter-actions (those involving the stratifying orblocking variable) before pooling further. Aprocedure discussed by Green and Tukey (1960)was selected to determine which three-factorinteractions could legitimately be pooled withthe initial error term. By this procedurai thesums of squares and degrees of freedom ofstatistically non-significant interactions couldbe included in the final error term. On theother hand, a significant interaction, or evenone approaching significance, certainly couldnot be aseumedto be zero or', sa1l. Therefore,it could not be considered an estimate of Cr2 andshould not be pooled. Thus, in the experiment,the three-factor interactions that included theleveling variable were tested using the initial

15

error term. Any interaction whose F-ratio ex-ceeded twice the 50 per cent point of the F-distribution with the corresponding degrees clfreedom was not pooled. The non-significantinteractions were assumed to be estimates ofel' and were pooled to form the final error temTable 4, showing the degrees of freedom andexpectations of mean squares, clarifies theanalysis.

PROCEDURES

Time Schedvie

A time schedule is presented at this pointto clarify chronological relationships betweenthose topics previously discussed and thoseabout to be presented. At the same time, itwill serve to emphasize the important proce-dural steps followed:

January 18, 1965:

January 19, 1965 -February 22, 1965:

February 22, 1965:

March 11, 1965:

March 20, 1965:

March 29, 1965:

April 2, 1965:

April 5, 1965 A. M:P. M:

April 9, 1965:

April 10, 1965 on:

Meeting with administra-tive officials to determinefeasibility of study.Discussions with adminis-trative officials on proce-dural policies.Conditional approval re-ceived from the administra-tive officials to conductthe study.Preliminary tryout ofstandardized test in twosixth-grade classes inWauwatosa, Wisconsin.Experimental materials andinstructions mailedtoprincipals of teachers intreatments 1 through 12(I. experimental atmosphereand/or+ notice of testing).Testing materials sent to32 graduate and advancedundergraduate student s(outside testers).Experimental materials andinstructions delivered toprincipals of teachers intreatments 9 through 16(- notice of testing).Test given in all classes.Tests collected from 32schools (-teacher scoring.Tests collected from 32schools (+teacher scoring).Analysis of data.

Page 26: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

A6

TABLE 4

Degrees of Freedom and Expectations of Mean Squares for Analysis of Variance

Source df E(148) Source df E(MS)

Experimental Atmosphere (E) 1 02 +4. 23 wE

Notice of Test (N) 1 are +4 23 wN2

Test Administrator (A) 1 are +4* 23 crA2

Test Scorer (5)

EN

EA

ES

NA

NS

AS

ENA

ENS

EAS

NAS

Il 0+4'231 are +4. 22

1 0+4. 22

1 w2+4, 22

1 02+4. 22

1 02 +4° 22

1 0+4°22

art

a

ar 2EA

Sa

crNA a

wNS 2

0' 2AS

1 04 +4 2ENA

1are

+ 4 2 crENS2

1 w2+4. 2 wE

aAS

1 a2 + 4. 2lir NASz

Previous Achievement (P)

EP

NP

AP

SIP

EN?

EAP

ESP

NAP

NSP

ASP

Error

Total df

3 02 +24

3 + 23

0' 2P

0'EP

3 (1'2 + 23 'NP

3 e2 +23 W 2AP

3 ,2 + 23 wsP2

3 02 +22 wENP

3 are +2z wEAP

3 02 +22 ESP

3 are +22 oNA2

3 02 +22G. NSP

3 2 2cr +2 wASP

16 are

63

Outside Test Administrators

The outside test administrators, many ofthem aged 30 or 40, were college students en-rolled in advanced measurement courses whovolunteered to do the totting for a reasonablecompensation. The testers were randomly as-signed to classes. A week before the test date,they were given packets containing the testmanual, enough tests for the class, and in-struction sheets. The instructions were ex-plicit and caused the tester to believe eitherthat an experiment was or was not in progress,according to the experimental treatment theparticular class was receiving. Thus, outsideadministrators testing classes under treatments3, 4, 7, and 8, believed that an experimentwas being conducted. On the other hand, thosetesting classes in treatments 11, 12, 15, and16, lbelieved only that nOrming data was rou-tinely being collected. The main report (Good-win, 1965) contains the written instructions

given to the outside administrators (each out-side test administrator received two sheets ofinstructions).

Conduct of Testing and kering in the Experiment

On the morning of the scheduled testing, allschools were telephoned. The principals ofthe 32 schools tested by outside administratorswere phoned so that alternate testers could bedispatched to those classes which, for somereason, were without a test administrator.However, all of the designated outside testersarrived at their destinations on time. On thebasis of reports received from building prin-cipals, it was apparent that the outside testerswere well prepared to administer the test.

In classes where teachers administered thetest, three irregularities occurred. All are re-ported in the main report (Goodwin, 1965), andnone were considered critical in biasing result-ing data. .l

Page 27: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

Tests to be scored by outsiders (treatments2, 4, 6, 8, 10, 12, 14, and 16) were collectedon the afternoon of April 5, 1965. This scoringwas done by four university students who wererandomly selected from a pool of scorers. Twoof the scorers believed that the tests resultedfrom an experiment, and these scorers eachscored a random half of the data collected from+experimental atmosphere classes (treatments2, 4, 6, and 8). The other two scorers believedthat the tests were taken during routine test:manning, and these scorers each scored a ran-dom half of the tests collected from classesunder treatments 10, 12, 14, and 16.

Teachers under treatments 1,3, 5, 7, 9, U,13, and15 scored the tests of their own pupils.

17

These tests were collected on Friday, April 9,and were later rescored to determine the ac-curacyof initial scoring. The tests of experi-mental subjects scored by outsiders were alsorescored for accuracy of initial scoring. Thisrescoring was done by four different universitystudents selected from the pool of availablescorers. Each scorer in this latter group knewthat an experiment was in progress, was re-peatedly instructed to work accurately, andscored a random fourth of all the tests collectedunder all treatments. After this stage in thescoring, the experimenter randomly selectedfive percent of all tests and rescored them toascertain that the final scores were, in fact,accurate.

Page 28: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

iv

RESULTS

In this chapter, the class means for eachof the experimental units on the four dependentvariables will be given. Following the tablesof mean squares and F-ratios, significant ef-fects will be clarified by the presentation ofthe appropriate means.

IDETERNINATION OF FINAL ERROR TERNS

As discussedinChapter III, the initial errorterm was composed of the five four-factor in-teractions and the single five-factor interaction.This error term was used to test the six three-farAor interactions that included the stratify-ing variable. The mean squares and F-ratios

TABLE 5

that resulted see summarized for all four testscores in Table 5,

The 50 percent point of the F-distribution forthree and 16 degrees of freedom is .824. Ascan be seen in Table 5, only one three-factorinteraction fell in the no-pool category using1.648 as the critical value. This interaction,A X S X P, was significant by this procedurefor each of the four dependent variables. Ac-cordingly, only the sums of squares and de-grees of free dOm for EX NX P, Ex AX P,EXSXP, N X AX P, and NX SX P were pooledwith those of the initial error term. In thecase of each dependent variable, the resultingor final error term contained 31 df and was ap-preciably smaller, and presumably more stable,than the initial error term.

Mean Squares and F-Ratios of Selected Three-Factor Interactions onStanford Arithmetic Achievement Test

Sour

Sub-TestComputations Concepts ApplicationsM.S. F

_AverageM.S. F

EXNXPEX AX PEXBXPNX AX PNXSXPAXSXPError

333333

16

.441

.279

.148

.161

.064

.578

f .312

1.41

1.85

.105

.112.015.070.168.473

.218

-----

2.17

.057

.024

.101

.072

.021

.940

.314

-----

2.99

.159

.071.008.088.040.626

.232

-----

2.70

E = Experimental AtmosphereN = Notice of TestA = Test AdministratorS = Test ScorerP = Previous Achievement

18

Note: In this and all succeedingF-value tables, a hyphenindicates a F < 1.

Page 29: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

ANALYSES OF VARIANCE

Arithmetic Computations

As stated in Chapter III, three test scoreswere available as well as an average or com-posite score. The first of these scores, Com-putation, was based on pupil response to 39standard or drill type items primarily concernedwith fundamental arithmetic processes. Theresulting mean scores for each of the experi-mental units (classes) is given in Table 6,along with the number of second- s em e stersixth-grade pupils making up the subjects forsac% class. It can be noted that although classsize was quite comparable between treatments(as one would expect), this was not true be-tween levels, with relatively fewer second-semester sixth-grade students in the lowerstrata.

The 64 clear means were subjected to a4 X 24 analysis of variance. The resultingmean squares and F-ratios are contained inTable 7. Previous arithmetic ac hi ev em enttested highly significant, as expected, withmeans of 6.611, 6.195, 5.786, and 5.007 grade

19

placement units for strata one through four,respectively (see Table 6).

Also significant was the first -order inter-action between experimental atmosphere andnotice of test date. This occurred because ofhigher mean for the ++ and -- tr a a tm entcombinations as compared with the + - and - +treatment means. The relevant means are pre-sented in Table 8. (In this an all subsequentdiscussions, algebraic signs will bs used toindicate treatment interactions in accordancewith the treatment definitions on pages 12-13in Chapter III. The first sign will refer to thefirst term of errs interaction as listed in theF-ratio tables, the second sign to the secondterm, etc. For example, in Table 7 the signifi-cant interaction is listed as E X N; thus + +would refer to + experimental atmosphere and+ notice of testing, - + would refer to - experi-mental atmosphere and+ notice of testing, etc. )

None of the three-factor interactions weresignificant.

TABLE 6

Average Computation Grade Placements on Atanfo_rArithmetic Achievement Test and Number of Pupils by Experimental Unit,

Treatment, and Stratum; Testing Conducted in April 1965

Treat-ment Exp. TestNo _Atmos, Notice

Teacher TeacherAdm. Scored

Stratum

Aver. Total1 2 3 4.

G. P. N G. P. N G. P. N -a. P. N.

G. P. N

1 + + + + 6.315 33 6.690 29 6.020 31 5.230 23 6.064 1162 + + + - 7.491 32 6.521 33 5.908 25 5.407 14 6.332 1043 + + - + 8.081 32 6.252 29 5.456 16 4.500 13 6.072 904 + 4 - 6.878 32 5.465 17 5.773 30 5.454 24 5.892 1036 + - + + 6.389 18 6.724 17 5.685 29 5.770 30 6.142 946 + - + - 5.855 33 5.555 22 5.307 28 4.300 15 5.254 987 + - - + 6.353 32 6.514 29 5.908 24 4.831 13 $.901 988 + - - - 6.564 25 6.193 27 5.489 18 4.836 28 5.770 989 .. + + + 6.109 23 5.688 24 6.246 35 4.833 21 5.719 103

10 - + + - 7.668 28 6.959 29 4.853 17 4.359 29 5.960 10311 - + - + 6.127 30 6.506 18 4.843 23 4.927 30 5.601 10112 - + - - 6.255 31 .6.026 34 5.400 30 4.705 22 5.596 11713 - - + + 6.418 22 6.106 34 7.300 21 5.381 37 6.301 11414 .. - + - 6.496 27 5.878 23 6.643 28 5.872 25 6.222 10315 .. - - + 6.084 31 6.514 29 5.521 14 5.063 24 5.795 9816 - - - - 6.687 31 5.534 32 6.220 30 4.650 24 5.773 117

Average Grace 1Placement,1otal N 6.611 460 6.195 426 5.786 399 5.007 372 5.900 1657

Page 30: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

20

TABLE 7

Mean Squares and F- Ratios for Analysis of Variance of Computation Grade Placements onStanford Arithmetic Achievement Test

Source df Mean Square. F-Ratio

Experimental Atmosphere (E)Notice of Testing (N)

1

1

. 053

. 001 AM

Test Administrator (A) 1 .633 2.37Test Scorer (8) 1 .158Previous Achievement (P) 3 7.477 28. 00***E X N 1 1.572 5.89*E X A 1 . 411 1.54E X S 1 .284 1.06EX P 3 .135

x A 1 . Old 0.1

N X S 1 .522 1.96N X P 3 .671 2.51A X S 1 . 004A x P 3 .150S X P 3 . 262EXNxA 1 .348 1.30EXNXS 1 .148EXAXS 1 . 062NXAXS 1 .567 2.12Error 31 .267

*p < . 05< . 001

TABLE 8

Average Computation Grade Placements onStanford Test by

Experimental Atmosphere and Notice of Testing

ExperimentalAtmosphere

Notice of Testing

OM

mmmatl,e

6.0905.719

5.767

6.023

Arithmetic Ciecepte

The concepts score on the Stanford Arith-Retie Achievement Test, Intermediate II iscomputed using 32 problems. The problems.are more verbal than those in the Computationssub-test. This sub-test is concerned more

with the concepts behind the fundamental arith-metic processes rather than directly with theprocesses themselves.

The average grade placement of the 64 ex-perimental units was considerably higher onthis sub-test than on the first. As shown inTable 9, the average Concepts grade placementWas 6.266, over three months greater than theaverage Computation grade placement (5. 900).The numbers of pupils in the classes are notgiven in Table 9 (as they were in. Table 6) be-cause He were identical for each of the depen-dent ..Variables.

The class means on the Concepts sub -testwere analyzed using a complete factorial 4X 24analysis of variance. The mean squares andFs.ratiOs that resulted are -sunimarized in Table

,PreviOus arithm e tic _achievement was,highly significant with means of 7.160, 6.707,5.982, and 5..214 for the ,four.,Strata or levels(see Table 9).

Page 31: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

TABLE 9

Average Concepts Grade Placements on fitgurgakallani2AktimmillitLAITeby Experimental Unit, Treatment, and Stratum; Testing Conducted in April 1965

21

Treat-ment Exp. Test Teacher TeacherNumber Atmos. Notice Adm. Scored

1

2 ,+3456789

10111213141516

am

IMP

4.0

Average Grade Placement

Stratum

Average1 2 3 4

6.900 6.866 6.258 5.574 6.3998.553 6.861 5.932 5.457 6.7017.888 6.617 5.994 5.723 6.5557.372 6.800 6.453 5.133 6.4396.617 7.165 6.102 5.733 6.4047.158 6.086 5.643 4.753 5.9057.106 6.838 5.967 4.885 6.1997.268 6.452 5.194 5.146 6.0156.613 6.213 6.463 5.462 6.1887.954 7.376 5.447 4.952 6.4326.430 7.422 5.439 5.037 6.0827.171 6.359 6.403 4.868 6.2007.214 6.712 6.467 5.432 6.4567.137 6.548 5.943 5.940 6.3926.639 6.597 5.857 4.738 5.9586.535 6.403 6,143 4.608 5.9227.160 6.707 5.982 5.214 6.266

TABLE 10

Mean Squares and F-Rtstios for Analysis of Variance of Concepts Grade Placements onStanford Arithmetic Achievement Ten

Source df Mean Square F-Ratio

Experimental Atmosphere (E)Notice of Testing ,(N)Test ,Administrator (A)Test Scorer (Si)Previous Achievement (P)E X NE X AE X SE X PN X ANXSN X PA X SAX PS X PEx Nx AE XNXSEXAXSNXAXS

1

1

1

1

31

1

1

31

1

31

331

1

1

1

Error 31

. 244

. 762

. 567

. 01411.634

. 489. 306.145. 174. 096. 443. 066. 010. 096

1.544.82*3.59

73.63***3.091.94

1.10

2.80

Oa

. 440 2.78

. 103 -. 041. 000. 197 1.25

VD

.158

*p < . 05***p < , 001

Page 32: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

22

The source of the significant main effectfor notice of testing was a difference of overtwo months achievement. The average gradeplacement for classes receiving notice of thetest (+) was 6.375 while those classes receiv-ing no notice (-) averaged 6.156 grade place-ment units. None of the two- or three-factorinteractions tested significant at the . 05 level.

Arithmetic Applications

The third and final sub-test in the StanfordArithmetic ieveme Test contains 39 itemsandmeasures the pupil's ability to apply math-ematical principles to attain pro,. ,l em solutions.This type of exercise is commonly referred toas a "word iroblem." In this particular testthe pupil has to interpret graphs, compute areas,figure sales tax, etc.

The means for the experimental units aregiven in Table 11. The average Applicationsgrade placement, 6.608, exceeded that of boththe other sub-tests.

The same type of design used previously, a4 X e analysis of variance, was employed onthe class means. The resulting mean squares

and F-ratios are tabulated in Table 12. Pre-vious arithmetic achievement was again highlysignificant with means of 7.790, 7.113, 6.293,and 5.236 for the four strata (see Table 11).None of the other main effects were significant.

The interaction of experimental atmospherewith notice of testing was significant at the. 01 level. The source of this significance wasI high mean for the + + treatment combination,a moderately high mean for the - - tree antcombination and low means for the + - and - +treatments. The means are given in Table 13.

Another significant interaction occurred be-tween notice of testing and test scorers. Themeans of the observations under + +,. + and

+ treatment conditions were generally com-parable as can be seen in Table 14. However,the mean grade placement for the - - cell (thatis, no notice and outside scored) is appreciablylower than the other three.

The final significant two-factor interactioninvolved test scorer and previous arithmeticachievement, the stratifying variable. As canbe noted in Table 15 a marked crossover occurs.Tests of pupils in strata one and two that werescored by outsiders had nigher means than theteacher-scored tests, while the opposite situ-

TABLE 11

Average Applications Grade Placements on Stanford Arithmetic Achievementall by Experimental Unit, Treatment, and Stratum; Testing Conducted in April, 1965

Treat-ment Exp.Number Atmos.

Test Teacher TeacherNo ce Adm. Scored

Stratum

Average

1 + + + + 7.694 7.448 6.952 5.774 6.9672 + + - 9.240 7.133 5.700 5.186 6.8153 + + + 8.491 7.114 6.594 5.431 6.9074 + + - - 7.575 7.988 6.877 5.179 6.9055 + - + + 7.372 7.053 6.612 6.210 6.8126 + - + 7.964 6.532 5.825 4.367 6.1727 + - - + 7.572 6.886 6.313 5.431 6.5508 + - ... - 1.57Z 7.196 C.106 5.118 6.2489 - + + + 7.013 6.433 6.946 5.271 6.416

10 - + + - 8.486 7.672 5.629 4.507 6.57311 - + - + 6.703 7.456 5.41.7 5.063 6.16012 - + - - 7.939 7.032 6.937 4.836 6.68613 - - + + 7.732 6.959 6.805 5.930 6.85614 - - + - 8.522 6.952 6.268 5.884 6.90615 - - - + 7.587 7.272 6.214 5.183 6.56416 - - - 7.171 6.688 6.500 4.408 6.192Average Grade Placement 7.790 7.113 6.293 5.236 6.608

Page 33: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

23

TABLE 12

Mean Squares and F-Ratios for Analysis of Variance of Applications Grade Placements onAtaganLaltigagaigAilimmumfaspia

Source df Mean Square F-R-itio

Experimental Atmosphere (E) 1 .261 1.38Notice of Testing (N) 1 318 1.68Test Administrator (A) 1 .426 2.25Test Scorer (S) 1 .135Previous Achievement (P) 3 19.373 102.50***E x N 1 1.557 8.24**E x A 1 .248 1.31E X S 1 .532 2.81E x P 3 .108N x A 1 .291 1.54NXS 1 , 804 4.25 TN x P 3 .183A X. S 1 . 047 aAX P 3 .285 1.51S x P 3 1.018 5.39**ExNXA 1 .105 MD

EX NXS 1 .012ExAxS 1 . 073 aNX AX S 1 .091

Error 31 .189

r.05**p < , 01

***p < 001

TABLE 13

Average Applications Grade Placements onStanford Arithmetic Achievement Tett by

Experimental Atmosphere and Notice of Testing

ExperimentalAtmosPhare

immismaremirilwrimiss

Notice of Testtna Notice of`Testing

TABLE 14

Average Applications Grade Placements onStanford Arithoetic Acblevarnent Test by

Notice of Testing and Test Scorer

Test Scorera

..es,eiff....MEN.

6.898 6.446 6.612 6.7456.459 6.630 6.696 6.380

Page 34: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

24

ation prevailed for strata three and four. Itsho be recalled further that the AX S X Pinteraction was not pooled in the error termbecause this investigator could not assume

at it was an estimate of are.None of the second-order interactions were

significant.

TABLE 15

Average Application Grade Placements on§tariford Arithmetic Achievement Test by

Test Scorer and Previous Arithmetic Achieve-ment (Stratum)

TestScorer

Stratum .

1 2 3

7.520 7.078 6.482 5.5378.059 7.149 6.105 4.936

AVMS' ArithalletiC Scar®

The fourth d e pen de nt variable, averagearithmetic score, was formed by equally weight-ing the pupil's grade placements on the threesub-tests in the S andard Arithmetic Achieve-ment Test. The class means resulting from thisprocedure are found in Table 16.

The results of the 4 x 24 analysis of vari-ance are summarized in Table 17. The onlysignificant main effect was previous arithmeticachievement, with means for strata one throughfour of 7.187, 6.672, 6.020, and 5.152, re-spectively.

None of the three-factor interactions reachedsignificance. The only significant two-factorinteraction occurred between experimental at-mosphere and no ce of the testing. The sourceof this significance was a high mean gradeplacement for the + + treatment combinationwith low mean grade placements for the otherthree combinations, although the - cell meanwas somewhat larger than the means of the +- and - + cells. The precise means involvedin the interaction are presented in Table 18.

TABLE 16

Average Grade Placements on Stanford Arithmetic Achievement Testby Experimental Unit, Treatment, and Stratum; Testing Conducted in April 1965

Treat-ment Exp.Number Atmos.

Test Teacher TeacherNotice Adm. Scored

Stratum

Average4

1 + + + 6.970 7.001 6.410 5.526 6.4772 + + - 8.428 6.838 5.847 5.350 6.6163 + - + 8.153 6.661 6.015 5.218 6.5124 + - - 7.275 6.751 6.368 5.255 6.4125 - + + 6.793 6.980 6.133 5.904 6.4526 - - 6.992 6.058 5.592 4.467 5.7777 + - - + 7.010 6.746 6.063 5.049 6.2178 - - - 7.135 6.614 5.263 5.033 6.0119 + + + 6.578 6.111 6.552 5.189 6.107

10 - + + - 8.036 7.336 5.310 4.606 6.32211 + - + 6.420 7.128 5.233 5.009 5.94712 + - - 7.122 6.472 6.247, 4.803 6.16113 - + 7.121 6.592 6.857 5.581 6.53814 - - 6.459 6.284 5.899 6.50715 - 6.770 6.794 5.864 4.995 6.10616 - - - 6.798 6.208 6.288 4.555 5.962

Average Grade Placement 7.187 6.672 6.020 5.152 6.258

Page 35: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

25

TABLE 17

Mean Squares and F-Ratios for Analysis of Variance of Averagt Grade Placements onStanford_kitiunettc Achievement Teat

Source df Mean Square F-Ratio

Experimental Atmosphere (E) 1 .170 1.10Notice of Testing (N) 1 .242 1.56Test Administrator (A) 1 .538 3.47Test Scorer (8) 1 . 086Previous Achievement (P) 3 12.332 79.56***E X N 1 1.137 7.34*E X A 1 .318 Z. 05EX 1 .300 1.94EX P 3 .129N X A 1 . 060N x S 1 .530 3.74N X P 3 .104 1.19AX S 1 .003AX P 3 . 073El X P 3 .448 2.89EXNXA 1 .169 1.09EXNXS 1 . 025EXAXS 1 . 030 OW

NXAXS 1 . 089

Error 31 .155=01*p < 05

***p < 001

TABLE 18

Average Grade Placements on fliumfasi,kithmetic Achievement Test, by Experimental

Atmosphere and Notice of Testing

ExperimentalAtmosphere

Notice of Testi=

+.011.111M1111111MilIIIIMMOMM

6.5046.134

6.115

6.278

Two final tables are presented in this chap-ter. In Table19, the average grade placementson the Achievementfor each of the four independent variables arereported; the table will be referred to in Chapterv.

Finally, certain facts can be noted about

the accuracy of the teacher-scorers (in theodd-number treatments). In the first place,teacher errors ware as L.4ely to raise gradeplacements as lower them; this was also truefor the outside scorers. Second, teachers.petcent error rates were comparable to thoseof the outside scorers. The percent error ratesfor the teachers are listed in Table 20 by in-dependent variable and stratum. The similarityof the means for + and - notice and also for +and - experimental atmosphere is not found for+ and - test administrator or for strata. (If ascorer had recorded an incorrect grade place-ment for a sub-test he was given an error.On each test scored, therefore, a maximum ofthree errors could be made. The percent errorrate was determined for each scorer by dividinghis total number of errors by three times thenumber of tests that he had scored. )

In the next chapter, the implications of theresults reported in this chapter are discussed.

Page 36: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

26

TABLE 19

Average Computation, Concepts, Applications, and Total Grade Placements on theStantord Arittunetio Achievement Test by Independent Variable: Testing Conducted in April, 1965

1113.11111101111=116._

IndependentVariable

Arithmetic Sub-Test

AverageComputation Concepts Applications

Experimental 5.929 6.327 6.672 6.309Atmosphere 5.871 6.204 6.544 6.206

Notice of 5.905 6.375 6.679 6.319Testing 5.895 6.156 6.538 6.196

Teacher 5.999 6.360 6.690 6.350Administration 5.800 6.171 6.527 6.166

Teacher 5.949 6.280 6.654 6.295Scoring 11MI 5.850 6.251 6.562 6.221

TABLE 20

Percent Error Rates of Teacher-Scorers by Independent Variable and PreviousArithmetic Achievement (Stratum)

Independent Variable Percent Error Rate

Experimental Atmosphere 4.624.70

Notice of Testing

Teacher Administration

4.424.76

4.164.91

Stratum

1 5.092 5.163 4.124 3.54

411=11MINMENIMD

Page 37: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

V

DISCUSSION AND CONCLUSIONS

In this chapter, consideration is given tothe results found in the experiment and theirimplications. Where appropriate, the discus-sion will include relationships between thisstudy and the. articles and investigations de-scribed in Chapter 11.

To lend structure to the chapter, the follow-ing organizational scheme will be employed.First, each of the main effects associatedwith the four independent variables will beconsidered. Guidelines for educational re-searchers will be presented as they relate toeach of the independent variables. When thefunction of the leveling variable, previousaratlunetic ac hi ev em nt, will be examinedbriefly. Once the five single variables havebeen considered, attention will be focused onthe significant interactions and other inter-actions of interest. Any discussion that fol-lows the stating of near significant differencesand contains conjectures as to the possiblecauses of the difference should in no way, beconstrued as having made the difference a trueone or more significant than initially reported.

The discussion will next center on somegeneral observations made by this investigatorduring the course of the experiment. Last, con-clusions will be stated in terms of the hy-potheses in Chapter 1.

EXPERIMENTAL ATMOSPHERE

The entries in Table 19 consistently favorthe + experimental atmosphere treatment. Ex-perimental units under the + treatment scored.058, .123, and .128 grade placement unitsabove the - classes, an average superiority of.103 grade placement units or about one monthsachievement. These differences, a It houg hlarge enough to be considered of practical im-portance by many school administrators, wereassociated with F-ratios having an averagesignificance of only .25. Obviously, onewould incur a high risk of committing a Type II

error if he were to conclude that the differences

27

found were due to other than chance factors.Yet the literature cited in Chapter II all

seems to indicate that experimental atmosphereis a potent variable. The studies giving riseto the term "Hawthorne Effect" (Mayo, 1945;Roethlisberger, 1941; and Roethlisberger andDickson, 1941) and the work of Orne (1962)and Rosenthal (1963, 1965) all suggest a pro-nounced effect due to merely being involved inan experiment. However, a crucial differencebetween the present study and those mentionedabove is that in this study the full burden ofconveying or administering the experimentalatmosphere condition fell upon a very short, andin some respects innocuous, paragraph of in-structions. This one-shot treatment is notice-ably different from the daily and frequent inter-action between experimenter and subject suchas in the Hawthorne Western Electric plant.Several steps could have been taken to increasethe teacher's feeling of experimental involve-ment, but it must be remembered that only 16 ofthe 32 teachers under this condition had noticeof the test date. Stimulation of the teachersunder the + experimental atmosphere, - noticetreatment condition, might have led to the con-tamination of the notice variable or to this re-searcher taking such license with the truth thateven the most liberal school administrator wouldnot permit it.

It must be noted, however, that recent liter-ature on the effect of experimental atmosphereon educational research does not exist. Theobvious possibility should not be overlookedthat experimental atmosphere may, in actuality,have no effect on teachers in many situations.Such a possibility is in no way refuted by thelow statistical significance of the differencesfound in this study favoring + experimentalatmosphere. Until additional evidence is avail-able, the classroom researcher would do wellto adopt one position or the other, i.e., eithertell all the experimental subjects that they arein an experiment or tell none of them. The lat-ter course of action might still permit consider-able variability due to Omega demand character-

Page 38: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

28

istics (1962); thus in many situations it would sources of variability as the discussions ofbe undesirable.

NOTICE OF TESTING

The differential effect of 10 school daysnotice of the upcoming test as compared withnotice of a single school day was investigated.The resulting grade placements favored the +notice condition, with differences amountingto .010, .219, and .141 grade placement unitsfor the three arithmetic sub-tests (see Table19). Corresponding to the gap of over twomonths on the Concepts sub-test was an F-ratio significant beyond the 005 level, whilethe F-ratio for the Applications sub-test(F = 1.68) was significant at the .20 level.

The obvious discrepancy in need of resolu-tion is the difference between the apparent lackof effect of test notice upon the Computationsub-test (with a mean square of . 001) and thesignificant effect due to test notice on the Con-cepts sub-test. Examination of the test itemsinvo lved suggests a plausible explanation.The computation items are routine, drill-typeproblems. Students have attained their currentabilities on this type of problem over severalyears of daily practice. An inordinate emphasispracticing similar items would be necessary tobring about any appreciable gain in the students'performance on the task.

However, the problems in the Concepts sub-test are more of the "aha" variety, that is, ex-tremely puzzling when first encountered but re-markably routine after even a short discussionof the concepts involved, such as "place value:'Other problems in this sub-test could becomequite simple and routine with a minimum of in-struction. Teachers who received notice of thetesting quite probably read over the test items.With no conscious motivation to aid their pupilson the tests, they may have been attracted bysome of the concept (and application) problemsand subsequently may have discussed that typeof problem with their classes. Pupils in no-notice classes would not have had a similaropportunity to learn of the concepts involvedin the test items.

Regardless of the particular cause of thesignificant effect, the educational researcherwo d do well to insure that the experimantalsubjects and/or their teachers all receive thesame notice of any upcoming test (especiallyonethathas some degree of novelty associatedwith it), or that all concerned receive no noticeat all. The latter procedure, although moredifficult to implement, might reduce oth e r

interactions later in this chapter will imply.

ADMINISTRATION OF THE TEST

In 32 classes, teachers administered thetest to their pupils (+), while in the other 32,outside test administrators gave the test withthe teacher present in the room (-). For allthree sub-tests, the + treatment classes testedhigher than their - treatment counterparts.The differences in means on the three sub-testswere .199, .189, and .163 grade placementunits with an average difference of .184 unitsor two months' achievement (Table 19). As-sociated with these differences are F-ratios of2.37, 3.59, 2.25, and 3.47. The F-ratios forthe Concepts sub-test and the average gradeplacement are significant at the . and . 08levels of significance, and overall the averageF-ratio for this main effect is approximately. 10. Although this is considerably below the. 05 level of significance used heretofore, theconsistency of the effect due to the test ad-ministrator variable across all the sub-testslends support to its claim as due to a true,rather than a chance, difference.

In Chapter II, three references were citedthat gave essentially the same explanation asto why pupils score better on standardizedtests administered by their own teachers.Rice (1897), Lowell (1919), and Trailer (1951)suggested that indirect hints given by theteacher during the test might aid pupils to ob-tain higher scores. Although this researcherhas no way of knowing, it would seem that theunspoken rapport between teacher and pupil isan equally important consideration. Most pu-pils, especially in the lower grades, are some-what anxious about taking a test, and the anx-iety of many of them is undoubtedly increasedwhen a stranger administers the test. In somecases this anxiety reaches a level that impairsthe pupil's performance.

The reseacher who investigates performancein the schools must take this variable into ac-count. The least desirable situation would in-volve mixing the mode of test administation,I. e., having outsiders test some classes andletting some teachers test their own pupils.The best solution would be to use well-trainedoutsiders to test all classes, thereby testingall classes under nearly identical conditions.Extensions of this, TV administration of tests(Hopkins and Lefever, 1964) or administrationby phonograph record, offer great promise andreduce the uncontrolled variance inherent in

Page 39: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

using several outside test administrators. Acompromise alternative, between the two pro-cedures outlined, would be to let each teacheradminiuter the test to his own pupils. If thisfinal practice is adhered to, however, the inves-tigator must expect considerable uncontrolledvariation due to teachers' varying degrees ofrapport with their pupils and other factors.Extensive training of the teachers in the properprocedures to follow when giving the test wouldreduce in intensity, but not eliminate, the un-desirable variability inherent in testing pro-grams utilizing teacher administrators.

SCORING OF THE TEST

From Table 19, it can be seen that differ-ences between + and - scoring treatments arerelatively small: .099, .031, and .092 gradeplacement units, an average difference of ap-proximately three-quarters of a month, favoringthe treatments in which the teacher scored t1 6detests. In no instance did any associatedratio exceed 1.

The question as to possible differential ef-fects due to the actual scoring of tests byteachers and outsiders is evidently not a crit-ical one. It would seem that substantiallymore important are the directions for scoringand the concreteness and definiteness of thetask given the scorer. The scoring key usedwith this test served to "standardize" thescoring procedures used.

A brief analysis of the percent error ratesof the teacher-scorers demonstrated that theiraverage error rate was comparable to that ofthe outside scorers. Although no statisticalanalysis was performed, the teacher-error rateswere pr esented by independent variable inTable 20 for the reader's information. The er-rors made by the teachers were as likely toraise as lower grade placements, supportingan earlier finding of Phillips and Weathers(1958).

What implications can the educational re-searcher draw from these findings ? The com-parability of results when using teacher andoutside scorers would suggest that either orboth could be used to process test data. Thematter is not this simple, however. Individuals,in this case both teachers and outsiders, verywidely in their percent error rates (the outsidescorers varied from 2.06 to 7.88%; the teachersvaried from 0 to 18.10%). Pew researchersfeel secure reporting results that are based on"error-ridden" data, even if the errors are ran-dom. The varying competencies of scorersincrease the uncontrolled variance in the de-sign. Multiple restoring is also unsatisfactory:

29

It is time consuming, and there are dangersinherent in any situation where many peoplehandle the data.

Probably the best procedure to follow in re-gards to scoring a test given to evaluate a re-search project is to use a machine-scoringanswer sheet and to have each of a limitednumber of persons of known competence (i. e.low percent error rates) prepare a random selec-tion of the tests for machine-grading. If thetest has no machine-scoring answer sheet orif the test is subjective in nature, then moreelaborate preparations must be made, such asexact specification of scoring procedurestraining of scorers, blind scoring, etc. How-ever, even in this latter case it is still un-doubtedly wise to use only a few highly com-petent scorers, thereby reducing error variancesresulting from inaccurate scoring.

PREVIOUS ARITHNICTIC ACHIEVEMENT

The leveling variable, previous arithmeticachievement, was highly significant for alldependent variables. Thie variable alone ac-counted for a large proportion of the variancein the experimental observations. Althoughsome overlapping occurred (that is, somestratum two schools out-achieved stratum oneschools, etc. ), this was minimal and not un-expected, and the means for each of the fourstrata were widely disparate.

SIGNIFICANT INTERACTIONS AND INTERACTIONS OFINTEREST

In this section, the three first-order inter-actions that were significant will be discussedas well as these same three two-factor inter-actions for all of the dependent variables. Al-though some of the means associated withthese significant interactions were reported inearlier tables, they will be repeated in tableform here for the reader's convenience and topermit side-by-side comparison of the meansfor the interaction on all four dependent vari-ables. In addition, a brief discussion of theAXSX P interaction will be included.

The means associated with the interactionE X N are reported in Table 21. Of all the in-teractions, this one was apparently most con-sistently significant across the four dependentvariables. The interaction was primarily sig-nificant because of the relative effectivenessof the + + treatment combination in comparisonwith the + +, and - cells. Therefore,the primary importance of this significant

Page 40: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

30

TABLE 21

Average Grade Placements on Stanford kithmatic Achievement Testby Experimentaii Atmosphere - Test Notice 'treatment Combination and Sub-Test

E X N TreatmentCombination

Sub-TestAverageComputation Concepts Applications

it if 6.090 6.524 6.898 6.504" 5.767 '6.131 6.446 6.115

5.719 6.226 6.459 6.134OW NI 6.023 6.182 6.630 6.278

IMO

Psi-Ratio 5.89 3.09 8.24 7.34Significance (p < ) . 03 . 09 .01 .02

interactionis the highlighting of the effective-ness of + experimental atmosphere in comtina-tion with + notice of testing. In addition,note that the - treatment combination pro-duced a grade placement almost as high as the+ + cell on the Computations Flub -test but noton the other sub-tests. This fact lends sup-port to the contention (made in the discussionabove of "notice of testing") that the pupils°computational ability is the product of manyyears' training and is relatively unaffected byany short-duration' treatment.

The cell means generated by the N x S in-teraction are presented in Table 22. Althoughthe F- values reached the .05 level of signifi-

canoe only on the Applications sub-test, theyare of considerable magnitude and deserve someattention. The source of the significance isthe relatively low grade placements of the - -treatment combination: classes that receivedno notice of the test and whose tests werescored by outsiders. The similarity of the + +and + grade placements across sub-tests isstriking, indicating that no differential scoringpatterns were manifest between teacher andoutside scorers when the teacher had receivednotice of the test. However, the - + treatmentcombination produced grade placements con-sistently two to three months greater than the

- combination and this difference undoubtedly

TABLE 22

Average Grade Placements on Stanford Arithaptic Achievement Testby Test Notice - Test Scorer Treatment Combination and Sub- Test

N X S TreatmentCombination

Sub-TestAverageComputation Concepts Applications

++ 5.864 6.306 6.612 6.261

- 5.945 6.443 6.745 6.37

6.035 6.254 6.696 6.328

5.755 6.059 6.380 6.064

P-Ratio 1.96 2.80 4.25 3.74

Significance (p < ) .19 .11 .05 .07

Page 41: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

was the source of the significant interaction.Possibly the teachers in the - + cell adopted

different scoring standards than the outsidescorers because of the lack of notice affordedtheir pupils. Regardless, the consistency ofthis rather large difference between the - + and- cells is difficult to explain and warrantsfurther investigation. It is well to remember,however, that this interaction is significant atthe. 05 level for only one of the sub-tests, asmentioned above, and may be of spurious sig-nificance, although this appears unlikely.

The average grade placements for the finalsignificant two-factor interaction, S X P, arerecorded in Table 23. The source of this sig-nificance is the higher grade placements foroutside scorers in stratum 1 and the reversalof this situation for strata 2, 3, and 4, (i.e.,higher grade placements for teacher-scorers).It suggests that the teachers in the upper stra-tum schools put more emphasis on motivatingand/or preparing their pupils for the test whenthey (the teachers) knew that it would be scoredby outsider. than when they were to score itthemselves. Ontheother hand, teachers in thelower strata did not increase their preparationand/or motivation efforts when they knew thetest would be scored by outsiders. Indeed, theaverage differences on the three sub-tests fa-vored the teacher-scorers by .16C, .241, and.313 grade placement units for strata 2, 3, and4 respectively. Onlythe fact that the oppositesituation was true in stratum 1 (where the mean

31

of the tests scored by outsiders exceeded thatof the tests scored by teachers by .419 gradeplacement units) kept the main effect of testscorer from reaching significance.

The two-factor interaction continued to besignificant when paired with the effect due totest administrator. As discussed in ChapterIv, the sum of squares for the AXSX P inter-action was not pooled in the error term becauseit was quite large. Indeed, inspection meansfor the third and fourth strata for the four pos-sible combinations of the test-administratorand test-scorer variables (see Table 24), onefinds all differences ranging from three monthsto a full year in grade placement favoring the+ + treatment combination over the + - cell.The reverse situation is true for stratum 1, withthe+ -cellmeans surpassing the + + means byover one-half year's grade placement on allthree sub-tests. Differences between the - +and - - treatment combinations are appreciabltrsmaller for all strata. The - + and - - gradeplacements are generally lower than the cor-responding+ +and+ - means; this is to be ex-pected considering the significant main effectfavoring teacher administration of the test.The sources of both interactions can be seenand the above discussion clarified by studyingFigures 6 and 7, in which the relevant interac-tions on the Applications sub-test are graphed(the interactions are also in evidence on theother two sub-tests, but are not as prominentas those on the Applications sub-test whengraphed).

TABLE 23

Average Grade Placements on Stanford Arithmetic Achievement Testby Test Scorer - Previous Arithmetic Achievement (Stratum) Combination and Sub-Test

Stratum

-71=1111=11111Sub-Test

Computation S2B2112t2._T8 OS TS OS

Aialicrikuga_ AverageTS OS TS OS

1 6.484 6.737 6.926 7.394 7.520 8.059 6.977 7.3962 6.374 6.016 6.804 6.611 7.078 7.149 6.752 6.5923 5.872 5.699 6.068 5.895 6.482 6.105 6.141 5.9004 5.067 4.948 5.323 5.105 5.537 4.936 5.309 4.996

F-Ratio .98 2.78 5.39 2.89

Sign. (p < ) .43 .07 .01 .06

TS Teacher ScoredOS al Outside Scored

Page 42: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

32

TABLE 24

Average Grade Placements on Stanford Lilbatuaktinsaintint by Test Administrator -Test Scorer - Previous ArithMetic Achievement (Stratum) Combination and Sub-Test

Sub-test Stratum Sign. *illIMINIIIMM=.111=1010=m11..1.11111111.1=0111111111

Computation 1 6.308 6.877 6.661 6.5962 6.302 6.228 6.446 5.804 F = 2.163 6.313 5.678 5.432 5.72.0 p < . 124 5.303 4.984 4.830 4.911

Concepts 1 6.836 7.700 7. 016 7. 0862 6.739 6.718 6.868 6.503 F = 2.993 6.322 5.741 5.814 6.048 p < . 054 5. 550 5.270 5. 096 4.939

Applications 1 7.453 8.553 7.588 7.5642 6.973 7.072 7.182 7.226 F = 4.973 6.829 5.85 6.134 6.355 p < . 014 5.796 4.986 5.277 4.885

Average 1 6.865 7.710 7.088 7.1)822 6.671 6.673 6.832 6.511 F = 4.043 6.488 5.758 5.794 6.041 p < . 024 5.550 5.080 5.068 4.911

*Pooled error term used to compute F-ratios.

= Outside Scored

= Teacher Scored

7. 0

t 6.5S

V6.0

I

Flgure 6. Graph of test scorer by Previous arithmetic achievement interaction on' ap cationsGraph

Page 43: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

8.5

8, 0

7.5

7. 0

6.5.

6.0

5.5

5.0

33

Figure 7. Graph of test administrator by test scorer by previous arithmetic achievement inter-action on applications subtest.

This second-order interaction illustratesthat the significant SX P interaction was almostentirely generated in classes in which theteacher administered the test. In additien, theAx SX P interaction indicates that stratum oneteachers who administer the test preparatory toits being scored by outsiders produce pupilachievement scores notably above teachers inthe same strata who both adminiiiter and scorethe test themselves. At the other extreme werethe observations in strata three and four. Itseems almost as if the teachers in stratum onewho administered the test took procedural andmotivational steps before and/or during theiradministration of the test to insure continuedhigh achievement by their pupils even when thetests were scored by outsiders. On the otherhand teachers in the lower achieving strata,three and four, evidently did not engage in simi-lar behaviors, Or, if they did, these teacher be-

haviors had a minimal or even negative effecton the achievement scores of the pupils con-cerned.

GENERAL OBSERVATIONS ON THE EXPERHIENT

The initial observation when looking at theresults of the experiment is that the pupils didnot achieve as well on the Stanford ArithmeticAchievement Test as might have been expected.The average grade placement on the Iowa Testat IMOD Ste, administered in October, 1964,was 6.164 grade placement units. The April,1965, testing with the Still:ford might reasonablyhave been expected to produce a mean gradeplacement of 6.7 or 6.8. Instead, the average.placement was 6.258. The discrepancy couldbe due to any of a multitude of reasons or to acombination of them: the subject-matter con,

Page 44: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

34

tent and curricular objectives of the schoolsystem might be more consonant with the Iowathan the Stanford, or the Stanford may simplybe harder, or the recency of the ramming of theStanford as compared with the Iowa may kn afactor, etc.

Turning to observations of the experimentalprocedure itself, this investigator has everyintention of being a detached, impartial critic,although he realizes that this is quite impos-sible. The observations concern three proce-dural steps that would be taken to increase theexactness and meaningfulness of the studywere it to be conducted again.

First, an attempt would be made to makethe + experimental atmosphere treatment morerealistic or, in modern parlance, to "beef-it-up. " This might be accomplished by additionalletters to the teachers involved. The schoolsystem officials quite naturally did not wantto deliberately mislead the teachers, and thisrestriction obviously placed an upper bound onthe ingenuity that one might display in con-cocting highly-charged situations to stimulatethe teachers. Possibly visitations to theteachers would have assisted in increasing thepotency of the + experimental atmospheretreatment, yet the inherent dangers in such anapproach is implied in the findings of Rosenthal(1963) and Sarason and Minard (1963) and mustbe considered if such an undertaking is con-templated.

A second methodological variation thatwould definitely increase the precision of theexperiment would be to commence the treatmentsas soon as possible after the stratifying oleveling variable is available. Stratifyingby previous arithmetic achievement allowedidentification of a large proportion of the vari-ance in the observations. Had the experimentbeen run in November, 1964, even more of thevariance would have been controlled. As it was,the differential learning progress made by e64 classes between October and April served toincrease the uncontrolled variance in the de-sign.

Finally, every effort would be made to in-crease the statistical power of the design, pos-sibly by including additional classes for useas experimental units. The statistical power ofthe analysis would probably be enhanced byhaving two classes per cell, thereby permittinga presumably smaller within-cell error term. Inthe initial planning for this study, the schoolsystem administrators were understandably re-luctantto involve 64 classes in a research in-vestigation. At that time, consideration wasgiven to the possibility of using a fractional,

ratherthan a complete, factorial design, there-by confounding some of the main effects withthe higher order interactions. Had the schoolsystem officials not graciously permitted theinclusion of 64 classes, a fractional factorialmay have been run. However, the resultinghigher order interactions, especially AxS X Pand even some of the four-factor interactions,were quite large (relative to the "usual" oro°average" third-order interaction) and much in-formation would have been obscured or com-pletely unavailable because of the confoundinginherent isi a fractional factorial design. Werethe study to be run again with 64 experimentalunits, omission of the test scorer variable wouldbe feasible, allowing a 4 X 23 design wi, twoclassrooms (observations) per cell.

CONCILMSIONS

In this final section, the hypotheses formu-latedin Chapter I will be restated and then theconclusions reached on the basis of 11,e resultsof this experiment will be succinctly stated.The statistically significant interactions willalso be stated in the form of hypotheses.

le There is no significant difference in testperformance between pupils whose teachersbelieve an experiment is in progress andpupils whose teachers do not so believe.

The findings of this study failed to reject thishypothesis. The .discussion on this variableearlier in this chapter considered the implica-tions of the information that was collected.2. There is no significant difference in test

performance between pupils whose teachersreceive notice of the test date and pupilswhose teachers do not receive notice.

This hypothesis was rejected at the . 05 levelin the case of a sub-test involving questionswith a novel quality. At the same time, it mustbe noted that this invest 'gator failed to rejectthe hypothesis for two sub-tests containingrelatively °°common-place" items.3. There is no significant difference in test

performance between pupils whose regularteachers administer the test and pupils whoare tested by an "outside" administrator.

This hypothesis was rejected at approximatelythe .10 level of significance. Although incur-ring a substantially increased risk of a Type Ierror, the consistently substantial F-ratios forthis effect across all dependent variables ledto this conclusion.4. There is no significant difference in test

performance between pupils whose regular

Page 45: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

teachers score the test and pupils whoseteachers do not.

The experiment failed to reject this hypothesis.

As a result of testing the interactions gen-erated by the variables under investigation,the effects of the following variable combina-tions were considered significant:

1. Thi combination of experimental atmos-phere with notice of testing produced sig-nificantly higher grade placements (averagep < . 05) on all sub-tests except the onecontaining fundamental, drill-type items.

2. The combination of nonotice of testing withoutside scoring pr oduc e d significantlylower grade placements (p < .05) on one

35

sub-test, and grade placements low enoughto be amsociated with considerably large,although non - significant, F- ratios on theother sub-tests.

3. The combination of previous ar it hmeticachievement with teacher scoring resultedin a significant crossover (p . 05) on twoof the three sub-tests: outside scorespro d uc d higher grade placements thanteacher-scorers in high achieving classes,and teacher-scorers produced higher gradeplacements than outside scorers in lowachieving classes. The significant AnX Sx P interaction demonstrated that the gradeplacements producing the S X P,interactionwere located almost entirely in classes inwhich the teacher administered the test.

Page 46: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

APPENDIX A

EXPERIMENTAL TREAT

TREATMENT 1:

EATS: INSTRUCTIONS TO TEACHERS

Instructions received by teachers on March 22,1965.

To:6th grade teacher at School

The Public Schools are conduct-ing a study to obtain an accurate estimate ofthe achievement of each sixth grade pupil be-fore he enters junior high school. This studymight be considered an educational experiment.The past subject-matter achievement of pupilsin each of a number of previous classes hasbeen determined and will be compared withpresent achievement status. It is our plan totest the pupils in selected classes, includingyours, at a later date. The persons involvedin this experiment will appreciate your help anddiligence in collecting this important informa-tion.

Will you please schedule time for your pu-pils to take the new Stanford Arithmetic Achtem-.ment Test on Monday morning, April 5, 196 5? Thetest should be given in two sittings, with atleast a 15-minute break between sittings. Pos-sibly the students' recess period can be util-ized for this break, but wo you please ar-range to have the test completed during themorning?

Wo d you please administer the test to yourpupils ? The manual and the necessary testsare enclosed. Look' over the instructions foradministration closely. You should read overthe instructions twice, noting especially thoseparts underlined in red. This should takeabout one hour, and you will be, paid $3.75 forthis preparation time.

Would you also please score the tests foryour pupils using the enclosed scoring key?The key will allow you to work rapidly and ac-curately. On the front of each pupil's testbooklet, Mark his three grade scores obtainedfrom the bottom of test pages 3, 5, and 8.- Donot fill in the percentile ranks. Please assure

36

yourself that the marks are accurate.This should take you about two hours, and

you will be paid $7.50 for your time spentscoring the tests. Would you please have thetests scored by April 8 ? Replace the tests inthe envelope, seal it with scotch tape, andleave it in the principal's office by 4 p. m. onApril 8 so that the'tests may be c o 11 e c t e d.Thank you.

TREATMENT 2:

Instructions received by teachers on March 22,1965.

To:6th grade teacher at School

The Public Schools are conduct-ing a study to obtain an accurate estimate ofthe achievement of each sixth grade pupil be-fore he enters junior high school. This studymight be considered an educational experiment.The past subject- matter achievement of pupilsin each of a number of previous classes hasbeen determined and will be compared withpresent achievement status. It is our plan totest the pupils in selected classes, includingyours, at a later date. The persons involvedin this experiment will appreciate your help and

lige ce in collecting this important informa-tion.

Will you please schedule time for your pu-pils to take the new Stanford Arithmetic Achieve-ment Test on Monday morning, April 5, 1965 ?The test should be given in two sittings, withat least a 15-minute break between sittings.Possibly the students' recess period can beu lined for this bieak, but would you pleasearrange to have the test completed during themorning ?

Would you please administer the test to yourpupils ? The manual and the necessary testsare enclosed: Look over the instructions for

Page 47: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

administration closely. You should read overthe instructions twice, noting especially thoseparts underlined in red. This should take aboutone hour, and you will be paid $3.75 for thispreparation time.

Once your pupils have completed the test,replace the tests in the envelope, seal it withscotch tape, and leave it in the principal's of-fice by noon on April 5 so that the tests may becollected. You have no responsibility to scorethe tests. Thank you.

TREATMENT 3:

Instructions received by teachers on March 22,1965.

To:6th grade teacher at School

The Public Schools are conduct-,ing a study to obtain an accurate estimate ofthe achievement of each sixth grade pupil be-fore he enters junior high school. This studymight be considered an educational experiment.The past subject-matter achievement of pupilsin each of a number of previous classes hasbeen determined and will be compared with pres-ent achievement status. It is our plan to testthe pupils in selected classes, including yours,at a later date. The persons involved in thisexperiment will appreciate your help and dili-gence in collecting this important information.

Will you please schedule time for your pu-pils to take the new Stanford Arithmetic Achieve-ment !rest on Monday morning, April 5, 1965 ?The test should be given in two sittings, withat least a 15-minute break between sittings.Possibly the students' recess period can beutilized for this break, but would you pleasearrange to have the test completed during themorning ?

A graduate, or advanced undergraduate, stu-dent will be prepared to administer the test toyour pupils. He will bring the tests with him.Would you please remain in the classroom dur-ing the testing ? This student will arrive atyour room about 9 A. M. on Monday, April 5,after first checking in with your building prin-cipal.

Would you pleas i score the tests for yourpupils using the enclosed scoring key ? Thekey will allow you to work rapidly and accurate -ly. On the front of each pupil's test booklet,mark his three grade scores obtained from thebottom of test pages 3, 5, and 8. Do not fill

37

in the percentile ranks. Please assure yourselfthat the marks are accurate.

This should take you about two hours, andyou will be paid $7.50 for your time spentscoring the tests. Would you please have thetests scored by April 8 ? Replace the tests inthe envelope, seal it with scotch tape, andleave It in the principal's Lffice by 4 P. M. onApril 8 so that the tests may be collected .Thank you.

TREATMENT 4:

Instructions received by teachers on March 22,1965.

To:6th grade .teacher at School

The Public Schools are conduct-ing a study to obtain an accurate estimate ofthe achievement of each sixth grade pupil be-fore he erters junior high school. This studymight be considered an educational experiment.The past subject-matter achievement of pupilsin each of a number of previous classes hasbeen determined and will be compared withpresent achievement status. It is our plan totest the pupils in selected classes, includingyours, at a later date. The persons involvedin this experiment will appreciate your help anddiligence in collecting this important informa-tion.

Will you please schedule time for your pu-pils to take the new Stanford Arithmetic Achieve-ment Test on Monday morning, April 5, 1965 ?The test should be given in two sittings, withat least a 15-minute break between sittings.Possibly the students' recess period can beutilized for this break, but would you pleasearrange to have the test completed during themorning ?

A graduate, or advanced undergraduate, stu-dent will be prepared to administer the test toyour pupils. He will bring the tests with him.Would you please remain in the classroom clue-ing the testing ? This student will arrive atyour room about 9 A. M. on Monday, April 5,after first checking in with your building prin-cipal.

Once your pupils have completed the test,replaQe the tests in the envelope, seal it withscotch tape, and leave it in the principal's of-fice by noon on April 5 so that the tests maybe collected. You have no responsibility toscore the tests. Thank you.

Page 48: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

TREATMENT 5:

Instructions received by teachers on March 22,1965.

To:6th grade teacher at School

The Public Schools are conduct-ing a study to obtain an accurate estimate ofthe achievement of each sixth grade pupil be-fore he enters junior high school. This studymight be considered an educational experiment.The past subject-matter achievement of pupilsin each of a number of previous classes hasbeen determined and will be compared withpresent achievement status. It is our plan totest the pupils in selected classes, includingyours, at a later date. The persons involvedin this experiment will appraciate your help anddiligence in collecting this important informa-tion. Additional materials and instructionswill be forthcoming.

Instructions received by teachers on April 2,1965.

To:6th grade teacher at School

As you were informed 10 days ago, thePublic Schools are conducting a

study to obtain an accurate estimate of theachievement of each sixth grade pupil beforehe enters junior high school. This study mightbe considered an educational experiment. Thepast subject-matter achievement of pupils ineach of a number of previous classes has beendetermined and will be compared with presentachievement status. It is our plan to test thepupils in selected classes, including yours.The persons involved in this experiment willappreciate your help and diligence in collectingthis important information. it is now possibleto relate the details of this experiment.

Will you please schedule time for your pu-pils to take the new Stanford Arithmetic Achieve-ment Test on Monday morning, April 5, 1965?The test should be given in two sittings, withat least a 15-minute break between sittings.Possibly the students' recess period can beutilized for this break, but would you pleasearrange to have the racist completed during themorning ?

Would you please administer the test to yourpupils ? The manual and the necessary testsare enclosed. Look over the instructions for

4

administation closely. You should read overthe instructions twice, noting especially thosepenis u n di e r l i n e d in red. This should takeabout one hour, and you will be paid $3.75 forthis preparation time.

Would you also please score the tests foryour pupils using the enclosed scoring key ?The key will allow you to work rapidly and ac-curately. On the front of each pupil's testbooklet, mark his three grade scores obtainedfrom the bottom of test pages 3,5, and 8. Donot fill An the percentile ranks. Please assureyourself that the marks are accurate.

This should take you about two hours, andyou will be paid $7.50 for your time spentscoring the tests. Would you please have thetests scored by April 8 ? Replace the tests inthe envelope, seal it with scotch tape, andleave it in the principal's office by 4 p. m. onApril 8 so that the tests may be collected.Thank you.

TREATMENT 6:

Instructions received by teachers on March 22,1965.

To:6th grade teacher at School

The Public Schools are conduct-ing a study to obtain an accurate estimate ofthe achievement of each sixth grade pupil be-fo..3 he enters junior high school. This studymight be considered an educational experiment.The past subject-matter achievement of pupilsin each of a number of previous classes hasbeen determined and will be compared with pres-ent achievement status. It is our plan to testthe pupils in selected classes, including yours,at a later date. The persons involved in thisexperiment will appreciate your help and dili-gence in collecting this important information.Additional materials and instructions will beforthcoming.

Instructions received by teachers on April 2,1965.

To:6th grade teacher at School

As you were informed 10 days ago, thePublic Schools are conducting a

Page 49: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

study to obtain an accurate estimate of theachievement of each sixth grade pupil beforehe enters junior high school. This study mightbe considered an educational experiment. Thepast subject-matter achievement of pupils ineach of a number of previous classes has beendetermined and will be compared with presentachievement status. It is our plan to test thepupils in selected classes, including yours.The persons involved in this experiment willappreciate your help and diligence in collectingthis important information. It is now possibleto relate the details of this experiment.

Will you please schedule time for your pu-pils to take the new Stanford Arithmetic Achieve-ment Test on Monday morning, April 5, 1965 ?The test should be given in two sittings, withat least a 15-minute break between sittings.Possibly the students' recess period can beutilized for this break, but would you pleasearrange to have the test completed during themorning ?

Would you please administer the test to yourpupils ? The manual and the necessary testsare enclosed. Look over the instructions foradministration closely. You should read overthe instructions twice, noting especially thoseparts underlined inred. This should take aboutone hour, and you will be paid $3.75 for thispreparation time.

Once your pupils have completed the test,replace the tests in the envelope, seal it withscotch tape, and leave it in the principal's of-fice by noon on April 5 so that the tests maybe collected. You have no responsibility toscore the tests. Thank you.

TREATMENT 7:

Instructions received by teachers on March 22,1965.

To:6th grade teacher at School

The Public Schools are conduct-ing a study to obtain an accurate estimate ofthe achievement of each sixth grade pupil be-fore he enters junior high school. This studymight be considered an educational experiment.The past subject-matter achievement of pupilsin each of a number of previous classes hasbeen determined and will be compared withpresent achievement status. It is our plan totest the pupils in selected classes, includingyours, at a later date. The persons involved

39

in this experiment will appreciate your helpand diligence in collecting this important in-formation. Additions materials and instructionswill be forthcoming.

Instructions received by teachers on April 2,1965.

To:6th grade teacher at School

As you toci informed 10 days ago, the1,....zic Schools are conducting a

study to obtain an accurate estimate of theachievement of each sixth grade pupil beforehe enters junior high school. This study mightbe considered an educational experiment. Thepast subject-matter achievement of pupils ineach of a number of previous classes has beendetermined and will be compared with presentachievement status. It is our plan to test thepupils in selected classes, including yours.The persons involved in this experiment willaopreciate your help and diligence in collectingth.s important information. It is now possibleto relate the details of this experiment.

Will you please schedule time for your pu-pils to take the new Stanford Arithmetic Achieve-ne. Test on Monday morning, April 5, 1965 ?The test should be given in two sittings, withat least a 15-minute break between sittings.Possibly the students' recess period can beutilized for this break, but would you please,arrange to have the test completed during themorning ?

A graduate, or advanced undergraduate,student will be prepared to administer the testto your pupils. He will bring the tests with nim.Would you please remain in the classroom dur-ing the testing ? This student will arrive atyour room about 9 A. M. on Monday, April 5,after first checking in with your building prin-cipal.

Would you please score the tests for yourpupils using the enclosed scoring key? Thekey will allow you to work rapidly and accur-ately. On the front of each pupil's test book-let, mark his three grade scores obtained fromthe bottom of test pages 3, 5, and 8. Do notfill in the percentile ranks. Please assureyouself that the marks are accurate.

This should take you about two hours, andyou will be paid $7.50 for your time spentscoring the tests. Would you please have thetests scored by April 8 ? Replace the tests inthe envelopes seal it with scotch tape, andleave it in the principal's office by 4 P. M. on

Page 50: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

40

April 8 so that the tests may be collected.Thank you.

TREATMENT $:

Instructions received by teachers on March 22,1965.

To:1.11111111111001111

6th grade teacher at School

The Public Schools are conduct-ing a study to obtain an accurate estimate ofthe achievement of each sixth grade pupil be-fore he enters junior high school. This studymight be considered an educational experiment.The past subject-matter achievement of pupilsin each of a number of previous classes hasbeen determined and will be compared withpresent achievement status. It is our plan totest the pupils in selected classes, includingyours, at a later date. The persons involvedin this experiment will appreciate your helpand diligence in collecting this important infor-mation. Additional materials and instructionswill be forthcoming.

Instructions received by teachers on April 2,1965.

To:6th grade teacher at _School

As you were informed 10 days ago, thePublic Schools are conducting a

study to obtain an accurate estimate of theachievement of each sixth grade pupil beforehe enters junior high school. This study mightbe considered an educational experiment. Thepast subject-matter achievement of pupils ineach of a number of previous classes has beendetermined and will be compared with presentachievement status. it is our plan to test thepupils in selected classes, including yours.The persons involved in this experiment willappreciate your help and diligence in collectingthis important information. It is now possibleto relate the details of this experiment.

Will you please schedule time for your pu-pils to take the new Stapford Arithmetic Achieve-ment Tekt on Monday morning, April 5, 1965 ?The test should be given in two sittings, withat least a 15-minute break between sittings.Possibly the students' recess period can beutilized for this break, but would you please

arrange to have the test completed during themorning ?

A g r a du a t e, or advanced undergraduate,student will be prepared to administer the testto your pupils. He will bring the tests withhim. Would you please remain in the classroomduring the testing ? This student will arrive atyour room about 9 A. M. on Monday, April 5,after first checking in with your building prin-cipal.

Once your pupils have completed the test,replace the tests in the envelope, seal it withscotch tape, and leave it in the principal's of-fice by noon on April 5 so that the tests maybe collected. You have no responsibility toscore the tests. Thank you.

TREATMENTS 9 AND 03:

Treatment 9: Instructions received by teacherson March 22, 1965.

Treatment 13: Instructions receivedby teacherson April 2, 1965.

To:6th grade teacher at School

The Public Schools have beenasked. to collect, in a routine manner, somenormative information on a new standardizedtest. Your class has been randomly selectedto take the test at a later date. Data from allschools will be pooled, and separate classeswill not be identified in the process. The cen-tral office will not be involved in the scoringor recording of the tests.

Will you please schedule time for your pu-pils to take the new Standard Arithmetic Achieve-ment Test, on Monday morning, April 5, 1965 ?The test should be given in two sittings, withat least a 15-minute break between sittings.Possibly the students' recess period can beutilized for this break, but would you pleasearrange to have the test completed during themorning ?

Would you please administer the test to yourpupils ? The manual and the necessary testsare enclosed. Look over the instructions foradministration closely. You should read overthe instructions twice, noting especially thoseparts underlined in red. This should take aboutone hour, and you will be paid $3.75 for thispreparation time.

Would you also please score the tests foryour pupils using the enclosed scoring key ?

Page 51: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

The key will allow you to work rapidly and ac-curately. On the front of each pupil's testbooklet, mark his three grade scores obtainedfrom the bottom of test pages 3, 5, and 8. Donot fill in the percentile ranks. Please assureyourself that the marks are accurate,

This should take you about two hours, andyou will be paid $ 7.50 for your time spentscoring the tests. Would you please have thetests scored by April 8 ? Replace the tests inthe envelope, seal it with scotch tape, andleave it in the principal's office by 4 P. M. onApril 8 so that the tests may be collected.Thank you.

TREATMENTS IQ AND 14:

Treatment 10: Instructionson March 22, 1965.

Treatment 14: Instructionson April 2, 1965.

To:6th grade teacher at

received

received

by teachers

by teachers

School

The Public Schools have beenasked to collect, in a routine manner, somenormative information on a new standardizedtest. Your class has been randomly selectedto take the test at a later date. Data from allschools will be pooled, and separate classeswill not be identified in the process. The cen-tral office will not be involved in the scoringor recording of the tests.

W/11 you please schedule time for your pu-pils to take the new Stanford Arithmetic Achieve-Beat on Monday morning, April 5, 1965 ?The test should be given in two sittings, withat least a 15-minute break between sittings.Possibly the students° recess period can beutilized for this break, but would you pleasearrange to have the test completed during themorning ?

Would you please administer the test toyour pupils ? The manual and the necessarytests are enclosed. Look over the instructionsfor adminiseation closely, You should readover the instructions twice, noting especiallythose parts underlined in red. This should takeabout one hour, and you will be paid $3.75 forthis preparation time.

Once your pupils have completed the test,replace the tests in the envelope, seal it withscotch tape, and leave it in the principal's of-fice by noon on April 5 so that the tests may

41

be collected. You have no responsibility toscore the tests. Thank you.

TREATMENTS I 0 AND IS:

Treatment 11: Instructions received by teacherson March 22, 1965.

Treatment 15: Instructions received by teacherson April 2, 1965.

To:6th grade teacher at School

The Public Schools have beenasked to collect, in a routine manner, somenormative information on a new standardizedtest. Your class has been randomly selectedto take the test at a later date. Data from allschools will be pooled, and separate classeswill not be identified in the process. The cen-tral office will not be involved in the scoringor recording of the tests.

Will you please schedule time for your pu-pils to take the new Stanford Arithmetic Achieve-ment Test on Monday morning, April 5, 1965 ?The test should be given in two sittings, withat least a 15-minute break between sittings.Possibly the students' recess period can beutilized for this break, but would you pleasearrange to have the test completed during themorning ?

A g r a d u a t e, or advanced undergraduate,student will be prepared to administer the testto your pupils. He will bring the tests withhim. Would you please remain in the classroomduring the testing ? This student will arrive atyour room about 9 A. M. on Monday, April 5,after first checking in with your building prin-cipal.

Would you please score the tests for yourpupils using the enclosed scoring key ? Thekey will allow you to work rapidly and accur-ately. On the front of each pupil's test book-let, mark his three grade. scores obtained fromthe bottom of test pages 3, 5, and 8. Do notfill in the percentile rank*. Please assure your-self that the marks are accurate.

This should take you about two hours, andyou will be paid $7.50 for your time spentscoring the tests. Would you please have thetests scored by April 8 ? Replace the tests inthe envelope, seal it with scotch tape, andleave it in the principal's office by 4 P. M. onApril 8 so that the tests may be collected.Thank you.

Page 52: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

42

TREATMENTS 12 AND 16:

Treatment 12: Instructions received by theteachers on March 22, 1965.

Treatment 16: Instructions received by theteachers on April 2, 1965.

To:6th grade teacher at School

The Public Schools have beenasked to collect, in a routine manner, somenormative information on a new standardizedtest. Your class has been randomly selectedto take the test at a later date. Data from allschools will be pooled, and separate classeswill not be identified in the process. The cen-tral office will not be involved in the scoringor recording of the tests.

Will you please schedule time for your pu-pils to take the new Stanford Arithmetic Achieve-

ment Test on Monday morning, April 5, 1965 ?The test should be given in two sittings, withat least a 15-minute break between sittings.PossAbly the students' recess period can beutilized for this break, but would you pleasearrange to have the test completed during themorning ?

A g r a du a t e, or advanced undergraduate,student will be prepared to administer the testto your pupils. He will bring the tests withhim. Would you please remain in the classroomduring the testing ? This student will arrive atyour room about 9 A. M. on Monday, April 5,after first checking in with your building prin-cipal.

Once your pupils have completed the test,replace the tests in the envelope, seal it withscotch tape, and leave it in the principal's of-fice by noon on April 5 so that the tests may becollected. You have no responsibility to scorethe tests. Thank you.

Page 53: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

BIBLIOGRAPHY

Bolton, F. B. Evaluating teaching effective-ness through the use of scores on achieve-ment tests. It_gduc. Res,, 1945, 38 691-696.

Brooks, S. S. Comparing the efficiencyof spe-cial teaching methods by means of sten-dardized tests. T. educ. Rags I 1921, 4,337-346. (a)

Brooks, S. S. Measuring the efficiency ofteachers by standardized tests. J. educ.Bau 1921, 4, 255-264. (b)

Brooks, S. S. Improving schools by ,s t a n-dardized tests. Boston: Houghton Mifflin,1922.

Campbell, D. T., & Stanley, J. C. Experi-mental and quasi-experimental designs forresearch on teaching. In N. L. Gage (Ed. ),Handbook of research on teaching. Chicago:Rand McNally, 1963. Pp. 171-246.

Cook, D. L. The Hawthorne Effect in educa-tional research. Delta Kaltman, 1962,Ai, 116-122.

De Weerdt, Esther N. The transfer effect of.practice in related functions upon a groupintelligence test. ,Sch. & Soc., 1927, 25438-440.

Douglass, H, R. The effect of measurementupon instruction. 1.___educ. Res. , 1935, a508-511.

Edmiston, R. W.> Examine the examination.Psychol. , 1939, 30 126-138.

Findley, W. G. A group testing program forthe modern school. Educ. Dsvehol. Measmt.1945, 1, 173-179.

French, J. W. , & Dear, R. E. Effect of coach-ing on an aptitude test. Educ. psycho!.Measmt, 1959, 12, 319-330.

Gilmore, M. E. Coaching for intelligence tests.T. educ. Psychag_. , 1927, 18 119-121.

Goodwin, W. L. The effects on achievementtest results of varying conditions of experi-mental atmosphere, notice of testing, testadministration, and test scoring. Unpub-lished doctoral disseration, Univer. ofWisconsin, 1965.

43

Green, B. F., & Tukey, J. Complex analysisof variance: General problems. Efflychome-trika 1960, 25 127-152.

Hopkins, K. D., & Lefever, D. W, TV vs.teacher administration of standardized tests:Comparability of scores. Paper read at Nat.Conf. Measmt Educ. , Chicago, February,1965.

Hulten, C. E. The personal element in teachers'marks. T. educ. Res., 1925, 12 49-55.

Kelley, T. L. , Madden, R. , Gardner, E. F. &

Rudman, H. C. Stanford Achieve -ment Test, Intermediate II, Form W. NewYork: Harcourt, Brace, & World, 1964.

Kintz, B. L. , Delprato, D. j., Mettes, D. R.,Per sons, C. E., & Schappe, R. H. Theexperimenter effect. Psycho!. Bull., 1965,63 223-232.

Lindquist, E. F., & Hieronymus, A. N. IowaTest of Basic Skills. Boston: HoughtonMifflin, 1956.

Lorge, I. , Thorndike, R. L., & Hagen, Eliza-beth P. La" -Thorndike Intelligence Test.Boston: Houghton Mifflin, 1964.

Lowell, Frances. A preliminary report of somegroup tests of general intelligence. T. educ.

1919, 1®, 323-344.McCall, W. A. How to experiment in education

New York: Macmillan, 1923.McGuigan, F. J. The experimenter: A ne-

glected stimulus object. Psychol.. Bull.,1963, 60 421-428.

Manual for adminijaa or supervisors, andliL:11AarIow Test of Basic Skills

Boston: Houghton Mifflin, 1956.Mayo, E. The political problems of an indus-,

trial civilization. Boston: Harvard BusinessSchool, 1945.

Mayo, E. The social_problems of an industrialcivilization. Boston: Harvard BusinessSchool, 1945,

Orne, M. T. On the social psychology of thepsychological experiment: With particularreference to demand characteristics andtheir implitcations. Amer. Psvchol1962, 17 776-783.

Page 54: OF lent - ERIC · r84209 eric report resume. ed 010 202. 1..ri.310...67. 24 (rev) the effects on achievement test results of varying conditions of experimental atmospherev.notice

44

Phillips, B. N., & Weathers, G. Analysis oferrors made in scoring standardized tests.Educ. asvchol. Measmt, 1958, Ill, 563-567.-

Pitner, R. Accuracy in scoring group intelli-gence tests. educ. Psychol., 1926,470-475.

Rice, J. M. The futility of the spelling grind.Forum, 1897, 23 163-172.

Roethlisberger, F. J. Management and morale.Cambridge, Mass.: H a r v a r d UniversityPress, 1941.

Roethlisberger, F. J. & Dickson, W. J. Man-aaeutent and the worker. Cambridge, Mass. :Harvard University Press, 1941.

Rosenthal, R. On the social psychology of thepsychological experiment: The experimen-ter's hypotilesis as unintended determinantof experimental results. Amer. Scientist,1963, 51 268-283.

Rosenthal, R. Covert communication and tacitunderstanding in the E- S dyad. Paper readat Amer. Psychol. Ass., Chicago, Septem-ber, 1965.

Rosenthal, R., & Fode, K. L. The effect ofexperimenter bias on the performance of thealbino rat. Behay. Sal., 1963, 8 183-189.

Rosenthal, R., & Hales, E. S. Experimentereffect on the study of invertebrate behavior.Psychol. Req., 1962, 11 251-256.

Rosenthal, R., Persenger, G. W., Kline, L.V., & Muiry, R. C. The role of the research

assistant in the mediation of experimenterbias. T. Pers.,, 1963, AL 313-335.

Ryans, D. G. The criteria of teaching effec-tiveness. L jagia jiteWI 1949, S., 690-699.

Sarason, 1. G., & Minard, J. The interrela-tionship among subjects, experimenters,and situational variables. abnilsrmPsvophol., 1963, 62, 87-91.

Simpson, R. H. The critical interpretation oftest results in a school system. Educ. DEM.,chol. Measmt, 1947, 1, 61-66.

"Student." The Lanarkshire milk experiment.Biometrika, 1931, 23. 398.

Thorndike, R. L., & Hagen, Elizabeth P.Measurement and evaluation in psvcholoavand education. New York: John Wiley andSons, 1955.

Traxler, A. E. Administering and scoring theobjective test. In E. F. Lindquist (Ed. ),Educational measurement, Washington,D.C.: Am er ic an Council cn Education,1951. Pp. 329-416.

Tyler, F. T., & Chalmers, T. M. The effecton scores a warning junior high school pu-pils of coming tests. J. educ. Res., 1943,11, 290-296.

Winer, B. J. Statistical principles in exaeria:mental cjesigai New York: McGraw Hill,1962.