7 evaluationl models

Foundational Models for 21 Centuryst

Program Evaluation

by

Daniel L. StufflebeamThe Evaluation Center

Western Michigan University

The Evaluation Center

Occasional Papers Series

December 1, 1999

This paper was prepared for The Evaluation Center’s Occasional Papers Series. It is based on a presentation1

in the State of the Evaluation Art and Future Directions in Educational Program Evaluation Invited Symposium at

the annual meeting of the American Educational Research Association; Montreal, Quebec, Canada; April 20, 1999.

Appreciation is extended to colleagues who critiqued prior drafts of this paper, especially Sharon2

Barbour, Jerry Horn, Tom Kellaghan, Gary Miron, Craig Russon, James Sanders, Sally Veeder, Bill Wiersma, and

Lori Wingate. While their valuable assistance is acknowledged, the author is responsible for the paper’s contents

and especially any flaws.

ii

Foundational Models for 21 Centuryst

Program Evaluation12

In moving to a new millennium, it is an opportune time for evaluators to critically appraise theirprogram evaluation approaches and decide which ones are most worthy of continued applicationand further development. It is equally important to decide which approaches are best abandoned.In this spirit, this paper identifies and assesses 22 approaches often employed to evaluateprograms. These approaches, in varying degrees, are unique and comprise most programevaluation efforts. Two of the approaches, reflecting the political realities of evaluation, are oftenused illegitimately to falsely characterize a program’s value and are labeled pseudoevaluations.The remaining 20 approaches are typically used legitimately to judge programs and are dividedinto questions/methods-oriented approaches, improvement/ accountability approaches, and socialagenda/advocacy approaches. The best program evaluation approaches appear to be OutcomesMonitoring/Value-Added Assessment, Case Study, Decision/Accountability, Consumer-Oriented,Client-Centered, Constructivist, and Utilization-Based, with the new Democratic Deliberativeapproach showing promise. The worst bets seem to be Politically Controlled, Public Relations,Accountability (especially payment by results), Clarification Hearings, and Program Theory-Based. The rest fall somewhere in the middle. All legitimate approaches are enhanced whenkeyed to and assessed against professional standards for evaluations.

iii

Table of Contents

I INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Overview of the Paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Evaluation Models and Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1The Nature of Program Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Need to Study Alternative Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Classifications Of Alternative Evaluation Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Program Evaluation Defined . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Pseudoevaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Questions/Methods-Oriented Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Improvement/Accountability-Oriented Evaluations . . . . . . . . . . . . . . . . . . . . . . . 4Social Agenda-Directed (Advocacy) Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

II PSEUDOEVALUATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Approach 1: Public Relations-Inspired Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 7Approach 2: Politically Controlled Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

III QUESTIONS/METHODS-ORIENTED EVALUATION APPROACHES . . . . . . . . . . . . . 11Approach 3: Objectives-Based Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Approach 4: Accountability, Particularly Payment By Results Studies . . . . . . . 12Approach 5: Objective Testing Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Approach 6: Outcomes Monitoring/Value-Added Assessment . . . . . . . . . . . . . 15Approach 7: Performance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Approach 8: Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Approach 9: Management Information Systems . . . . . . . . . . . . . . . . . . . . . . . . . 19Approach 10: Benefit-Cost Analysis Approach . . . . . . . . . . . . . . . . . . . . . . . . . 20Approach 11: Clarification Hearing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Approach 12: Case Study Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Approach 13: Criticism and Connoisseurship . . . . . . . . . . . . . . . . . . . . . . . . . . 24Approach 14: Program Theory-Based Evaluation . . . . . . . . . . . . . . . . . . . . . . . 25Approach 15: Mixed Methods Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

IV IMPROVEMENT/ACCOUNTABILITY-ORIENTED EVALUATION APPROACHES . . 41Approach 16: Decision/Accountability-Oriented Studies . . . . . . . . . . . . . . . . . . 41Approach 17: Consumer-Oriented Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Approach 18: Accreditation/Certification Approach . . . . . . . . . . . . . . . . . . . . . 45

V SOCIAL AGENDA-DIRECTED (ADVOCACY) APPROACHES . . . . . . . . . . . . . . . . . . . 53Approach 19: Client-Centered Studies (or Responsive Evaluation) . . . . . . . . . . 55Approach 20: Constructivist Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

iv

Approach 21: Deliberative Democratic Evaluation . . . . . . . . . . . . . . . . . . . . . . 58Approach 22. Utilization-Focused Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 60

VI Best Approaches for 21 Century Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .st 71Table 19: RATINGS Strongest Program Evaluation Approaches . . . . . . . . . . . . . . . . 72Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87Checklist for Rating Evaluation Approaches in Relationship to

The Joint Committee Program Evaluation Standards . . . . . . . . . . . . . . . . . . . . 89

v

Editor’s Note:

The Occasional Paper Series is published by The Evaluation Center on the campus of Western Michigan University. Its purpose is to advance the theory and practice of evaluation by reporting on new developments in the profession. Authors who contribute to the series retain copyright or their work. This allows them to publish early drafts of a paper, obtain feedback from readers, make necessary modifications, and go on to publish in other venues.

In this volume of the Series, published on the eve of a new millennium, Daniel Stufflebeam reviews the evaluation models that have emerged and identified the models that offer the greatest prospects for future success. Few in the profession are better able to do this than Stufflebeam. During his career, which spans nearly four decades, he hasdeveloped nearly 100 standardized tests, authored the CIPP evaluation model, was the first Chair of the Joint Committee on Standards for Educational Evaluation, and pioneered the concept for metaevaluation.

The reader is invited to join the ranks of authors who have published in the Occasional Paper Series including Donald Campbell, Gene Glass, Arnold Love, James Sanders, Michael Scriven, Lori Shephard, Robert Stake, and Daniel Stufflebeam. Manuscripts should be 50-100 pages in length and significant to the field of evaluation. All submissions are reviewed for acceptability by the editorial team made up of the staff of The Evaluation Center.

Craig Russon, Ph.D.Editor, The Occasional Paper Series

I. INTRODUCTION

Overview of the Paper

Evaluators today have at their disposal manymore evaluation approaches than in 1960. Asevaluators prepare to surmount the Y2Kchallenges and cross into the next century, it isan opportune time to consider what 20 centuryth

evaluation developments are best to take alongand which ones would best be left behind. Ihave, in this paper, attempted to sort 22alternative evaluation approaches into whatfishermen sometimes call the “keepers” and the

“throwbacks.” More importantly, I haveattempted to characterize each approach;identify its strengths and weaknesses; andconsider whether, when, and how each approachis best applied. The reviewed approachesemerged mainly in the U.S. between 1960 and1999.

Following a period of relative inactivity in the1950s, a succession of international andnational forces stimulated the development ofevaluation theory and practice. Main influenceswere the efforts to vastly strengthen the U. S.defense system spawned by the Soviet Union’s1957 launching of Sputnik I; the new U.S. lawsin the 1960s to equitably serve persons withdisabilities and minorities; the federalevaluation requirements of the Great Societyprograms initiated in 1965; the U.S. movementbegun in the 1970s to hold educational andsocial organizations accountable for bothprudent use of resources and achievement ofobjectives; the stress on excellence in the 1980sas a means of increasing U.S. internationalcompetitiveness; and the trend in the 1990s forvarious organizations, both inside and outsidethe U.S., to employ evaluation to assure quality,competitiveness, and equity in deliveringservices. Education has consistently been at the

heart of societal reforms in the U.S., and theU.S. society has repeatedly pressed educators toshow through evaluation whether or notimprovement efforts were succeeding.

The development of program evaluation as afield of professional practice was also spurredby a number of seminal writings. Theseincluded, in chronological order, publicationsby Tyler (1942, 1950), Campbell and Stanley(1963), Cronbach (1963), Stufflebeam (1966),Tyler (1966), Scriven (1967), Stake (1967),Stufflebeam (1967), Suchman (1967), Alkin(1969), Guba (1969), Provus (1969),Stufflebeam et al. (1971), Parlett and Hamilton(1972), Eisner (1975), Glass (1975), Cronbachand Associates (1980), House (1980), andPatton (1980). These and other authors/scholarsbegan to project alternative approaches toprogram evaluation. In the ensuing years a richliterature on a wide variety of alternativeprogram evaluation approaches developed [see,for example, Cronbach (1982); Guba andLincoln (1981, 1989); Nave, Misch, andMosteller (1999), Nevo (1993); Patton (1982,1990, 1994, 1997); Rossi and Freeman (1993);Schwandt (1984); Scriven (1991, 1993, 1994a,1994b, 1994c); Shadish, Cook, and Leviton(1991); Smith, M. F. (1989); Smith, N. L.(1987); Stake (1975b, 1988, 1995); Stufflebeam

(1997); Stufflebeam and Shinkfield (1985);Wholey, Hatry, and Newcomer (1995);Worthen and Sanders (1987, 1997)].

Evaluation Models and Approaches

The chapter uses the term evaluation approachrather than evaluation model because, for onereason, the former is broad enough to cover

illicit as well as laudatory practices. Also,beyond covering both creditable and

2 Stufflebeam

noncreditable approaches, some authors ofevaluation approaches say that the term model istoo demanding to cover their published ideasabout how to conduct program evaluations. Butfor these two considerations, the term modelwould have been used to encompass most of theevaluation proposals discussed in this chapter.This is so because most of the presentedapproaches are idealized or “model” views forconducting program evaluations according tothe beliefs and experiences of their authors.

The Nature of Program Evaluation

The chapter employs a broad view of programevaluation. It encompasses evaluations of anycoordinated set of activities directed atachieving goals. Examples are assessments ofongoing, cyclical curricular programs; time-bounded projects; and regional or state systemsof services. Such program evaluations bothoverlap and yet are distinguishable from otherforms of evaluation, especially studentevaluation, teacher evaluation, materialsevaluation, and school evaluation. The program evaluation approaches that areconsidered cut across a wide array of programsand services, e.g., curriculum innovations,school health services, counseling, adulteducation, preschool, state systems of education,school-to-work projects, adult literacy, andparent involvement in schools. Clearly,program evaluation applies importantly to abroad array of activities.

Need to Study Alternative Approaches

The study of alternative evaluation approachesis vital for the professionalization of programevaluation and for its scientific advancementand operation. Professionally, careful study ofthe approaches being employed in the name ofprogram evaluation can help evaluatorslegitimize approaches that comport with soundprinciples of evaluation and discredit those thatdon’t. Scientifically, such a review can help

evaluation researchers identify, examine, andaddress conceptual and technical issuespertaining to the development of the evaluationdiscipline. Operationally, a critical view ofalternatives can help evaluators consider andassess optional frameworks for planning andconducting particular studies. On this point, theauthor has found that different approaches maywork differentially well, depending on theevaluation’s context. Often it is advantageousto borrow strengths of different approaches tocreate a “best fit” approach for specificevaluation projects. Thus, it behoovesevaluators to develop a repertoire of differentlegitimate approaches they can use, plus theability to discern which approaches work bestunder what circumstances. However, a mainvalue in studying alternative program evaluationapproaches is not to enshrine any of them. Onthe contrary, the purposes are to discover theirstrengths and weaknesses, decide which onesmerit substantial use, determine when and howthey are best applied, and obtain direction forimproving these approaches and devising betteralternatives.

Classifications Of Alternative EvaluationApproaches

In analyzing the 22 alternative evaluationapproaches, prior assessments regardingprogram evaluation’s state of the art wereconsulted. Stake’s analysis of 9 programevaluation approaches provided a usefulapplication of advance organizers (the types ofvariables used to determine informationrequirements) for ascertaining different types ofprogram evaluations. Hastings’ review of the1

growth of evaluation theory and practice helpedto place the evaluation field in a historicalperspective. Guba’s presentation and2

assessment of six major philosophies inevaluation was provocative. House’s (1983)3

analysis of different approaches illuminatedimportant philosophical and theoreticaldistinctions. Finally, Scriven’s (1991, 1994a)

Introduction 3

writings on the transdiscipline of evaluationhelped to sort out different evaluationapproaches; it was also invaluable in seeingprogram evaluation approaches in the broadercontext of evaluations focused on variousobjects other than programs. Although thepaper does not always agree with theconclusions put forward in these publications,all of the prior assessments helped sharpen theissues addressed.

Program Evaluation Defined

In characterizing and assessing differentevaluation approaches, careful considerationwas given to the various kinds of activitiesconducted in the name of program evaluation.These activities were classified based on theirdegree of conformity to a particular definition ofevaluation. This chapter defines evaluation as astudy designed and conducted to assist someaudience to assess an object’s merit and worth.This definition should be widely acceptablebecause it agrees with common dictionarydefinitions of evaluation; also, it is consistentwith the definition of evaluation that underliespublished sets of professional standards forevaluations (Joint Committee 1981, 1994).However, it will become apparent that manystudies done in the name of program evaluationeither do not conform to this definition ordirectly oppose it.

The above definition of an evaluation studywas used to classify program evaluationapproaches into four categories. The firstcategory includes approaches that promoteinvalid or incomplete findings (referred to aspseudoevaluations), while the other threeinclude approaches that agree, more or less,with the employed definition of evaluation( i . e . , Q u e s t i o n s / M e t h o d s - O r i e n t e d ,Improvement/Accountability, and SocialAgenda/Advocacy).

Pseudoevaluations

This paper’s first group of programevaluation approaches includes what I havetermed pseudoevaluations. These promote apositive or negative view of a program,irrespective of its actual merit and worth.Such studies often are motivated by politicalobjectives, e.g., persons holding or seekingauthority may present unwarranted claimsabout their achievements and/or the faults oftheir opponents or hide potentially damaginginformation. These objectionable approachesare presented because they deceive throughevaluation and can be used by those in powerto mislead constituents or to gain andmaintain an unfair advantage over others,especially those persons with little power. Ifevaluators acquiesce to and supportpseudoevaluations, they help promote andsupport injustice, mislead decision making,lower confidence in evaluation services, anddiscredit the evaluation profession. Thus, thepaper discusses pseudoevaluations in order tosensitize professional evaluators and theirclients to the prevalence of and harm causedby such inappropriate studies and to convincethem to oppose such invalid evaluationpractices.

Questions/Methods-Oriented Approaches

The second category of approaches includesstudies that are oriented to (1) addressspecified questions whose answers may or maynot be sufficient to assess a program’smerit and worth and/or (2) use some preferredmethod(s). These Questions/Methods-Oriented Approaches include studies thatemploy as their starting points operationalobjectives, s tandardized measurementdevices, cost analysis procedures, expertjudgment, a theory or model of a program,case s tudy procedures, managementinformation systems, designs for controlledexperiments, and/or a commitment to employ

4 Stufflebeam

a mixture of qualitative and quantitativemethods. Most of them emphasize technicalquality and posit that it is usually better toanswer a few pointed questions well than toattempt a broad assessment of something’smerit and worth. Since these approaches tendto concentrate on methodological adequacy inanswering given questions rather thandetermining a program’s value, the set ofthese approaches may be referred to as quasi-evaluation approaches. While they aretypically labeled as evaluations, they may ormay not meet the requirements of a soundevaluation.

Imp r o ve me n t / A c c o u n t a b i l i t y -O r i e n t e dEvaluations

The third set of approaches involves studiesdesigned primarily to assess and/or improve aprogram’s merit and worth. These are labeledImprovement/Accountabil ity-OrientedEvaluations. They are expansive and seekcomprehensiveness in considering the fullrange of questions and criteria needed toassess a program’s value. Often they employthe assessed needs of a program’sstakeholders as the foundational criteria forassessing the program’s merit and worth. They seek to examine the full range ofappropriate technical and economic criteriafor judging program plans and operations. They also look for all relevant outcomes, notjust those keyed to program objectives. Suchstudies sometimes are overly ambitious intrying to provide broad-based assessmentsleading to definitive, documented, andunimpeachable judgments of merit and worth.Typically, they must use multiple qualitativeand quantitative assessment methods toprovide cross-checks on findings. In general,these approaches conform closely to thispaper’s definition of evaluation.

Social Agenda/Directed (Advocacy) Models

The fourth category of approaches is labeledSocial Agenda/Directed (Advocacy) Models.The approaches in this group are quite heavilyoriented to employing the perspectives ofstakeholders as well as experts incharacterizing, investigating, and judgingprograms. Mainly, they eschew thepossibility of finding right or best answersand reflect the philosophy of postmodernism,with its attendant stress on cultural pluralism,moral relativity, and multiple realities.Typically, these evaluation approaches favor aconstructivist orientation and the use ofqualitative methods. These evaluationapproaches emphasize the importance ofdemocratically engaging stakeholders inobtaining and interpreting findings. They alsostress serving the interests of underprivilegedgroups. Worries about these approaches arethat they might concentrate so heavily onserving a social mission that they fail to meetthe standards of a sound evaluation. Forexample, if an evaluator is so intent onserving the underprivileged, empowering thedisenfranchised, and/or righting educationaland/or social injustices, he or she mightcompromise the independent, impartialperspective needed to produce valid findings.In the extreme, an advocacy evaluation couldcompromise the integrity of the evaluationprocess in order to achieve social objectives andthus devolve into a pseudoevaluation.The particular social agenda/advocacyapproaches presented in this paper seem tohave sufficient safeguards needed to walk thefine line between sound evaluation servicesand politically corrupted evaluations. Worriesabout bias control in these approachesincrease the importance of subjectingadvocacy evaluations to metaevaluationsgrounded in standards for sound evaluations.

Introduction 5

Of the 22 program evaluation approachesd i s c u s s e d , 2 a r e c l a s s i f i e d a spseudoevaluations, 13 as questions/methods-oriented approaches, 3 as improvement/accountability-oriented approaches, and 4 associal agenda/advocacy-directed approaches.The analysis of the 20 legitimate approaches ispreceded with a discussion of the 2approaches that often are used to distortfindings and conclusions. The latter group isconsidered because evaluators and clientsshould be alert to and reject approaches thatoften are masqueraded as sound evaluations,but in reality lack truthfulness and integrity.

Each approach is analyzed in terms of tendescriptors: (1) advance organizers, that is,the main cues that evaluators use to set up astudy; (2) main purpose(s) served; (3) sourcesof questions addressed; (4) questions that arecharacteristic of each study type; (5) methodstypically employed; (6) persons whopioneered in conceptualizing each study type;(7) other persons who have extendeddevelopment and use of each study type; (8)key considerations in determining when to useeach approach; (9) strengths of the approach,and (10) weaknesses of the approach. Usingthese descriptors, comments on each of the 22program evaluation approaches are presented.These assessments are then used to reachconclusions about which approaches shouldbe avoided, which are most meritorious, andunder what circumstances the worthyapproaches are best applied.

Caveats

I acknowledge, without apology, that theassessments of approaches and the entries inthe charts throughout the paper are mainly mybest judgments. I have taken no poll, and nodefinitive research exists to represent aconsensus on the characteristics and strengthsand weaknesses of the different approaches.

My analyses reflect 35 years of experience inapplying and studying different evaluationapproaches. Hopefully, as parochial as thesemight be, they will be useful to evaluatorsand evaluation students at least in the form ofworking hypotheses to be tested.

Also, I have mainly looked at the approachesas relatively discrete ways to conductevaluations. In reality, there are manyoccasions when it is functional to mix andmatch different approaches. A carefulanalysis of such combinatorial applications nodoubt would produce several hybridapproaches for analysis. Unfortunately, thatstep is beyond the scope of what I haveattempted here.

II. PSUEDOEVALUATIONS

Because this paper is focused on describing andassessing the state of the art in evaluation, it isnecessary to discuss bad and questionablepractices, as well as the best efforts.Evaluations can be viewed as threatening orapproached in opportunistic ways. In suchcases, evaluators and their clients are sometimestempted to shade, selectively release, or evenfalsify findings. While such efforts may looklike sound evaluations, they are judged in thisanalysis to be psuedoevaluations, if they do notforthrightly attempt to produce and report to allright-to-know audiences valid assessments ofmerit and worth. The first type ofpsuedoevaluation considered—Public Relationsapproach—may meet the standard foraddressing all right-to-know audiences but failsas a legitimate evaluation approach, becausetypically it presents a program’s strengths (or anexaggerated view of them) but not itsweaknesses. The second psuedoevaluationa p p r o a c h — P o l i t i c a l l y C o n t r o l l e devaluation—may be quite strong in obtainingvalid information but fail as a sound evaluationby either withholding information from right-to-know audiences or releasing only those partsthat are advantageous to the client.

Approach 1: Public Relations-Inspired Studies

The public relations approach begins with anintention to use data to convince constituentsthat a program is sound and effective. Othernames for the approach are “ideologicalmarketing” (see Ferguson, June 1999),advertising, and infomercial.

The advance organizer is the propagandist’sinformation needs. The study’s purpose is tohelp the program director/public relations

official project a convincing, positive publicimage for a program, project, process,organization, leadership, etc. The guidingquestions are derived from the public relationsspecialists’ and administrators’ conceptions ofwhich questions would be most popular withtheir constituents. In general, the publicrelations study seeks information that wouldmost help an organization confirm its claims ofexcellence and secure public support. From thestart, this type of study seeks not a validassessment of merit and worth but informationneeded to help the program “put its best footforward.” Such studies avoid gathering orreleasing negative findings.

Typical methods used in public relations studiesare biased surveys, inappropriate use of normstables, biased selection of testimonials andanecdotes, “massaging” of obtained information,selective release of only the positive findings,cover-up of embarrassing incidents, and the useof “expert,” advocate consultants. In contrast tothe “critical friends” employed in Australianevaluations, public relations studies use“friendly critics.” A pervasive characteristic ofthe public relations evaluator’s use of dubiousmethods is a biased attempt to nurture a goodpicture for the program being evaluated. Thefatal flaw of built-in bias to report only goodthings offsets any virtues of this approach. If anorganization substitutes biased reporting of onlypositive findings for balanced evaluations ofstrengths and weaknesses, it soon willdemoralize evaluators who are trying to conductand report valid evaluations and may discreditits overall practice of evaluation.

By disseminating only positive information ona program’s performance while withholding

8 Stufflebeam

information on shortcomings and problems,evaluators and clients may mislead thetaxpayers, constituents, and other stakeholdersconcerning the program’s true value. Thepossibility of such positive bias in advocacyevaluations underlies the longstanding policy ofConsumers Union not to include advertising bythe owners of the products and services beingevaluated in its Consumer Reports magazine. Inorder to maintain credibility with consumers,Consumers Union has steadfastly maintained anindependent perspective and a commitment toidentify and report both strengths andweaknesses in the items evaluated and not tosupplement this information with biased ads.

A contact with an urban school district illustratesthe public relations type of study. Asuperintendent requested a community survey forhis district. The superintendent said,straightforwardly, that he wanted a survey thatwould yield a positive report on the district’sperformance and his leadership. He said such apositive report was desperately needed at thetime so that the community would restoreconfidence in the school district and him. Thesuperintendent did not get the survey andpositive report, and it soon became clear why hethought one was needed. Several weeks aftermaking the request, he was summarily fired.Another example occurred when a large urbanschool district used one set of national norms tointerpret pretest results and another norms tablefor the posttest. The result was a spuriousportrayal and attendant wrong conclusion thatthe students’ test performance had vastlyimproved between the first and second testadministrations. Still another example was seenwhen an evaluator gave her superintendent asound program evaluation report, showing bothstrengths and weaknesses of the targetedprogram. The evaluator was surprised anddismayed one week later, when thesuperintendent released to the public a revisedversion showing only the program’s strengths.

Evaluators need to be cautious in how theyrelate to the public relations activities of theirsponsors, clients, and supervisors. Certainly,public relations documents will referenceinformation from sound evaluations.Evaluators should persuade their audiences tomake honest use of the evaluation findings.Evaluators should not be party to misuses,especially in cases where erroneous reports areissued that predictably will mislead readers tobelieve that a seriously flawed program is good.As one safeguard evaluators can promote andhelp their clients arrange to have independentmetaevaluators examine the organization’sproduction and use of evaluation findingsagainst professional standards for evaluations.

Approach 2: Politically Controlled Studies

The politically controlled study is an approachthat can be either defensible or indefensible. Apolitically controlled study is illicit if theevaluator and/or client (a) withhold the full setof evaluation findings from audiences who haveexpress, legitimate, and legal rights to see thefindings; (b) abrogate their prior agreement tofully disclose the evaluation findings; or (c) biasthe evaluation message by releasing only part ofthe findings. It is not legitimate for a client firstto agree to make the findings of a commissionedevaluation publicly available and then, havingpreviewed the results, to release none or onlypart of the findings. If and when a client orevaluator violates the formal written agreementon disseminating findings or applicable law,then the other party has a right to takeappropriate actions and/or seek anadministrative or legal remedy.

However, clients sometimes can legitimatelycommission covert studies and keep the findingsprivate, while meeting applicable laws andadhering to an appropriate advance agreementwith the evaluator. This is especially the case inthe U.S. for private organizations not governedby public disclosure laws. Also, an evaluator,

Psuedoevaluations 9

under legal contractual agreements, can plan,conduct, and report an evaluation for privatepurposes, while not disclosing the findings toany outside party. The key to keeping client-controlled studies in legitimate territory is toreach appropriate, legally defensible, advance,written agreements and to adhere to thecontractual provisions concerning release of thestudy’s findings. Such studies also have toconform to applicable laws on release ofinformation.

The advance organizers for a politicallycontrolled study include implicit or explicitthreats faced by the client for a programevaluation and/or objectives for winningpolitical contests. The client’s purpose incommissioning such a study is to secureassistance in acquiring, maintaining, orincreasing influence, power, and/or money. Thequestions addressed are those of interest to theclient and special groups that share the client’sinterests and aims. The main questions ofinterest to the client are, What is the truth, asbest can be determined, surrounding theparticular dispute or political situation? Whatinformation would be advantageous in apotential conflict situation? What data might beused advantageously in a confrontation?Typical methods of conducting the politicallycontrolled study include covert investigations,simulation studies, private polls, privateinformation files, and selective release offindings. Generally, the client wants obtainedinformation to be as technically sound aspossible. However, he or she may also want towithhold findings that do not support his or herposition. The approach’s strength is that itstresses the need for accurate information.However, because the client might releaseinformation selectively to create or sustain anerroneous picture of a program’s merit andworth, might distort or misrepresent thefindings, might violate a prior agreement tofully release the findings, or might violate a

“public’s right to know” law, this type of studycan degenerate into a pseudoevaluation.

For obvious reasons, persons have not beennominated to receive credit as pioneers ordevelopers of the illicit, politically controlledstudy. To avoid the inference that this type ofstudy is imaginary, consider the followingexamples.

A superintendent of one of the nation’s largestpublic school districts once confided that hepossessed an extensive notebook of detailedinformation about each school building in hisdistrict. The information included studentachievement, teacher qualifications, racial mixof teachers and students, average per-pupilexpenditure, socioeconomic characteristics ofthe student body, teachers’ average length oftenure in the system, and so forth. Theaforementioned data revealed a highlysegregated district with uneven distribution ofresources and markedly different achievementlevels across schools. When asked why all thenotebook’s entries were in pencil, thesuperintendent replied it was absolutelyessential that he be kept informed about thecurrent situation in each school; but he said itwas also imperative that the community-at-large, the board, and special interest groups inthe community, in particular, not have access tothe information, for any of these groups mightpoint to the district’s inequities as a basis forprotest and even removing the superintendent.Hence, one special assistant kept the documentup-to-date; only one copy existed, and thesuperintendent kept that locked in his desk. Thepoint of this example is not to negatively judgethe superintendent’s behavior. Instead, thesuperintendent’s ongoing covert investigationand selective release of information wasdecidedly not a case of true evaluation, for whathe disclosed to the right-to-know audiences didnot fully and honestly inform them about theobserved situation in the district. This examplemay appropriately be termed a pseudoevaluation

10 Stufflebeam

because it both underinformed and misinformedthe school district’s stakeholders.

Cases like this undoubtedly led to the federaland state sunshine laws in the United States.Under current U.S. and state freedom ofinformation provisions, most informationobtained through the use of public funds mustbe made available to interested and potentiallyaffected citizens. Thus, there exist legaldeterrents to and remedies for illicit, politicallycontrolled evaluations that use public funds.

While it would be unrealistic to recommend thatadministrators and other evaluation users notobtain and selectively employ information forpolitical gain, they should not misrepresent theirpolitically controlled information-gathering andreporting activities as sound evaluation.Evaluators should not lend their names andendorsements to evaluations presented by theirclients that misrepresent the full set of relevantfindings, that present falsified reports aimed atwinning political contests, or that violateapplicable laws and/or prior formal agreementson release of findings. Before addressing the next group of studytypes, a few additional comments are in orderconcerning pseudoevaluation studies. Theseapproaches have been considered because theyare a prominent part of the evaluation scene.Sometimes “evaluators” and their clients are co-conspirators in performing a purposelymisleading study. On other occasions,evaluators, believing they are doing anassessment that is impartial, technically sound,and contracted to inform the public, discoverthat their client had other intentions or decidesto abrogate prior evaluation agreements. Whenthe time is right, the client is able to subvert thestudy in favor of producing the desired biasedpicture or none at all. It is imperative thatevaluators be more alert than they often are to

these kinds of potential conflicts. Otherwise,they will be unwitting accomplices in efforts tomislead through evaluation.

Such instances of misleading constituentsthrough purposely biased reports or cover-up offindings, to which the public has a right,underscore the importance of havingprofessional standards for evaluation work,faithfully applying them, and periodicallyengaging outside evaluators to assess one’sevaluation work. It is also prudent to developadvance contracts and memoranda ofagreements to ensure that the sponsor andevaluator agree on procedures and safeguards toassure that the evaluation will comply withcanons of sound evaluation and pertinent legalrequirements. Despite these warnings, it can belegitimate for evaluators to give privateevaluative feedback to clients, provided thatapplicable laws, statutes, and policies are metand sound contractual agreements on release offindings are reached and honored.

III. QUESTIONS/METHODS-ORIENTED\EVALUATION APPROACHES

Q u es t i ons / me t h o d s -o r i e n t e d p r ogr a mevaluation approaches are so labeled becausethey start with particular questions and thenmove to the methodology appropriate foranswering the questions. Only subsequently dothey consider whether the questions andmethodology are appropriate for developing andsupporting value claims. These studies can becalled quasi-evaluation studies, becausesometimes they happen to provide evidence thatfully assesses a program’s merit and worth,while in other cases, their focus is too narrow oris only tangential to questions of merit andworth. Quasi-evaluation studies have legitimateuses apart from their relationship to programevaluation, since they can focus on importantquestions, even though they are narrow inscope. The main caution is that these types ofstudies not be uncritically equated withevaluation.

Approach 3: Objectives-Based Studies

The objectives-based study is the classicexample of a questions/methods-orientedevaluation approach (Madaus & Stufflebeam,1988). In this approach, some statement ofobjectives provides the advance organizer. Theobjectives may be mandated by the client,formulated by the evaluator, or specified by theservice providers. The usual purpose of anobjectives-based study is to determine whetherthe program’s objectives have been achieved.Program developers, sponsors, and managersare typical audiences for such a study. Theseaudiences want to know the extent to whicheach stated objective was achieved.

The methods used in objectives-based studiesessentially involve specifying operationalobjectives and collecting and analyzingpertinent information to determine how welleach objective was achieved. A wide range ofobjective and performance assessments may beemployed. Criterion-referenced tests areespecially relevant to this evaluation approach.

Ralph Tyler is generally acknowledged to bethe pioneer in the objectives-based type ofstudy, although Percy Bridgman and E. L.Thorndike probably should be credited alongwith Tyler. Many people have furthered the4

work of Tyler by developing variations of hisobjectives-based evaluation model. A few ofthem are Bloom et al. (1956), Hammond(1972), Metfessel and Michael (1967), Popham(1969), Provus (1971), and Steinmetz (1983).

The objectives-based approach is especiallyapplicable in assessing tightly focused projectsthat have clear, supportable objectives. Eventhen, such studies can be strengthened byjudging project objectives against the intendedbeneficiaries’ assessed needs, searching for sideeffects, and studying the process as well as theoutcomes.

Undoubtedly, the objectives-based study hasbeen the most prevalent approach used in thename of program evaluation. It is one that hasgood common sense appeal; programadministrators have had a great amount ofexperience with it; and it makes use oftechnologies of behavioral objectives and bothnorm-referenced and criterion-referencedtesting. Common criticisms are that such

12 Stufflebeam

studies lead to terminal information that is oflittle use in improving a program or otherenterprise; that this information often is far toonarrow in scope to constitute a sufficient basisfor judging the object’s merit and worth;relatedly, that they do not uncover positive andnegative side effects; and that they may creditunworthy objectives.

Approach 4: Accountability, ParticularlyPayment By Results Studies

The accountability study became prominent inthe early 1970s. Its emergence seems to havebeen connected to widespread disenchantmentwith the persistent stream of evaluation reportsindicating that almost none of the massive stateand federal investments in educational andsocial programs were making any positive,statistically discernable difference. Oneproposed solution posited that accountabilitysystems could be initiated to ensure both thatservice providers would carry out theirresponsibilities to improve services and thatevaluators would do a thorough job ofidentifying the effects of improvement programsand determining which persons and groups weresucceeding and which were not.

The advance organizers for the accountabilitystudy are the persons and groups responsible forproducing results, the service providers’ workresponsibilities, and the expected outcomes.The study’s purposes are to provide constituentswith an accurate accounting of results, to ensurethat the results are primarily positive, and topinpoint responsibility for good and badoutcomes. Sometimes accountability programsadminister both sanctions and rewards to theresponsible service providers, depending on theextent and quality of their services andachievement.

The questions addressed in accountabilitystudies come from the program’s constituentsand controllers, such as taxpayers; parent

groups; school boards; and local, state, andnational funding organizations. The mainquestion that the groups want answeredconcerns whether each involved serviceprovider and organization charged withresponsibility for delivering and improvingservices is carrying out its assignments andachieving all it should, given the investments ofresources to support the work.

A wide variety of methods have been used toensure and assess accountability. These includeperformance contracting; Program Planning andBudgeting System (PPBS); Management ByObjectives (MBO); Zero Based Budgeting;mandated “program drivers” and indicators;program input, process, output databases;independent goal achievement auditors;procedural compliance audits; peer review;merit pay for individuals and/or organizations;collective bargaining agreements; mandatedtesting programs; institutional report cards; self-studies; site visits by expert panels; andprocedures for auditing the design, process, andresults of self-studies. Also included aremandated goals and standards, decentralizationand careful definition of responsibility andauthority, payment by results, awards andrecognition, sanctions, takeover/interventionauthority by oversight bodies, and competitivebidding.

Lessinger (1970) is generally acknowledged asa pioneer in the area of accountability. Some ofthe people who have extended Lessinger’s workare Stenner and Webster, in their developmentof a handbook for conducting auditingactivities, and Kearney, in providing leadership5

to the Michigan Department of Education indeveloping the first statewide educationalaccountability system. A recent major attemptat accountability, involving sanctions andrewards, was the ill-fated, heavily-fundedKentucky Instructional Results InformationSystem (Koretz & Barron, 1998). The failure ofthis program was clearly associated with fast

Questions/Methods-Oriented Evaluation Approaches 13

pace implementation in advance of validation,reporting and later retraction of flawed results,results that were not comparable to those inother states, payment by results that fosteredteaching tests and other cheating in the schools,and heavy expense–associated with performanceassessments–that could not be sustained overtime. Kirst (1990) analyzed the history anddiversity of attempts at accountability ineducation within the following six broad typesof accountability: performance reporting,monitoring and compliance with standards orregulations, incentive systems, reliance on themarket, changing locus of authority or control ofschools, and changing professional roles.

Accountability approaches are applicable toorganizations and professionals funded andcharged to carry out public mandates, deliverpublic services, implement specially fundedprograms, etc. It behooves these programleaders to maintain a dynamic baseline ofinformation needed to demonstrate fulfillmentof responsibilities and achievement of positiveresults. They should focus accountabilitymechanisms especially on those programelements that can be changed with the prospectof improving outcomes. They should also focusaccountability to enhance staff cooperationtoward achievement of collective goals ratherthan to stimulate counterproductive competition.Moreover, accountability studies that comparedifferent programs should fairly consider theprograms’ different contexts, includingespecially beneficiaries’ characteristics andneeds, local support, available resources, andexternal forces.

The main advantages of accountability studiesare that they are popular among constituentgroups and politicians and are aimed atimproving public services. Also, they canprovide program personnel with clearexpectations against which to plan, execute, andreport on their services and contributions. Theycan also be designed to give service providers

both freedom to innovate on procedures andclear expectations and requirements forproducing and reporting on sound outcomes. Inaddition, setting up healthy, fair competitionbetween comparable programs can result inbetter services and products for consumers.

A main disadvantage is that accountabilitystudies often issue invidious comparisons andthereby produce unhealthy competition andmuch political unrest and acrimony amongeducators and between them and theirconstituents. Also, accountability studies oftenfocus too narrowly on outcome indicators andcan undesirably narrow the range of servicesprovided. Another disadvantage is thatpoliticians tend to force the implementation ofaccountability efforts before the neededinstruments, scoring rubrics, assessor training,etc. can be planned, developed, field-tested, andvalidated. Furthermore, prospects for rewardsand threats of sanctions have often led serviceproviders to cheat in order to assure positiveevaluation reports. For example, in schools,cheating to obtain rewards and avoid sanctionshas frequently generated bad teaching, badpress, and turnover in leadership.

Approach 5: Objective Testing Programs

Since the 1930s, American education has beeninundated with standardized, multiple choice,norm-referenced testing programs. Probablyevery school district in the United States hassome type of standardized testing program ofthis type. Such tests are administered annuallyby local school districts and/or state educationdepartments to inform students, parents,educators, and the public at large about theachievements of children and youth. Their mainpurposes are to assess the achievements ofindividual students and groups of studentscompared to norms and/or standards. Typically,these tests are administered to all students inapplicable grade levels. Because these testresults focus on student outcomes and are

14 Stufflebeam

conveniently available, many educators havetried to use the results to evaluate the quality ofspecial projects and specific school programs byinferring that high scores reflect successfulefforts and that low scores reflect poor efforts.Such inferences can be erroneous if the testswere not targeted on particular project orprogram objectives or the needs of particulartarget groups of students and if the students’background characteristics were not taken intoaccount.

Advance organizers for standardizededucational tests include areas of the schoolcurriculum and specified norm groups. Themain purposes of testing programs are tocompare the test performance of individualstudents and groups of students to those ofselected norm groups and/or to diagnoseshortfalls related to particular objectives.Additionally, standardized test results are oftenused to compare the performance of differentprograms, schools, etc., and to examineachievement trends across years. Metrics usedto make the comparisons typically arestandardized individual and mean scores for thetotal test and subtests.

The sources of questions addressed by testingprograms are usually test publishers and testdevelopment/selection committees. The typicalquestion addressed by these tests concernswhether the test performance of individualstudents is at or above the average performanceof local, state, and national norm groups. Otherquestions may concern the percentages ofstudents who surpassed one or more cut-scorestandards, where the group of students ranks incomparison with other groups, or whether thecurrent year’s achievement is better than in prioryears. The main process involved in usingtesting programs is to select, administer, score,interpret, and report the tests.

Lindquist (1951), a major pioneer in this area,was instrumental in developing the Iowa testing

programs, the American College TestingProgram, the National Merit ScholarshipTesting Program, and the General EducationalDevelopment Testing Program, as well as theMeasurement Research Center at the Universityof Iowa. Many people have contributedsubstantially to the development of educationaltesting in America, including Ebel (1965),Flanagan (1939), Lord and Novick (1968), andThorndike (1971). In the 1990s a number ofpersons innovated in such areas of testing asitem response theory (Hambleton &Swaminathan, 1985) and value-addedmeasurement (Sanders & Horn, 1994; Webster,1995).

Virtually all public schools in the U.S. engagein one or more forms of standardized, objectiveachievement testing. If the school’s personnelcarefully select such tests and use themappropriately to assess and improve studentlearning and report to the public, the involvedexpense and effort is highly justified. However,they should be careful not to rely on theseresults for evaluating specially targeted projectsand programs. Student outcome measures forjudging specific projects and programs must bevalidated in terms of the particular objectivesand the characteristics and needs of the studentsbeing served by the program.

The main advantages of standardized-testingprograms are that they are efficient in producingvalid and reliable information on studentperformance in many areas of the schoolcurriculum and that they are a familiar strategyat every level of the school program in virtuallyall school districts in the United States. Themain limitations are that they provide data onlyabout student outcomes; they reinforce students’multiple-choice test-taking behavior rather thantheir writing and speaking behaviors; they tendto address only lower-order learning objectives;and, in many cases, they are perhaps a betterindicator of the socioeconomic levels of thestudents in a given program, school, or school


district than of the quality of the implicatedteaching and learning. Stake (1971) and othershave argued effectively that standardized testsoften are poor approximations of what teachersactually teach. Moreover, as has been patentlyclear in evaluations of programs for bothdisadvantaged students and gifted students,norm-referenced tests often do not measureachievements well for the low and high scoringstudents. Unfortunately, program evaluatorsoften have made uncritical uses of standardizedtest results to judge a program’s outcomes, justbecause the results are conveniently availableand have face validity to the public. Manytimes the contents of such tests do not match theprogram’s objectives. Also, they may measurewell the differences between students in themiddle of the achievement distribution butpoorly for the slow learners often targeted byspecial education programs and high achievers.

Approach 6: Outcomes Monitoring/Value-Added Assessments

Recurrent outcomes/value-added assessment isa special case of the use of standardized testingto evaluate the effects of programs and policies.The emphasis here is on annual testing in orderto assess trends and partial out effects of thedifferent levels and components of aneducational system. Characteristic of thisapproach is the cyclical collection of outcomemeasures based on standardized indicators,analysis of results in relation to policyquestions, and reporting of overall results plusspecific policy-relevant analyses. The maininterest is in an aggregate, not individualperformance. A state education departmentmay regularly collect achievement data from allstudents (at selected grade levels), as is the casein the Tennessee Value-Added AssessmentSystem. The evaluator may analyze the data tolook at contrasting results related to particularobjectives for schools using and not usingparticular programs. These results may befurther broken out to make comparisons

between classes, curricular areas, grade levels,teachers, schools, different size and resourceclassifications of schools, districts, and differentareas of a state. This approach differs from thetypical standardized achievement testingprogram in its emphasis on uncovering andanalyzing policy issues rather than onlyreporting on students’ progress. Otherwise, thetwo approaches have much in common.

The advance organizers in monitoringoutcomes and employing value-added analysisare the indicators of expected and possibleoutcomes and the scheme for classifying resultsto examine policy issues and/or program effects.The purposes of Outcomes Monitoring/Value-Added Assessment systems are direction forpolicymaking, accountability to constituents,and feedback for improving programs andservices. This approach also ensuresstandardization of data for assessment andimprovement throughout a system. The sourceof questions to be addressed by suchmonitoring systems originate from fundingorganizations, policymakers, the system’sprofessionals, and constituents.

Illustrative questions addressed by OutcomesMonitoring/Value-Added Assessment systemsare To what extent are particular programsadding value to students’ achievement? Whatare the cross-year trends in outcomes? In whatsectors of the system is the program workingbest and poorest? What are key, pervasiveshortfalls in particular program objectives thatrequire further study and attention? To whatextent are program successes and failuresassociated with the system’s differentorganizational levels?

Developers of the Outcomes Monitoring/Value-Added Assessment approach include especiallyWilliam Sanders and Sandra Horn (1994);William Webster (1995); Webster, Mendro, andAlmaguer (1994); and Peter Tymms (1995).These developers have used census data on

16 Stufflebeam

student achievement trends to diagnose areas forimprovement and look for effects of programsand policies. What distinguishes the OutcomesMonitoring/Value-Added Assessment approachfrom the traditional standardized testingprogram is sophisticated analysis of data topartial out effects of programs and policies andto identify areas where new policies andprograms are needed. In contrast to theseapplications, the typical standardized testingprogram is focused more on providing feedbackon the performance of individual students andgroups of students, without the attendant policy-oriented analysis. Probably the OutcomesMonitoring/Value-Added Assessment approachis mainly feasible for well-endowed stateeducation departments and large school districtswhere there is strong support from policygroups, administrators, and service providers tomake the approach work. It requiressystemwide buy-in; politically effective leadersto continually explain and sell the program; asmoothly operating, dynamic, computerizedbaseline of relevant input and outputinformation; highly skilled technicians to makeit run efficiently and accurately; complicatedstatistical analysis; and high-level commitmentto use the results for purposes of policydevelopment , accountabi li ty, programevaluation, and improvement at all levels of thesystem.

The central advantage of OutcomesMonitoring/Value-Added Assessment is in thesystematization and institutionalization of adatabase of outcomes that can be used over timeand in a standardized way to study and findmeans to improve outcomes. Also, OutcomesMonitoring/Value-Added Assessment isconducive to using a standard of continuousprogress across years for every student asopposed to employing static cut scores. Thelatter, while prevalent in accountabilityprograms, basically fail to take into accountmeaningful gains by low or high achievingstudents, since these gains usually are far

removed from the static, cut score standards.Also, Sanders and Horn (1994) have shown thatuse of static cut scores may produce a “shedpattern,” in which students who began belowthe cut score make the greatest gains while thosewho started above the cut score standard makelittle progress. Like the sloping roof of a toolshed, the gains are greatest for previously lowscoring students and progressively lower for thehigher achievers. This suggests that teachers areconcentrating mainly on getting students to thecut score standard but not beyond it and thus“holding back the high achievers.” Thisapproach makes efficient use of standardizedtests; is amenable to analysis of trends at state,district, school, and classroom levels; usesstudents as their own controls; and emphasizesservice to every student.

A major disadvantage of this approach is that itis politically volatile, since it is used to identifyresponsibility for successes and failures down tothe levels of schools and teachers. Also, it isconstrained mainly to use quantitativeinformation such as that coming fromstandardized, multiple choice achievement tests.Consequently, the complex and powerfulanalyses are based on a limited scope ofoutcome variables. Nevertheless, Sanders(1989) has argued that a strong body ofevidence supports the use of well-constructed,standardized, multiple choice achievement tests.Beyond the issue of outcome measures, theapproach does not provide in-depthdocumentation of program inputs and processesand makes little if any use of qualitativemethods. Despite the advancements inobjective measurement and the employment ofhierarchical mixed models to defensibly partialout effects of a system’s organizationalcomponents and individual staff members,critics of the approach argue that causal factorsare so complex that no measurement andanalysis system can fairly fix responsibility tothe level of teachers for the academic progressof individual and collections of students.


Approach 7: Performance Testing

In the 1990s, there were major efforts to offsetthe limitations of the typical multiple choicetests by employing performance or authenticmeasures. These are devices that requirestudents to demonstrate the performance beingassessed by producing authentic responses, suchas written or spoken answers, musical orpsychomotor presentations, portfolios of workproducts, or group solutions to definedproblems. Arguments given for suchperformance tests are that they have high facevalidity and model and reinforce the skills thatstudents should be acquiring through theirstudies. For example, students are not beingtaught so that they will do well in choosing bestanswers from a list, but so that they will masterthe underlying understandings and skills andeffectively apply them to real life problems.

The advance organizers in performanceassessments are life skill objectives and content-related performance tasks plus ways that theirachievement can be demonstrated in practice.The main purpose of performance tests is tocompare the test performance of individualstudents and groups of students to modelperformance on the assessment tasks. Gradesassigned to each respondent’s performance,using set rubrics, enables assessment of thequality of achievements represented andcomparisons across groups.

The sources of questions that performance testsaddress are analyses of selected life skill tasksand content specifications in curricularmaterials. The typical questions addressed byperformance tests concern whether individualstudents can effectively write, speak, figure,analyze, lead, work cooperatively, and solvegiven problems up to the level of acceptablestandards. The main process involved in usingperformance tests is to define areas of skills tobe assessed; select the type of assessmentdevice; construct the assessment tasks;

determine scoring rubrics; define standards forassessing performance; train and calibratescorers; validate the measures; and administer,score, interpret, and report the test results.

In speaking of licensing tests, Flexner (1910)called for tests that ascertain students’ practicalability to successfully confront and solveproblems in concrete cases. Some of thepioneers in applying performance assessment tostate education systems were the state educationdepartments in Vermont and Kentucky(Kentucky Department of Education, 1993;Koretz, 1986, 1996; Koretz & Barron, 1998).Other sources of information about the generalapproach and issues in performance testinginclude Baker, O’Neil, and Linn (1993);Herman, Gearhart, and Baker (1993); Linn,Baker, and Dunbar (1991); Mehrens (1972);Messick (1994); Stillman, Haley, Regan,Philbin, Smith, O’Donnell, and Pohl (1991);Swanson, Norman, and Linn (1995); Torrance(1993); and Wiggins (1989).

Often it is difficult to obtain the conditionsnecessary to employ the performance testingapproach. It requires a huge outlay of time andresources for development and application.Typically, state education departments andschool districts probably should use thisapproach very selectively and only when theycan make the investment needed to producevalid results that are worth the large, requiredinvestment. On the other hand, students’writing ability is best assessed and nurturedthrough obtaining, assessing, and providingcritical feedback on students’ writing samples.

The main advantages of performance testingprograms are that they require students toconstruct responses to assessment tasks that areakin to what they will have to do in real life.They eliminate guessing from the testing task.They also reinforce life skills, such as being ableto write or otherwise construct responses ratherthan pass multiple choice tests.

18 Stufflebeam

Major disadvantages of the approach are heavytime requirements for administration; high costsof scoring; difficulty in achieving reliablescores; narrow scope of skills that can feasiblybe assessed; and lack of norms forcomparisons, especially at the national level. Ingeneral, performance tests are inefficient, costly,and often of dubious reliability. Moreover,compared with multiple choice tests,performance tests, in the same amount of testingtime, can cover only a much narrower range ofquestions.

Approach 8: Experimental Studies

In using controlled experiments, programevaluators randomly assign subjects or groupsof subjects to experimental and control groupsand then contrast the outcomes when theexperimental group receives a particularintervention and the control group receives nospecial treatment or some different treatment.This type of study was quite prominent inprogram evaluation during the late 1960s andearly 1970s, when there was a federalrequirement to assess the effectiveness offederally funded innovations. However,exper imental program eva lua t ionssubsequently fell into disfavor and disuse. (Inthe 1990s, controlled experiments in educationhave been rare [Nave, Misch , &Mosteller,1999].) Apparent reasons for thisdecline are that evaluators rarely can meet therequired experimental conditions andassumptions and the prevalent finding has been“no statistically significant result.”

This approach is labeled as a questions-orientedor quasi-evaluation strategy because it startswith questions and methodology that mayaddress only a narrow set of the questionsneeded to assess a program’s merit and worth.In the 1960s, Campbell and Stanley (1963) andothers hailed the true experiment as the onlysound means of evaluating interventions. Thispiece of evaluation history reminds one of

Kaplan’s (1964) famous warning against the so-called “law of the instrument,” whereby a givenmethod is equated to a field of inquiry. In sucha case, the field of inquiry is restricted to thequestions that are answerable by the givenmethod. Fisher (1951) specifically warnedagainst equating his experimental methods withscience. Similarly, experimental design is amethod that can contribute importantly toprogram evaluation, as Nave, Misch, andMosteller (1999) have demonstrated, but byitself it is often insufficient to address a client’sfull range of evaluation questions.

The advance organizers in experimental studiesare problem statements, competing treatments,hypotheses, investigatory questions, andrandomized treatment and comparison groups.The usual purpose of the controlled experimentis to determine causal relationships betweenspecified independent and dependent variables,such as a given instructional method and studentstandardized-test performance. It is particularlynoteworthy that the sources of questionsinvestigated in the experimental study areresearchers, program developers, and policyfigures, and not usually a program’s constituentsand practitioners.

The frequent question in the experimental studyis, What are the effects of a given interventionon specified outcome variables? Typicalmethods used are experimental and quasi-experimental designs. Pioneers in usingexperimentation to evaluate programs areCampbell and Stanley (1963), Cronbach andSnow (1969), and Lindquist (1953). Otherpersons who have developed the methodologyof experimentation substantially for programevaluation are Boruch (1994); Glass andMaguire (1968); Nave, Misch, and Mosteller(1999); Suchman (1967); and Wiley and Bock(1967).

Evaluators should consider conducting acontrolled experiment only when its required


conditions and assumptions can be met. Oftenthis requires substantial political influence,substantial funding, and widespreadagreement–e.g., among the targeted educators,parents, and teachers—to submit to therequirements of the experiment. Suchrequirements typically include, among others, astabilized program that will not have to be studiedand modified during the evaluation; the ability toestablish and sustain comparable program andcontrol groups; the ability to keep the programand control conditions separate anduncontaminated; and the ability to obtain theneeded criterion measures from all or at least arepresentative group of the members of theprogram and comparison groups. Evaluabilityassessment was developed as a particularmethodology for determining the feasibility ofmoving ahead with an experiment (Smith, 1989;Wholey, 1995).

Controlled experiments have a number ofadvantages. They focus on results and not justintentions or judgments. They provide strongmethods for establishing relatively unequivocalcausal relationships between treatment andoutcome variables; this ability can be especiallysignificant when program effects are small butimportant. Moreover, because of the prevalentuse and success of experiments in such fields asmedicine and agriculture, the approach haswidespread credibility.

The above advantages are offset by seriousobjections to experimenting on school studentsand other subjects. It is often consideredunethical or even illegal to deprive the controlgroup of the benefits of special funds forimproving services. Likewise, many parentsdon’t want schools to experiment on theirchildren by applying unproven interventions.Typically, schools find it impractical andunreasonable to randomly assign students totreatments and to hold treatments constantthroughout the study period. Also, experimentalstudies provide a much narrower range of

information than schools or other organizationsoften need to assess and strengthen theirprograms. On this point, experimental studiestend to provide terminal information that is notuseful for guiding the development andimprovement of programs and in fact need tothwart ongoing modifications of the treatments.

Approach 9: Management Information Systems

The management information system is like thepolitically controlled approaches, except that itsupplies managers with the information they needto conduct and report on their programs, asopposed to supplying them with the informationthey need to win a political advantage. Themanagement information approach is also like thedecision/accountability-oriented approach, whichwill be discussed later, except that thedecision/accountability-oriented approachprovides information needed to both developand defend a program’s merit and worth, whichgoes beyond providing information that managersneed to implement and report on theirmanagement responsibilities.

The advance organizers in most managementinformation systems include programobjectives, specified activities, and projectedprogram milestones or events. A managementinformation system’s purpose, as already implied,is to continuously supply managerswith the information they need to plan, direct,control, and report on their programs or spheresof responsibility.

The sources of questions addressed are themanagement personnel and their superiors. Themain questions they typically want answeredare, Are program activities being implementedaccording to schedule, according to budget, andwith the expected results? To provide readyaccess to information for addressing suchquestions, these systems regularly store and makeaccessible up-to-date information on theprogram’s goals, planned operations, actual

20 Stufflebeam

operations, staff, program organization,operations, expenditures, threats, problems,publicity, achievements, etc.

Methods employed in management informationsystems include system analysis, ProgramEvaluation and Review Technique (PERT),Critical Path Method, Program Planning andBudgeting System (PPBS), Management byObjectives, computer-based informationsystems, periodic staff progress reports, andregular budgetary reporting.

Cook (1966) introduced the use of PERT ineducation, and Kaufman (1969) wrote about theuse of management information systems ineducation. Business schools and programs incomputer information systems regularly providecourses in management information systems.Mainly, these focus on how to set up andemploy computerized information banks for usein organizational decision making.

W. Edwards Deming (1986) argued thatmanagers should pay close attention to processrather than being preoccupied with outcomes.He advanced a systematic approach formonitoring and continuously improving anenterprise’s process, arguing that close attentionto the process will result in increasingly betteroutcomes. It is commonly said that, in payingattention to this and related advice fromDeming, Japanese car makers and later theAmericans greatly increased the quality ofautomobiles (Aguaro, 1990). Bayless andMassaro (1992) applied Deming’s approach toprogram evaluations in education. Based onthis writer’s observations, the approach was notwell suited to assessing the complexities ofeducational processes—possibly because, unlikethe manufacture of automobiles, educators haveno definitive, standardized models for linkingexact educational processes to specifiedoutcomes.

Nevertheless, given modern databasetechnology, program managers often can andshould employ management informationsystems in multiyear projects and programs.Program databases can provide information notonly for keeping programs on track, but also forassisting in the broader study and improvementof program processes and outcomes.

A major advantage of the use of managementinformation systems is in giving managersinformation they can use to plan, monitor,control, and report on complex operations. Amajor difficulty with the application of thisindustry-oriented type of system to educationand social services is that the products of manysuch programs are not amenable to a narrow,precise definition as is the case with acorporation’s profit and loss statement.Moreover, processes in educational and socialprograms often are complex and evolving ratherthan straightforward and standardized like thoseof manufacturing and business. Theinformation gathered in managementinformation systems typically lacks the scope ofcontext, input, process, and outcomeinformation required to assess a program’s meritand worth.

Approach 10: Benefit-Cost Analysis Approach

Benefit-cost analysis as applied to programevaluation is a set of largely quantitativeprocedures used to understand the full costs ofa program and to determine and judge whatthose investments returned in objectivesachieved and broader social benefits. The aimis to determine costs associated with programinputs, determine the monetary value of theprogram outcomes, compute benefit-cost ratios,compare the computed ratios to those of similarprograms, and ultimately judge the program’sproductivity in economic terms.

The benefit-cost analysis approach to programevaluation may be broken down into three levels


of procedures: (1) cost analysis of programinputs, (2) cost-effectiveness analysis, and (3)benefit-cost analysis. These may be looked atas a hierarchy. The first type, cost analysis ofprogram inputs, may be done by itself. Suchanalyses entail an ongoing accumulation of aprogram’s financial history. These analyses areof use in controlling program delivery andexpenditures. The program’s financial historycan be used to compare the program’s actualcosts to the projected costs in the originalbudget and to the costs of similar programs.Also, cost analyses can be extremely valuable tooutsiders who might be interested in replicatingthe program.

Cost-effectiveness analysis necessarily includescost analysis of program inputs to determine thecost associated with the progress towardachieving each objective. Such analyses mightcompare two or more programs’ costs andsuccesses in achieving the same objectives. Aprogram could be judged superior on cost-effectiveness grounds if it had the same costsbut superior outcomes as similar programs. Orthe program could still be judged superior oncost-effectiveness grounds if it achieved thesame objectives as more expensive programs.Cost-effectiveness analyses do not requireconversion of outcomes to monetary terms butmust be keyed to clear, measurable programobjectives.

Benefit-cost analyses typically build on a costanalysis of program inputs and a cost-effectiveness analysis. But the benefit-costanalysis goes further. It seeks to identify abroader range of outcomes than just thoseassociated with program objectives. Itexamines the relationship between theinvestment in a program and the extent of positiveand negative impacts on the program’senvironment. In doing so, it ascertains andplaces a monetary value on program inputs andeach identified outcome. It identifies aprogram’s benefit-cost ratios and compares

these to similar ratios for competing programs.Ultimately, benefit-cost studies seek conclusionsabout the comparative benefits and costs of theexamined programs.

Advance organizers for the overall benefit-costapproach are associated with cost breakdownsfor both program inputs and program outputs.Program input costs may be delineated by lineitems (e.g., personnel, travel, materials,equipment, communications, facilities,contracted services, overhead, etc.), by programcomponents, by year, etc. In cost-effectivenessanalysis, a program’s costs are examined inrelation to each program objective, and these mustbe clearly defined and assessed. The moreambitious benefit-cost analyses look at costsassociated with main effects and side effects,tangible and intangible outcomes, positive andnegative outcomes, and short-term and long-term outcomes—both inside and outside theprogram. Frequently, they also may breakdown costs by individuals and groups ofbeneficiaries. One may also estimate the costsof foregone opportunities and, sometimes,political costs. Even then, the real value ofbenefits associated with human creativity orself-actualization are nearly impossible toestimate. Consequently, the benefit-costequation rests on dubious assumptions anduncertain realities.

The purposes of these three levels of benefit-cost analysis are to gain clear knowledge ofwhat resources were invested, how they wereinvested, and with what effect. In popularvernacular, cost-effectiveness and benefit-costanalyses seek to determine the program’s “bangfor the buck.” There is great interest inanswering this type of question. Policy boards,program planners, and taxpayers are especiallyinterested to know whether program investmentsare paying off in positive results that exceed orare at least as good as those produced by similarprograms.

22 Stufflebeam

Authoritative information on the benefit-costapproach may be obtained by studying thewritings of Kee (1995), Levin (1983), andTsang (1997).

Benefit-cost analysis is potentially important inmost program evaluations. Evaluators and theirclients are advised to discuss this matterthoroughly with their clients, to reachappropriate advance agreements on what shouldand can be done to obtain the needed costinformation, and to do as much cost-effectiveness and benefit-cost analysis as can bedone well and within reasonable costs.

Benefit-cost analysis is an important butproblematic consideration in programevaluations. Most program evaluations areamenable to analyzing the costs of programinputs and maintaining a financial history ofexpenditures. The main impediment to this isthat program authorities often do not wantanyone other than the appropriate accountantsand auditors looking into the financial books. Ifcost analysis, even at only the input levels, is tobe done, this must be clearly provided for in theinitial contractual agreements covering theevaluation work. Performing cost-effectivenessanalysis can be feasible if cost analysis of inputsis agreed to; if there are clear, measurableprogram objectives; and if comparable costinformation can be obtained from competingprograms. Unfortunately, it is usually hard tomeet all these conditions needed for a successfulcost-effectiveness analysis. Even moreunfortunate is the fact that it is usuallyimpractical to conduct a thorough benefit-costanalysis. Not only must it meet all theconditions of the analysis of program inputs andcost-effectiveness analysis, but it must alsoplace monetary values on identified outcomes,both those anticipated and those not expected.

Approach 11: Clarification Hearing

The clarification hearing is one label for thejudicial approach to program evaluation. Thisapproach essentially puts a program on trial.Role-playing evaluators competitivelyimplement both a damning prosecution of theprogram—arguing that it failed—and a defenseof the program—arguing that it succeeded. Ajudge hears these arguments within theframework of a jury trial and controls theproceedings according to advance agreementson rules of evidence and trial procedures. Theactual proceedings are preceded by thecollection of and sharing of evidence by bothsides. The prosecuting and defending evaluatorsmay call witnesses and place documents andother exhibits into evidence. A jury hears theproceedings and ultimately makes and issues aruling on the program’s success or failure.Ideally, the jury is composed of personsrepresentative of the program’s stakeholders.By videotaping the proceedings, theadministering evaluator can, after the trial,compile a condensed videotape as well asprinted reports to disseminate what was learnedthrough the process.

The advance organizers for a clarificationhearing are criteria of program effectiveness thatboth the prosecuting and defending sides agreeto apply. The judicial approach’s main purposeis to ensure that the evaluation’s audience willreceive balanced evidence on the program’sstrengths and weaknesses. The key questionsessentially are, Should the program be judged asuccess or failure? Is it as good or better thanalternative programs that address the sameobjectives?

Robert Wolf (1975) pioneered the judicialapproach to program evaluation. Others whoapplied, tested, and further developed theapproach include Levine (1974), Owens (1973),and Popham and Carlson (1983).


Based on the past uses of this approach, it canbe judged as only marginally relevant toprogram evaluation. By its adversarial nature,the approach prods the evaluators to presentbiased arguments in order to win their cases.The approach subordinates truth seeking towinning. Accuracy suffers in this process. Themost effective debaters are likely to convincethe jury of their position even when it is poorlyfounded. Also, the approach is politicallyproblematic, since it generates considerableacrimony. Despite the attractiveness of usingthe law as a metaphor for program evaluation,with the law’s attendant rules of evidence, thepromise of this application has not beenfulfilled. There are few occasions in which itmakes practical sense for evaluators to applythis approach.

Approach 12: Case Study Evaluations

A case-study-based program evaluation is afocused, in-depth description, analysis, andsynthesis of a particular program or otherobject. The investigators do not control theprogram in any way. Instead, they look at it asit is occurring or as it occurred in the past. Thestudy looks at the program in its geographic,cultural, organizational, and historical contexts.It closely examines the program’s internaloperations and how it uses inputs and processesto produce outcomes. It examines a wide rangeof intended and unexpected outcomes. It looksat the program’s multiple levels and alsoholistically at the overall program. Itcharacterizes both central, dominant themes andvariations and aberrations. It defines anddescribes the program’s intended and actualbeneficiaries. It examines beneficiaries’ needsand to what extent the program effectivelyaddressed the needs. It employs multiplemethods to obtain and integrate multiple sourcesof information. While it breaks apart andanalyzes a program along various dimensions, italso provides an overall characterization of theprogram.

The main thrust of the case study approach is todelineate and illuminate a program, notnecessarily to guide its development and toassess and judge its merit and worth. Hence,this paper characterizes the case study approachas a questions/methods-oriented approach ratherthan an improvement/ accountability approach.

The advance organizers in case studies includethe definition of the program, characterization ofits geographic and organizational environment,the historical period in which it is to beexamined, the program’s beneficiaries and theirassessed needs, the program’s underlying logicof operation and productivity, and the key rolesinvolved in the program. A case study programevaluation’s main purpose is to providestakeholders and their audiences with anauthoritative, in-depth, well-documentedexplication of the program.

The case study should be keyed to the questionsof most interest to the evaluation’s mainaudiences. The evaluator must thereforeidentify and interact with the program’sstakeholders. Along the way stakeholders willbe engaged in helping to plan the study andinterpret findings. Ideally, the audiencesinclude the program’s oversight body,administrators, staff, financial sponsors,beneficiaries, and potential adopters of theprogram.

Typical questions posed by some or all of theabove audiences are, What is the program inconcept and practice? How has it evolved overtime? How does it actually operate to produceoutcomes? What has it produced? What are theshortfalls and negative side effects? What are thepositive side effects? In what ways and towhat degrees do various stakeholders value theprogram? To what extent did the programeffectively meet beneficiaries’ needs? Whatwere the most important reasons for theprogram’s successes and failures? What are theprogram’s most important unresolved issues?

24 Stufflebeam

How much has it cost? What are the costs perbeneficiary, per year, etc.? What parts of theprogram have been successfully transported toother sites? How does this program comparewith what might be called critical competitors?The above questions only illustrate the range ofquestions that a case study might address, sinceeach case study will be tempered by theinterests of the client and other audiences for thestudy and the evaluator’s interests.

To conduct effective case studies, evaluatorsneed to employ a wide range of qualitative andquantitative methods. These may includeanalysis of archives; collection of artifacts, suchas work samples; content analysis of programdocuments; both independent and participantobservations; interviews; logical analysis ofoperations; focus groups; tests; questionnaires;rating scales; hearings; forums; andmaintenance of a program database. Reportsmay incorporate in-depth descriptions andaccounts of key historical trends; focus oncritical incidents, photographs, maps, testimony,relevant news clippings, logic models, andcross-break tables; and summarize mainconclusions. The case study report may includepapers on key dimensions of the case, asdetermined with the audience, as well as anoverall holistic presentation and assessment.Case study reports may involve audio and visualmedia as well as printed documents.

Case study methods have existed for manyyears and have been applied in such areas asclinical psychology, law, the medicalprofession, and social work. Pioneers inapplying the method to program evaluationinclude Campbell (1975), Lincoln and Guba(1985), Platt (1992), Stake (1995), and Yin(1992).

The case study approach is highly conducive toprogram evaluation. It requires no controls oftreatments and subjects and looks at programs asthey naturally occur and evolve. It addresses

accuracy issues by employing and triangulatingmultiple perspectives, methods, and informationsources. It employs all relevant methods andinformation sources. It looks at programswithin relevant contexts and describescontextual influences on the program. It looksat programs holistically and in depth. Itexamines the program’s internal workings andhow it produces outcomes. It includes clearprocedures for analyzing qualitative information.It can be tailored to focus on the audience’smost important questions. It can be doneretrospectively or in real time. It can bereported to meet given deadlines andsubsequently updated based on furtherdevelopments.

The main limitation of the approach is thatsome evaluators may mistake its openness andlack of controls as an excuse for approaching ithaphazardly and bypassing steps to assure thatfindings and interpretations possess rigor as wellas relevance. Also, because of a preoccupationwith descriptive information, the case studyevaluator may not collect sufficient judgmentalinformation to permit a broad-based assessmentof a program’s merit and worth. Users of thisapproach might slight quantitative analysis infavor of qualitative analysis. By trying toproduce a comprehensive description of aprogram, the case study evaluator may notproduce timely feedback needed to help inprogram development. To overcome thesepotential pitfalls, evaluators using the case studyapproach should fully address the principles ofsound evaluation as related to accuracy, utility,feasibility, and propriety.

Approach 13: Criticism and Connoisseurship

The connoisseur-based approach was developedpursuant to the methods of art criticism andliterary criticism. This approach assumes thatcertain experts in a given substantive area arecapable of in-depth analysis and evaluation thatcould not be done in other ways. Just as a


national survey of wine drinkers could produceinformation concerning their overall preferencesfor types of wines and particular vineyards, itwould not provide the detailed, creditablejudgments of the qualities of particular winesthat might be derived from a single connoisseurwho has devoted a professional lifetime to thestudy and grading of wines and whosejudgments are highly and widely respected.

The advance organizer for the connoisseur-basedstudy is the evaluator’s special expertise andsensitivities. The study’s purpose is to describe,critically appraise, and illuminate a particularprogram’s merits. The evaluation questionsaddressed by the connoisseur-based evaluationare determined by expert evaluators—the criticsand authorities who have undertaken theevaluation. Among the major questions theycan be expected to ask are, What are theprogram’s essence and salient characteristics?What merits and demerits distinguish theparticular program from others of the samegeneral kind?

The methodology of connoisseurship includesthe critics’ systematic use of their perceptualsensitivities, past experiences, refined insights,and abilities to communicate their assessments.The evaluator’s judgments are conveyed invivid terms to help the audience appreciate andunderstand all of the program’s nuances.

Eisner (1975, 1983) has pioneered this strategyin education. A dozen or more of Eisner’s6

students have conducted research anddevelopment on the connoisseurship approach,e.g., Vallance (1973) and Flinders and Eisner(1994).

This approach obviously depends on thequalifications of the particular expert chosen todo the program evaluation. The approach alsorequires an audience that has confidence in andis willing to accept and use the connoisseur’sreport. The author of this paper would willingly

accept and use any evaluation that Dr. ElliottEisner agreed to present, but there are not manyEisners out there.

The main advantage of the connoisseur-basedstudy is that it exploits the particular expertiseand finely developed insights of persons whohave devoted much time and effort to the studyof a precise area. They can provide an array ofdetailed information that the audience can thenuse to form a more insightful analysis thanotherwise might be possible. The approach’sdisadvantage is that it is dependent on theexpertise and qualifications of the particularexpert doing the program evaluation, leavingroom for much subjectivity.

Approach 14: Program Theory-Based Evaluation

Program evaluations based on program theorybegin with either (1) a well-developed andvalidated theory of how programs of a certaintype within similar settings operate to produceoutcomes or (2) an initial stage toapproximate such a theory within the contextof a particular program evaluation. Theformer of these conditions is much morereflective of the implicit promises in a theory-based program evaluation, since the existenceof a sound theory means that a substantialbody of theoretical development has producedand tested a coherent set of conceptual,hypothetical, and pragmatic principles, plusassociated instruments to guide inquiry in theparticular area. Then, the theory can aid aprogram evaluator to decide what questions,indicators, and assumed linkages between andamong program elements should be used toevaluate a program covered by the theory.

Some well-developed theories for use inevaluations exist, which gives this approachsome measure of viability. For example,health education/behavior change programsare sometimes founded on validatedtheoretical frameworks, such as the Health

26 Stufflebeam

Belief Model (Becker, 1974; Mullen, Hersey,& Iverson, 1987; Janz & Becker, 1984).Other examples are the PRECEDE-PROCEED Model for health promotionplanning and evaluation (Green & Kreuter,1991), Bandura’s (1977) Social CognitiveTheory, the Stages of Change Theory byProchaska and DiClemente (1992), and Petersand Waterman’s (1982) theory of successfulorganizations. When such frameworks exist,their use probably can enhance a program’seffectiveness and provide a structure forvalidly evaluating the program’s functioning.Unfortunately, however, few program areasare buttressed by well-articulated and testedtheories.

Thus, most theory-based evaluations begin bysetting out to develop a theory thatappropriately could be used to guide theparticular program evaluation. As will bediscussed later in this characterization, suchad hoc theory development efforts and theirl inkage to program evaluations areproblematic. In any case, let us look at whatthe theory-based evaluator attempts toachieve.

The point of the theory development orselection effort is to identify advanceorganizers to guide the evaluation.Essentially, these are the mechanisms bywhich program activities are understood toproduce or contribute to program outcomes,along with the appropriate description ofcontext, specification of independent anddependent variables, and portrayal of keylinkages. The main purposes of the theory-based program evaluation are to determine theextent to which the program of interest istheoretically sound, to understand why it issucceeding or failing, and to provide directionfor program improvement.

Questions for the program evaluation are derivedfrom the guiding theory. Example

questions include, Is the program grounded inan appropriate, well-articulated, and validatedtheory? Is the employed theory up to date andreflective of recent research? Are theprogram’s targeted beneficiaries, design,operation, and intended outcomes consistentwith the guiding theory? How well does theprogram address and serve the full range ofpertinent needs of the targeted beneficiaries?If the program is consistent with the guidingtheory, are the expected results beingachieved? Are program inputs and operationsproducing outcomes in the ways the theorypredicts? What changes in the program’sdesign or implementation might producebetter outcomes? What elements of theprogram are essential for successfulreplication? Overall, was the programtheoretically sound, did it operate inaccordance with an appropriate theory, did itproduce the expected outcomes, were thehypothesized causal linkages confirmed, is theprogram worthy of continuation and/ordissemination, and what program features areessential for successful replication?

The nature of these questions suggests that thesuccess of the theory-based approach isdependent on a foundation of sound theorydevelopment and validation. This, of course,entails sound conceptualization of at least acontext-dependent theory, formulation andrigorous testing of hypotheses derived fromthe theory, development of guidelines forpractical implementation of the theory basedon extensive field trials, and independentassessment of the theory. Unfortunately, notmany program areas in education and thesocial sciences are grounded in soundtheories. Moreover, evaluators wanting toemploy a theory-based evaluation often find itinfeasible to conduct the full range of theorydevelopment and validation steps and still toget the evaluation done on time. Thus, inclaiming to conduct a theory-based


evaluation, evaluators often seem to promisemuch more than they can deliver.

The main procedure typically used in these“theory-based program evaluations” is amodel of the program’s logic. This may be adetailed flowchart of how inputs are thoughtto be processed to produce intendedoutcomes. It may also be a grounded theorylike those advocated by Glaser and Strauss(1967). The network analysis of the formerapproach is typically an armchair theorizingprocess involving the evaluators and personswho are supposed to know how the programis expected to operate and produce results.They discuss, scheme, discuss some more,network, discuss further, and finally producenetworks in varying levels of detail of what isinvolved in making the program work andhow the various elements are linked toproduce the desired outcomes. The moredemanding grounded theory requires asystematic, empirical process of observingevents or analyzing materials drawn fromoperating programs followed by an extensivemodeling process.

Pioneers in applying theory developmentprocedures to program evaluation includeGlaser and Strauss (1967) and Weiss (1972,1995). Other developers of the approach areBickman (1990), Chen (1990), and Rogers (inpress).

In any program evaluation assignment, it isreasonable for the evaluator to examine theextent to which program plans and operationsare grounded in an appropriate theory ormodel. Also, it can be useful to engage in amodicum of effort to network the programand thereby seek out key variables andlinkages. As noted previously, in the enviablebut rare situation where a relevant, validatedtheory exists, the evaluator can beneficiallyapply it in structuring the evaluation andanalyzing findings.

However, if a relevant, defensible theory ofthe program’s logic does not exist, evaluatorsneed not develop one. In fact, if they attemptto do so they will incur many threats to theirevaluation’s success. Rather than evaluatingthe program and its underlying logic, theevaluators might usurp the program staff’sresponsibility for program design. Theymight do a poor job of theory development,given limitations on time and resources todevelop and test an appropriate theory. Theymight incur the conflict of interest associatedwith having to evaluate the theory theydeveloped. They might pass off anunvalidated model of the program as a theory,when it meets almost none of therequirements of a sound theory. They mightbog down the evaluation in too much effort todevelop a theory for the program. They mightalso focus attention on a theory developedearly in a program and later discover that theprogram has evolved to be a quite differententerprise than what was theorized at theoutset. In this case the initial theory couldbecome a “Procrustean bed” for the programevaluation.

Overall, there really isn’t much to recommendtheory-based program evaluation, since doingit right is usually not feasible and since failedor misrepresented attempts can be highlycounterproductive. Nevertheless, modestattempts to model programs—labeled assuch—can be useful for identifyingmeasurement variables, so long as theevaluator doesn’t spend too much time on thisand so long as the model is not considered asfixed or as a validated theory. Also, in therare case where an appropriate theory alreadyexists, the evaluator can make beneficial useof the theory to help structure and guide theevaluation and interpret the findings.

28 Stufflebeam

Approach 15: Mixed Methods Studies

In an attempt to resolve the longstandingdebate about whether program evaluationsshould employ quantitative or qualitativemethods, some authors have proposed thatevaluators should regularly combine thesemethods in given program evaluations (forexample, see the National ScienceFoundation’s 1997 User-Friendly Handbookfor Mixed Method Evaluations). Suchrecommendations, along with practicalguidelines and illustrations, are no doubtuseful to many program staff members and toevaluators. But in the main, therecommendation for a mixed methodapproach only highlights a large body oflongstanding practice of mixed-methodsprogram evaluation rather than proposing anew approach. All seven approachesdiscussed in the remainder of this section ofthe paper employ both qualitative andquantitative methods. What sets them apartfrom the mixed method approach is that theirfirst considerations are not the methods to beemployed but either the assessment of valueor the social mission to be served. The mixedmethods approach is included in this sectionon questions/methods approaches, because itis preoccupied with using multiple methodsrather than using whatever methods areneeded to comprehensively assess aprogram’s merit and worth. As with the otherapproaches in this section, the mixed methodsapproach may or may not fully assess aprogram’s value; thus, it is classified as aquasi-evaluation approach.

The advance organizers of the mixed methodsapproach are formative and summativeevaluations, qualitative and quantitativemethods, and intra-case or cross-case analysis.Formative evaluations are employed toexamine a program’s development and assisti n i m p r o v i n g i t s s t r u c t u r e a n dimplementation. Summative evaluations

basically look at whether objectives wereachieved, but may look for a broader array ofoutcomes. Qualitative and quantitativemethods are employed in combination toassure depth, scope, and dependability offindings. This approach also applies tocarefully selected single programs or tocomparisons of alternative programs.

The basic purposes of the mixed methodapproach are to provide direction forimproving programs as they are evolving andto assess their effectiveness after they havehad time to produce results. Use of bothquantitative and qualitative methods isintended to assure dependable feedback on awide range of questions; depth ofunderstanding of particular programs; aholistic perspective; and enhancement of thevalidity, reliability, and usefulness of the fullset of findings. Investigators look toquantitative methods for standardized,replicable findings on large data sets. Theylook to qualitative methods for elucidation ofthe program’s cultural context, dynamics,meaningful patterns and themes, deviantcases, diverse impacts on individuals as wellas groups, etc. Qualitative reporting methodsare applied to bring the findings to life,making them clear, persuasive, andinteresting. By using both quantitative andqualitative methods, the evaluator securescross-checks on different subsets of findingsand thereby instills greater stakeholderconfidence in the overall findings.

The sources of evaluation questions are theprogram’s goals, plans, and stakeholders. Thestakeholders often include skeptical as well assupportive audiences. Among the importantstakeholders are program administrators andstaff, policy boards, financial sponsors,beneficiaries, taxpayers, and program areaexperts.


The approach may pursue a wide range ofquestions. Examples of formative evaluationquestions are

• To what extent do program activities followthe program plan, time line, andbudget?

• To what extent is the program achievingits goals?

• What p ro b l ems i n des ign orimplementation need to be addressed?

Examples of summative evaluation questionsare

• To what extent did the program achieveits goals?

• Was the program appropriately effectivefor all beneficiaries?

• What interesting stories emerged?• What are program stakeholders’

judgments of program operations,processes, and outcomes?

• What were the important side effects?• Is the program sustainable and

transportable?

The approach employs a wide range ofmethods. Among the quantitative methodsemployed are surveys using representativesamples, both cohort and cross-sectionalsamples, norm-referenced tests, rating scales,quasi experiments, significance tests for maineffects, and a posteriori statistical tests. Thequalitative methods may include ethnography,document analysis, narrative analysis,purposive samples, single cases, participantobservers, independent observers, keyinformants, advisory committees, structuredand unstructured interviews, focus groups,case studies, study of outliers, diaries, logicmodels, grounded theory development, flowcharts, decision trees, matrices, andperformance assessments. Reports mayinclude abstracts, executive summaries, fullrepor t s , o ra l b r ie f ings , confe rence

presentations, and workshops. They shouldinclude a balance of narrative and numericalinformation.

Considering his book on service studies inhigher education, Ralph Tyler (Tyler et al.,1932) was certainly a pioneer in the mixedmethod approach to program evaluation.Other authors who have written cogently onthe mixed methods approach are Guba andLincoln (1981), Kidder and Fine (1987),Lincoln and Guba (1985), Miron (1998),Patton (1990), and Schatzman and Strauss(1973).

Basically, it is almost always appropriate toconsider using a mixed methods approach.Certainly, the evaluator should take advantageof opportunities to obtain any and allpotentially available information that isrelevant to assessing a program’s merit andworth. Sometimes a study can be mainly oronly qualitative or quantitative, but usuallysuch studies would be strengthened byincluding both types of information. The keypoint is to choose methods because they caneffectively address the study’s questions, notbecause they are either qualitative orquantitative.

Key advantages of using both qualitative andquant i ta t ive methods are that theycomplement each other in ways that areimportant to the evaluation’s audiences.Information from quantitative methods tendsto be standardized, efficient, amenable tostandard tests of reliability, easily summarizedand analyzed, and accepted as “hard” data.Information from qualitative approaches addsdepth; can be delivered in interesting, story-like presentations; and provides a means toexplore and understand the more superficialquantitative findings. Using both types ofmethods affords important cross-checks onfindings.

30 Stufflebeam The main pitfall in pursuing the mixedmethods approach is using multiple methodsbecause this is the popular thing to do ratherthan because the selected methods bestrespond to the evaluation questions.Moreover, sometimes evaluators let thecombination of methods compensate for alack of rigor in applying them. Also, using amixed methods approach can produce aschizophrenic evaluation if the investigatoruncritically mixes positivistic and postmodernparadigms. Along this line, quantitative andqualitative methods are derived from differenttheoretical approaches to inquiry and reflectdifferent conceptions of knowledge; and manyevaluators do not possess the requisitefoundational knowledge in both the sciencesand humanities to effectively combinequantitative and qualitative methods. Theapproaches in the remainder of this paperplace proper emphasis on mixed methods,making choice of the methods subservient to theapproach’s dominant philosophy and tothe particular evaluation questions to beaddressed.

The mixed methods approach to evaluationconcludes this paper’s discussion of thequestions/ methods approaches to evaluation.These 13 approaches tend to concentrate onselected questions and methods and thus mayor may not fully address an evaluation’sfundamental requirement to assess aprogram’s merit and worth. The array ofthese approaches suggests that the field hasadvanced considerably since the 1950s whenprogram evaluations were rare and mainlyused approaches grounded in behavioralobjectives, standardized tests, and/oraccreditation visits.

Tables 1 through 6 summarize the similaritiesand differences between the models inrelationship to advance organizers, purposes,characteristic questions, methods, strengths,

and weaknesses.

Questions/Methods-Oriented Evaluation Approaches 3 1

Table 1: Comparison of the 13 Quasi-Evaluation Approaches on Most Common ADVANCE ORGANIZERS

Advance OrganizersEvaluation Approaches (by identification number)*

3 4 5 6 7 8 9 10 11 12 13 14 15

Program content/definition U U

Program rationale U

Context U

Treatments U

Time period U

Beneficiaries U

Comparison groups U

Norm groups U

Assessed needs U

Problem statements U

Objectives U U U U

Independent/dependent U U

Indicators/criteria U U

Life skills U

Performance tasks U

Questions/hypotheses/causal factors

U U

Policy issues U

Tests in use U U

Formative & summative evaluation U

Qualitative & quantitative methods U

Program activities/milestones U

Employee roles &responsibilities

U U

Costs U

Evaluator expertise & sensitivities U

Intra-case/cross-case analysis U

* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments, 9. Management information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism & connoisseurship,14. Program theory-based, 15. Mixed methods.

32 Stufflebeam

Table 2: Comparison of the 13 Quasi-Evaluation Approaches on Primary EVALUATION PURPOSES

Evaluation PurposesEvaluation Approaches (by identification number)*

3 4 5 6 7 8 9 10 11 12 13 14 15

Determine whether programobjectives were achieved

U U U U U

Provide constituents with anaccurate accounting of results

U U U U U

Assure that results are positive U

Assess learning gains U

Pinpoint responsibility forgood & bad outcomes

U U U

Compare students’ test scoresto norms

U

Compare students’ testperformance to standards

U U U

Diagnose programshortcomings

U U U U U

Compare performance ofcompeting programs

U U U U U

Examine achievement trends U U

Inform policymaking U U U U

Direction for programimprovement

U U U U

Ensure standardization ofoutcome measures

U U

Determine cause and effectrelationships in programs

U U

Inform management decisions& actions

U

Assess investments and payoffs U

Provide balanced informationon strengths & weaknesses

U

U

Explicate & illuminate aprogram

U U

Describe & critically appraise aprogram

U

Assess a program’s theoreticalsoundness

U

* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8.Experiments, 9. Management information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13.Criticism & connoisseurship, 14. Program theory-based, 15. Mixed methods.


Table 3: Comparison of the 13 Quasi-Evaluation Approaches on Characteristic EVALUATION QUESTIONS

Evaluation QuestionsEvaluation Approaches (by identification number)*

3 4 5 6 7 8 9 10 11 12 13 14 15

To w hat extent was each program

objective achieved? U U U

Did the program effectively discharge

its responsibilities? U U

Did tested performance meet or exceed

pertinent norms? U

Did tested performance meet or exceed

standards? U U

W here does a group’s tested

performance rank compared with other

groups?

U U

Is a group’s present performance better

than past performance? U U U

W hat sectors of a system are

performing best and poorest? U

W here are the shortfalls in specific

curricular areas? U

At what grade levels are the strengths

& shortfalls? U

W hat value is being added by

particular programs? U

To what extent can students effectively

speak, write, figure, analyze, lead,

work cooperatively, & solve problems?

U

W hat are a program’s effects on

outcomes?U U

Are program activities being

implemented according to schedule,

budget, & expected results?

U

W hat is the program’s return on

investment? U

Is the program sustainable &

transportable?U U U

Is the program worthy of continuation

and/or dissemination? U U U U U

Is the program as good or better than

others that address the same

objectives?

U U U

W hat is the program in concept &

practice? U U

34 Stufflebeam

Table 3: Comparison of the 13 Quasi-Evaluation Approaches on Characteristic EVALUATION QUESTIONS

How has the program evolved over

time?U

How does the program produce

outcomes?U U

W hat has the program produced? U U

W hat are the program’s shortfalls &

negative side effects?U U

W hat are the program’s positive side

effects?U U

How do various stakeholders value the

program?U U

Did the program meet all the

beneficiaries’ needs?U U U

W hat were the most important reasons

for the program’s success or failure?U U

W hat are the program’s most important

unresolved issues?U

How much did the program cost? U U

W hat were the costs per beneficiary,

per year, etc.?U U

W hat parts of the program were

successfully transported to other sites?U

W hat are the program’s essence &

salient characteristics?U U

W hat merits & demerits distinguish the

program from similar programs?U U

Is the program grounded in a validated

theory?U

Are program operations consistent w ith

the guiding theory?U

W ere hypothesized causal linkages

confirmed? U U

W hat changes in the program’s design or

implementation might produce better

outcomes?

U U U U U U U

W hat program features are essential for

successful replication?U U U

W hat interesting stories emerged? U U

* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments, 9. Managementinformation systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism & connoisseurship, 14. Program theory-based, 15. Mixed methods.


Table 4: Comparison of the 13 Quasi-Evaluation Approaches on Characteristic EVALUATION METHODS


3 4 5 6 7 8 9 10 11 12 13 14 15

Operational objectives U U

Criterion-referenced test U U U U

Performance contracting U

Program Planning & Budgeting

SystemU U

Program Evaluation & Review

TechniqueU

M anagement by objectives U U U

Staff progress reports U

Financial reports & audits U

Zero Based Budgeting U

Cost analysis, cost-effectiveness

analysis, & benefit-cost analysisU

M andated “program drivers” &

indicatorsU

Input, process, output databases U U

Independent goal achievement

auditorsU U

Procedural compliance audits U

Peer review U

M erit pay for individual and/or

organizationsU

Collective bargaining agreements U

Trial proceedings U

M andated testing U U

Institutional report cards U

Self-studies U

Site visits by experts U

Program audits U

Standardized testing U U U U

Performance measures U U U

Computerized or other database U U U

36 Stufflebeam

Table 4: Comparison of the 13 Quasi-Evaluation Approaches on Characteristic EVALUATION METHODS

Hierarchical mixed model analysis U

Policy analysis U

Experimental & quasi-experimental

designs U U

Study of outliers U U U

System analysis U

Analysis of archives U U

Collection of artifacts U U

Log diaries U

Content analysis U U

Independent & participant observers U U

Key informants U U

Advisory committees U

Interviews U U

Operational analysis U

Focus group U U

Questionnaires U U

Rating scales U U

Hearings & forums U U

In-depth descriptions U

Photographs U

Critical incidents U

Testimony U U U

Flow charts U

Decision trees U

Logic models U U U

Grounded theory U U

News clippings analysis U U

Cross-break tables U U U U

Expert critics U U U U

* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments, 9. Managementinformation systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism & connoisseurship, 14. Program theory-based, 15. Mixed methods.


Table 5: Comparison of the 13 Quasi-Evaluation Approaches on Prevalent STRENGTHS


3 4 5 6 7 8 9 10 11 12 13 14 15

Common senses appeal U U U U U U U

W idely known & applied U U U U

Employs operational objectives U

Employs the technology of testing U U U U U U

Efficient use of standardized tests U U

Popular among constituents &

politicians U U U U

Focus on improving public services U

Can focus on audience’s most

important questions U U U U

Defines obligations of service

providers U

Requires production of and reporting

on positive outcomes U

Seeks to improve services through

competition U U

Efficient means of data collection U U U

Stress and validity & reliability U U U U

Triangulates findings from multiple

sources U U U

Uses institutionalized database U

M onitors progress on each student U U

Emphasizes service to every student U

Hierarchical analysis of achievement U

Conducive to policy analysis U U

Employs trend analysis U

Strong provision for analyzing

qualitative information U U U

Rejects use of artificial cut scores U U

Considers student background by

using students as their own controls U

Considers contextual influences U U U

38 Stufflebeam

Table 5: Comparison of the 13 Quasi-Evaluation Approaches on Prevalent STRENGTHS

Uses authentic measures U U

Eliminates guessing U

Reinforces life skills U

Focuses on outcomes U U U U U U U

Focuses on a program’s strengths &

weaknessesU U U

Determines cause & effects U

E x a m i n e s p r o g r a m ’ s i n t e r n a l

w o rk ings & ho w it p ro d uc e s

outcomes

U U

Guides program management U

Helps keep programs on track U

Guides broad study & improvement

of program processes & outcomesU U

Can be done retrospectively or in

real timeU U U U

Documents costs of program inputs U

M aintains a financial history for the

programU

Contrasts program alternatives on

both costs & outcomesU

Employs rules of evidence U

Requires no controls of treatments

& participantsU U

E x a m i n e s p r o g r a m s a s t h e y

naturally occurU U

Examines programs holistically &

in depthU U

Engages experts to render refined

descriptions & judgementsU U

Yields in-depth, refined, effectively

communicated analysisU U

Employs all relevant information

sources & methodsU U

Stresses complementarity of

qualitative & quantitative methodsU U

* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments, 9. Management information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism & connoisseurship,14. Program theory-based, 15. Mixed methods.


Table 6: Comparison of the 13 Quasi-Evaluation Approaches on Prevalent WEAKNESSES/LIMITATIONS

Weaknesses/LimitationsEvaluation Approaches (by identification number)*

3 4 5 6 7 8 9 10 11 12 13 14 15

M ay credit unworthy objectives U

M ay define a program’s success in terms

that are too narrow and mechanical and

not attuned to beneficiaries’ various needs

U

M ay employ only lower-order learning

objectivesU U U

Relies almost exclusively on multiple

choice test dataU U

M ay indicate mainly socioeconomic

status, not quality of teaching & learning U

M ay reinforce & overemphasize multiple

choice test taking ability to the exclusion

of writing, speaking, etc.

U U

M ay poorly test what teachers teach U U

Yields mainly terminal information that

lacks utility for program improvementU U

Provides data only on student outcomes U U U U

Narrow scope of skills that can feasibly

be assessedU

M ay provide too narrow an information

basis for judging a program’s merit &

worth U U U U U U U U

M ay employ many methods because it is

the thing to do rather than because they

are neededU

M ay inappropriately &

counterproductively mix positivistic &

postmodern paradigmsU

M ay oversimplify the complexities

involved in assigning responsibility for

student learning gains to individual

teachers

U

M ay miss important side effects U U U U U

M ay rely too heavily on the expertise &

judgment of a single evaluatorU

M ay issue invidious comparisons U U U U

M ay produce unhealthy competition U U U U U

M ay provoke political unrest U U U U U

Accuracy suffers in the face of competing

evaluationsU

40 Stufflebeam

Table 6: Comparison of the 13 Quasi-Evaluation Approaches on Prevalent WEAKNESSES/LIMITATIONS

M ay undesirably narrow the range of

program servicesU U

Politicians tend to press for premature

implementationU U U

Granting rewards & sanctions may produce

cheatingU U U

Inordinate time requirements for

administration & scoringU

High costs of scoring U

Difficulty in achieving reliability U

High cost U

Low feasibility U U U

M ay inappropriately deprive control group

subjects of entitlementsU

Carries a connotation of experimenting on

children or other subjects using unproven

methods

U

Requirements of random assignments is

often not feasibleU

Tend to stifle continual improvement of the

programU

Vital data may be inaccessible to evaluators U

Investigators may mistake the approach’s

openness & lack of controls as license to

ignore rigor

U

Evaluators might unsurp the program staff’s

responsibilities for program designU

M ight ground an evaluation in a hastily

developed, inadequate program theoryU

M ight develop a conflict of interest to

defend the evaluation-generated program

theory

U

M ight bog down the evaluation in a

seemingly endless process of program

theory development

U

M ight create a theory early in a program

and impede the program from redefinition

and refinement

U

* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments, 9. Management information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism &connoisseurship, 14. Program theory-based, 15. Mixed methods.

III. IMPROVED/ACCOUNTABILITY-ORIENTEDEVALUATION APPROACHES

This paper turns next to a set of approaches thatstress the need to fully assess a program’s meritand worth, whatever the required questions andmethods. These are the improvement/accountability-oriented evaluation approaches,labeled Decisions/Accountability, Consumer-Orientation, and Accreditation. Respectively,these three approaches emphasize improvementthrough serving program decisions, providingconsumers with assessments of optionalprograms and services, and helping consumers togain assurances that given programs areprofessionally sound.

Approach 16: Decision/Accountability-OrientedStudies

The decision/accountability-oriented approachemphasizes that program evaluation should beused proactively to help improve a program aswell as retroactively to judge its merit andworth. As mentioned previously, thedecision/accountability-oriented approachshould be distinguished from managementinformation systems and from politicallycontrolled studies because of the emphasis indecision/accountability-oriented studies onquestions of merit and worth. The approach’sphilosophical underpinnings include anobjectivist orientation to finding best answers tocontext-limited questions and subscription to theprinciples of a well-functioning democraticsociety, especially human rights, equity,excellence, conservation, and accountability.Practically, the approach is oriented to engagingstakeholders in focusing the evaluation;addressing their most important questions;providing timely, relevant information to assistdecision making; and producing anaccountability record.

Decision makers, decision situations, andprogram accountability requirements provideu s e f u l a d v a n c e o r g a n i z e r s f o rdecision/accountability-oriented studies. Theapproach emphasizes that decision makersinclude not just top managers but stakeholdersat all organizational levels of a program. Fromthe bottom up, such stakeholders may includebeneficiaries, parents and guardians, serviceproviders, administrators, support personnel,policy boards, funding authorities, taxpayers,etc. The generic decision situations to be servedmay include formulation of goals and priorities,identification and assessment of competingapproaches, planning and budgeting programoperations, staffing programs, carrying outplanned activities, judging outcomes,determining how best to use programs,recycling program operations, etc. Key classesof needed evaluative information areassessments of needs, problems, andopportunities; identification and assessment ofcompeting program approaches; assessment ofprogram plans; assessment of staffqualifications and performance; assessment ofprogram facilities and materials; monitoring andassessment of program implementation;assessment of intended and unintended andshort-range and long-range outcomes; andassessment of cost-effectiveness.

Basically, the purpose of decision/accountabilitystudies is to provide a knowledge and valuebase for making and being accountable fordecisions that result in developing, delivering,and making informed use of cost-effectiveservices. Serving this purpose requires thatevaluators interact with representative membersof their audiences and supply them withrelevant, timely, efficient, and accurateevaluative feedback. A theme of this approach

42 Stufflebeam

is that the most important purpose of evaluationis not to prove but to improve.

The sources of questions addressed by thedecision/accountability-oriented approach arethe concerned and involved stakeholders. Thesemay include all persons and groups who mustmake choices related to initiating, planning,implementing, and using a program’s services.Main questions addressed are, What beneficiaryneeds should be addressed? What are theavailable alternatives for addressing these needs,and what are their comparative merits? Whatplan of services should be operationalized anddelivered? What facilities, materials, andequipment are needed? Who should conductthe program? What roles should the differentparticipants carry out? Is the program workingand should it be revised in any way? Is theprogram effectively reaching all the targetedbeneficiaries and meeting their needs? Were theprogram staff members responsible and effectivein carrying out their responsibilities toimplement the program and meet thebeneficiaries’ needs? Is the program better thancompeting alternatives? Is it sustainable? Is ittransportable? Is the program worth the requiredinitial investment? Answers to these and relatedquestions are to be based on the underlyingstandard of good programs, i.e., they musteffectively reach and serve the beneficiaries’targeted needs at a reasonable cost and do so aswell or better than reasonably availablealternatives.

Many methods may be used in decision/accountability-oriented program evaluations.Among others, these include surveys, needsassessments, case studies, advocate teams,observations, interviews, resident evaluators,and quasi-experimental and experimentaldesigns. The point needs to be underscored thatthis approach involves the evaluator and arepresentative body of stakeholders in regularexchanges about the evaluation. Typically, theevaluator should establish and regularly interactwith an evaluation advisory or review panel in

order to help define evaluation questions, shapeevaluation plans, review draft reports, and helpdisseminate findings. This panel should includerepresentatives of all stakeholder groups. Theevaluator’s exchanges with this group involveconveyance of evaluation feedback that may beof use in program improvement and use, alsoplanning what future evaluation activities andreports would be most helpful to programpersonnel and other stakeholders. Interimreports may also assist beneficiaries, programstaff, and others to obtain feedback on theprogram’s merits and worth. By maintaining adynamic baseline of evaluation information andways that the information was applied, theevaluator can use this information to develop acomprehensive summative evaluation report, topresent periodic feedback to the broad group ofstakeholders, and to supply program personnelwith information they need to make their ownaccountability reports.

Involvement of stakeholders, as a key feature ofthis approach, is consistent with a key principleof the change process. An enterprise—readevaluation here—can best help bring aboutchange in a target group’s behavior if that groupwas involved in planning, monitoring, andassessing outcomes of the enterprise. Byinvolving stakeholders throughout theevaluation process, decision-oriented evaluatorslay the groundwork for bringing stakeholders tounderstand and value the evaluation process andapply the findings.

Cronbach (1963) first introduced educators tothe idea that evaluation should be reorientedfrom its objectives-based history to a concernfor helping program personnel make betterdecisions about how to deliver effectiveservices. While he did not use the termsformative and summative evaluation, heessentially defined the underlying concepts. Indiscussing the distinctions between theconstructive, proactive orientation on the onehand and the retrospective, judgmentalorientation on the other, he argued for placing

Improvement/Accountability-Oriented Evaluation Approaches 43

more emphasis on the former—in contrast to thee v a l u a t i o n t r a d i t i o n o f s t r e s s i n gretrospective outcomes evaluation. Later, I(Stufflebeam, 1966, 1967) introduced aconceptualization of evaluation that was basedon the idea that evaluation should help programpersonnel make and defend decisions that are inthe best interest of meeting beneficiaries’ needs.While I argued for an improvement orientationto evaluation, I also emphasized that evaluatorsmust both inform decisions and provide aninformational basis for accountability. I alsoemphasized that the approach should interactwith and serve the full range of stakeholderswho need to make judgments and choices abouta program. Other persons who have contributedto the development of a decision/accountabilityorientation to evaluation are Alkin (1969) andWebster.7

The decision/accountability-oriented approach isapplicable in cases where program staffs andother stakeholders want and need bothformative and summative evaluation. It canprovide the evaluation framework for bothinternal evaluation and external evaluation.When used for internal evaluation, usually it isimportant to commission an independentmetaevaluation of the inside evaluator’s work.In addition to application to programevaluations, this approach has proved useful inevaluating personnel, students, projects,facilities, and products.

A m a i n a d v a n t a g e o f t h edecision/accountability-oriented approach is thatit encourages program personnel to useevaluation continuously and systematically intheir efforts to plan and implement programsthat meet beneficiaries’ targeted needs. It aidsdecision making at all levels of a system andstresses program improvement. It also presentsa rationale and framework of information forhelping program personnel to be accountable fortheir decisions and actions in implementing aprogram. It is heavily geared to involving thefull range of stakeholders in the evaluation

process to assure that their evaluation needs arewell addressed and to encourage and supportthem to make effective use of evaluationfindings. It is comprehensive in attending tocontext, inputs, process, and outcomes. Itbalances the use of quantitative and qualitativemethods. It is keyed to professional standardsfor evaluations. Finally, the approachemphasizes that evaluations must be groundedin the democratic principles of a free society.

A main limitation is that the collaborationrequired between an evaluator and stakeholdersintroduces opportunities for impeding theevaluation and/or biasing its results, especiallywhen the evaluative situation is politicallycharged. Also, when evaluators are activelyinfluencing the course of a program, they mayidentify so closely with it that they lose some ofthe independent, detached perspective needed toprovide objective, forthright reports. Moreover,the approach may overemphasize formativeevaluation and give too little attention tos u m m a t i v e e v a l u a t i o n . E x t e r n a lmetaevaluation has been employed tocounteract opportunities for bias and to assurethe proper balance of formative and summativeevaluation. Though the charge is erroneous, thisapproach carries the connotation that only topdecision makers are served.

Approach 17: Consumer-Oriented Studies

In the consumer-oriented approach, theevaluator is the “enlightened surrogateconsumer.” He or she must draw directevaluative conclusions about the program beingevaluated. Evaluation is viewed as the processof determining something’s merit and worth,with evaluations being the products of thatprocess. The approach regards a consumer’swelfare as a program’s primary justification andaccords that welfare the same primacy inprogram evaluation. Grounded in a deeplyreasoned view of ethics and the common goodplus skills in obtaining and synthesizingpertinent, valid, and reliable information, the

44 Stufflebeam evaluator should help developers produce anddeliver products and services that are ofexcellent quality and of great use to consumers(e.g., students, their parents, teachers, andtaxpayers). More importantly, the evaluatorshould help consumers identify and assess themerit and worth of competing programs,services, and products.

Advance organizers include societal values,consumers’ needs, costs, and criteria ofgoodness in the particular evaluation domain.The purpose of a consumer-oriented programevaluation is to judge the relative merits andworths of the products and services ofalternative programs and, thereby, to helptaxpayers, practitioners, and potentialbeneficiaries make wise choices. This approachis objectivist in assuming an underlying realityand positing that it is possible although oftenextremely difficult to find best answers. Thisapproach looks at a program comprehensively interms of its quality and costs, functionallyregarding the assessed needs of the intendedbeneficiaries, and comparatively consideringreasonably available alternative programs.Evaluators are expected to subject their programevaluations to evaluations, what Scriven termedmetaevaluation.

The approach employs a wide range ofassessment topics. These include programdescription, background and context, client,consumers, resources, function, deliverysystem, values, standards, process, outcomes,costs, critical competitors, generalizability,statistical significance, assessed needs, bottom-line assessment, practical significance,recommendations, reports, and metaevaluation.The evaluation process begins withconsideration of a broad range of such topics,continuously compiles information on all ofthem, and ultimately culminates in a super-compressed judgment of the program’s meritand worth.

Questions for the consumer-oriented study arederived from society, from programconstituents, and especially from the evaluator’sframe of reference. The general questionaddressed is, Which of several alternativeprograms is the best choice, given theirdifferential costs, the needs of the consumergroup, the values of society at large, andevidence of both positive and negativeoutcomes?

Methods include checklists, needs assessments,goal-free evaluation, experimental and quasi-experimental designs, modus operandi analysis,applying codes of ethical conduct, and costanalysis (Scriven, 1974). A preferred method isfor an external, independent consumer advocateto conduct and report findings of studies ofpublicly supported programs. The approach iskeyed to employing a sound checklist of theprogram’s key aspects. Scriven (1991)developed a generic “Key EvaluationChecklist” for this purpose. The mainevaluative acts in this approach are grading,scoring, ranking, apportioning, and producingthe final synthesis (Scriven, 1994a).

Scriven (1967) was a pioneer in applying theconsumer-oriented approach to programevaluation, and his work parallels theconcurrent work of Ralph Nader and theConsumers Union in the general field ofconsumerism. Glass has supported anddeveloped Scriven’s approach. Scriven coined8

the terms formative and summative evaluation.He allowed that evaluations can be divergent inearly quests for critical competitors andexplorations related to clarifying goals andmaking programs function well. However, healso maintained that ultimately evaluations mustconverge on summative judgments about aprogram’s merit and worth. While accepting theimportance of formative evaluation, he alsoargued against Cronbach’s (1963) position thatformative evaluation should be given the majoremphasis. According to Scriven, the bottom-line aim of a sound evaluation is to judge the


program’s merit, comparative value, and overallworth. Scriven (1991, 1994a) sees evaluation asa transdiscipline encompassing all evaluations ofvarious entities across all applied areas anddisciplines and comprised of a common logic,methodology, and theory that transcendsspecific evaluation domains, which also havetheir unique characteristics.

The consumer-oriented study requires a highlycredible and competent expert plus eithersufficient resources to allow the expert toconduct a thorough study or other means toobtain the needed information. Often, aconsumer-oriented evaluator is engaged toevaluate a program after its formative stages areover. In these situations, the external consumer-oriented evaluator is often dependent on beingable to access a substantial base of informationthat the program staff had accumulated. If nosuch base of information exists, the consumer-oriented evaluator will have great difficulty inobtaining enough information to produce athorough, defensible summative programevaluation.

One of this approach’s main advantages is thatit is a hard-hitting, independent assessmentintended to protect consumers from shoddyprograms, services, and products and instead toguide them to support and use thosecontributions that best and most cost-effectivelyaddress their needs. Also, the approach’s stresson independent/objective assessment yieldshigh credibility with consumer groups. Theapproach directly attempts to achieve acomprehensive assessment of merit and worth.This is aided by Michael Scriven’s (1991) KeyEvaluation Checklist and his EvaluationThesaurus (in which he presents and explainsthe checklist). The approach provides for asummative evaluation to yield a bottom-linejudgment of merit and worth, preceded by aformative evaluation to assist developers to helpassure that their programs will succeed.

One disadvantage is that the approach can be soindependent from practitioners that it may notassist them to do a better job of servingconsumers. If summative evaluation is appliedtoo early, it can intimidate developers and stifletheir creativity. However, if summativeevaluation is applied only near a program’s end,the evaluator may have great difficulty inobtaining sufficient evidence to confidently andcredibly judge the program’s basic value. Thisoften iconoclastic approach is also heavilydependent on a highly competent, independent,and “bulletproof” evaluator.

Approach 18: Accreditation/CertificationApproach

Most school districts and universities and manyprofessional organizations have periodicallybeen the subject of an accreditation study, andmany professionals, at one time or another, havehad to meet certification requirements for agiven position. Such studies of institutions andpersonnel are in the realm of accountability-oriented evaluations, and they have animprovement element as well. Institutions,institutional programs, and personnel are studiedto prove whether they are fit to serve designatedfunctions in society; typically, the feedbackreports include areas for improvement.

The advance organizers used in theaccreditation/certification study usually areguidelines and criteria that some accrediting orcertifying body has adopted. As previouslysuggested, the evaluation’s purpose is todetermine whether institutions, institutionalprograms, and/or personnel should be approvedto perform specified functions.

The source of questions for accreditation orcertification studies is the accrediting orcertifying body. Basically, they address thisquestion: Are institutions and their programsand personnel meeting minimum standards, andhow can their performance be improved?

46 Stufflebeam Typical methods used in the accreditation/certification approach are self-study and self-reporting by the individual or institution. In thecase of institutions, panels of experts areassigned to visit the institution, verify a self-report, and gather additional information. Thebasis for the self-studies and the visits by expertpanels are usually guidelines and criteria thathave been specified by the accrediting agency.

Accreditation of education was pioneered by theCollege Entrance Examination Board around1901. Since then, the accreditation function hasbeen implemented and expanded, especially bythe Cooperative Study of Secondary SchoolStandards, dating from around 1933.Subsequently, the accreditation approach hasbeen developed, further expanded, andadministered by the North Central Associationof Secondary Schools and Colleges, along withtheir associated regional accrediting agenciesacross the United States, and by many otheraccrediting and certifying bodies. Similaraccreditation practices are found in medicine,law, architecture, and many other professions.

Any area of professional service that potentiallycould put the public at risk if services are notdelivered by highly trained specialists inaccordance with standards of good practice andsafety should consider subjecting its programsto accreditation reviews and its personnel tocertification processes. Such use of evaluationservices is very much in the public interest andalso is a means of getting feedback of use instrengthening capabilities and practices.

The main advantage of the accreditation orcertification study is that it aids lay persons inmaking informed judgments about the quality oforganizations and programs and thequalifications of individual personnel. Themain difficulties are that the guidelines ofaccrediting and certifying bodies oftenemphasize inputs and processes and notoutcome criteria. Also, the self-study andvisitation processes used in accreditation offer

many opportunities for corruption and ineptperformance. As has been said for a number ofthe evaluation approaches described above, it isprudent to subject accreditation and certificationprocesses themselves to independentmetaevaluations.

The three improvement/accountability-orientedapproaches emphasize the assessment of meritand worth, which is the thrust of the definitionof evaluation used to classify the 22 approachesconsidered in this paper. Tables 7 through 12summarize the similarities and differencesbetween the models in relationship to advanceorganizers, purposes, characteristic questions,methods, strengths, and weaknesses. The paperturns next to the fourth and final set of programevaluation approaches—those concerned withusing evaluation to further some social agenda.


Table 7: Comparison of the Three Improvement/Accountability Approaches on Most Common ADVANCEORGANIZERS

Advance OrganizersEvaluation Approaches

16.Decision/

Accountability

17. ConsumerOrientation

18. Accreditation

Decision makers/stakeholders U

Decision situations U

Program accountability requirements U U

Needs, problems, opportunities U

Competing program approaches U U

Program operations U U

Program outcomes U U U

Cost-effectiveness U U

Assessed needs U U

Societal values U

Intrinsic criteria of merit U U

Accreditation guidelines & criteria U

48 Stufflebeam

Table 8: Comparison of the Primary PURPOSES of the Three Improvement/Accountability Evaluation Approaches

Purposes

Evaluation Approaches

16. Decision/Accountability 17. Consumer Orientation 18. Accreditation

Provide a knowledgeand value base fordecisions

U

Judge alternatives U

Approve/recommendprofessional services

U


Table 9: Comparison of the Improvement/Accountability Evaluation Approaches onCharacteristic EVALUATION QUESTIONS

Characteristic Evaluation Questions Evaluation Approaches

16. Decision/Accountability


18. Accreditation

What consumer needs should be addressed? U U

What alternatives are available to address the needs &what are their comparative merits?

U U

What plan should guide the program? U

What facilities, materials, and equipment are needed? U

Who should conduct the program & what rolesshould the different participants carry out? U

Is the program working & should it be revised? U U U

How can the program be improved? U U

Is the program reaching all the rightful beneficiaries? U

What are the outcomes? U U U

Did staff responsibly & effectively discharge theirprogram responsibilities?

U

Is the program superior to critical competitors? U U

Is the program worth the required investment? U U

Is the program meeting minimum accreditationrequirements?

U

50 Stufflebeam

Table 10: Comparison of Main METHODS of the Three Improvement/AccountabilityEvaluation Approaches

Evaluation Methods



Surveys U

Needs assessments U U

Case studies U

Advocate teams U

Observations U U

Interviews U U U

Resident evaluators U

Quasi experiments U U

Experiments U U

Checklists U U

Goal-free evaluations U

Modus operandi analysis U

Applying codes of ethicalconduct

U

Cost analysis U

Self-study U

Site visits by expertpanels

U U


Table 11: Comparison of the Prevalent STRENGTHS of the Three Improvement/Accountability EvaluationApproaches

Strengths



Keyed to professionalstandards U U

Examines context, inputs,process, & outcomes U U

Balances use of quantitativeand qualitative methods U U U

Integrates evaluation intomanagement operations U

Targets constituents’ needs U U

Stresses programimprovement

U

Provides basis foraccountability

U U U

Involves and addresses theneeds of all stakeholders U U

Serves decision making at allsystem levels U

Promotes & assists uses ofevaluation findings U U

Emphasizes democraticprinciples

U U

Stresses an independentperspective

U U

Stresses consumer protection U U

Produces a comprehensiveassessment of merit & worth U U U

Emphasizes cost-effectiveness U

Provides formative &summative evaluation

U U

Grades the quality ofprograms & institutions U U

Aided by Scriven’s KeyEvaluation Checklist &Evaluation Thesaurus

U

52 Stufflebeam

Table 12: Comparison of the Prevalent WEAKNESSES of the Three Improvement/Accountability EvaluationApproaches

Weaknesses


16. Decision/Accountability


18. Accreditation

Involved collaboration with client/stakeholders mayengender interference & bias U U

Influence on program operations may compromise theevaluation’s independence

U

May be too independent to help strengthen operations U

Carries connotation that top decision makers are mostimportant

U

May overemphasize formative evaluation andunderemploy summative evaluation U

Stress on independence may minimize formativeassistance

U

Summative evaluation applied too early may stifle staffs’creativity U

Summative evaluation applied too late in a program’sprocess may be void of much needed information U

Heavily dependent on a highly competent, independentevaluator U

May overstress intrinsic criteria U

May underemphasizeoutcome information

U

Includes many opportunities for evaluatees to coopt &bias the evaluators U

V. SOCIAL AGENDA/ADVOCACY APPROACHES

The Social Agenda/Advocacy approaches areheavily directed to making a difference in societythrough program evaluation. Theseapproaches especially are employed to ensurethat all segments of society have equal access toeducational and social opportunities andservices. The approaches even have anaffirmative action bent toward givingpreferential treatment through programevaluation to the disadvantaged. If—as manypersons have stated—information is power, thenthis set of approaches could be said to beoriented toward employing program evaluation,sometimes in a biased way, to empower thedisenfranchised. By giving stakeholders theauthority for key evaluation decisions, relatedespecially to interpretation and release offindings, evaluators empower these persons touse evaluation to their best advantage; but theyalso may make the evaluation vulnerable to biasand other misuse. Nevertheless, there is muchto recommend these approaches, since they arestrongly oriented to democratic principles ofequity and fairness and employ practicalprocedures for involving the full range ofstakeholders.

Approach 19: Client-Centered Studies (orResponsive Evaluation)

The classic approach in this set is the client-centered study or what Robert Stake (1983) hastermed the responsive evaluation. The labelclient-centered evaluation is used here, becauseone pervasive theme is that the evaluator mustwork with and for the support of a diverse clientgroup including, for example, teachers,administrators, developers, taxpayers, legislators,and financial sponsors. They are theClients in the sense that they support, develop,

administer, or directly operate the programsunder study and seek or need evaluators’counsel and advice in understanding, judging,and improving the programs. The approachcharges evaluators to interact continuously withand respond to the evaluative needs of thevarious clients.

This approach contrasts sharply with Scriven’sconsumer-oriented approach. Stake’s evaluatorsare not the independent, objective assessors asseen in Scriven’s approach. The client-centeredstudy embraces local autonomy and helpspeople who are involved in a program toevaluate it and use the evaluation for programimprovement. The evaluator in a sense is theclient’s handmaiden as they strive to make theevaluation serve their needs. Moreover, theclient-centered approach rejects objectivistevaluation and instead subscribes to thepostmodernist view, wherein there are no bestanswers or clearly preferable values and whereinsubjective information is preferred. In thisapproach, the program evaluation mayculminate in conflicting findings andconclusions, leaving interpretation to the eyes ofthe beholders. Client-centered evaluation isperhaps the leading entry in the “relativisticschool of evaluation,” which calls for apluralistic, flexible, interactive, holistic,subjective, constructivist, and service-orientedapproach. The approach is relativistic becauseit seeks no final authoritative conclusion, butinstead interprets findings against stakeholders’different and often conflicting values. Theapproach seeks to examine a program’s fullcountenance and prizes the collection andreporting of multiple, often conflictingperspectives on the value of a program’s format,operations, and achievements. Side effects and

54 Stufflebeam

incidental gains as well as intended outcomesare to be identified and examined.

The advance organizers in client-centeredevaluations are stakeholders’ concerns andissues in the program itself, also the program’srationale, background, transactions, outcomes,standards, and judgments. The client-centeredprogram evaluation may serve a wide range ofpurposes. Some of these are helping people ina local setting gain a perspective on theprogram’s full countenance; understanding theways that various groups see the program’sproblems, strengths, and weaknesses; andlearning the ways affected people value theprogram plus the ways program experts judge it.The evaluator’s process goal is to carry on acontinuous search for key questions and toprovide the clients with useful information as itbecomes available.

The client-centered/responsive approach has astrong philosophical base: evaluators shouldpromote equity and fairness, help those withlittle power, thwart the misuse of power, exposethe huckster, unnerve the assured, reassure theinsecure, and always help people see thingsfrom alternative viewpoints. The approachsubscribes to moral relativity and posits that forany given set of findings there are potentiallymultiple, conflicting interpretations that areequally plausible.

Community, practitioner, and beneficiarygroups in the local environment plus externalprogram area experts provide the questionsaddressed by the client-centered study. Ingeneral, the groups usually want to know whatthe program has achieved, how it operated, andthe ways in which it is judged by involvedpersons and experts in the program area. Themore specific evaluation questions emerge asthe study unfolds based on the evaluator’scontinuing interactions with the stakeholdersand their collaborative assessment of thedeveloping evaluative information.

This approach reflects a formalization of thelongstanding practice of informal, intuitiveevaluation. It requires a relaxed and continuousexchange between the evaluator and clients.The approach is more divergent thanconvergent. Basically, the approach calls forcontinuing communication between evaluatorand audience for the purposes of discovering,investigating, and addressing a program’sissues. Designs for client-centered programevaluations are relatively open-ended andemergent, building to narrative description,rather than aggregating measurements acrosscases. The evaluator attempts to issue timelyresponses to clients’ concerns and questions bycollecting and reporting useful information,even if the needed information hadn’t beenanticipated at the study’s beginning.Concomitant with the ongoing conversationwith the clients, the evaluator attempts to obtainand present a rich set of information on theprogram. This includes its philosophicalfoundation and purposes, history, transactions,and outcomes. Special attention is given to sideeffects, the standards that various persons holdfor the program, and their judgments of theprogram.

Depending on the evaluation’s purpose, theevaluator may legitimately employ a range ofdifferent methods. Some of the preferredmethods are the case study, expressiveobjectives, purposive sampling, observation,adversary reports, story telling to conveycomplexity, sociodrama, and narrative reports.Client-centered evaluators are charged to checkfor existence of stable and consistent findings byemploying redundancy in their data-collectingactivities and replicating their case studies.Evaluators are not expected to act as aprogram’s sole or final judges, but shouldcollect, process, and report the opinions andjudgments of the full range of the program’sstakeholders plus pertinent experts. In the end,the evaluator makes a comprehensive statementof what the program is observed to be and

Social Agenda/Advocacy Approaches 55

references the satisfaction and dissatisfactionthat appropriately selected people feel towardthe program. Overal l , the client-centered/responsive evaluator uses whateverinformation sources and techniques seemrelevant to portraying the program’scomplexities and multiple realities andcommunicates the complexity even if the resultinstills doubt and makes decision making moredifficult.

Stake (1967) is the pioneer of the client-centered/responsive type of study, and hisapproach has been supported and developed byDenny (1978), MacDonald (1975), Parlett andHamilton (1972), Rippey (1973), and Smith andPohland (1974). Guba’s (1978) earlydevelopment of constructivist evaluation alsowas heavily influenced by Stake’s writings onresponsive evaluation. Stake has expressedskepticism about scientific inquiry as adependable guide to developing generalizationsabout human services and pessimism about thepotential benefits of formal programevaluations.

The main condition for applying the client-centered approach is a receptive client groupand a confident, competent, responsiveevaluator. The client must be willing toendorse a quite open, flexible evaluation plan asopposed to a well-developed, detailed,preordinate plan and must be receptive toequitable participation by a representative groupof stakeholders. They must find qualitativemethods acceptable and usually be willing toforego anything like a tightly controlledexperimental study, although in exceptionalcases a controlled field experiment might beemployed. The client and other involvedstakeholders need tolerance, even appreciationfor ambiguity and to hold out only modesthopes for obtaining definitive answers toevaluation questions. The clients must bereceptive to ambiguous findings, multipleinterpretations, employment of competing value

perspectives, and heavy involvement ofstakeholders in interpreting and using findings.Finally, the clients must be sufficiently patientto allow the program evaluation to unfold andfind its direction based on the ongoinginteractions between the evaluator and thestakeholders.

A main strength of the responsive/client-centered approach is that it involves action-research, in which people funding,implementing, and using programs are helped toconduct their own evaluations and use thefindings to improve their understanding,decisions, and actions. The evaluations lookdeeply into the stakeholders’ main interests andsearch broadly for relevant information. Theyalso examine the program’s rationale,background, process, and outcomes. They make effective use of qualitative methods andtriangulate findings from different sources. Theapproach stresses the importance of searchingwidely for unintended as well as intendedoutcomes. It also gives credence to themeaningful participation in the evaluation by thefull range of interested stakeholders. Judgmentsand other inputs from all such persons arerespected and incorporated in the evaluations.The approach also provides for effectivecommunication of findings.

A main weakness is the approach’svulnerability regarding external credibility,since people in the local setting, in effect, haveconsiderable control over the evaluation of theirwork. Similarly, evaluators working so closelywith stakeholders may lose their independentperspectives. Also, the approach is not veryamenable to reporting clear findings in time tomeet decision or accountability deadlines.Moreover, rather than bringing closure, theapproach’s adversary aspects and divergentqualities may generate confusion andcontentious relations among the stakeholders.Sometimes, this cascading, evolving approach

56 Stufflebeam

may bog down in an unproductive quest formultiple inputs and interpretations.

Approach 20: Constructivist Evaluation

The constructivist approach to programevaluation is heavily philosophical, serviceoriented, and paradigm driven. Theconstructivist paradigm rejects the existence ofany ultimate reality and employs a subjectivistepistemology. It sees knowledge gained as oneor more human constructions, uncertifiable andconstantly problematic and changing. It placesthe evaluators and program stakeholders at thecenter of the inquiry process, employing all ofthem as the evaluation’s “human instruments.”The approach insists that evaluators be totallyethical in respecting and advocating for all theparticipants, especially the disenfranchised.Evaluators are authorized, even expected tomaneuver the evaluation to emancipate andempower involved or affected disenfranchisedpeople. Evaluators do this by raisingstakeholders’ consciousness so that they areenergized, informed, and assisted to transformtheir world. The evaluator must respect theparticipants’ free will in all aspects of theinquiry and should empower them to help shapeand control the evaluation activities in theirpreferred ways. The inquiry process must beconsistent with effective ways of changing andimproving society. Thus, stakeholders must playa key role in determining the evaluationquestions and variables. Throughout the study,the evaluator regularly and continuously informsand consults the stakeholders in all aspects ofthe study. The approach rescinds any specialprivilege of scientific evaluators to work insecret and control/manipulate human subjects. In guiding the program evaluation, theevaluator balances verification with a quest fordiscovery, balances rigor with relevance, andbalances the use of quantitative and qualitativemethods. The evaluator also provides rich anddeep description in preference to precise

measurements and statistics. He or she employsa relativist perspective to obtain and analyzefindings, stressing locality and specificity overgeneralizability. The evaluator posits that therecan be no ultimately correct conclusions. He orshe exalts openness and the continuing searchfor more informed and illuminatingconstructions.

This approach is as much recognizable for whatit rejects as for what it proposes. In general, itstrongly opposes positivism as a basis forevaluation, with its realist ontology, objectivistepistemology, and experimental method. Itrejects any absolutist search for correct answers.It directly opposes the notion of value-freeevaluation and attendant efforts to expungehuman bias. It rejects positivism’s deterministicand reductionist structure and its belief in thepossibility of fully explaining studied programs.

The constructivist approach’s advanceorganizers are basically the philosophicalconstraints placed on the study, as seen above,including the requirement of collaborative,unfolding inquiry. A constructivist approach’smain purpose is to determine and make sense ofthe variety of constructions that exist amongstakeholders. The approach keeps the inquiryopen to ongoing communication and to thegathering, analysis, and synthesis of furtherconstructions. One construction is notconsidered more true than others, but some maybe judged as more informed and sophisticatedthan others. All evaluation conclusions areviewed as indeterminate with the continuingpossibility of finding better answers. Allconstructions are also context dependent. In thisrespect, the evaluator does define boundaries onwhat is being investigated.

The questions addressed in constructiviststudies cannot be determined apart from theparticipants’ interactions. Together, theevaluator and stakeholders identify thequestions to be addressed. These questions


emerge in the process of formulating anddiscussing the study’s rationale, planning theschedule of discussions, and obtaining variousinitial persons’ views of the program to beevaluated. These questions develop further overthe course of the approach’s hermeneutic anddialectic processes. The questions may or maynot cover the full range of issues involved inassessing something’s merit and worth. Also,the set of questions to be studied is neverconsidered fixed.

The constructivist methodology is firstdivergent, then convergent. Through the use ofhermeneutics the evaluator collects anddescribes alternative individual constructions onan evaluation question or issue, assuring thateach depiction meets with the respondent’sapproval. Communication channels are keptopen throughout the inquiry, and all respondentsare encouraged and facilitated to make theirinputs and keep apprised of all aspects of thestudy. The evaluator then moves to a dialecticalprocess aimed at bringing the differentconstructions into as much consensus aspossible. Respondents are providedopportunities to review the full range ofconstructions along with other relevantinformation. The evaluator engages therespondents in a process of studying andcontrasting existing constructions, consideringrelevant contextual and other information,reasoning out the differences among theconstructions, and moving as far as they cantoward a consensus. The constructivistevaluation is, in a sense, never ending. There isalways more to learn, and finding ultimatelycorrect answers is considered impossible.

Guba and Lincoln (1985, 1989) are pioneers inapplying the constructivist approach to programevaluation. Also, Bhola (1998), a disciple ofGuba, has extensive experience in applying theconstructivist approach to evaluating programsin Africa. Thomas Schwandt (1984), anotherdisciple of Guba, has written extensively about

the philosophical underpinnings of constructivistevaluation. Fetterman’s (1994) empowermentevaluation approach is closely aligned withconstructivist evaluation, since it seeks toengage and serve all stakeholders, especiallythose with little influence. However, there is akey difference between the constructivist andempowerment evaluation approaches. While theconstructivist evaluator retains control of theevaluation and works with stakeholders todevelop a consensus, the empowermentevaluation “gives away” authority for theevaluation to the stakeholders, with theevaluator serving in a technical assistance role.

The constructivist approach can be appliedusefully when the evaluator, client, andstakeholders in the program fully agree that theapproach is appropriate and that they willcooperate. They should reach such agreementsbased on an understanding of what the approachcan and cannot deliver. They need to acceptthat questions and issues to be studied willunfold throughout the process. They alsoshould be willing to receive ambiguous,possibly contradictory findings, reflecting thestakeholders’ diverse perspectives. They shouldknow also that the shelf life of the findings islikely to be short (not unlike any otherevaluation approach, but clearly acknowledgedin the constructivist approach). They also need tovalue qualitative information that largelyreflects stakeholders’ various perspectives andjudgments. On the other hand, they should notexpect to receive definitive pre-post measures ofoutcomes and statistical conclusions aboutcauses and effects. While these persons canhope for achieving a consensus in the findings,they should agree that such a consensus mightnot emerge and that in any case such aconsensus would not generalize to other settingsor time periods.

This approach has a number of advantages. It isexemplary in fully disclosing the wholeevaluation process and set of findings. It is

58 Stufflebeam

consistent with the principle of effective changeprocesses that people are more likely to valueand use something (read evaluation here) if theyare consulted and involved in its development.It also seeks to directly involve the full range ofstakeholders who might be harmed or helped bythe evaluation as important, empowered partnersin the evaluation enterprise. It is said to beeducative for all the participants, whether or nota consensus is reached. It also lowersexpectations for what clients can learn aboutcauses and effects. While it doesn’t promisefinal answers, it does move from a divergentstage, in which it searches widely for insightsand judgments, to a convergent stage in whichsome unified answers are sought. In addition, ituses participants as instruments in theevaluation, thus taking advantage of theirrelevant experiences, knowledge, and valueperspectives; this greatly reduces the burden ofdeveloping, field-testing, and validatinginformation collection instruments before usingthem. The approach makes effective use ofqualitative methods and triangulates findingsfrom different sources.

However, the approach is limited in itsapplicability and has some disadvantages.Because of the need for full involvement andongoing interaction through both the divergentand convergent stages, it is often difficult toproduce the timely reports that funding agenciesand decision makers demand. Also, to workwell the approach requires the attention andresponsible participation of a wide range ofstakeholders. The approach seems to beunrealistically utopian in this regard.Widespread, grass-roots interest andparticipation are often hard to obtain and sustainthroughout a program evaluation. This can beexacerbated by a continuing turnover ofstakeholders. While the process emphasizes andpromises openness and full disclosure, someparticipants don’t want to tell their privatethoughts and judgments to the world. Moreover, stakeholders sometimes are poorly

informed about the issues being addressed in anevaluation and thus are poor data sources. It canbe unrealistic to expect that the evaluation canand will take the needed time to inform and thenmeaningfully involve those who begin asbasically ignorant of the program beingassessed. Also, constructivist evaluations can begreatly burdened by itinerant evaluationstakeholders who come and go and who expect toreopen questions previously addressed andany consensus previously reached. In addition,some evaluation clients don’t take kindly toevaluators who are prone to report competing,perspectivist answers without taking a standregarding the program’s merit and worth. Also,many clients aren’t necessarily attuned to theconstructivist philosophy. Instead, they mayvalue reports that mainly include hard data onoutcomes and assessments of statisticalsignificance. Often, they also expect that reportsshould be based on relatively independentperspectives that are free of programparticipants’ conflicts of interest. In addition,the constructivist approach is a countermeasure toassigning responsibility for successes andfailures in a program to certain individuals orgroups; many policy boards, administrators, andfinancial sponsors might see this rejection ofindividual and group accountability asunworkable and unacceptable. It is easy to saythat all persons in a program should share theglory or the disgrace; but try to tell this to anexceptionally hardworking and effective teacherin a school program where virtually no one elsetries or succeeds.

Approach 21: Deliberative DemocraticEvaluation

Perhaps the newest entry in the programevaluation models enterprise is the deliberativedemocratic approach advanced by House andHowe (1998). The approach functions withinan explicit democratic framework and chargesevaluators to uphold democratic principles inreaching defensible evaluative conclusions. The


approach envisions program evaluation as aprincipled, influential societal institution,contributing to democratization through theissuing of reliable and valid claims.

The approach’s advance organizers are seen inits three main dimensions: democraticparticipation, dialogue to examine andauthenticate stakeholders’ inputs, anddeliberation to arrive at a defensible assessmentof the program’s merit and worth. All threedimensions are considered essential in allaspects of a sound program evaluation.

In the democratic dimension, the approachproactively identifies and arranges for theequitable participation of all interestedstakeholders throughout the course of theprogram evaluation. The approach stressesequity and does not tolerate power imbalancesin which the message of powerful parties woulddominate the evaluation message. In thedialogic dimension the evaluator engagesstakeholders and other audiences to assist incompiling preliminary evaluation findings.Subsequently, the collaborators seriouslydiscuss and debate the draft evaluation findingsto ensure that no participant’s views aremisrepresented. In the culminating deliberativestage, the evaluator(s) honestly considers anddiscusses with others all inputs obtained butthen renders what he or she considers a fullydefensible assessment of the program’s merit andworth. All interested stakeholders are given voicein the evaluation, and the evaluatoracknowledges their views in the final report, butmay express disagreement with some of them.The deliberative dimension sees the evaluator(s)reaching a reasoned conclusion by reviewing allinputs; debating them with stakeholders andothers; reflecting deeply on all these inputs; thenreaching a defensible, well-justified conclusion.

This approach’s purpose is to employdemocratic participation in the process ofarriving at a defensible assessment of a

program. The evaluator(s) determines theevaluation questions to be addressed but does sothrough dialogue and deliberation with engagedstakeholders. Presumably, the bottom-linequestions concern judgments about theprogram’s merit and its worth to thestakeholders.

Methods employed may include discussionswith stakeholders, surveys, and debates.Inclusion, dialogue, and deliberation areconsidered relevant in all stages of anevaluation—inception, design, implementation,analysis, synthesis, write-up, presentation, anddiscussion. House and Howe (1998) presentedthe following 10 questions for assessing theadequacy of a democratic deliberativeevaluation:

Whose interests are represented?Are major stakeholders represented?Are any excluded?Are there serious power imbalances?Are there procedures to control imbalances?How do people participate in the evaluation?How authentic is their participation?How involved is their interaction?Is there reflective deliberation?How considered and extended is thedeliberation?

Ernest House originated this approach. He andKenneth Howe say that many evaluatorsalready implement their proposed principles.Especially, they pointed to an article byKarlsson (1998) to illustrate their approach.Also, they refer to a number of authors whohave proposed practices that at least in part arecompatible with the democratic dialogicapproach.

This approach is applicable when a client agreesto fund an evaluation that requires democraticparticipation of at least a representative group ofstakeholders. Thus, the funding agent must be

60 Stufflebeam

willing to give up sufficient power to allowinputs from a wide range of stakeholders, earlydisclosure of preliminary findings to allinterested parties, and opportunities for thestakeholders to play an influential role inreaching the final conclusions. Also, arepresentative group of stakeholders must bewilling to engage in open and meaningfuldialogue and deliberation in all stages of thestudy.

This approach has many advantages associatedwith any democratic process. It is a directattempt to make evaluations just. It assuresdemocratic participation of stakeholders in allstages of the evaluation. It strives to incorporatethe views of all interested parties, includinginsiders and outsiders, disenfranchised personsand groups, those who control the purse strings,etc. Meaningful democratic involvement shoulddirect the evaluation to the issues that peoplecare about and incline them to respect and usethe evaluation findings. It employs dialogue toexamine and authenticate stakeholders’ inputs.A key advantage over some other advocacyapproaches is that the democratic deliberativeevaluator reserves the right to rule out inputsthat are considered incorrect or unethical. Theevaluator is open to all stakeholders’ views,carefully considers them, but then renders asdefensible a judgment of the program aspossible. He or she does not leave theresponsibility for reaching a defensible finalassessment to a major i ty vote ofstakeholders—some of whom are sure to haveconflicts of interest and be uninformed. Inrendering a final judgment, the evaluatorensures closure.

As House and Howe have acknowledged, thedemocratic dialogic approach is, at this time,unrealistic and often cannot be fully applied.This approach—in offering and expecting fulldemocratic participation in order to make anevaluation work—reminds me of a colleaguewho used to despair of ever changing or

improving higher education. He would say thatchanging any aspect of our university wouldrequire getting every professor to withhold heror his veto. In view of the very ambitiousdemands of the democratic dialogic approach,House and Howe have proposed it as an ideal tobe kept in mind even though evaluators willseldom, if ever, be able to achieve this ideal.

Approach 22. Utilization-Focused Evaluation

The utilization-focused approach is explicitlygeared to assure that program evaluations makeimpacts. It is a process for making choicesabout an evaluation study in collaboration witha targeted group of priority users, selected froma broader set of stakeholders, in order to focuseffectively on their intended uses of anevaluation. All aspects of a utilization-focusedprogram evaluation are chosen and applied tohelp the targeted users obtain and applyevaluation findings to their intended uses and tomaximize the possibility they will do so. Suchstudies are judged more for the difference theymake in improving programs and influencingdecisions and actions than for their elegance andtechnical excellence. No matter how good anevaluation report is, if it only sits on the shelfgathering dust, then it contributed little ifanything to the evaluation’s success.

The advance organizers of utilization-focusedprogram evaluations are, in the abstract, thepossible users and uses to be served. Workingfrom this initial conception, the evaluator movesas directly as possible to identify in concreteterms the actual users to be served. Throughcareful and thorough analysis of stakeholders,the evaluator identifies the multiple and variedperspectives and interests that should berepresented in the study. He or she then selectsa group that is willing to pay the price ofsubstantial involvement and that appropriatelyrepresents the program’s stakeholders. Theevaluator then engages this client group toclarify why they need the evaluation, how they


intend to apply its findings, and how they thinkit should be conducted. The evaluator facilitatesthe users’ choices by supplying a menu ofpossible uses, information, and reports for theevaluation. But this is done not to supply thechoices but to help the client groupthoughtfully focus and shape the study. Themain possible uses of evaluation findingscontemplated in this approach are assessment ofmerit and worth, improvement, and generationof knowledge. The approach also values theevaluation process itself, seeing it as helpful inenhancing shared understandings amongstakeholders, bringing support to a program,promoting participation in the program, anddeveloping and strengthening organizationalcapacity.

In deliberating with the intended users, theevaluator emphasizes that the programevaluation’s purpose must be to give them theinformation they need to fulfill their objectives.Such objectives include socially valuableaims such as combating problems of illiteracy,crime, hunger, homelessness, unemployment,child abuse, spouse abuse, substance abuse,i l l ne s s , a l i e n a t i o n , d i s c r i mi n a t i o n ,malnourishment, pollution, bureaucratic waste,etc. However, it is the targeted users whodetermine the program to be evaluated, whatinformation is required, how and when it must bereported, and how it will be used.

In this approach, the evaluator is no iconoclast,but instead is the intended users’ servant and afacilitator. The evaluation should meet the fullrange of professional standards for programevaluations, not just utility. The evaluator musttherefore be an effective negotiator, standing onprinciples of sound evaluation, but working hardto gear a defensible program evaluation to thetargeted users’s evolving needs. The utilization-focused evaluation is considered situational anddynamic. Depending on the circumstances, theevaluator may play any of a variety ofroles—trainer, measurement expert, internal

col league, ex ternal exper t , analys t ,spokesperson, mediator, etc.

The evaluator works with the targeted users todetermine the evaluation questions. Suchquestions are to be determined locally, mayaddress any of a wide range of concerns, andprobably will change over time. Example fociare processes, outcomes, impacts, costs, costbenefits, etc. The chosen questions are keptfront and center and provide the basis forinformation collection and reporting plans andactivities, so long as the users continue to valueand pay attention to the questions. Often,however, the evaluator and client group willadapt, change, or refine the questions as theevaluation unfolds.

All evaluation methods are fair game in theutilization-focused program evaluation. Theevaluator will creatively employ whatevermethods are relevant, e.g., quantitative andqualitative, formative and summative,naturalistic and experimental. As much aspossible, the utilization-focused evaluator putsthe client group in “the driver’s seat” indetermining evaluation methods, so that theywill make sure the evaluator addresses theirmost important questions; collects the rightinformation; applies the relevant values;addresses the key action-oriented questions;uses techniques they respect; interprets thefindings against a pertinent theory; reports theinformation in a form and at a time when it canbest be used; convinces stakeholders of theevaluation’s integrity and accuracy; andfacilitates the users’ study, application, and—asappropriate—dissemination of the findings.The bases for interpreting evaluation findingsare the users’ values, with the evaluatorengaging in much values clarification to ensurethat evaluative information and interpretationsserve the users’ purposes. The users are activelyinvolved in interpreting findings. Throughoutthe evaluation process, the evaluator balances

62 Stufflebeam

the concern for utility with provisions forvalidity and cost-effectiveness.

In general, the method of utilization-focusedprogram evaluation is labeled “active-reactive-adaptive and situationally responsive,”emphasizing that the methodology evolves inresponse to ongoing deliberations between theevaluator and client group and in consideration ofcontextual dynamics. Patton (1997) says that“Evaluators are active in presenting to intendedusers their own best judgments aboutappropriate evaluation focus and methods; theyare reactive in listening attentively andrespectfully to others’ concerns; and they areadaptive in finding ways to design evaluationsthat incorporate diverse interests . . . whilemeeting high standards of professionalpractice.”

Patton (1980, 1982, 1994, 1997) is the leadingproponent of utilization-focused evaluation.Others who have advocated for utilization-focused evaluations are Alkin (1995), Cronbachand Associates (1980), Davis and Salasin(1975), and the Joint Committee on Standardsfor Educational Evaluation (1981, 1994).

As defined by Patton, this approach hasvirtually universal applicability. It is situationaland can be tailored to meet any programevaluation assignment. It carries with it theintegrity of sound evaluation principles. Withinthese general constraints, the evaluatornegotiates all aspects of the evaluation to servespecific individuals who need to have a programevaluation performed and who intend to makeconcrete use of the findings. The evaluatorselects from the entire range of evaluationtechniques those that best suit the particularprogram evaluation. And the evaluator playsany of a wide range of evaluation andimprovement-related roles that fit the localneeds. The approach requires a substantialoutlay of time and resources by all participants

for both conducting the program evaluation andthe needed follow-through.

This approach is geared to maximizingevaluation impacts. It comports with a keyprinciple of change. Persons who are involvedin an enterprise, such as an evaluation, are morelikely to understand, value, and use it if theywere meaningfully involved in its development.As Patton says, “ . . . by actively involvingprimary intended users, the evaluator is trainingusers in use, preparing the groundwork for use,and reinforcing the intended utility of theevaluation . . . ” The approach engagesstakeholders to determine the evaluation’spurposes and procedures and uses theirinvolvement to promote use of findings. Ittakes a more realistic approach to stakeholderinvolvement than some other advocacyapproaches. Instead of trying to reach and workwith all stakeholders, Patton’s approach worksconcretely with a representative group of users.The approach places strong emphasis on valuesclarification and attends closely to contextualdynamics. The program evaluation mayselectively use any and all relevant evaluationprocedures and triangulates findings fromdifferent sources. Finally, this approach stressesthe need to meet all relevant standards forevaluations.

The approach’s main limitation is seen byPatton to be turnover of involved users.Replacement users may require that the programevaluation be renegotiated. This maybe necessary to sustain or renew the prospectsfor evaluation impacts. But it can also derail orgreatly delay the process. Also, the approachseems to be vulnerable to corruption by the usergroups, since they are given so much controlover what will be looked at, what questionsaddressed, and what information employed.Stakeholders with conflicts of interest mayinappropriately influence the evaluation.Empowered stakeholders may inappropriatelylimit the evaluation to only a subset of the


important questions. Also, it may be nigh untoimpossible to get a representative users group toagree on a sufficient commitment of time andsafeguards to assure an ethical, valid process ofdata collection, reporting, and use. Moreover,effective implementation of this approachrequires a highly competent, confident evaluatorwho can approach any situation flexibly withoutcompromising basic professional standards.Strong skills of negotiation are essential, and theevaluator(s) must possess expertise in the fullrange of quantitative and qualitative evaluationmethods, strong communication and politicalskills, and working knowledge of all applicablestandards for evaluations. Unfortunately, notmany evaluators are sufficiently trained andexperienced to meet these requirements.Nevertheless, the utilization-focused approach istied for second in the ranking of the 22approaches considered in this paper.

The utilization-focused approach to evaluationconcludes this paper’s discussion of the socialagenda/advocacy approaches to evaluation.These four approaches concentrate on makingevaluation an instrument of social justice andmodesty and candor in presenting findings thatoften are ambiguous and contradictory. Tables13 through 18 summarize the similarities anddifferences between these approaches inrelationship to advance organizers, purposes,characteristic questions, methods, strengths, andweaknesses.

64 Stufflebeam

Table 13: Comparison of the Four Social Agenda/Advocacy Evaluation Approaches onMost Common ADVANCE ORGANIZERS

Advance Organizers


19. Client-Centered/Responsive

20. Constructivist 21. DeliberativeDemocratic

22. Utilization-Focused

Evaluation users U

Evaluation uses U

Stakeholders’ concerns& issues in the programitself

U U U

Rationale for theprogram

U

Background of theprogram

U

Transactions/operations in theprogram

U

Outcomes U

Standards U

Judgments U

Collaborative, unfoldingnature of the inquiry U U U

Constructivistperspective

U

Rejection of positivism U

Democratic participation U U U U

Dialogue withstakeholders to validatetheir inputs

U


Table 14: Comparison of the Four Social Agenda/Advocacy Evaluation Approaches on Primary EVALUATION PURPOSES

EvaluationPurposes




22. Utilization-Based

Inform stakeholders about aprogram’s full countenance U

Conduct a continuous searchfor key questions & providestakeholders with usefulinformation as it becomesavailable

U U U

Learn how various groups seea program’s problems,strengths, and weaknesses

U U

Learn how stakeholders judgea program

U U

Learn how experts judge aprogram

U

Determine & make sense of avariety of constructions abouta program that exist amongstakeholders

U

Employ democraticparticipation in arriving at adefensible assessment of aprogram

U

Provide users the informationthey need to fulfill theirobjectives

U U U U

66 Stufflebeam

Table 15: Comparison of the Four Social Agenda/Advocacy Evaluation Approaches onCharacteristic EVALUATION QUESTIONS

CharacteristicEvaluation Questions





Were questions negotiated withstakeholders? U U U

What was achieved? U U

What were the impacts? U

How did the programoperate? U U

How do variousstakeholders judge theprogram?

U U U

How do experts judgethe program? U

What is the program’srationale? U U

What were the costs? U

What were the cost-benefits?

U


Table 16: Comparison of the Four Social Agenda/Advocacy Evaluation Approaches onMain EVALUATION METHODS

Characteristic Methods





Case study U U

Expressive objectives U

Purposive sampling U U

Observation U U

Adversary reports U

Story telling to conveycomplexity U

Sociodrama to focus on issues U

Redundant data collectionprocedures U

Collection & analysis ofstakeholders’ judgments

U

Hermeneutics to identifyalternative constructions

U

Dialectical exchange U

Consensus development U

Discussions with stakeholders U U

Surveys U U

Debates U

All relevant quantitative &qualitative, formative &summative, & naturalistic &experimental methods

U

68 Stufflebeam

Table 17: Comparison of the Four Social Agenda/Advocacy Evaluation Approaches onPrevalent STRENGTHS

Strengths Evaluation Approaches




Helps stakeholders to conduct their ownevaluations

U

Engages stakeholders to determine theevaluation’s purposes & procedures U U

Stresses values clarification U

Looks deeply into stakeholders’ own interests U

Searches broadly for relevant information U

Examines rationale, background, process, &outcomes

U

Attends closely to contextual dynamics U U U

Identifies both side effects & main effects U U

Balances descriptive & judgmental information U

Meaningfully engages the full range ofstakeholders

U U U

Engages a representative group of stakeholderswho are likely to apply the findings U

Empowers all stakeholders to influence & usethe evaluation for their purposes U

Collects & processes judgments from allinterested stakeholders

U U U

Fully discloses the evaluation process & findings U

Educates all participants U

Both divergent & convergent in searching forconclusions

U U

Selectively employs all relevant evaluationmethods

U U

Effectively uses qualitative methods U U U

Employs participants as evaluation instruments U

Triangulates findings from different sources U U U U

Focuses on the questions of interest to thestakeholders

U U U U

Directly works to make evaluations just U U U


Strengths Evaluation Approaches




Grounded in principles of democracy U

Assures democratic participation of stakeholdersin all stages of the evaluation

U

Uses dialogue to examine & authenticatestakeholders’ inputs

U

Rules out incorrect or unethical inputs fromstakeholders

U

Evaluator renders a final judgment, assuringclosure

U

Geared to maximize evaluation impacts U

Promotes use of findings through stakeholderinvolvement

U U U U

Stresses effective communication of findings U U

Stresses need to meet all relevant standards forevaluations

U

70 Stufflebeam

Table 18: Comparison of the Four Social Agenda/Advocacy Evaluation Approaches onPrevalent WEAKNESSES

Weaknesses Evaluation Approaches




May empower stakeholders to bias theevaluation

U

Evaluators may lose independence throughadvocacy

U U U

Divergent qualities may generate confusion& controversy

U

May bog down in an unproductive quest formultiple inputs & interpretations U U

Time consuming to work through divergent& convergent stages

U U

Low feasibility of involving & sustainingmeaningful participation of all stakeholders

U U U U

May place too much credence in abilities ofstakeholders to be credible informants

U U

Thwarts individual accountability U

May be unacceptable to clients who arelooking for firm conclusions

U U

Turnover of involved users may destroy theevaluation’s effectiveness

U

Empowered stakeholders mayinappropriately limit the evaluation to onlysome of the important questions

U U

Utopian, not yet developed for effective,efficient application

U

Open to possible bad influences on theevaluation via stakeholders’ conflicts ofinterest

U U U

VI. BEST APPROACHES FOR 21 CENTURYST

EVALUATIONS

As shown in the preceding parts, a variety ofevaluation approaches emerged during the 20th

century. Nine of these approaches appear to bestrongest and most promising for continued useand development beyond the year 2000. Asshown in the preceding analyses, the other 13approaches also have varying degrees of merit, but I chose in this section to converge attentionto the most promising approaches. The ratingsof these 9 approaches appear in Table 19. Theyare listed in order of merit, within the categoriesof Improvement/Accountability, SocialMission/Advocacy, and Questions/Methodsevaluation approaches. The ratings are inrelationship to the Joint Committee ProgramEvaluation Standards and were derived by theauthor using a special checklist keyed to theStandards.3

All nine of the rated approaches earned overallratings of Very Good, except Accreditation,which was judged Good overall. TheUtilization-Focused and Client-Centeredapproaches received Excellent ratings in thestandards areas of Utility and Feasibility, whilethe Decision/Accreditation approach was judgedExcellent in provisions for Accuracy. Therating of Good in the Accuracy area for theOutcomes Monitoring/Value-Added approachwas due not to low merit of what this approach’stechniques, but to the narrowness of questionsaddressed and information used; in its narrowsphere of application the OutcomesMonitoring/Value-Added approach providestechnically sound information. Thecomparatively lower ratings given to the

Accreditation approach results from its being alabor intensive, expensive approach; itssusceptibility to conflict of interest; itsoverreliance on self-reports and brief site visits;and its insular resistance to independentmetaevaluations. Nevertheless, the distinctlyAmerican and pervasive accreditation approachis entrenched. All who will use it are advised tostrengthen it in the areas of weakness identifiedin this paper. The Consumer-Oriented approachalso deserves its special place, with its emphasison independent assessment of developedproducts and services. While this consumerprotection approach is not especially applicableto internal evaluations for improvement, itcomplements such approaches with theoutsider, expert view that becomes importantwhen products and services are put up fordissemination.

The Case Study approach scored surprisinglywell, considering that it is focused on use of aparticular technique. An added bonus of thisapproach is that it can be employed as acomponent of any of the other approaches, or itcan be used by itself. As mentioned previouslyin this paper, the Democratic Deliberativeapproach is new and appears to be promising fortesting and further development. Finally, theConstructivist approach is a well-founded,mainly qualitative approach to evaluation thatsystematically engages interested parties to helpconduct both the divergent and convergentstages of evaluation. All in all, the nineapproaches summarized in Table 19 bode wellfor the future application and furtherdevelopment of alternative program evaluationapproaches.

The checklist used to evaluate each approach against3

the Joint Committee Program Evaluation Standards appears inthis paper’s appendix.

Table 19: RATINGS Strongest Program Evaluation Approaches Within types, listed in order of compliance with The Program Evaluation Standards

Evaluation Approach Graph of overall merit OverallScore &Rating

UTILITYRating

FEASIBILITYRating

PROPRIETYRating

ACCURACYRating

0 100 š P š F š G š VG š E š

IMPROVEMENT/ACCOUNTABILITY

Decision/Accountability ,,,,,,,,,,,,,,,,,, 92 (V G) 90 (V G) 92 (V G) 88 (V G) 98 (E)

Consumer Orientation ,,,,,,,,,,,,,,,, 81 (V G) 81 (V G) 75 (V G) 91 (V G) 81 (V G)

Accreditation ,,,,,,,,,,,, 60 (G) 71 (V G) 58 (G) 59 (G) 50 (G)

SOCIAL MISSION/ADVOCACY

Utilization-Based ,,,,,,,,,,,,,,,,, 87 (V G) 96 (E) 92 (E) 81 (V G) 79 (V G)

Client-Centered ,,,,,,,,,,,,,,,,, 87 (V G) 93 (E) 92 (E) 75 (V G) 88 (V G)

Democratic Deliberative ,,,,,,,,,,,,,,,, 83 (V G) 96 (E) 92 (V G) 75 (V G) 69 (V G)

Constructivist ,,,,,,,,,,,,,,,, 80 (V G) 82 (V G) 67 (V G) 88 (V G) 83 (V G)

QUESTIONS/METHODS

Case Study ,,,,,,,,,,,,,,,, 80 (V G) 68 (V G) 83 (V G) 78 (V G) 92 (V G)

Outcomes Monitoring/Value-Added ,,,,,,,,,,,,,, 72 (V G) 71 (V G) 92 (V G) 69 (V G) 56 (G)

The tests behind the ratings: The author rated each evaluation approach on each of the 30 Joint Committee program evaluation standards by judging whether the approach endorses eachof 10 key features of the standard. He judged the approach’s adequacy on each standard as follows: 9-10 Excellent, 7-8 Very Good, 5-6 Good, 3-4 Fair, 0-2 Poor. The score for theapproach on each of the 4 categories of standards (Utility, Feasibility, Propriety, Accuracy) was then determined by summing the following products: 4 x number of Excellent ratings, 3 xnumber of Very Good ratings, 2 x number of Good ratings, 1 x number of Fair ratings. Judgments of the approach’s strength in satisfying each category of standards were then determinedaccording to percentages of the possible quality points for the category of standards as follows: 93%-100% Excellent, 68%-92% Very Good, 50% -67% Good, 25%-49% Fair, 0%-24%Poor. This was done by converting each category score to the percent of the maximum score for the category and multiplied by 100. The 4 equalized scores were then summed, divided by4, and compared to the total maximum value, 100. The approach’s overall merit was then judged as follows: 93-100 Excellent, 68-92 Very Good, 50-67 Good, 25-49 Fair, 0-24 Poor. Regardless of the approach’s total score and overall rating, a notation of unacceptable would have been attached to any approach receiving a poor rating on the vital standards of P1 ServiceOrientation, A5 Valid Information, A10 Justified Conclusions, A11 Impartial Reporting. The author’s ratings were based on his knowledge of the Joint Committee Program EvaluationStandards, his many years of studying the various evaluation models and approaches, and his experience in seeing and assessing how some of these models and approaches worked inpractice. He chaired the Joint Committee on Standards for Educational Evaluation during its first 13 years and led the development of the first editions of both the program and personnelevaluation standards. Nevertheless, his ratings should be viewed as only his personal set of judgments of these models and approaches. Also, his conflict of interest is acknowledged, sincehe was one of the developers of the Decision/Accountability approach. The scale ranges in the above graphs are P =Poor, F=Fair, G=Good, VG=Very Good, E=Excellent.

72 Stufflebeam

Best Approaches for 21 Century Evaluation 73st

Conclusions

This completes the paper’s review of the 22approaches used to evaluate programs. Asstated at the paper’s beginning, a criticalanalysis of these approaches has importantimplications for the practitioner of evaluation,the theoretician who is concerned withdevising better concepts and methods, andthose engaged in professionalizing programevaluation.

A main point for the practitioner is thatevaluators may encounter considerabledifficulties if their perceptions of the studybeing undertaken differ from those of theirclients and audiences. Often, clients want apolitically advantageous study performed,while the evaluators want to conductquestions/methods-oriented studies that allowthem to exploit the methodologies in whichthey were trained. Moreover, audiencesusually want values-oriented studies that willhelp them determine the relative merits andworths of competing programs, or advocacyevaluations that will give them voice in theissues that affect them. If evaluators areignorant of the likely conflicts in purposes,the program evaluation is probably doomed tofailure from the start. The moral is, at theonset of the study, evaluators must be keenlysensitive to their own agendas for anevaluation study as well as those that are heldby the client and the other right-to-knowaudiences. Further, the evaluator shouldadvise involved parties of possible conflicts inthe evaluation’s purposes and should, at thebeginning, negotiate a common understandingof the evaluation’s purpose and theappropriate approach.

Presented alternatives legitimately could beeither a questions/methods (quasi-evaluation)study directed at assessing particularquestions, an improvement/accountability-oriented study, or a social agenda/advocacy

study. It is not believed, however, thatpolitically inspired and controlled studiesserve appropriate purposes in evaluatingprograms. Granted, they may be necessary inadministration and public relations, but theyshould not be confused with, or substitutedfor, sound evaluation. Moreover, it isimperative to remember that no one type ofstudy consistently is the best in evaluatingprograms. In the write-ups of the approaches,different ones are seen to work differentially

well depending on circumstances.

For the theoretician, a main point to begleaned from the review of the 22 types ofstudies is that they have inherent strengthsand weaknesses. In general, the weaknessesof the politically oriented studies are that theyare prone to manipulation by unscrupulouspersons and may help such people mislead anaudience into developing an unfounded,perhaps erroneous judgment of a program’smerit and worth. The main problem with thequestions/methods-oriented studies is thatthey often address questions that are morenarrow in scope than the questions needing tobe addressed in a true assessment of merit andworth. However, it is also noteworthy thatthese types of studies compete favorably withimprovement /accountabil i ty-or ientedevaluation studies and social agenda/advocacystudies in the efficiency of methodology andtechnical adequacy of information employed.Also, the improvement/accountability-oriented studies with their concentration onmerit and worth undertake a very ambitioustask, for it is virtually impossible to fully andunequivocally assess any program’s ultimateworth. Such an achievement would requireomniscience, infallibility, an unchangingenvironment, and an unquestioned, singularvalue base. Nevertheless, the continuingattempt to consider questions of merit andworth certainly is essential for theadvancement of societal programs. Finally,the social mission/advocacy studies are to be

74 Stufflebeam

applauded for their quest for equity as well asexcellence in the programs being studied.They model their mission by attempting tomake evaluation a participatory, democraticenterprise. Unfortunately, many pitfallsattend such utopian approaches to evaluation.Especially, these include susceptibility to biasand political subversion of the study andpractical constraints on involving, informing,and empowering all the stakeholders.

For the evaluation profession itself, thereview of program evaluation modelsunderscores the importance of evaluationstandards and metaevaluations. Professionalstandards are needed to obtain a consistentlyhigh level of integrity in uses of the variousprogram evaluation approaches. Alllegitimate approaches are enhanced whenkeyed to and assessed against professionalstandards for evaluations. In addition,benefits from evaluations are enhanced whenthey are subjected to independent reviewthrough metaevaluations.

As evidenced in this paper, the last half of the20 century saw considerable development ofth

program evaluation approaches. Many of theapproaches introduced in the 1960s and 1970shave been extensively refined and applied.The category of social agenda/advocacymodels has emerged as a new and importantpart of the program evaluation cornucopia.There is among the approaches anincreasingly balanced quest for rigor,relevance, and justice. Clearly, theapproaches are showing a strong orientationto stakeholder involvement and use ofmultiple methods.

Recommendations

In spite of the progress described above, thereis clearly a need for continuing efforts todevelop and implement better approaches toprogram evaluation. This is illustrated by

some of the authors’ hesitancy to accord thestatus of a model to their contributions orinclination to label them as utopian. As alsoseen in the paper, there are some approachesthat in the main seem to be a waste of time oreven counterproductive.

Theoreticians should diagnose strengths andweaknesses of existing approaches, and theyshould do so in more depth than demonstratedhere. They should use these diagnoses toevolve better, more defensible approaches andto help expunge the use of hopelessly flawedapproaches; they should work withpractitioners to operationalize and test thenew approaches; and, of course, both groupsshould collaborate in developing still betterapproaches. Such an ongoing process ofcritical review and development is essential ifthe field of program evaluation is not tostagnate, but instead is to provide vitalsupport for advancing programs and services.

Therefore, it is necessary, indeed essential,that evaluators develop a repertoire ofdifferent program evaluation approaches sothey can selectively apply them individually orin combination to best advantage. Going outon the proverbial limb, but also based onthe preceding analysis, the best approachesseem to be decision/accountability,utilization-based, client-centered, consumer-oriented, case study, democratic deliberative,constructivist, accreditation, and outcomesmonitoring. The worst bets, in my judgment,are the politically controlled, public relations,accountability (especially payment by results),clarification hearings, and program theory-based approaches. The rest fall somewhere inthe middle. While House and Howe’s (1998)democratic deliberative approach is new andin their view utopian, it has many elements ofa sound, effective evaluation approach andmerits study, further development, and trial.


Evaluation training programs shouldeffectively address the ferment over anddevelopment of new program evaluationapproaches. Evaluation trainers shoulddirectly teach their students about theexpanding and increasingly sophisticatedprogram evaluation approaches. Theseapproaches will serve well when evaluatorscan discern which approaches are worth usingand which are not, when they clearlyunderstand the worthy approaches, andprovided they know when and how to applythem. The most likely scenario is that presentapproaches will be extended and refinedrather than completely new approaches beingdeveloped. Therefore, a knowledge of theseapproaches is very important.

In addition, evaluators should regularly trainthe participants in their evaluations in theselected approach’s logic, rationale, process,and pitfalls. This will enhance thestakeholders’ cooperation and constructiveuse of findings.

Finally, evaluators are advised to adopt andregularly apply professional standards forsound program evaluations. They should usethe standards to guide development of betterevaluation approaches. They should applythem in choosing and tailoring approaches.They should engage external evaluators toapply the standards in assessing evaluationsthrough the process called metaevaluation.They should also contribute to improvementsin the professional standards. In accordancewith The Program Evaluation Standards(Joint Committee, 1994), program evaluatorsshould develop and selectively applyevaluation approaches that in the particularcontexts will meet the conditions of utility,feasibility, propriety, and accuracy.

Notes

1. Stake, R. E. Nine approaches toevaluation. Unpublished chart. Urbana,Illinois: Center for Instructional Research

and Curriculum Evaluation, 1974.2. Hastings, T. A portrayal of the changing

evaluation scene. Keynote speech at theannual meeting of the EvaluationNetwork, St. Louis, Missouri, 1976.

3. Guba, E. G. Alternative perspectives onevaluation. Keynote speech at the annualmeeting of the Evaluation Network, St.Louis, Missouri, 1976.

4. Presentation by Robert W. Travers in aseminar at the Western MichiganU n i v e r s i t y E v a l u a t i o n C e n t e r ,Kalamazoo, Michigan, October 24, 1977.

5. Stenner, A. J., and Webster, W. J. (Eds.)T e c h n i c a l a u d i t i n g p r o c e d u r e s .Educational product audit handbook, 38-103. Arlington, Virginia: Institute forthe Development of EducationalAuditing, 1971.

6. Eisner, E. W. The perceptive eye:Toward the reformation of evaluation.Paper presented at the annual meeting of theA me r i can Educa t i o n a l Re s e a r c hAssociation, Washington, DC, March1975.

7. Webster, W. J. The organization andfunctions of research and evaluation inlarge urban school districts. Paperpresented at the annual meeting of theA me r i can Educa t i o n a l R e s e a r c hAssociation, Washington, DC, March1975.

8. Glass, G. V. Design of evaluationstudies. Paper presented at the Councilfor Exceptional Children SpecialConference on Early Chi ldhoodEducation, New Orleans, Louisiana,1969.

76 Stufflebeam

Bibliography

Aguaro, R. (1990). R. Deming: TheAmerican who taught the Japanese aboutquality. New York: Fireside.

Alkin, M. C. (1969). Evaluation theorydevelopment. Evaluation Comment, 2, 2-7.

Alkin, M. C. (1995, November). Lessonslearned about evaluation use. Panelpresentation at the International EvaluationC o n f e r e n c e , A m e r i c a n E v a l u a t i o nAssociation, Vancouver, British Columbia.

Baker, E. L, O’Neil, H. R., & Linn, R. L.(1993). Policy and validity prospects forperformance-based assessment. AmericanPsychologist, 48, 1210-1218.

Bandura, A. (1977). Social learningtheory. Englewood Cliffs, NJ: Prentice-Hall.

Bayless, D., & Massaro, G. (1992).Quality improvement in education today andthe future: Adapting W. Edwards Deming’squality improvement principles and methodsto education. Kalamazoo, MI: Center forResearch on Educational Accountability andTeacher Evaluation.

Becker, M. H. (Ed.) (1974). The healthbelief model and personal health behavior[Entire issue]. Health Education Monographs,2, 324-473.

Bhola, H. S. (1998). Program evaluationfor program renewal: A study of the nationalliteracy program in Namibia (NLPN). Studiesin Educational Evaluation, 24(4), 303-330.

Bickman, L. (1990). Using programtheory to describe and measure program quality.In L. Bickman (Ed.) , Advances inProgram Theory. New Directions in ProgramEvaluation. San Francisco: Jossey-Bass.

Bloom, B. S., Englehart, M. D., Furst, E.J., Hill, W. H., & Krathwohl, D. R. (1956).Taxonomy of educational objectives:Handbook I: Cognitive domain. New York:David McKay.

Boruch, R. F. (1994). The future ofcontrolled randomized experiments: Abriefing. Evaluation Practice, 15(3), 265-274.

Bryk, A. S. (Ed.) (1983). Stakeholder-based evaluation. San Francisco: Jossey-Bass.

Campbell, D. T. (1975). Degrees offreedom and the case study. ComparativePolitical Studies, 8, 178-193.

Campbell, D. T., & Stanley, J. C. (1963).Experimental and quasi-experimental designsfor research on teaching. In N. L. Gage (Ed.),Handbook of research on training. Chicago:Rand McNally.

Campbell, D. T., & Stanley, J. C. (1966).Experimental and quasi-experimental designsfor research. Boston, MA: Houghton Mifflin.

Chen, H. (1990). Theory drivenevaluations. Newbury Park, CA: Sage.

Coffey, A., & Atkinson, P. (1996).Making sense of qual i ta t ive data:Complementary re search s t rategies .Thousand Oaks, CA: Sage.

Cook, D. L. (1966). Program evaluationand review techniques, applications ineducation. U.S. Office of EducationCooperative Monograph, 17 (OE-12024).

Cronbach, L. J. (1963). Courseimprovement through evaluation. TeachersCollege Record, 64, 672-83.


Cronbach, L. J. (1982). Designingevaluations of educational and socialprograms. San Francisco: Jossey-Bass.

Cronbach, L. J., & Associates. (1980).Toward reform of program evaluation. SanFrancisco: Jossey-Bass.

Cronbach, L. J., & Snow, R. E. (1969).Individual differences in learning ability as afunction of instructional variables. Stanford,CA: Stanford University Press.

Davis, H. R., & Salasin, S. E. (1975). Theutilization of evaluation. In E. L. Struening &M. Guttentag (Eds.), Handbook of evaluationresearch, Vol. 1. Beverly Hills, CA: Sage.

Debus, M. (1995). Methodologicalreview: A handbook for excellence in focusgroup research. Washington, DC: Academyfor Educational Development.

Deming, W. E. (1986). Out of the crisis.Cambridge, MA: Center for AdvancedEngineering Study, Massachusetts Institute ofTechnology.

Denny, T. (1978, November). Storytelling and educational understanding.Occasional Paper No. 12. Kalamazoo, MI:Evaluation Center, Western MichiganUniversity.

Denzin, N. K., & Lincoln, Y. S. (Eds.).(1994). Handbook of qualitative research.Thousand Oaks, CA: Sage.

Ebel, R. L. (1965). Measuringeducational achievement. Englewood Cliffs,NJ: Prentice-Hall.

Eisner, E. W. (1975, March). Theperceptive eye: Toward a reformation ofeducational evaluation. Invited address,Division B, Curriculum and Objectives,

American Educational Research Association,Washington, DC.

Eisner, E. W. (1983). Educationalconnoisseurship and criticism: Their form andfunctions in educational evaluation. In G. F.Madaus, M. Scriven, & D. L. Stufflebeam(Eds.), Evaluation models. Boston: Kluwer-Nijhoff.

Ferguson, R.(1999, June). Ideologicalmarketing. The Education Industry Report.

Fetterman, D. (1989). Ethnography: Stepby step. Applied Social Research MethodsSeries, 17. Newbury Park, CA: Sage.

Fetterman, D. (1994, February).Empowerment evaluation. EvaluationPractice, 15(1).

Fetterman, D., Shakeh, J. K., &Wandersman, (Eds.). (1996). Empowermentevaluation: Knowledge and tools for self-assessment & accountability. Thousand Oaks,CA: Sage.

Fisher, R .A. (1951). The design ofexperiments (6 ed.) New York: Hafner.th

Flanagan, J. C. (1939). Generalconsiderations in the selection of test itemsand a short method of estimating the product-moment coefficient from data at the tails ofthe distribution. Journal of EducationalPsychology, 30, 674-80.

Flexner, A. (1910). Medical education inthe United States and Canada. Bethesda, MD:Science and Health Publications.

Flinders, D. J., & Eisner, E. W. (1994,December). Educational criticism as a form ofqualitative inquiry. Research in the Teachingof English, 28(4), 341-356.

78 Stufflebeam

Glaser, B. G., & Strauss, A. L.(1967).The discovery of grounded theory.Chicago: Aldine.

Glass, G. V. (1975). A paradox aboutexcellence of schools and the people in them.Educational Researcher, 4, 9-13.

Glass, G. V, & Maguire, T. O. (1968).Analysis of time-series quasi-experiments.(U.S. Office of Education Report No. 6-8329.) Boulder: Laboratory of EducationalResearch, University of Colorado.

Green, L. W., & Kreuter, M. W. (1991).In Heal th promotion planning: Aneducational and environmental approach, 2nd

Edition (pp. 22-30). Mountain View, CA:Mayfield Publishing.

Greenbaum, T. L. (1993). The handbookof focus group research. New York:Lexington Books.

Guba, E. G. (1969). The failure ofeducat iona l evaluat ion . Educat ionalTechnology, 9, 29-38.

Guba, E. G. (1978). Toward amethodology of naturalistic inquiry inevaluation. CSE Monograph Series inEvaluation. Los Angeles: Center for the Studyof Evaluation.

Guba, E. G., & Lincoln, Y. S. (1981).Effective evaluation. San Francisco: Jossey-Bass.

Guba, E. G., & Lincoln, Y. S. (1989).Fourth generation evaluation. Newbury Park,CA: Sage.

Hart, D. (1994). Authentic assessment: Ahandbook for educators. Menlo Park, CA:Addison-Wesley.

Hambleton, R. K., & Swaminathan, H.(1985). Item response theory. Boston:Kluwer-Nijhoff.

Hammond, R. L. (1972). Evaluation atthe local level. (mimeograph). Tucson, AZ:EPIC Evaluation Center.

Herman, J. L., Gearhart, M. G., & Baker,E. L. (1993). Assessing writing portfolios:Issues in the validity and meaning of scores.Educational Assessment, 1, 201-224.

House, E. R. (1980). Evaluating withvalidity. Beverly Hills, CA: Sage.

House, E. R. (1983). Assumptionsunderlying evaluation models. In G. F.Madaus, M. Scriven, & D. L. Stufflebeam(Eds.), Evaluation models. Boston: Kluwer-Nijhoff.

House, E. R. (1993). Professionalevaluation–Social impact and politicalconsequences. Newbury Park, CA: Sage.

House, E. R., & Howe, K. R. (1998).Deliberative democratic evaluation inpractice. Boulder: University of Colorado.

Janz, N. K., & Becker, M. H.. (1984).The health belief model: A decade later.Health Education Quarterly, 11, 1-47.

Joint Committee on Standards forEducational Evaluation. (1981). Standards forevaluations of educational programs,projects, and materials. New York: McGraw-Hill.

Joint Committee on Standards forEducational Evaluation. (1994). The programevaluation standards: How to assessevaluations of educational programs.Thousand Oaks, CA: Sage.


Kaplan, A. (1964). The conduct ofinquiry. San Francisco: Chandler.

Karlsson, O. (1998). Socratic dialogue inthe Swedish political context. In T. A.Schwandt (Ed.), Scandinavian perspectives onthe evaluator’s role in informing socialpolicy. New Directions for Evaluation, 77,21-38.

Kaufman, R. A. (1969, May). Towardeducational system planning: Alice ineducationland. Audiovisual Instructor, 14,47-48.

Kee, J. E. (1995). Benefit-cost analysis inprogram evaluation. In J. S. Wholey, H. P.Hatry, & K. E. Newcomer, Handbook ofpractical program evaluation, pp. 456-488.San Francisco: Jossey-Bass.

Kentucky Department of Education.(1993). Kentucky results information system,1991-92 technical report. Frankfort, KY:Author.

Kidder, L., & Fine, M. (1987).Qualitative and quantitative methods: Whenstories converge. Multiple methods inprogram evaluation. New Directions forProgram Evaluation, 35. San Francisco:Jossey-Bass.

Kirst, M. W. (July, 1990). Accountability:Implications for state and local policymakers.In Policy Perspectives Series. Washington,DC: Information Services, Office ofEducational Research and Improvement, U.S.Department of Education.

Koretz, D. (1986). The validity of gains inscores on the Kentucky Instructional ResultsInformation System (KIRIS). Santa Monica,CA: Rand Education.

Koretz, D. (1996). Using studentassessments for educational accountability. InR. Hanushek (Ed.) , Improving theperformance of America’s schools, pp. 171-196. Washington, DC: National AcademyPress.

Koretz, D. M., & Barron, S. I. (1998).The validity of gains in scores on theKentucky Instructional Results InformationSystem (KIRIS). Santa Monica, CA: RandEducation.

Kvale, S. (1995). The social constructionof validity. Qualitative Inquiry, 1, 19-40.

Lessinger, L. M. (1970). Every kid awinner: Accountability in education. NewYork: Simon and Schuster.

Levin, H. M. (1983). Cost-effectiveness:A primer. New Perspectives in Evaluation, 4.Newbury Park, CA: Sage.

Levine, M. (1974, September). Scientificmethod and the adversary model. AmericanPsychologist, 666-677.

Lincoln, Y. S., & Guba, E. G. (1985).Naturalistic inquiry. Beverly Hills, CA: Sage.

Lindquist, E. F. (Ed.) (1951). Educationalmeasurement. Washington, DC: AmericanCouncil on Education.

Lindquist, E. F. (1953). Design and analysisof experiments in psycho logy andeducation. Boston: Houghton-Mifflin.

Linn, R. L., Baker, E. L., & Dunbar, S. B.(1991). Complex, performance-basedassessment: Expectations and validationcriteria. Educational Researcher, 20(8), 15-21.

80 Stufflebeam

Lofland, J., & Lofland, L. H. (1995).Analyzing social settings: A guide toqualitative observation and analysis, 3 Ed.rd

Belmont, CA: Wadsworth.

Lord, F. M., & Novick, M. R. (1968).Statistical theories of mental test scores.Reading, MA: Addison-Wesley.

MacDonald, B. (1975). Evaluation andthe control of education. In D. Tawney (Ed.),Evaluation: The state of the art. London:Schools Council.

McLean, R. A., Sanders, W. L., & Stroup,W. W. (1991). A unified approach to mixedlinear models. The American Statistician, 45,54-64.

Madaus, G. F., & Stufflebeam, D. L.;(1988) Educational evaluation: The classicalwritings of Ralph W. Tyler. Boston: Kluwer.

Mehrens, W. A. (1972). Usingperformance assessment for accountabilitypurposes. Educational Measurement: Issuesand Practice, 11(1), 3-10.

Merton, R. K., Fiske, M., & Kendall, P.L. (1990). The focused interview: A manual ofproblems and procedures, 2 Ed. New York:nd

The Free Press.

Messick, S. (1994). The interplay ofevidence and consequences in the validationof performance assessments. EducationalResearcher, 23(3), 13-23.

Metfessel, N. S., & Michael, W. B.(1967). A paradigm involving multiplecriterion measures for the evaluation of thee f f e c t i v e n e s s o f s c h o o l p r o gr a ms .Educational and Psychological Measurement,27, 931-43.

Miles, M. B., & Huberman, A. M. (1994).Qualitative data analysis: An expandedsourcebook. Thousand Oaks, CA: Sage.

Miron, G. (1998). Chapter in LeneBuchert (Ed.), Education reform in the southin the 1990s. Paris: UNESCO.

Mullen, P. D., Hersey, J., & Iverson, D.C. (1987). Health behavior models compared.Social Science and Medicine, 24, 973-981.

National Science Foundation. (1993). User-friendly handbook for project evaluation: Science,mathematics, engineering and technologyeducation. NSF 93-152. Arlington, VA: Author.

National Science Foundation. (1997).User-friendly handbook for mixed methodevaluations. NSF 97-153. Arlington, VA:Author.

Nave, B., Misch, E. J., & Mosteller. (Inpress). A rare design: The role of field trialsin evaluating school practices. In G. Madaus,D. L. Stufflebeam, & T. Kellaghan (Eds.),Evaluation models. Boston: KluwerAcademic Publishers.

Nevo, D. (1993). The evaluation mindedschool: An application of perceptions fromprogram evaluation. Evaluation Practice,14(1), 39-47.

Owens, T. (1973). Educational evaluationby adversary proceeding. In E. House (Ed.),School evaluation: The politics and process.Berkeley, CA: McCutchan.

Parlett, M., & Hamilton, D. (1972).Evaluation as illumination: A new approachto the study of innovatory programs.Edinburgh: Centre for Research in theEducational Sciences, Universi ty ofEdinburgh, Occasional Paper No. 9.

Patton, M. Q. (1980). Qualitativeevaluation methods. Beverly Hills, CA: Sage.

Patton, M. Q. (1982). Practicalevaluation. Beverly Hills, CA: Sage.


Patton, M. Q. (1990). Qualitativeevaluation and research methods, 2 Ed.nd

Newbury Park, CA: Sage.

Patton, M. Q. (1994). Developmentalevaluation. Evaluation Practice, 15(3), 311-319.

Patton, M. Q. (1997). Utilization-focusedevaluation: The new century text (3 Ed.).rd


Peters, T. J., & Waterman, R. H. (1982).In search of excellence. New York: WarnerBooks.

Platt, J. (1992). Case study in Americanmethodological thought. Current Sociology,40(1), 17-48.

Popham, W. J. (1969). Objectives andinstruction. In R. Stake (Ed.), Instructionalobjectives. AERA Monograph Series onCurriculum Evaluation, (Vol. 3). Chicago:Rand McNally.

Popham, W. J., & Carlson, D. (1983).Deep dark deficits of the adversary evaluationmodel. In G. F. Madaus, M. Scriven, & D. L.Stufflebeam, (Eds.), Evaluation models.Boston: Kluwer-Nijhoff.

Prochaska, J. O., & DiClemente, C. C.(1992). Stages of change in the modificationof problem behaviors. In M. Hersen, R. M.Eisler, & P. M. Miller (Eds.), Progress inbehavior modification, 28. Sycamore, IL:Sycamore Publishing Company.

Provus, M. N. (1969). Discrepancyevaluation model. Pittsburgh: PittsburghPublic Schools.

Provus, M. N. (1971). Discrepancyevaluation. Berkeley, CA: McCutcheon.

Rippey, R. M. (Ed.). (1973). Studies intransactional evaluation. Berkeley, CA:McCutcheon.

Rogers, P. R. (In press). Program theory:Not whether programs work but how theywork. In G. Madaus, D. L. Stufflebeam, & T.Kellaghan (Eds.), Evaluation models. Boston:Kluwer Academic Publishers.

Rossi, P. H., & Freeman, H. E. (1993).Evaluation: A systematic approach (5 ed.).th


Sanders, W. L. (1989). Using customizedstandardized tests. (Contract No. R-88-062003) Washington, DC: Office ofEducational Research and Improvement, U. S.Department of Education. (ERIC Digest No.ED 314429)

Sanders, W. L., & Horn, S. P. (1994). TheTennessee value-added assessment system(TVAAS): Mixed model methodology ineducational assessment. Journal of PersonnelEvaluation in Education, 8(3) 299-311.

Schatzman, L., & Strauss, A. L. (1973).Field research. Englewood Cliffs, NJ:Prentice-Hall.

Schwandt, T. A. (1984). An examinationof alternative models for socio-behavioralinquiry. Unpublished Ph.D. dissertation,Indiana University.

Scriven, M. S. (1967). The methodologyof evaluation. In R. E. Stake (Ed.) Curriculumevaluation. AERA Monograph Series onCurriculum Evaluation (Vol. 1). Chicago: Rand McNally.

Scriven, M. (1974). Evaluationperspectives and procedures. In W. J. Popham(Ed.), Evaluation in education: Currentapplications. Berkeley, CA: McCutcheon.

82 Stufflebeam

Scriven, M. (1991). Evaluation thesaurus.Newbury Park, CA: Sage.

Scriven, M. (1993, Summer). Hard-wonlessons in program evaluation. NewDirections. San Francisco: Jossey-Bass.

Scriven, M. (1994a). Evaluation as adiscipline. Studies in Educational Evaluation,20(1), 147-166.

Scriven, M. (1994b). The final synthesis.Evaluation Practice, 15(3), 367-382.

Scriven, M. (1994c). Product evaluation:The state of the art. Evaluation Practice,15(1), 45-62.

Seidman, I. E. (1991). Interviewing asqualitative research: A guide for researchersin education and social sciences. New York:Teachers College Press.

Shadish, W. R., Cook, T. D., & Leviton,L. C. (1991). Foundations of programevaluation. Newbury Park, CA: Sage.

Smith, M. F. (1989). Evaluabilityassessment: a practical approach. Boston:Kluwer Academic Publishers.

Smith, N. L. (1987). Toward thejustification of claims in evaluation research.Evaluation and program planning, 10(4),309-314.

Smith, L. M., & Pohland, P. A. (1974).Educational technology and the ruralhighlands. In L. M. Smith (Ed.), Fourexamples: Economic, anthropological,narrative, and portrayal (AERA Monographon Curriculum Evaluation). Chicago: RandMcNally.

Stake, R. E. (1967). The countenance ofeducational evaluation. Teachers CollegeRecord, 68, 523-540.

Stake, R. E. (1970). Objectives, priorities,and other judgment data. Review ofEducational Research, 40, 181-212.

Stake, R. E. (1971). Measuring whatlearners learn. (mimeograph). Urbana, IL:Center for Instructional Research andCurriculum Evaluation.

Stake, R. E. (1975a). Evaluating the artsin education: A responsive approach.Columbus, OH: Merrill.

Stake, R. E. (1975b, November).Program evaluation: Particularly responsiveevaluation. Kalamazoo: Western MichiganUniversity Evaluation Center, OccasionalPaper No. 5.

Stake, R. E. (1976). A theoreticalstatement of responsive evaluation. Studies inEducational Evaluation, 2, 19-22.

Stake, R. E. (1978). The case-studymethod in social inquiry. EducationResearcher, 7, 5-8.

Stake, R. E. (1979). Should educationalevaluation be more objective or moresubjective? Educational Evaluation andPolicy Analysis.

Stake, R. E. (1983). Program evaluation,particularly responsive evaluation. In G. F.Madaus, M. Scriven, & D. L. Stufflebeam(Eds.), Evaluation models, pp. 287-310.Boston: Kluwer-Nijhoff.

Stake, R. E. (1988). Seeking sweet water.In R. M. Jaeger (Ed.), Complementarymethods for research in education, pp. 253-300. Washington, DC: American EducationalResearch Association.


Stake, R. E. (1994). Case studies. In N. K.Denzin & Y. S. Lincoln (Eds.), Handbook ofqualitative research, pp. 236-247. ThousandOaks, CA: Sage.

Stake, R. E. (1995). The art of case studyresearch. Thousand Oaks, CA: Sage.

Stake, R. E., & Easley, J. A., Jr. (Eds.) (1978).Case studies in science education,1(2). NSF Project 5E-78-74. Urbana, IL:CIRCE, University of Illinois College ofEducation.

Stake, R. E., & Gjerde, C. (1971). Anevaluation of TCITY: The Twin City Institutefor Talented Youth. Kalamazoo, MI: WesternMichigan University Evaluation Center,Occasional Paper Series No. 1.

Steinmetz, A. (1983). The discrepancyevaluation model. In G. F. Madaus, M.Scriven, & D. L. Stufflebeam (Eds.),Evaluation models, pp. 79-100. Boston:Kluwer-Nijhoff.

Stillman, P. L., Haley, H. A., Regan, M.B., Philbin, M. M., Smith, S. R., O’Donnell,J., & Pohl, H. (1991). Positive effects of aclinical performance assessment program.Academic Medicine, 66, 481-483.

Stufflebeam, D. L. (1966, June). A depthstudy of the evaluation requirement. TheoryInto Practice, 5, 121-34.

Stufflebeam, D. L. (1967, June). The useof and abuse of evaluation in Title III. TheoryInto Practice, 6, 126-33.

Stufflebeam, D. L. (1997). A standards-based perspective on evaluation. In R. L.Stake, Advances in program evaluation, 3,pp. 61-88.

Stufflebeam, D. L., Foley, W. J., Gephart,W. J., Guba, E. G., Hammond, R. L.,Merriman, H. O., & Provus, M. M. (1971).Educational evaluation and decision making.Itasca, IL: Peacock.

Stufflebeam, D. L., & Shinkfield, A. J.(1985). Systematic evaluation. Boston:Kluwer-Nijhoff.

Suchman, E. A. (1967). Evaluativeresearch . New York: Russell SageFoundation.

Swanson, D. B., Norman, R. N., & Linn,R. L. (1995 June/July). Performance-basedassessment: Lessons from the healthprofessions. Educational Researcher, 24(5),5-11.

Tennessee Board of Education. (1992).The master plan for Tennessee schools 1993.Nashville: Author.

Thorndike, R. L. (1971). Educationalmeasurement (2 ed.). Washington, DC:nd

American Council on Education.

Torrance, H. (1993). Combiningmeasurement -driven ins truc t ion wi thau thent ic assessment : Some in i t i a lobservations of national assessment inEngland and Wales. Educational Evaluationand Policy Analysis, 15, 81-90.

Tsang, M. C. (1997, Winter). Costa n a l ys i s f o r i mp r oved educa t i o n a l

84 Stufflebeam

policymaking and evaluation. EducationalEvaluation and Policy Analysis, 19(4), 318-324.

Tyler, R. W., et al. (1932). Service studiesin higher education. Columbus, OH: TheBureau of Educational Research, The OhioState University.

Tyler, R. W. (1942). General statement onevaluation. Journal of Educational Research,35, 492-501.

Tyler, R. W. (1950). Basic principles ofcurriculum and instruction. Chicago:University of Chicago Press.

Tyler, R. W. (1966). The objectives andplans for a national assessment of educationalp r o gr e s s . J o u r n a l o f E d u c a t i o n a lMeasurement, 3, 1-10.

Tymms, P. (1995). Setting up a national“value-added” system for primary educationin England: Problems and possibilities. Paperpresented at the National Evaluation Institute,Kalamazoo, MI.

Vallance, E. (1973). Aesthetic criticism andc u r r i c u l u m d e s c r i p t i o n . P h . D .dissertation, Stanford University.

Webster, W. J. (1995). The connectionbetween personnel evaluation and schoolevaluation. Studies in EducationalEvaluation, 21, 227-254.

Webster, W. J., Mendro, R. L., &Almaguer, T. O. (1994). Effectivenessindices: a “value-added” approach tomeasuring school effect. Studies inEducational Evaluation, 20, 113-145.

Weiss, C. H. (1972). Evaluation.Englewood Cliffs, NJ: Prentice Hall.

Weiss, C. H. (1995). Nothing as practicalas good theory: Exploring theory-basedevaluation for comprehensive communityinitiatives for children and families. In J.Connell, A. Kubisch, L. B. Schorr, & C. H.Weiss (Eds.), New approaches to evaluatingcommunity initiatives. New York: AspenInstitute.

Weitzman, E. A., & Miles, M. B. (1995).A software sourcebook: Computer programsfor qualitative data analysis. Thousand Oaks,CA: Sage.

Wholey, J. S. (1995). Assessing thefeasibility and likely usefulness of evaluation.In J. S. Wholey, H. P. Hatry, & K. E.Newcomer. (1995). Handbook of practicalprogram evaluation, pp. 15-39. SanFrancisco: Jossey-Bass.

Wiggins, G. (1989). A true test: Towardmore authentic and equitable assessment. PhiDelta Kappan, 70, 703-713.

Wiley, D. E., & Bock, R. D. (1967,Winter). Quasi-experimentation in educationalsettings: Comment. The School Review, 353-66.

Wolcott, H. F. (1994). Transformingqualitative data: Description, analysis andinterpretation. Thousand Oaks, CA: Sage.

Wolf, R. L. (1975, November). Trial byjury: A new evaluation method. Phi DeltaKappan, 3(57), 185-87.

Worthen, B. R., & Sanders, J. R. (1987).Educat ional eva lua t ion: Al terna t iveapproaches and practical guidelines. WhitePlains, NY: Longman.

Worthen, B. R., Sanders, J. R., &Fitzpatrick, J. L. (1997). Program evaluation,2 ed. New York: Longman.nd


Yin, R. K. (1989). Case study research:Design and method. Newbury Park, CA:Sage.

Yin, R. K. (1992). The case study as atool for doing evaluation. Current Sociology,40(1), 121-137.

APPENDIX

Checklist for Rating Evaluation Approaches in Relationship toThe Joint Committee Program Evaluation Standards

Appendix 89

METAEVALUATION CHECKLIST:for Evaluating Evaluation Models against The Program Evaluation Standards

To meet the requirements for UTILITY, evaluations using the

Evaluation model should:

U1 Stakeholder Identification

Clearly identify the evaluation client

Engage leadership figures to identify other stakeholders

Consult potential stakeholders to identify theirinformation needs

Use stakeholders to identify other stakeholders

With the client, rank stakeholders for relativeimportance

Arrange to involve stakeholders throughout theevaluation

Keep the evaluation open to serve newly identifiedstakeholders

Address stakeholders’ evaluation needs

Serve an appropriate range of individual stakeholders

Serve an appropriate range of stakeholder organizations

9-10: Excellent; 7-8: Very Good; 5-6: Good; 3-4: Fair; 0-2 Poor

U2 Evaluator Credibility

Engage competent evaluators

Engage evaluators whom the stakeholders trust

Engage evaluators who can address stakeholders’concerns

Engage evaluators who are appropriately responsive toissues of gender, socioeconomic status, race, &language & cultural differences

Assure that the evaluation plan responds to keystakeholders’ concerns

Help stakeholders understand the evaluation plan

Give stakeholders information on the evaluation

Plan’s technical quality and practicality

Attend appropriately to stakeholders’ criticisms &suggestions

Stay abreast of social & political forces

Keep interested parties informed about the evaluation’sprogress


U3 Information Scope and Selection

Understand the client’s most important evaluationrequirements

Interview stakeholders to determine their differentperspectives

Assure that evaluator & client negotiate pertinentaudiences, questions, & required information

Assign priority to the most important stakeholders

Assign priority to the most important questions

Allow flexibility for adding questions during theevaluation

Obtain sufficient information to address thestakeholders’ most important evaluation questions

Obtain sufficient information to assess the program’smerit

Obtain sufficient information to assess the program’sworth

Allocate the evaluation effort in accordance with thepriorities assigned to the needed information


U4 Values Identification

Consider alternative sources of values for interpretingevaluation findings

Provide a clear, defensible basis for value judgments

Determine the appropriate party(s) to make thevaluational interpretations

Identify pertinent societal needs

Identify pertinent customer needs

Reference pertinent laws

Reference, as appropriate, the relevant institutionalmission

Reference the program’s goals

Take into account the stakeholders’ values

As appropriate, present alternative interpretations based on conflicting but credible value bases


90 Stufflebeam

U5 Report Clarity

Clearly report the essential information

Issue brief, simple, & direct reports

Focus reports on contracted questions

Describe the program & its context

Describe the evaluation’s purposes, procedures, &findings

Support conclusions & recommendations

Avoid reporting technical jargon

Report in the language(s) of stakeholders

Provide an executive summary

Provide a technical report


U6 Report Timeliness and Dissemination

Make timely interim reports to intended users

Deliver the final report when it is needed

Have timely exchanges with the program’s policyboard

Have timely exchanges with the program’s staff

Have timely exchanges with the program’s customers

Have timely exchanges with the public media

Have timely exchanges with the full range of right-to-know audiences

Employ effective media for reaching &informing thedifferent audiences

Keep the presentations appropriately brief

Use examples to help audiences relate the findings topractical situations


U7 Evaluation Impact

Maintain contact with audiences

Involve stakeholders throughout the evaluation

Encourage and support stakeholders’ use of thefindings

Show stakeholders how they might use the findings intheir work

Forecast and address potential uses of findings

Provide interim reports

Make sure that reports are open, frank, & concrete

Supplement written reports with ongoing oralcommunication

Conduct feedback workshops to go over & applyfindings

Make arrangements to provide follow-up assistance ininterpreting & applying the findings


Scoring the Evaluation Model for UTILITY

Add the following:

No. of Excellent ratings (0-7) x 4 =

No. of Very Good (0-7) x 3 =

No. of Good (0-7) x 2 =

No. of Fair (0-7) x 1 =

Total score: =

Strength of the Model’s Provisions for UTILITY

26 (93%) to 28: Excellent

19 (68%) to 25: Very Good

14 (50%) to 18: Good

7 (25%) to 13: Fair

0 (0%) to 5: Poor

(Total score) ÷ 28 = x 100 =

Appendix 91

To meet the requirements for FEASIBILITY, evaluations using the evaluation model should:

F1 Practical Procedures

Tailor methods & instruments to informationrequirements

Minimize disruption

Minimize the data burden

Appoint competent staff

Train staff

Choose procedures that the staff are qualified to carry out

Choose procedures in light of known constraints

Make a realistic schedule

Engage locals to help conduct the evaluation

As appropriate, make evaluation procedures a part ofroutine events


F2 Political Viability

Anticipate different positions of different interest groups

Avert or counteract attempts to bias or misapply thefindings

Foster cooperation

Involve stakeholders throughout the evaluation

Agree on editorial & dissemination authority

Issue interim reports

Report divergent views

Report to right-to-know audiences

Employ a firm public contract

Terminate any corrupted evaluation


F3 Cost Effectiveness

Be efficient

Make use of in-kind services

Produce information worth the investment

Inform decisions

Foster program improvement

Provide accountability information

Generate new insights

Help spread effective practices

Minimize disruptions

Minimize time demands on program personnel


Scoring the Evaluation Model for FEASIBILITY

Add the following:



No. of Good (0-3) x 2 =

No. of Fair (0-3) x 1 =

Total score: =

Strength of the Model’s Provisions for FEASIBILITY



6 (50%) to 7: Good

3 (25%) to 5: Fair

0 (0%) to 2: Poor

(Total score) ÷ 12 = x 100 =

92 Stufflebeam

To meet the requirements for PROPRIETY, evaluations using the

evaluation model should:

P1 Service Orientation

Assess needs of the program’s customers

Assess program outcomes against targeted customers’assessed needs

Help assure that the full range of rightful programbeneficiaries are served

Promote excellent service

Make clear to stakeholders the evaluation’s serviceorientationIdentify program strengths to build on

Identify program weaknesses to correct

Give interim feedback for program improvement

Expose harmful practices

Inform all right-to-know audiences of the program’spositive & negative outcomes


P2 Formal Agreements–Reach advance written

agreements on:

Evaluation purpose & questions

Audiences

Evaluation reports

Editing

Release of reports

Evaluation procedures & schedule

Confidentiality /anonymity of data

Evaluation staff

Metaevaluation

Evaluation resources


P3 Rights of Human Subjects

Make clear to stakeholders that the evaluation willrespect & protect the rights of human subjects

Clarify intended uses of the evaluation

Keep stakeholders informed

Follow due process

Uphold civil rights

Understand participant values

Respect diversity

Follow protocol

Honor confidentiality/anonymity agreements

Do no harm


P4 Human Interactions

Consistently relate to all stakeholders in a professionalmanner

Maintain effective communication with stakeholders

Follow the institution’s protocol

Minimize disruption

Honor participants’ privacy rights

Honor time commitments

Be alert to & address participants’ concerns about theevaluation

Be sensitive to participants’ diversity of values &cultural differences

Be even-handed in addressing different stakeholders

Do not ignore or help cover up any participant’sincompetence, unethical behavior, fraud, waste, orabuse


P5 Complete and Fair Assessment

Assess & report the program’s strengths

Assess & report the program’s weaknesses

Report on intended outcomes

Report on unintended outcomes

Give a thorough account of the evaluation’s process

As appropriate, show how the program’s strengthscould be used to overcome its weaknesses

Have the draft report reviewed

Appropriately address criticisms of the draft report

Acknowledge the final report’s limitations

Estimate & report the effects of the evaluation’slimitations on the overall judgment of the program


Appendix 93

P6 Disclosure of Findings

Define the right-to-know audiences

Establish a contractual basis for complying with right-to-know requirements

Inform the audiences of the evaluation’s purposes &projected reports

Report all findings in writing

Report relevant points of view of both supporters &critics of the program

Report balanced, informed conclusions &recommendations

Show the basis for the conclusions &recommendations

Disclose the evaluation’s limitations

In reporting, adhere strictly to a code of directness,openness, & completeness

Assure the reports reach their audiences


P7 Conflict of Interest

Identify potential conflicts of interest early in theevaluation

Provide written, contractual safeguards againstidentified conflicts of interest

Engage multiple evaluators

Maintain evaluation records for independent review

As appropriate, engage independent parties to assessthe evaluation for its susceptibility or corruption byconflicts of interestWhen appropriate, release evaluation procedures, data,& reports for public review

Contract with the funding authority rather than thefunded program

Have internal evaluators report directly to the chiefexecutive officer

Report equitably to all right-to-know audiences

Engage uniquely qualified persons to participate in theevaluation, even if they have a potential conflict ofinterest; but take steps to counteract the conflict


P8 Fiscal Responsibility

Specify & budget for expense items in advance

Keep the budget sufficiently flexible to permitappropriate reallocations to strengthen the evaluation

Obtain appropriate approval for needed budgetarymodifications

Assign responsibility for managing the evaluationfinances

Maintain accurate records of sources of funding &expenditures

Maintain adequate personnel records concerning joballocations & time spent on the job

Employ comparison shopping for evaluation materials

Employ comparison contract bidding

Be frugal in expending evaluation resources

As appropriate, include an expenditure summary aspart of the public evaluation report


Scoring the Evaluation Model for PROPRIETY

Add the following:



No. of Good (0-8) x 2 =

No. of Fair (0-8) x 1 =

Total score: =

Strength of the Model’s Provisions for PROPRIETY



16 (50%) to 21: Good

8 (25%) to 15: Fair

0 (0%) to 7: Poor

(Total score) ÷ 32 = x 100 =

94 Stufflebeam

To meet the requirements for ACCURACY, evaluations using the evaluation model should:

A1 Program Documentation

Collect descriptions of the intended program fromvarious written sources

Collect descriptions of the intended program from theclient & various stakeholders

Describe how the program was intended to function

Maintain records from various sources of how theprogram operated

As feasible, engage independent observers to describethe program’s actual operations

Describe how the program actually functioned

Analyze discrepancies between the various descriptionsof how the program was intended to function

Analyze discrepancies between how the program wasintended to operate & how it actually operated

Ask the client & various stakeholders to assess theaccuracy of recorded descriptions of both the intendedand the actual program

Produce a technical report that documents theprogram’s operations


A2 Context Analysis

Use multiple sources of information to describe theprogram’s context

Describe the context’s technical, social, political,organizational, & economic features

Maintain a log of unusual circumstances

Record instances in which individuals or groupsintentionally or otherwise interfered with the program

Record instances in which individuals or groupsintentionally or otherwise gave special assistance to theprogram

Analyze how the program’s context is similar to ordifferent from contexts where the program might beadopted

Report those contextual influences that appeared tosignificantly influence the program & that might be ofinterest to potential adopters

Estimate effects of context on program outcomes

Identify & describe any critical competitors to thisprogram that functioned at the same time & in theprogram’s environment

Describe how people in the program’s general areaperceived the program’s existence, importance, andquality


A3 Described Purposes and Procedures

At the evaluation’s outset, record the client’s purposesfor the evaluation

Monitor & describe stakeholders’ intended uses ofevaluation findings

Monitor & describe how the evaluation’s purposes staythe same or change over time

Identify & assess points of agreement & disagreementamong stakeholders regarding the evaluation’spurposes

As appropriate, update evaluation procedures toaccommodate changes in the evaluation’s purposes

Record the actual evaluation procedures, asimplemented

When interpreting findings, take into accountthe different stakeholders’ intended uses of theevaluation

When interpreting findings, take into accountthe extent to which the intended procedureswere effectively executed

Describe the evaluation’s purposes andprocedures in the summary & full-lengthevaluation reports

As feasible, engage independent evaluators tomonitor & evaluate the evaluation’s purposes &procedures


A4 Defensible Information Sources

Obtain information from a variety of sources

Use pertinent, previously collected information oncevalidated

As appropriate, employ a variety of data collectionmethods

Document & report information sources

Document, justify, & report the criteria &methods used to select information sources

For each source, define the population

For each population, as appropriate, define anyemployed sample

Document, justify, & report the means used toobtain information from each source

Include data collection instruments in a technicalappendix to the evaluation report

Document & report any biasing features in theobtained information


Appendix 95

A5 Valid Information

Focus the evaluation on key questions

As appropriate, employ multiple measures to addresseach question

Provide a detai led descr iption of thecons t ru c t s & behav io r s abou t wh ichinformation will be acquired

Assess & report what type of information eachemployed procedure acquires

Train & calibrate the data collectors

Document & report the data collection conditions &process

Document how information from each procedure wasscored, analyzed, & interpreted

Report & justify inferences singly & in combination

Assess & report the comprehensiveness ofthe information provided by the proceduresas a set in relation to the information neededto answer the set of evaluation questions

E s t a b l i s h m e a n i n g f u l c a t e g o r i e s o fin formation by iden t i fying regu lar &recurrent themes in information collectedusing qualitative assessment procedures


A6 Reliable Information

Identify and justify the type(s) & extent of reliabilityclaimed

For each employed data collection device, specify theunit of analysis

As feasible, choose measuring devices that in the pasthave shown acceptable levels of reliability for theirintended uses

In reporting reliability of an instrument, assess &report the factors that influenced the reliability,including the characteristics of the examinees, the datacollection conditions, & the evaluator’s biases

Check & report the consistency of scoring,categorization, & coding

Train & calibrate scorers & analysts to produceconsistent results

Pilot test new instruments in order to identify andcontrol sources of error

As appropriate, engage & check the consistencybetween multiple observers

Acknowledge reliability problems in the final report

Estimate & report the effects of unreliability in the dataon the overall judgment of the program


A7 Systematic Information

Establish protocols for quality control of the evaluationinformation

Train the evaluation staff to adhere to the dataprotocols

Systematically check the accuracy of scoring & coding

When feasible, use multiple evaluators & check theconsistency of their work

Verify data entry

Proofread & verify data tables generated fromcomputer output or other means

Systematize & control storage of the evaluationinformation

Define who will have access to the evaluationinformation

Strictly control access to the evaluation informationaccording to established protocols

Have data providers verify the data they submitted


A8 Analysis of Quantitative Information

Begin by conducting preliminary exploratory analysesto assure the data’s correctness & to gain a greaterunderstanding of the data

Choose procedures appropriate for the evaluationquestions and nature of the data

For each procedure specify how its key assumptionsare being met

Report limitations of each analytic procedure,including failure to meet assumptions

Employ multiple analytic procedures to check onconsistency & replicability of findings

Examine variability as well as central tendencies

Identify & examine outliers & verify their correctness

Identify & analyze statistical interactions

Assess statistical significance & practical significance

Use visual displays to clarify the presentation &interpretation of statistical results


96 Stufflebeam

A9 Analysis of Qualitative Information

Focus on key questions

Define the boundaries of information to be used

Obtain information keyed to the important evaluationquestions

Verify the accuracy of findings by obtainingconfirmatory evidence from multiple sources,including stakeholders

Choose analytic procedures & methods ofsummarization that are appropriate to the evaluationquestions & employed qualitative information

Derive a set of categories that is sufficient todocument, illuminate, & respond to the evaluationquestions

Test the derived categories for reliability & validity

Classify the obtained information into the validatedanalysis categories

Derive conclusions & recommendations &demonstrate their meaningfulness

Report limitations of the referenced information,analyses, & inferences


A10 Justified Conclusions

Focus conclusions directly on the evaluation questions

Accurately reflect the evaluation procedures &findings

Limit conclusions to the applicable time periods,contexts, purposes, & activities

Cite the information that supports each conclusion

Identify & report the program’s side effects

Report plausible alternative explanations of thefindings

Explain why rival explanations were rejected

Warn against making common misinterpretations

Obtain & address the results of a prerelease review ofthe draft evaluation report

Report the evaluation’s limitations


A11 Impartial Reporting

Engage the client to determine steps to ensure fair,impartial reports

Establish appropriate editorial authority

Determine right-to-know audiences

Establish & follow appropriate plans for releasingfindings to all right-to-know audiences

Safeguard reports from deliberate or inadvertentdistortions

Report perspectives of all stakeholder groups

Report alternative plausible conclusions

Obtain outside audits of reports

Describe steps taken to control bias

Participate in public presentations of the findings tohelp guard against & correct distortions by otherinterested parties


A12 Metaevaluation

Designate or define the standards to be used in judgingthe evaluation

Assign someone responsibility for documenting &assessing the evaluation process & products

Employ both formative & summative metaevaluation

Budget appropriately & sufficiently for conducting themetaevaluation

Record the full range of information needed to judgethe evaluation against the stipulated standards

As feasible, contract for an independentmetaevaluation

Determine & record which audiences will receive themetaevaluation report

Evaluate the instrumentation, data collection, datahandling, coding, & analysis against the relevantstandards

Evaluate the evaluation’s involvement of andcommunication of findings to stakeholders against therelevant standards

Maintain a record of all metaevaluation steps,information, & analyses


Scoring the Evaluation Model for ACCURACY

Add the following:



No. of Good (0-12) x 2 =

No. of Fair (0-12) x 1 =

Total score: =

Strength of the Model’s Provisions for ACCURACY



24 (50%) to 32: Good

12 (25%) to 23: Fair

0 (0%) to 11: Poor

(Total score) ÷ 48 = x 100 =