williams_operational definitions and assessment of higher-order cognitive constructs

7/28/2019 Williams_Operational Definitions and Assessment of Higher-Order Cognitive Constructs

http://slidepdf.com/reader/full/williamsoperational-definitions-and-assessment-of-higher-order-cognitive-constructs 1/17

Educational Psychology Review, Vol. 11, No. 4, 1999

Operational Definitions and Assessment of Higher-Order Cognitive Constructs

Robert L. Williams1

The educational psychology literature is replete with references to higher-order cognitive constructs, such as critical thinking an d creativity. Presum-ably, these constructs represent the primary processes an d outcomes thateducators should promote in students. For these constructs to be maximallyuseful, they must be transformed into specific operational definitions thatlead to reliable and va lid assessment strategies. Minim izing overlap in thedefinitions and assessment of different concepts would contribute to an or-derly accumulation of knowledge about the constructs in question. T he idealwould be for each construct to have a definition that is distinct from thedefinitions of other cognitive constructs. Although higher-order cognitiveconstructs have much surface appeal, their utility is tied to the clarity an dfidelity of their definitions and assessment procedures.

KEY WORDS: operational definitions; cognitive constructs; assessment; critical th inkin g; cre-ativity.

INTRODUCTION

In the introduction to Historical Foundations of Educational Psychol-

ogy, Glover and Ronning (1987) asserted that learning and cognition repre-

sent the primary areas remaining within pure educational psychology. Cog-nitive issues in educational psychology come in many forms, ranging frombasic subject-matter skills to metacognition. Knowledge, understanding,reasoning, memory, information processing, and higher-order thinking are

among the popular targets in the educational psychology literature.1Psychoeducational Studies, The University of Tennessee, Knoxville, Tennessee 37996-3400.Fax: 423-974-0135. e-mail: [email protected]

411

1040-726X/99/1200-0411$16.00/0 © 1999 Plenum Publishing Corporation



The advancement of good thinking ha s long been viewed as a major

priority of schooling. In the early part of this century, John Dewey(1916,

p. 179) affirmed that "all which the schools can or need do for pupils . . .

is to develop their ability to think." The contemporary literature is charac-terized by books and articles recommending th e teaching of higher order

thinking at all educational levels (Facione, 1986). In fact, one o f the leadingauthorities on critical thinking, Halpern (1998, p. 455), contends that whattoday's college students most need is "the ability to think clearly and the

disposition to engage in the effortful process of thinking." What Halpernenvisioned fo r college students is now being promoted even in the primary

grades (Hamers et al., 1998).Despite all the rhetoric celebrating higher-order thinking constructs,

their status in educational psychology remains unclear because of problemsrelated to definition and assessment. Such higher-order constructs as cre-

ativity and critical thinking have so much surface appeal that we embracethem even before clearly defining them. However, unless we can clearly and

specifically articulate what these constructs mean, how can we systematicallyassess an d promote them?

CONTRIBUTIONS FROM BEHAVIORAL RESEARCH

The area of psychology that has done the best in providing focusedoperational definitions fo r educational constructs is behavioral psychology.Research articles in this area describe ho w target variables ar e defined an dmeasured, with assessment usually conducted in the context of regular

classroom events. The articles also typically indicate the reliability of the

assessment procedures.The definitional and assessment practices of appliedbehavior analysts allow other researchers to replicate and expand on theirwork with a high degree of precision.

Because of their emphasis on operational definitions an d reliable as -sessment procedures, th e mainstream behavioral journals in education werefirst examined in the preparation of this paper. The three major behavioraljournals that publish research having classroom applications, Journal ofApplied Behavior Analysis, Behavior Modification, an d Education andTreatment of Children, were searched issue by issue, from their inception,

fo r articles dealing with critical thinking, creativity, higher-order thinking,cognitive problem solving, decision making, an d kindred variables. It wasassumed that studies on higher-order cognitive constructs inpredominantly

behavioral journals might point the way to operational definitions an dreliable assessment of such constructs.

The higher-order cognitive construct that has attracted the most atten-

412 Williams



tion in these journals is creativity. Seven articles dealing explicitly with in -

class assessment and prom otion of creative responses were identified (Bakerand Winston, 1985; Campbell an d Willis, 1978,1979; Glover, 1979; Glover

and Gary, 1976; Goetz, 1982; Maloney an d Hop kins, 1973). Go etz's artic leis particularly i l luminating in tha t it provides an overview of definitions fo r

creative products. The primary theme embedded within th e definitions

reviewed by Goetz is novelty.Typically, definitions of creative responses reflect some combination

of the c reati vity factors first proposed by Guilford (1950) and later adapted

by Torrance (1966): fluency, flexibility, originality, and elaboration. In re-search on creative responding in the classroom, Gu ilford 's and Torrance 's

factors have been applied to samples of students' schoolwork (e.g., stories,ideas, drawings). For example, Baker an d Winston (1985) assessed diversityin children's drawings and stories by computing th e number of different ac -

tions, people, and typ es of objects subsumed in a particular draw ing or story.Despite the significant contributions of these behavioral studies, the

publication dates of the articles reveal no creativity studies in the last 10years in the identified journals. Although nev er a major emphasis in thesejournals, creative behavior seems to have largely disappeared as a researchinterest in the selected journals. A possible explanation is that behavioral

researchers have become disillusion ed wi th the predictive utility of the

dimensions stemming from Guilford's and Torrance's models of creativity .Baer (1991, 1993a, b) contends that creativity in adul t work is often task-specific and not highly related to the Guilfo rd and Torrance dim ensions

of divergent thinking.

The th ree beh avioral journ als surveyed have app aren tly published noresearch on higher-order thin kin g constructs other than cre ativity. The

kindred construct of equivalence relationships—the perception of equ alityinheren t in Piaget 's conservation tasks— has been e xam ined behav iorally

in a few other journals (primarily Psychological Record an d Journal of the

Experimental Analysis of Behavior) but not in the specific j ou rna l s identified

earlier. Because equivalence relationships are typically studied unde r quasi-

laboratory conditions, this research (e.g., Goggin et al., 1978) m akes m inim al

contribution to our unders tanding of higher-order thinking in the classroom.Thus, it appears that th e prim ary behavioral journals have provided limited

help in articulating operational definitions and reliable assessment proce-dures for most higher-order cognitive constructs.

DEFINITIONS OF COGNITIVE CONSTRUCTS

By their very nature, cognitive constructs refer to processes that occur

at a covert, men talistic level. The covert n atu re of cognition presen ts a

Definition and Assessment of Cognitive Constructs 413



major impediment to the scientific study of this domain. Unless constructs

can be defined in terms of observable phenomena, they cannot be reliablyassessed an d scientifically studied (Sager, 1976). What wouldbe ideal in the

scientific study of higher-order cognition is first to develop solid operational

definitions and then to work toward standardization of these definitions.

Operational Definitions

Thinking is ultimately a private matter; no one else may be aware ofour thoughts. That is acceptable so long as we are concerned only about

our own thinking, bu t dealing with student cognition compels us to know

about thinking to which we have no introspective window. Consequently,

we must have observable evidence of student engagement in particular

kinds of thinking. That need brings us to the notion of an operationaldefinition, which requires that student cognition be translated into overt

actions or products—often in the form of language-based behavior. AsHalpern (1998, p. 454) has asserted with respect to metacognitive monitor-

ing, al l higher-order cognitive skills "need to be made explicit an d public

so that they can be examined an d feedback can be given about ho w wellthey are functioning."

An operational definition is not intended to be the ultimate statement

of what a construct represents. Instead, its purpose is to increase the preci-

sion of assessment an d communication related to that construct. If a re-

searcher wants to test the replicability of another's findings, using that

person's precise operational definition of the construct would be important.

However, in many cases, researchers want to improve on one another's

operational definitions of a construct. Operational definitions can be refinedin at least tw o ways: (a ) some aspect of the definition can be made more

explicit, and (b) elements can be added to or deleted from the definition.

Making the Definition More Explicit. To understand better the firsttype of refinement, consider the way that creativity is often defined. Most

operational definitions include some reference to unusual, novel, unique,

or original responses (e.g., Campbell and Willis, 1978; Ford and Harris,

1992; Glover, 1979; Goetz, 1982). Thus, a typical researcher in this area

might define creative thinking as an unusual response to a given stimulus

situation (Maloney an d Hopkins, 1973). Fearing that this definition is notsufficiently precise, another researcher might define creative thinking as aresponse that is different from a given student's previous responses to a

designated task. This is one of the most common ways that creative re-

sponding has been operationalized in the behavioral literature (Goetz).

But even this adaptation is not without problems. It leaves open the possibil-

414 Williams



ity that a response judged to be creative for one student might be judged

as commonplace fo r another.One way to avoid individualistic definitions of creative responding is

to identify a reference group whose collective responses will provide th e

framework fo r judging the creativity of each indiv idua l's responses. Forexample, Campbell an d Willis (1979) used the responses of other students(n = 26) in one fifth-grade class as their point of comparison for assessingeach student's originality. The stude nt task w as to w rite a short essay ona "just-suppose topic" each day. These essays were scored on the basis ofTorrance's (1966) four dimensions of creative responding. The emphasishere is the researchers' description of originality as "the improbabili ty ofa response's occurring within the responses of fifth graders" (p. 8). Theresearchers and the teacher ma de the judg me nt as to wha t they thou ghtwas original within that group. A limitation of this arrangement is that the

criterion for originality is narrowly based on the responses of the specificgroup studied.

Because th e Campbell an d Willis approach makes a heavy demandon teacher judgment an d leaves open th e possibility tha t jud gme nts oforiginality would vary from group to group, other researchers may try toestablish a more precise, yet broader, f rame of reference fo r defining cre-ative responding. These researchers m ay specify that th e response be ident i -fied as statistically unlikely, based on prior tabulation of responses given

to the stimulus condition. This approach is often reflected in comparisonsof a studen t's responses to scoring norms in a particular testing m anu al,such as The Torrance Tests of Creative Thinking (Torrance, 1974a). Forexample, responses receiving the highest originality rating on the Torrancetest are those given by less tha n 2% of the respondents in the standardization

sample. Using the scoring norms as the po int of comparison removes someof the subjectivity in judging what is creative. The findings may also beapplicable to a much larger reference group because of the size of thesample used in norming the instrument. This kind of operational definitionha s occasionally been used in behavioral studies of creativity (e.g., Campbelland Willis, 1978; Glover et al., 1980).

Adding or Deleting Elements in a Definition. Another way t ha t an

operational definition can be refined is by adding or deleting dimensionsof th e definition. Ordinarily, on e would want an operational definition toreflect all the elements subsumed in the underlying conceptual definition

of a particular construct. Consider, for example, a definition of criticalthinking generated by 46 reputed experts in philosophy an d education.Using the Delphi M ethod, this interactive panel of experts defined criticalthinking as "purposeful, self-regulatory jud gm ent which results in interpre-tation, analysis, evaluation, an d inference, as well as explanation of the




evidential, conceptual, methodological, criteriological, or contextual consid-

eration upon which tha t judgment i s based" (Facione, 1990, p. 3).Although this definition may capture several important elements ofcritical thinking, it would be a daunting task to develop an assessmentscheme that clearly matches this definition. For this definition to be useful,

"purposeful , self-regulatory judgment" must first be operationalized. Howdo students behave when engaging in this kind of judgment? Once thisdescription has been provided, the next task is to operationally link eachof the ancillary themes (such as interpretation, analysis, an d evaluation)to the central theme.

Each elem ent in an operational definition may present its own prob-

lems of interpretation. For example, appropriateness of response is often

combined with originality of response in definitions of creative responding(Amabile, 1987; Ochse, 1990; Sternberg and Lubart, 1996). In using this

definition, the researcher or practitioner may have more difficulty judgingwhat is appropriate than judging wha t is original. Would "appropriate"

require that th e response be reflected in a problem solution? Or could"appropriate" simply require tha t a response lead to furth er inquiry ?

For example, a group of middle schoolers was brainstorming on howto make school more pleasant. They offered some rather wild suggestions,

at least judg ed by conventional adult standards, but those ideas all seemedto build momentum toward more discussion. Piping in acid rock music,writing graffiti on the classroom walls, displaying nude paintings in the

hallways, starting the schoolday midm orning, letting students choose wha tto study, permitting students to assign their own grades, replacing deskswith mats fo r lounging on the floor, painting their faces differently eachday, and giving award s for the m ost unus ual stude nt dress were among the

early suggestions that appeared to arouse m uch enthusiasm . Eventually , acompromise was reached between students and faculty that permitted stu-dents to use headsets fo r listening to mutually agreeable music w hile doingindividual assignments and to bring banners and paintings for possible

display in their classrooms. None of the earlier suggestions were judgedacceptable by their teachers, but those propositions did gene rate discussionthat eventually led to some workable plans. So could those earlier sugges-

tions be considered appropriate contributions to the idea-generationprocess?

In evalua ting the elem ents of an operational definition, there is some-times tension between an element 's man ageabil i ty (the ease with which itcan be assessed) and its conceptual fidelity (the extent to which it reflectsthe essence of the concept represented). Many operational definitions of

creativity, for example, include the notion of fluency (i.e., the num ber ofresponses to a comm on stim ulus). Because one can easily coun t th e n u m b e r

416 Williams




of ideas generated, fluency would appear to be the most manageable dimen-

sion of creativity.Despite the manageability of fluency as a criterion of creative re-

sponding, other dimensions may come closer to capturing the essence ofthis construct. A large number of ideas for dealing with a problem might

be of the same general type. Fluency makes no demand on the diversityor nove lty of ideas except that they not be duplicates. A lthoug h a fluencydefinition of creative responding w ould make for manageable assessment,would the data reflect creativity as adequately as an assessment based onan originality-appropriateness defin ition?

In attempting to blend manageability and fidelity, one might make acase for including only those elements that reflect the heart of a construct.In fact, the scope of an operational definition might be limited to the

primary theme embedded in a construct. For example, Facione (1986, p.

222) ha s proposed that critical thinkin g be defined as "the ability to properlyconstruct an d evaluate arguments." To add clarity to his definition, Facionedescribes an argument as "a set of statements, one of which (the conclusion)is presented by a person as being implied or justified by the others (thepremises)" (p. 222). A person w ould be judge d pro ficient in critical thinking

if he could provide solid support for his arguments and could accuratelyjudge whether others' arguments are well supported. Presum ably, Facione's

definition of critical think ing would lead to m ore manag eable assessmentthan would th e expert panel's multidimen sional definition referred to ear-

lier. Nevertheless, th e question remains as to whether Facione's focuseddefinition adequately reflects th e complexity of critical thin king .

Although desirable, developing operational definitions for higher order

cognitive constructs is not a problem-free venture. Operational definitions

seldom provide an exhaustive representation of a construct. The insistenceon operationism could even reduce higher-order cognitive constructs tosuperficial representation (Wallach, 1971). Sim ilarly, operationa l definitionsmay convey the illusion of precision, when they are actually open to multipleinterpretations. Just because a concept is defined operationally does no tmean that all researchers will necessarily agree as to the meaning of thatdefinition or how best to assess the construct. The generalizability of opera-tional definitions of higher-order cognitive constructs ha s also been calledinto question (Baer, 1991). Such definitions may become so task-specific

that they indicate little about the overall cognitive skills of students.

Standardized Definitions

Despite the provisional nature of most operational definitions, theideal might be to work toward standardized operational definitions. Stan-



THE ROLE OF ASSESSMENT

The primary purpose of operational definitions is to promote precise

assessment of targeted constructs. In turn, the primary purpose of assess-ment is to determine the impact of educational interventions on thoseconstructs. Without solid assessment procedures, educators cannot deter-mine whether interventions are accomplishing their intended purpose. Ap-

parently many educators do not feel strongly about this point. They invest

precious resources in untested instructional methods, which often seemgrounded more in political posturing an d assertive marketing than in soundassessment. My contention is that intervention should no t stray fa r from

assessment. No intervention should be attempted withouta solid assessmentplan, and no school system should bu y into an instructional system notaccompanied by extensive assessment data.

In addition to using assessment data to evaluate programs (i.e., deter-mine whether programs improve student performance in a cost-effectivefashion), teachers must regularly assess the academic development of eachstudent. For example, teachers interested in promoting critical-thinkingskills must have first a manageable operational definition of this constructan d then in-class waysof assessing behaviors consistent with that definition.

Without such individual assessment data, neither the student, the teacher,nor the child's parents ca n determine if the student is actually learning toth ink critically.

For assessment data to fulfill their promise in research and practice,the assessment procedures and outcomes must be both reproducible and

predictive of other important constructs. The first issue relates mainly to

the reliability of the assessment measures and the second to the validity

of those measures. The reliability and validity measures both reflect on theadequacy of the underlying operational definitions. Reliability is more atest of the precision of an operational definition, whereas validity is morea test of the importance of that definition (i.e., the extent to which assess-ment data represent a construct linked to other important variables). To

maximize their utility, assessment measures must be (a) reliable and validand (b) applicable in the daily assessment of higher-order thinking.

Replicability of Assessment Outcomes

It does not matter whether assessment data are quantitative or qualita-tive; what matters iswhether others can use the same assessment proceduresto reach a similar conclusion about the status of the variable in question.




Without that replicability, we can derive only limited benefit from others'

research an d practice.Replicable assessment outcomes are based on reliable assessment data.

Reliability has two meanings that are relevant in this context: reliabilityacross time an d reliab ility across raters. If a teacher rates the same studentproduct on two separate occasions, th e teacher's ratings should be similar.

The clearer th e teacher's assessment criteria, th e more consistent th e assess-

ment conclusions ar e likely to be across t ime. Un fortun ately, temporalstability in ratings of higher order thinking is seldom assessed. Test-retest

reliability is reported for some of the more formal measures of higher-order thinking but is not f requently addressed in classroom assessmentof thinking.

Although creativity has received more attention in the behavioralliterature than h ave other higher-order constructs, stability in classroom

assessment of creativity is rarely reported. An exception is Baer's (1993b)report of ratings of overall creativity reflected in children's poems andstories, with stability in the ratings being in the neighborhood of .5 across

a period of several mon ths. That degree of stability would be consideredmarginal for long-term test-retest reliability.

When educators us e forma l tests—such as the Torrance Tests of Cre-

ative Thinking (Torrance, 1966) and the Watson-Glaser Critical ThinkingAppraisal (Watson an d Glaser, 1980)—to assess higher-order constructs,th e test-retest reliability of these measures becomes c rucial. Tre ffing er's(1985) analysis of the test-retest reliability of the Torrance tests points to

wide variation in the reliability coefficients, ranging from .50 to .93. Mosttest-retest measures for the Torrance tests, however, are in the .60 to .70range—a borderline level of stability. Cooper's (1991) analysis of other

popular measures of creativity points to an equally marginal test-retestreliability for most of them.

If a particu lar assessment approach is to yield data useful to a varietyof educators, the assessment conclusions must also be consistent acrossdifferent raters w ho assess the targe ted processes or produ cts. It is especiallyimportant to establish interrater consistency in the research domain.Agreem ent of .8 and above is necessary to have confidence in an assessmentconclusion (Kazdin, 1994). Without this level of interrater consistency, aproduct judged as reflective of higher-order thinking by one investigatormay be viewed as pedestrian by another.

The interj udg e agreem ent reported for several creativity measures hasbeen quite good. Agreement between teachers and trained observers who

used th e Torrance (1974a) scoring guide fo r judging responses to theTorrance tests has typically been in the mid to high .90s (Torrance, 1974b).Interrater agreement on type and amount of creativeness is usually high

420 Williams



when consistent and explicit rating criteria ar e used in the rating scheme.

Fo r example, agreement between raters using Torrance's dimensions tojudge written essays was generally in the range of the mid .80s to the mid.90s (Campbell and Willis, 1978, 1979). Th e explicit rating system used byBaker an d Winston (1985) to assess diversity in children's drawings an dstories yielded extremely high interrater agreement: .99 for drawings an d.98 for stories. Ratings of overall originality, without the use of explicitcriteria, tend to be lower, bu t still acceptable. Fo r example, Baker an dWinston (1985) found that teacher ratings of overall creativeness of chil-

dren's stories produced an interrater agreement of .79.Judges' agreement on subjective rank ings of creativeness has not been

consistently high. Maloney an d Hopkins (1973) obtained an agreement ofonly .46 for the rankings of creativeness in different essays from the samechild. On the other hand, Baer (1993b) found that agreement ranged from

.79 to .88 among judges who ranked the creativeness of poems an d storiesby using their ow n inform al definitions of creativity.

Interrate r agreem ents in ratings of critical-thinking constructs approxi-mate th e agreements fo r creativity. Brabeck (1983) had judges rate th elevel of reflective judg m ent indicated in sem istructured interviews dealingwith four conflictual dilemmas. At the lowest of three levels of reflective

judgment, the individual's thinking was judged as essentially illogical(i.e., presenting evidence that contradicts one's ow n stated views); atthe middle levels of reflective judgment, the individual's thinkin g might

be considered nonlogical (i.e., ignoring logic in reaching conclusions);and at the highest levels of reflective judg m ent, the individual's thinkingwould be viewed as highly logical (i.e., using logic as a primary meansof reaching a conclusion). Brabeck found that interjudge agreement

regarding overall level of reflective judgment manifested in the inter-views to be .77, with agreement ranging from .53 to .62 across the four

dilemmas used in the study.One of the more promising essay tests of critical thinking , The Ennis-

Weir Critical Thinking Essay Test (Ennis and Weir, 1985), has yieldedexcellent inte rrate r reliability estimates (.82 to .86). This level of agreementmay be attributable partly to the highly structured nature of the test an dthe scoring instructions. The test involves appraising the logic of an argu-men tative letter to the editor, w ith most paragraph s in the letter reflectingerrors in thinking. Students ar e instructed to evaluate th e thinking in eachparagraph as well as in the passage as a whole. A scoring guide indicatesthe number of points that raters are to assign to the different levels of logicreflected in the student's appraisal. For educators inclined to use an essayformat to assess critical thinking , th e Ennis-Weir represents an option thathas excellent potential for high interrater agreement.




Validity of Assessment Measures

Assessment procedures should also be judged in terms of their validity.Researchers dealing w ith higher-order thinking should be especially con-

cerned about two kinds of validity: construct an d predictive. Cons tructvalidity relates, in part, to how well an assessment procedure representsits foundational construct. Predictive validity is a determination of the

linkage between measures of higher-order thinking an d performance onother important variables.

Construct validity assessment begins w ith an examination of the fidelityof the operational definition on which the assessment strategy is based.Does the operational definition truly represent the underly ing concept to

which it is linked? One way to evaluate this match is to ask individualsassumed to be knowledgeable of the construct to rate the fit between the

operational definition and the foundational concept. If high ratings areachieved at this stage, the judges can then be asked to rate how well the

assessment strategy matches the operational definition. The ideal wouldbe to achieve high fit ratings (a) between the operational definition and itsconceptual base and (b) between the operational definition and the selectedassessment procedure.

Predictive validity is a domain where th e assessment strategies fo r

higher-order cognitive constructs have probably been most deficient. Wh atdo these assessments predict that has value in other realms? Do assessmentsof higher-order thinking predict other types of accomplishments? Do stu-dents judged to have a high level of critical thinking, fo r example, performbetter on real-world problem-solving tasks than those judged to be lessskilled in critical thinking? Likewise, how well do the formal creativity

measures predict unusual and appropriate accomplishments in the worldoutside the classroom?

Although the predictive validity of creative measures has been exam-ined more extensively than that of most higher-order th inking constructs,several researchers (e.g., Anastasi, 1976; Mansfield et al., 1978) have ques-tioned how well creativity measures predict real-world accomplishments.For example, Mansfield and Busse (1981) report that scores on creativitytests have been minim ally correlated with creative accomplishments in

science. In contrast, Treffinger (1985) concluded that Torrance scores sig-nificantly predicted creative achievements in several areas, especially writ-

ing, science, medicine, and leadership. A limitation of these predictivevalidity measures is that assessment of adult accomplishments is often basedon self-report.

Formal measures of critical thin king have p resented a mixed pictureof predictive validity. Scores on The California Critical Thinking Skills Test

4 2 2 Williams



subject areas. The higher-order thinking objectives could be grouped to-

gether or interspersed among more basic subject-matter objectives. Subject-matter mastery might provide foundational information fo r higher order

thinking. Whatever the case, students would be pre- and posttested on the

specific higher-order thinkingskills until mastery of each skill is indicated.Authentic assessment typically involves an examination of real-world

activities an d products created by the student. Usually, this form of assess-ment is based on self-selected an d self-directed student projects that haverelevance beyond the classroom. These projects should be linked tohigher-order thinking in a variety of ways. Most basically, higher-order thinking

should be evidenced in the student's work. The quality of the student's

work should be enhanced by the use of higher-order thinking in planningan d implementing th e project. Plus, the experience of doing th e project

should sharpen one's higher-order thinking skills. Thus, higher-order th ink-

in g could be construed as both a cause and an effect of high-quality authen-tic work.

If th e linkage between authentic work and higher-order thinking is tobe assessed in any of the above ways, the teacher and the student mustagree on an operational definition of higher-order thinking o serve as theirframe of reference. For the assessment to be credible, it must also meetth e standards of rater reliability referred to earlier (i.e., different ratingsof the student's higher-order thinking should be consistent) and be pre-dictive of other important student accomplishments. The standards of relia-bility an d validity are no less important in authentic assessment than inmore conventional assessment approaches (Worthen, 1993). The extent to

which authentic assessment currently meets those standards is questionable(Halpern, 1993).

CONCLUDING OBSERVATIONS

For higher-order cognitive constructs to be useful, they must be opera-

tionally defined an d assessed in reliable an d valid ways. Unless these re -quirements are met, educators cannot determine whether their selectedinterventions actually promote higher-order thinking. Although behavioralresearch has done well in articulating operational definitions and systematicassessment procedures, they have infrequently applied these approachesto higher-order cognitive constructs.

To be most useful, operational definitions must both capture the es-

sence of the concepts they represent and be logistically manageable. Highlyfocused definitions tend to be more manageable than multifaceted defini-

4 2 4 Williams



Campbell, J. A., and Willis, J. (1979). A behavioral program to teach creative writing in theregular classroom. Educ. Treat. Children 2: 5-15.

Cooper, E. (1991). A critique of six measures for assessing creativity. J. Great. Behau. 25:194-204.

Dewey, J. (1916). Democracy and Education, Macmillan, New York.Ennis, R. H. (1993). Critical th inking assessment. Theory Pract, 32: 179-186.Ennis, R. H., and Weir, E. (1985). Th e Ennis-Weir Critical Thinking Essay Test, Midwest,

Pacific Grove, CA.Facione, P. A. (1986). Critical thinking assessment. Theory Practice 32: 179-186.Facione, P. A. (1990). Critical Thinking: A Statement of Expert Consensus fo r Purposes

of Educational Assessment an d Instruction. Research Findings an d Recommendations,American Philosophical Association (ERIC Document Reproduction Service No. ED315 423), Newark, DE.

Facione, P. A., and Facione, N. C. (1994). Th e California Critical Thinking Skills Test, CaliforniaAcademic Press. Millbrae.

Ford, D. V., and Harris, J. J. (1992). The elusive definition of creativity. J. Creat. Behav. 26:

186-198.Glover, J. A. (1979). The effectiveness of reinforcement and practice for enhancing the creative

writing of elementary school children, J. Appl. Behav. Anal. 12: 487.

Glover, J. A., and Gary, A. L. (1976). Procedures to increase some aspects of creativity. J.Appl. Behav. Anal. 9: 79-84.

Glover, J., and Ronning, R. (1987). Introduction. In Glover, J., and Ronning, R. (eds.),Historical Foundations of Educational Psychology, Plenum Press, N ew York, pp . 3-15.

Glover, J. A., Zimmer, J. W., and Burning, R. H. (1980). Information processing approachesamong creative students. The J. Psychol. 105: 93-97.

Goetz, E. A. (1982). A review of functional analysesof preschool children's creative behaviors.Educ. Treat. Children 5: 157-177.

Goggin, H. B., Landers, W. F., and Bittner, A. C. (1978). Concepts of equivalence relationsan d conservation of liquid in preschool children. Psychol. Rep. 43 : 991-1001.

Guilford, J. P. (1950). Creativity. Am. Psychol. 5:444-454.Halpern, D. F. (1993). Assessing the effectiveness of critical th inking instruction. J. Gen.

Educ. 42: 238-254.Halpern, D. F. (1998). Teaching critical thinking for transfer across domains. Am. Psychol.

53: 449-455.Hamers, J. H. M., de Koning, E., and Sijtsma, K. (1998). Inductive reasoning in third grade:

Intervention promises and constraints. Contemp. Educ. Psychol. 23: 132-148.

Kazdin, A. E. (1994). Behavior Modification in Applied Settings, Brooks/Cole, PacificGrove, CA.

Kiah, C. J. (1993). A model for assessing critical th inking skills. Paper presented at the meetingof the Annual Student Assessment Conference of the Virginia Assessment Group and

the State Council of Higher Education fo r Virginia, Richmond, Nov.Maloney, K. B., and Hopkins, B. L. (1973). Th e modification of sentence structure and its

relationship to subjective judgements of creativity in writing. J. Appl. Behav. Anal. 6:425-433.

Mansfield, R. S ., and Busse, T. V. (1981). Th e Psychology of Creativity an d Discovery, Nelson-Hall, Chicago.

Mansfield, R. S., Busse, T. V., and Krepelka, E. J. (1978). The effectiveness of creativitytraining. Rev. Educ. Res. 48: 517-536.

Ochse, R. (1990). Before the Gales of Excellence. Th e Determination of Creative Genius,Cambridge University Press, Cambridge.

Sager, E. (1976). Operational definition. J. Bus. Commun. 14(1): 23-26.Sternberg, R. J., and Lubart, T. I. (1996). Investing in creativity. Am. Psychol. 51: 677-688.Torrance, E. P. (1966). Torrance Tests of Creative Thinking: Norms—Technical Manual,

Personnel, Lexington, MA.Torrance, E. P. (1974a). Torrance Tests of Creative Thinking: Directions Manual an d Scoring

Guide, Scholastic Testing Service, Bensenville, IL.

426 Williams



Torrance, E. P. (1974b). Torrance Tests of Creative Thinking. Norms—echnical Manual, Scho-

lastic Testing Service, Bensenville, IL.

Treffinger, D. J. (1985). Review of the Torrance tests of creative th inking. In Mitchell, J. V.Jr. (ed.), The Ninth Mental Measurements Yearbook, University of Nebraska, Lincoln,

pp. 1632-1634.

Walberg, H., and Haertel, G. (1992). Educational psychology's first century.J. Educ. Psychol.

84: 6-19.

Wallach, L. (1971). Implications of recent work in philosophy of science for the role of

operational definition in psychology. Psychol. Rep. 28: 583-608.Watson, G., and Glaser, E. (1980). Watson-Glaser Critical Thinking Appraisal, Psychological

Corporation, San Antonio, TX.

Worthen, B. R. (1993). Critical issues that will determine the fu ture of alternative assessment.

Phi Delta Kappan 74: 444-454.


williams_operational definitions and assessment of higher-order cognitive constructs

Documents