alexander beaujean william shiu · 2008 teaching colloquy, department of religion definitions...

28
THEORY OF MEASUREMENT: Everything You Wanted To Know About Classroom Assessment But Were Afraid To Ask Alexander Beaujean William Shiu Baylor Psychometric Laboratory http://homepages.baylor.edu/psychometric_lab 2008 Teaching Colloquy Department of Religion 2008 Teaching Colloquy, Department of Religion TABLE OF CONTENTS Definitions Test Design Test Score Properties: Reliability and Validity Cognitive Processes Some Item Types Developing the Test Take Home Message

Upload: others

Post on 25-Jan-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

THEORY OF MEASUREMENT:Everything You Wanted To Know About Classroom Assessment But Were Afraid To Ask

Alexander BeaujeanWilliam ShiuBaylor Psychometric Laboratoryhttp://homepages.baylor.edu/psychometric_lab

2008 Teaching Colloquy Department of Religion

2008 Teaching Colloquy, Department of Religion

TABLE OF CONTENTSDefinitionsTest DesignTest Score Properties: Reliability and ValidityCognitive ProcessesSome Item TypesDeveloping the TestTake Home Message

Page 2: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

Definitions

2008 Teaching Colloquy Department of Religion

2008 Teaching Colloquy, Department of Religion

DEFINITIONSTest (noun)

Etymology:Middle English, vessel in which metals were assayed [analyzed], potsherd, from Anglo-French test, tees pot, Latin testum earthen vessel; akin to Latin testa earthen pot, shell

Definition: (1): a procedure, reaction, or reagent used to identify or characterize a substance or constituent (2): something (as a series of questions or exercises) for measuring the skill, knowledge, intelligence, capacities, or aptitudes of an individual or group

(Test. (2008). In Merriam-Webster Online Dictionary.Retrieved September 26, 2008, from http://www.merriam-webster.com/dictionary/test)

Page 3: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

DEFINITIONS(Achievement) Test:

A collection of items or tasks used to measure a underlying construct of interest, the results (i.e., test scores) of which allows for decisions based on the construct's level

2008 Teaching Colloquy, Department of Religion

DEFINITIONSItem:

Genesis is the first book of the Bible. T/F

Item Stem Item Response

Page 4: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

DEFINITIONSConstruct:

A measure of some trait/attribute/quality that is not “operationally defined.”A latent entity whose level and relationship with other objects (either latent or manifest) can only be inferred

Latent:Extant, but not perceivable by bodily senses

Cronbach & Meehl (1955)

Test Design

2008 Teaching Colloquy Department of Religion

Page 5: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

TEST DESIGNTest Philosophy:

What will/will not your test measure ?What construct are you hoping to makes inferences?

What is required for your test to measure that construct?

2008 Teaching Colloquy, Department of Religion

TEST DESIGN

Person Ability/Trait(Construct)

Cognitive Process(es)Item Response

Context

Page 6: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

TEST DESIGNTest Purpose:

What information do you want to obtain from this test?

…and…What decision(s) do you need to make from this information?

2008 Teaching Colloquy, Department of Religion

TEST DESIGNExaminee Population:

For whom is this test intended?

Page 7: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

TEST DESIGNConstraints

Time to take testPlatform

Paper vs. ComputerLocation

security/standardizationAdministration

Entire Group vs. Subgroups vs. Individual

Test Score Properties: Reliability and Validity

2008 Teaching Colloquy Department of Religion

Page 8: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

RELIABILITYReliability

Do the test scores measure its construct consistently?

Contributors to inconsistencyRandomness (vary from examinee to examinee)Systematic (consistent for all examinees)

Effects can be innocuous or severe, depending on the: purpose of the test

2008 Teaching Colloquy, Department of Religion

RELIABILITYEstimation

0 < reliability < 1Published: .80-.95Classroom: .50

MethodsCorrelation between 2 administrations (of same test)Correlation among test items

Internal Consistency (α)See Frisbee (1988)

Page 9: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

RELIABILITYInfluences on Reliability Estimates

LengthDimensionality

How many constructs is the test measuring?Item DifficultyItem Discrimination

How likely is a response in examinees “high” on the construct vs. examinees “low” on the construct?

Heterogeneity of the examineesStudent Factors (motivation, “testwiseness”)Time AllotmentSecurity

2008 Teaching Colloquy, Department of Religion

VALIDITYValidity

Are the test scores measuring the intended construct?An argument, for which you need multiple stands of evidence, e.g.:

Do they appear to measure what its intended construct?Do experts think they are measuring its intended construct?Do they have relationships with other measures that……

Measure the same thingsMeasure different things

Do they predict outcomes of interest?Do the test’s items have a basis in the curriculum?

See AERA/APA/NCME (1999)

Page 10: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

Cognitive Processes

2008 Teaching Colloquy Department of Religion

2008 Teaching Colloquy, Department of Religion

ASSESSMENT PROCESSGood Classroom Assessments Flow From the Class’s Instructional Objectives/Learning Outcomes………And Allow Inferences About the Construct of Interest

Page 11: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

“Learning Objectives”

CognitiveProcesses

Item Responses Test

Scores

Construct Inference

ASSESSMENT PROCESS

2008 Teaching Colloquy, Department of Religion

KNOWLEDGEKNOWLEDGE

BLOOM’S TAXONOMY

COMPREHENSIONCOMPREHENSIONAPPLICATIONAPPLICATION

ANALYSISANALYSISSYNTHESISSYNTHESISEVALUATIONEVALUATION

Bloom (1956)

Developm

ent/D

ifficulty

Page 12: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

BLOOM’S TAXONOMYLevel 1-Knowledge

Recall informationSome item stems: recall, recite, list, label, define, identify, quote, who, what, when, where, telllist, describe, relate, locate, write, find, state, name

Examples:Define consubstantiation.Who was Constantine?When were the first Crusades?List the five points of Calvinism.

2008 Teaching Colloquy, Department of Religion

BLOOM’S TAXONOMYLevel 2-Comprehension

Understand informationSome item stems: demonstrate, explain, describe, interpret, summarize, cause-effect, explaininterpret, outline, discuss, distinguish, restate, translate, describe

Examples: Why did Paul to write to the church at Philippi?

(a) Address the issue of rivals, and uphold his apostleship(b) To preserve the view of justification by faith(c) To emphasize that under salvation by Christ, Jews and Gentiles are brought together

Page 13: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

BLOOM’S TAXONOMYLevel 3-Application

Use informationSome item stems: demonstrate, apply, calculate, illustrate, show, construct, interview, solve, showuse, illustrate, construct, complete, examine, classify

Example: Translate the following into English:Αειδε Θεά ούλομένην μήνιν Αχιλήος

2008 Teaching Colloquy, Department of Religion

BLOOM’S TAXONOMYLevel 4- Analysis

Examine/break apart informationSome item stems: explain, connect, classify, categorize, compare, analyze, distinguish, examinecompare, contrast, investigate, categorize, explain, separate

Example:Compare Plato’s Republic with Lenin’s April ThesesWhich of the following names of God is most different from the other three:(a) JEHOVAH (b) ELOHIM (c) KURIOS (d) DESPOTES

Page 14: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

BLOOM’S TAXONOMYLevel 5- Synthesis

Create with informationSome item stems: combine, integrate, modify, hypothesize, abstract, create, design, inventcompose, predict, plan, imagine, propose, devise, formulate, conjecture

Example:Conjecture about Stephen’s response to Paul néeSaul, were they to have met after Paul’s Roman imprisonment.

2008 Teaching Colloquy, Department of Religion

BLOOM’S TAXONOMYLevel 6- Evaluation

Combine previous information skills to make a judgmentSome item stems: judge, select, choose, decidejustify, debate, verify, argue, recommend, assessdiscuss, rate, prioritize, determine

Example:Appraise Calvin’s Institutes in light of Oberman’s The Dawn of the Reformation.Who deserves precedence as the earliest Baptist church in North America: Roger Williams’ Providence church or John Clarke’s Newport church. Support your answer with scholarly sources.

Page 15: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

Some Item Types

2008 Teaching Colloquy Department of Religion

2008 Teaching Colloquy, Department of Religion

“Learning Objectives”

CognitiveProcesses

Item Responses Test

Scores

Construct Inference

ASSESSMENT PROCESS

Page 16: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #1TRUE-FALSE

Example: Augustine wrote The Confessions. T/F?

Pros:Convenient to writeEasy to scoreAllows flexibility in content coverage

Cons:Limited in cognitive processes coveredGuessingStudent response sets

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #1TRUE-FALSE

Best Practice:Make the statements as short and specific as possibleOne idea per statementAvoid trivial informationUse positive statements instead of negative, and always avoid double negative statementsDo not use opinion statements unless they are attributed to someoneLength should not differ between true/false statementsApproximately equal number of true/false statements

Page 17: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #2MULTIPLE CHOICE RESPONSE

Example: Who is famous for his 95 Theses?(a) Pope Leo X; (b) Martin Luther; (c) Johann Eck

Pros:“Best Answer” is more flexible than unequivocal true/falseAllows different cognitive processes in item responseGuessing less of a factor than T/FEasy to score

Cons:Large amount of time to write good distracters (wrong response alternatives)Guessing is possible

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #2MULTIPLE CHOICE RESPONSE

Best Practice:Item stems should: (a) have autonomous meaning , (b) present as much of the item as possible, and (c) have no irrelevant materialAvoid negative item stemsAll item responses should be grammatically compatible with their stem and of approximately equal lengthThere should be only one correct/best answerDistracters should be plausibleAvoid “clues” in item stemAvoid “none/all of the above” response options

Page 18: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #3 MATCHING

Example: Match the philosopher with their work:

ProsCan cover much material in content domainEasy to administer

ConsLimited in cognitive processes coveredDifficult to find homogenous material Difficult to develop good, plausible set of responses

A. Plato B. AristotleC. SocratesD. EuclidE. Zeno

_A__ 1. The Socratic Dialogues_C__ 2. None_B___3. Organon_D__ 4. The Elements_E___5. Reminiscences of Crates

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #3 MATCHING

Best Practice:Use homogenous materialHave an unequal numbers of stems and responsesPlace responses in numerical or alphabetical orderExplicitly state the basis for finding a matchPlace all items/responses on the same page

Page 19: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #4FILL IN THE BLANK

Example: Martin Buber edited the _______, a Zionist periodical. (Die Welt)Pros:

Very, very minimal guessingEasy to construct item stems

Cons:Must score by hand, and possibility of multiple correct responses.Assess only factual knowledge

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #4FILL IN THE BLANK

Best Practice:Make the item require a short, specific response Do not take items stems directly from textbooksQuestions are better than incomplete statementsRight or left justify the item response blanks, and make them the same size for all itemsOnly one blank per item

Page 20: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #5ASHORT RESPONSEExample: List the Beatitudes. Pros:

Can measure complex learning objectives and cognitive processesMinimizes cheating

Cons:Scoring can be subjectiveLimited sampling of content

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #5BESSAY

Example: Explain how Nietzsche's notion of the will to power is a response to Schopenhauer's will to live?(Your answer should be no longer than 2 pages, and should cite scholarly sources. It will be evaluated on your analysis of cited scholarship and the skill at which the essay is organized)

Pros:Can help students connect related ideas

Responding can (possibly) be a learning exercise itselfCan measure complex objectives & processes

Cons:Relies on both writing skills and content familiarityScoring is subjective less score reliabilityLimited sampling of content

Page 21: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #5SHORT ANSWER/ESSAY

Best Practice:Only use for learning outcomes that require non-objective assessmentMap the questions directly onto learning objectivesInform respondents on the grading criteria (e.g., content knowledge, thought organization)Make the examinee’s writing task explicitEstimate the time needed for an appropriate answerGive all examinees the same (or equivalent) questions. Avoid optional questions.Outline the expected answer in advance, and……Develop a rubric that allocates points in the desired manner before administering exam

Developing the Test

2008 Teaching Colloquy Department of Religion

Page 22: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

TEST SPECIFICATIONSContent Domain

How do topics within the content area relate to each other and how does knowledge in the area build?

Cognitive Skills/Process to Answer ItemDistribution of Content Areas and Cognitive Skills Demand throughout Test

2008 Teaching Colloquy, Department of Religion

TEST SPECIFICATIONSFor Classroom Evaluations, You Want Your Tests to Map onto Your Instructional Objectives/Learning Outcomes

Test ItemInstructional Objective/

Learning Outcomes

I. Demonstrates Skill in Critical Thinking

A. Comprehends Relevant Antecedents to Historical Events

Name Three Precipitating Events to the First Crusade

Page 23: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

TABLE OF SPECIFICATIONSInstructional

Objectives

Total Items

Content W

eight

10

15

10

35

Objective Weight

Major Content Area

Know

ledge

Com

prehension

Application

Analysis

Synthesis

1 2

4

1

7

3

2

6

2

3

2

7

Early Christian Writers in the West

2 3

Luther and the Beginning of the Reformation

3 2

Liberal Protestantism in Modernity 3 2

Total Items 8 7

2008 Teaching Colloquy, Department of Religion

TEST LENGTHNo “correct” lengthDepends on:

Administration timeExamineesScores neededContent coverageItem types usedDesired reliability

Page 24: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

TEST ORGANIZATIONDirections

Be explicitGive time allowed to take testGive directions for respondingGive point allocation (weighting) if different across items

Item GroupingIf there are different item types on the test:

Only if needed, group items by content areaPut same items types togetherWithin a type, place in order of simpler to more complex

2008 Teaching Colloquy, Department of Religion

TEST/ITEM SCORINGPoints to Consider

Allow for partial credit?Should content areas be weighted equally?Should learning objectives be weighted equally?If a test is made of multiple “subtests”, is each autonomous or graded as a whole?

e.g., if Jane missing all 10 of the “Liberal Protestantism in Modernity” questions, but gets the other 25 items correct, can she still “pass” the test?

Page 25: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

TEST/ITEM ANALYSISA Multiple Item Test Provides Much Information

Item difficulties (e.g., percent who “passed” the item)Does they differ by content area?Does they differ by instructional objective?

DistractersAre “high scorers” endorsing a distracter more than the correct answer?

DiscriminationHow well does an item discriminate “high scorers” from “low scorers”

Are there omitted items or items not reached?Is there a pattern in those items?

Reliability Calculations & Validity Evidence

2008 Teaching Colloquy, Department of Religion

TEST/ITEM ANALYSISFor More Information:

EDP 5340. Measurement/EvaluationChapter 13 of: Hollis-Sawyer, Thornton, Hurd, & Condon (2008)Chapter 14 of: Linn & Miller (2005)Chapter 6 of Urbina (2004)LERTAP program [http://www.assess.com/ ]

Page 26: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

Take Home Message

2008 Teaching Colloquy Department of Religion

2008 Teaching Colloquy, Department of Religion

TAKE HOME MESSAGEBe Mindful In Test ConstructionBe Purposeful in Item Selection and Development

Page 27: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

Questions?

2008 Teaching Colloquy Department of Religion

2008 Teaching Colloquy, Department of Religion

REFERENCESAmerican Educational Research Association, American

Psychological Association, and the National Council on Measurement in Education [AERA/APA/NCME]. (1999), Standards for educational and psychological testing, Washington, DC: American Psychological Association.

Bloom B. S. (1956). Taxonomy of educational objectives, Handbook I: The cognitive domain. New York: David McKay Co Inc.

Brennan, R. L. (Ed.) (2006), Educational measurement (4th ed.). Westport, CT: Praeger.

Cronbach, L. J. & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302.

Frisbie, D. A. (1988). Reliability of scores from teacher-made tests, Educational measurement: Issues and practice, 7, 25-35. [free: http://www.ncme.org/pubs/items/ITEMS_Mod_3.pdf ]

Page 28: Alexander Beaujean William Shiu · 2008 Teaching Colloquy, Department of Religion DEFINITIONS |(Achievement) Test: yA collection of items or tasks used to measure a underlying construct

2008 Teaching Colloquy, Department of Religion

REFERENCESHollis-Sawyer, L., Thornton, G. C., Hurd, B. & Condon, M. E. (2008).

Exercises in psychological testing (2nd ed.). Boston: Allyn & Bacon

Linn, R. L. & Miller, M. D. (2005). Measurement and assessment in teaching (9th ed.). Upper Saddle River, NJ: Pearson.

Urbina, S. (2004). Essentials of psychological testing. Hoboken, N.J.: John Wiley & Sons.