lesson six reliability. case imagine that a hundred students take a 100-item test at three o’clock...

30
Lesson Six Reliabili ty

Upload: paulina-austin

Post on 28-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Lesson Six

Reliability

Page 2: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Case

Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible difficult or ridiculously easy for these students, so they do not all get zero or a perfect score of 100. Now what if in fact they had not taken the test on the Thursday but had taken it at three o’clock the previous afternoon? Would we expect each student to have got exactly the same score on the Wednesday as they actually did on the Thursday?

Page 3: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Contents

Definition of reliability Factors contributing to unreliability Types of reliability Indication of reliability: Reliability coefficient Ways of obtaining reliability coefficient:

– Alternate/Parallel forms– Test-retest– Split-half & KR-21/KR-20

Two ways of testing reliability How to make test more reliable Online video

http://www.le.ac.uk/education/testing/ilta/faqs/main.html

Page 4: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Definition of Reliability (1)

“The consistency of measures across different times, test forms, raters, and other characteristics of the measurement context” (Bachman, 1990, p. 24).

If you give the same test to the same testees on two different occasions, the test should yield similar results.

Page 5: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Definition of Reliability (2)

A reliable test is consistent and dependable.

Scores are consistent and reproducible.The accuracy or precision with which a

test measures something; that is, consistency, dependability, or stability of test results.

Page 6: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Factors Contributing to Unreliability

X=T+ E (observed score = true score + error score)

Concerned with freedom from nonsystematic fluctuation.

Fluctuations in– the student– scoring– test administration– the test itself

Page 7: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Types of Reliability

Student- (or Person-) related reliability Rater- (or Scorer-) related reliability

– Intra-rater reliability– Inter-rater reliability

Test administration reliability Test (or instrument-related) reliability

Page 8: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Student-Related Reliability (1)

The source of the error score comes from the test takers.– Temporary illness– Fatigue– Anxiety– Other physical or psychological fact

ors– Test-wiseness (i.e., strategies for efficient t

est taking)

Page 9: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Student-Related Reliability (2)

Principles:– Assess on several occasions– Assess when person is prepared

and best able to perform well– Ensure that person understands

what is expected (e.g., instructions are clear)

Page 10: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Rater (or Scorer) Reliability (1)

Fluctuations: including human error, subjectivity, and bias

Principles:– Use experienced trained raters.– Use more than one rater.– Raters should carry out their

assessments independently.

Page 11: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Rater Reliability (2)

Two kinds of rater reliability:– Intra-rater reliability– Inter-rater reliability

Page 12: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Intra-Rater Reliability

Fluctuations including:– Unclear scoring criteria– Fatigue– Bias toward particular good and bad

students– Simple carelessness

Page 13: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Inter-Rater Reliability (1)

Fluctuations including:– Lack of attention to scoring criteria– Inexperience– Inattention– Preconceived biases

Page 14: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Inter-Rater Reliability (2)

Used with subjective tests when two or more independent raters are involved in scoring

Train the raters before scoring (e.g., TWE, dept. oral and composition tests for recommended students).

Page 15: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Inter-Rater Reliability (3)

Compare the scores of the same testee given by different raters.

If r= high, there’s inter-rater reliability.

Page 16: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Test Administration Reliability

Street noise– Listening comprehension test

Photocopying variationsLightingVariations in temperatureCondition of desks and chairsMonitors

Page 17: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Test Reliability

Measurement errors come from the test itself:– Test is too long– Test with a time limit– Test format allows for guessing– Ambiguous test items– Test with more than one correct

answer

Page 18: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Ways of Enhancing Reliability

General strategies: Consider possible sources of unreliability

– Reduce or average out nonsystematic fluctuations inraterspersonstest administrationinstruments

Page 19: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

How to Make Tests More Reliable? (1)

Take enough samples of behaviorTry to avoid ambiguous itemsProvide clear and explicit instructionsEnsure tests are well layout & perfectly

legibleProvide uniform and undistracted

condition of administrationTry to use objective tests

Page 20: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

How to Make Tests More Reliable? (2)

Try to use direct testsHave independent, trained ratersProvide a detailed scoring keyTry to identify the test takers by number,

not by namesTry to have more multiple independent

scoring in subjective tests (Hughes, 1989, pp. 36-42).

Page 21: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Reliability Coefficient (r)

To quantify the reliability of a test allow us to compare the reliability of different tests.

0 ≤ r ≤ 1 (ideal r= 1, which means the test gives precisely the same results for particular testees regardless of when it happened to be administered).

If r = 1: 100% reliable A good achievement test: r>= .90 R<.70 shouldn’t use the test

Page 22: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

How to Get Reliability Coefficient

Type of Reliability How to Measure

Stability or Test-Retest

Give the same assessment twice, separated by days, weeks, or months. Reliability is stated as the correlation between scores at Time 1 and Time 2.

Alternate Form Create two forms of the same test (vary the items slightly).  Reliability is stated as correlation between scores of Test 1 and Test 2.

Internal  Consistency

(Alpha, a)

Compare one half of the test to the other half.  Or, use methods such as Kuder-Richardson Formula 20 (KR20) or Cronbach's Alpha.   

Page 23: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

How to Get Reliability Coefficient

Two forms, two administrations: alternate/parallel forms

One form, two administrations: test-retest One form, one administration (internal

consistency):– split-half (Spearman-Brown procedure)– KR-21– KR-20

Page 24: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Alternate/Parallel Forms

Two forms, two administrations:– Equivalent forms (i.e.,

different items testing the same topic) taken by the same test taker on different days

– If r is high, this test is said to have good reliability.

– the most stringent form

Test plan

Form A Form B

Page 25: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Test-Retest

The same test is administered to the same testees with a short time lag, and then calculate r.

Appropriate for highly speeded test

Test A

Trial 1 Trial 2

One form, two administrations

Page 26: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Split-half (Spearman-Brown Procedure)

One test, one administration Split the test into halves (i.e., odd questions vs ev

en questions) to form two sets of scores. Also called internal consistency

Q1

Q2

Q3

Q4

Q5

Q6

First Half

Second Half

Page 27: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Split-half (2)

Note that the r isn’t the reliability of the test A math relationship between test length and reliabilit

y: the longer the test, the more reliable it is. Rel.total = nr/1+ (n-1)r Spearman & Brown Prophec

y Formula E.g., correlation between 2 parts of test; r= .6 rel.

of full test = .75 If lengthen the test items into 3 times: r= .82

Page 28: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Kuder-Ridchardson formula 21

KR-21 = k/(k-1){1-[x (1- x/k)]/s2}k= number of items; x= means= standard deviation (formula see Bailey 100)

– description of the spread outness in a set of scores (or score deviations from the mean)

– o<=s the larger s, the more spread out

– E.g., 2 sets of scores: (5, 4,3) and (7,4,1); which group in general behaves more similarly?

Page 29: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Kuder-Ridchardson formula 20

KR-20= [k/(k-1)][1-(∑pq/s2) p= item difficulty (percent of people who got

an item right) q= 1-p (i.e., percent of people who got an

item wrong)

Page 30: Lesson Six Reliability. Case Imagine that a hundred students take a 100-item test at three o’clock one Thursday afternoon. The test is not impossible

Ways of Testing Reliability

Examine the amount of variation– Standard Error of Measurement (SEM)– The smaller the better

Calculate “reliability coefficient”– “r”– The bigger the better