Download - ABCs of IRT

November 18, 2010

Diane M. Talley, MA

Stephen B. Johnson, PhD

James A. Penny, PhD

Psychometrics as Science and Art

2010 ICE Educational Conference

� IRT and Classical

� Concepts of IRT

� A logit

� The abc’s

� Benefits

� Pre-equating

� immediate scoring

� Population invariance

� Assumptions

� Implications


The right tools for the job

� Data

� Program

� Tool


Versus

Classical versus IRT model


Classical versus IRT

Classical Model IRT Model

� Traditional � Modern

� Requires less strict

adherence to assumptions

� Requires stricter

adherence to assumptions

� Sample dependent � Population invariant

� Statistics

(p – diff, p-biserial – disc)

� Probability-based statistics

(b-diff, a-disc, c-guessing)

� Simple scoring model (raw

score)� Scoring is more complex


What’s a logit?

Ability

The

Performance

StandardProbability


b (difficulty)


0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00-3

-2.8

-2.5

-2.3 -2

-1.8

-1.5

-1.3 -1

-0.8

-0.5

-0.3 0

0.25 0.5

0.75 1

1.25 1.5

1.75 2

2.25 2.5

2.75

THETA

P(u

=1

| T

HE

TA

)

Paint by Numbers Leonardo

1

43

2

5

a (discrimination) and b


0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00-3

-2.7

5

-2.5

-2.2

5 -2

-1.7

5

-1.5

-1.2

5 -1

-0.7

5

-0.5

-0.2

5 0

0.25 0.5

0.75 1

1.25 1.5

1.75 2

2.25 2.5

2.75

THETA

P(u

=1

| T

HE

TA

)


1

2

3

a, b, and c (guessing)


0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

-3

-2.7

5

-2.5

-2.2

5 -2

-1.7

5

-1.5

-1.2

5 -1

-0.7

5

-0.5

-0.2

5 0

0.25 0.5

0.75 1

1.25 1.5

1.75 2

2.25 2.5

2.75

THETA

P(u

=1

| T

HE

TA

)


1

2

3

Fit statistics

Comparison of Infit and Outfit

0

1

2

3

4

5

6

Infit OutfitIt

em

Ord

er

ICE 2010 Conference Atlanta Georgia

Outfit Mean Square Plot

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25 30

Item Order

MS

Q

Infit Mean Square Plot

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 5 10 15 20 25 30

Item Order

MS

Q

Population Invariance

Item 3

Item 2

Item 1

.92.70

.80.60

.50.15

High

Performing

Low

Performing

Classical Difficulty Values IRT Difficulty Values

Item 3

Item 2

Item 1

-.75-.75

0.000.00

1.501.50

High

Performing

Low

Performing


IRT Pre-Equating

� What does it mean?

� Why would you want to do it?

� What does it mean for building item banks

and forms?


Test Information Function (TIF)

Comparison of Test Information Functions

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

-3 -2.75 -2.5 -2.25 -2 -1.75 -1.5 -1.25 -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.775 1.025 1.275 1.525 1.775 2.025 2.275 2.525 2.775 3.025

Theta

Info

rmat

ion Form A

Form B


Assumptions

� Unidimensionality

� Local Independence


Implications

� Item writing� Leave those scored items alone!

� Focused item writing targeting the performance standard

� Assembly� Items selected for a form should be around the standard

� Testing and Reporting � Field test items for pre-equating/on-demand scoring

� Form assignment

� Scoring

� Recalibration

� Harder to explain to stakeholders


Does IRT make sense for you?

� What is the size and maturity of your program and

item bank?� Do you like to tinker with items?

� Do your program requirements change frequently?

� How experienced/capable are your item writers?

� How do you score candidates?� IRT or number correct

� Do you hold scores or do immediate scoring?

� Can you afford a psychometrician?


Questions?

Diane M. Talley [email protected] A. Penny [email protected] B. Johnson [email protected]

919.572.6880www.castleworldwide.com

Download - ABCs of IRT

Top Related