item response theory in constructing measures

61
Item Response Theory in Constructi ng Measures Carlo Magno, PhD De La Salle University, Manila PEMEA BOT, Psychometrics and Statistics Division 1

Upload: carlo-magno

Post on 14-May-2015

2.274 views

Category:

Education


1 download

DESCRIPTION

Thsi presnetation is the latest version of all the IRT presentation I made. This also contains Rasch analysis for polytomous scales.

TRANSCRIPT

Page 1: Item Response Theory in Constructing Measures

Item Response Theory in Constructing Measures

Carlo Magno, PhDDe La Salle University, Manila

PEMEA BOT, Psychometrics and Statistics Division

1

Page 2: Item Response Theory in Constructing Measures

Advance Organizer

Approaches in Analyzing Test Data Classical test Theory (CTT)

Focus of Analysis in CTT Limitations of CTT

Item Response Theory (IRT) Approaches in IRT Advantages of the IRT Example of an IRT model: Rasch Model

What to interpret? IRT for scales Applications of IRT on Tests Workshop

2

Page 3: Item Response Theory in Constructing Measures

Approaches in Analyzing Test Items

Classical Test Theory Item Response Theory

3

Page 4: Item Response Theory in Constructing Measures

Classical Test Theory (CTT) Regarded as the “True Score Theory” Responses of examinees are due only to

variation in ability of interest Sources of variation external conditions

or internal conditions of examinees that assumed to be constant through rigorous standardization or to have an effect that is nonsystematic or random by nature

4

Page 5: Item Response Theory in Constructing Measures

Classical Test Theory (CTT)

TO = T + EThe implication of the classical test

theory for test takers is that test are fallible imprecise tools

Error = standard error of measurement

Sm = S 1 - rTrue score = M +- Sm = 68% of the

normal curve 5

Page 6: Item Response Theory in Constructing Measures

Normal curve

6

meanRange of True Scores

One SE

Page 7: Item Response Theory in Constructing Measures

Focus of Analysis in CTT

• Frequency of correct responses (to indicate question difficulty);

• Frequency of responses (to examine distracters);

• Reliability of the test and item-total correlation (to evaluate discrimination at the item level)

7

Page 8: Item Response Theory in Constructing Measures

Issues in CTT

A score is dependent on the performance of the group tested (Norm referenced)

The group on which the test has been scaled has outlived has usefulness across time Changes in the defined population Changes in educational emphasis

There is a need to rapidly make new norms to adopt to the changing times

If the characteristics of a person changes and does not fit the specified norm then a norm for that person needs to be created.

Each collection of norms has an ability of its own = rubber yardstick

8

Page 9: Item Response Theory in Constructing Measures

Item Response Theory

Synonymous with latent trait theory, strong true score theory or modern mental test theory

Initially designed for tests with right and wrong (dichotomous) responses.

Examinees with more ability have higher probabilities for giving correct answers to items than lower ability students (Hambleton, 1989).

Each item on a test has its own item characteristic curve that describes the probability of getting each particular item right or wrong given the ability of the test takers (Kaplan & Saccuzzo, 1997) 9

Page 10: Item Response Theory in Constructing Measures

Item Characteristic Curve

A function of ability () – latent trait

Forms the boundary between the probability areas of answering an item incorrectly and answering the item correctly

10

Page 11: Item Response Theory in Constructing Measures

Judging Responsive Items

11

Page 12: Item Response Theory in Constructing Measures

Approaches of IRT

One dimension (Rasch Model) One parameter model = uses only the difficulty parameter

Two dimension Two parameter Model = difficulty and ability parameter

Three dimension (Logistic Model) Three Parameter Model = item difficulty, item discrimination, and psuedoguessing 12

Page 13: Item Response Theory in Constructing Measures

Three-parameter IRT Model

Mathematical model linking the observable dichotomously scored data (item performance) to the unobservable data (ability)

Pi(θ) gives the probability of a correct response to item i as a function if ability (θ)

b is the probability of a correct answer (1+c)/2

b=item difficulty

a=item discrimination

c=psuedoguessing parameter

a b

c

Page 14: Item Response Theory in Constructing Measures

Two-Paramter IRT Model

Two-parameter model: c=0

One-parameter model: c=0, a=1

ba

Page 15: Item Response Theory in Constructing Measures

One-Parameter Model

Three items showing different item difficulties (b)

Page 16: Item Response Theory in Constructing Measures

Advantages of the IRT

The calibration of test item difficulty is independent of the person used for the calibration.

The method of test calibration does not matter whose responses to these items use for comparison

It gives the same results regardless on who takes the test

The scores a person obtain on the test can be used to remove the influence of their abilities from the estimation of their difficulty. The result is a sample free item calibration.

16

Page 17: Item Response Theory in Constructing Measures

Rasch Model

Rasch’s (1960) main motivation for his model was to eliminate references to populations of examinees in analyses of tests.

According to him that test analysis would only be worthwhile if it were individual centered with separate parameters for the items and the examinees (van der Linden & Hambleton, 2004).

17

Page 18: Item Response Theory in Constructing Measures

Rasch Model

The Rasch model is a probabilistic unidimensional model which asserts that:

(1) the easier the question the more likely the student will respond correctly to it, and

(2) the more able the student, the more likely he/she will pass the question compared to a less able student. 18

Page 19: Item Response Theory in Constructing Measures

Rasch Model

The model was enhanced to assume that the probability that a student will correctly answer a question is a logistic function of the difference between the student's ability [θ] and the difficulty of the question [β] (i.e. the ability required to answer the question correctly), and only a function of that difference giving way to the Rasch model

Thus, when data fit the model, the relative difficulties of the questions are independent of the relative abilities of the students, and vice versa (Rasch, 1977).

19

Page 20: Item Response Theory in Constructing Measures

Assumptions of the Rasch Model According to Fisher (1974)

(1) Unidimensionality. All items are functionally dependent upon only one underlying continuum.

(2) Monotonicity. All item characteristic functions are strictly monotonic in the latent trait. The item characteristic function describes the probability of a predefined response as a function of the latent trait.

20

Page 21: Item Response Theory in Constructing Measures

Assumptions of the Rasch Model According to Fisher (1974)

(3) Dichotomy of the items. For each item there are only two different responses, for example positive and negative. The Rasch model requires that an additive structure underlies the observed data. This additive structure applies to the logit of Pij, where Pij is the probability that subject i will give a predefined response to item j, being the sum of a subject scale value ui and an item scale value vj, i.e. In (Pij/1 - Pij) = ui + vj

21

Page 22: Item Response Theory in Constructing Measures

Difference of CTT and IRT

Source: Magno, C. (2009). Demonstrating the difference between classical test theory and item response theory using derived data. The International Journal of Educational and Psychological Assessment, 1, 1-11.. 22

Page 23: Item Response Theory in Constructing Measures

Difference of CTT and IRT

Source: Magno, C. (2009). Demonstrating the difference between classical test theory and item response theory using derived data. The International Journal of Educational and Psychological Assessment, 1, 1-11. 23

Page 24: Item Response Theory in Constructing Measures

Difference of CTT and IRT

Source: Magno, C. (2009). Demonstrating the difference between classical test theory and item response theory using derived data. The International Journal of Educational and Psychological Assessment, 1, 1-11.

24

Page 25: Item Response Theory in Constructing Measures

What to interpret in IRT?

Item Characteristic Curve (ICC) – Test Characteristics Curve (TCC)

Logit measures for each item Item Information Function (IIF) – Test

Information Function (TIF) Infit measures

25

Page 26: Item Response Theory in Constructing Measures

Test/Item Characteristic Curve (TCC)

TCC: Sum of ICC that make up a test or assessment and can be used to predict scores of examinees at given ability levels.

TCC(Ѳ)=∑Pi(Ѳ) Links the true score to

the underlying ability measures by the test.

TCC shift to the right of the ability scale=difficult items

Page 27: Item Response Theory in Constructing Measures

Test Characteristic Curve

Steeper slopes indicate greater discriminating ability

Flat slopes indicates weak discrimination ability

27

Page 28: Item Response Theory in Constructing Measures

Test Characteristic Curve

Figure 4. Test Characteristic Curve of the PRPF for the Primary Rater Figure 5. Test Characteristic Curve of the Secondary Rater

Page 29: Item Response Theory in Constructing Measures

Item/Test Information Function

I(Ѳ), Contribution of particular items to the assessment of ability.

Items with higher discriminating power contribute more to measurement precision than items with lower discriminating power.

Items tend to make their best contribution to measurement precision around their b value.

Page 30: Item Response Theory in Constructing Measures

Test Information Function

Tests with highly constrained TIF are imprecise measures of the for much of the continuum of the domain

Tests with TIF that encompass a large range provides precise scores along the continuum of the domain measured.

-2.00 SD units to +2.00 SD units – includes 95% of the possible values of the distribution. 30

Page 31: Item Response Theory in Constructing Measures

Test Information Function

Figure 2. Test Information Function of PRPF for the Primary Raters

-4.00 SD to +4.00 SD unitsFigure 3. Test Information Function of the PRPF of the Secondary Rater

-4.00 SD to +4.00 SD units

Page 32: Item Response Theory in Constructing Measures

Item Information Function

Page 33: Item Response Theory in Constructing Measures

–3 –2 –1 1 2 30

0

0.2

0.4

0.6

0.8

1

Ability ()

Four item characteristic curves

1 2 3

4

–3 –2 –1 1 2 30

0

0.5

1

1.5

2

Ability ()

Item information for four test items

1

2

3

4

Figure 6: Item characteristics curves and corresponding item information functions

Page 34: Item Response Theory in Constructing Measures

Test Information Function The sum of item information functions in a

test. Higher values of the a parameter increase

the amount of information an item provides.

The lower the c parameter, the more information an item provides.

The more information provided by an assessment at a particular level, the smaller the errors associated with ability estimation.

their corresponding IFF     

Page 35: Item Response Theory in Constructing Measures

30

0

0.5

1

1.5

2

Ability ()

Figure 7: Test information function for a four–item test

 

Page 36: Item Response Theory in Constructing Measures

Item Difficulty

Item Analysis Determining item difficulty (logit measure of

+ means an item is difficult, and – means easy).

Utilizing goodness-of-fit criteria to detect items that do not fit the specified response model (Z statistic, INFIT Mean square).

Item Selection Assess the contribution of each items’ test

information function that are independent of other items.

36

Page 37: Item Response Theory in Constructing Measures

Interpreting Winsteps Output Item Difficulty

MEASURE=logit measures of proportion correct Negative values (-) item is easy Positive values (+) item is difficult

Goodness of fit Values of MNSQ INFIT within 0.8 to 1.2 Z standard scores of 2.o and below are acceptable High values of item MNSQ indicate a “lack of

construct homogeneity” with other items in a scale, whereas low values indicate “redundancy” with other items” (Linacre & Wright, 1998).

Item Discrimination Point biserial estimate=close to 1.0

37

Page 38: Item Response Theory in Constructing Measures

Polychotomous IRT Models

Having more than 2 points in the responses (ex. 4 point scale)

Rating scale Model/polytomous Model (Andrich, 1978)

Partial Credit Model Graded Response Model Nominal model

Page 39: Item Response Theory in Constructing Measures

Graded Response model for a 5-point scale

Page 40: Item Response Theory in Constructing Measures

Additional outputs for polytomous models Item Response Thresholds

Logistic curves for each scale category The extent to which the items response levels

differ along the continuum of the latent construct (different of a response of “strongly agree” to “agree”).

Ideal to monotonic – the higher the scale, higher threshold values are expected.

Easier items have smaller response threshold than difficult items.

Threshold values that are very close means indistinguishable from each other.

40

Page 41: Item Response Theory in Constructing Measures

Threshold Categories

ExamplePrimary rater: -3.79, -1.95, .96, and

4.35, Secondary rater: -3.90, -2.25, .32,

and 3.60.

Page 42: Item Response Theory in Constructing Measures

Asian Values Scale

42Magno, C. (2010). Looking at Filipino preservice teachers value for education through epistemological beliefs. TAPER, 19(1), 61-78.

Page 43: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch Model

Self-regulation is defined by Zimmerman (2002) as self-generated thoughts, feeling, and actions that are oriented to attaining goals.

Self-regulated learners are characterized to be “proactive in their efforts to learn because they are aware of their strengths and limitations and because they are guided by personally set goals and task-related strategies” (p. 66).

43

Page 44: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch Model

Subprocesss of self-regulation (Zimmerman, 1986). Metacognition (planning, organizing,

self-instructing, monitoring, self-evaluating)

Motivation (competence, self-efficacy, autonomy)

Behavioral (select, structure, and optimize learning environments) aspects of learning.

Self-regulation structured interview – 14 questions (Zimmerman & Martinez-Pons, 1986)

44

Page 45: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch Model

SRLIS Reliability – percentage of agreement

between 2 coders Discriminant validity - high and low

achievers were compared across the 14 categories.

Construct validity - self-regulated learning scores were used to predict scores of the students in the Metropolitan Achievement Tests (MAT) together with gender and socio-economic status of parents.

45

Page 46: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch Model

To continue the development in the process of arriving at good measures of self-regulation.

A Polytomous Item Response Theory This analysis allows reduction of item

variances because the influence of person ability is controlled by having a separate calibration (Wright & Masters, 1982; Wright & Stone, 1979). 46

Page 47: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch Model

Method 222 college students SRLIS was administered to 1454 Responses were converted into items

dpicting the 14 categories Item review

47

Page 48: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch ModelPrincipal components analysis: 7

factors were extracted that explains 42.54% of the total variance (55 items loaded highly >.4)

The seven factors were conformed (N=305) All 7 factors were significantly correlated

. 7-factor structure was supported:

▪ χ2=332.07, df=1409▪ RMS=.07▪ RMSEA=.06▪ GFI=.91, ▪ NFI=.89

48

Page 49: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch Model

49

Page 50: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch Model

50

Page 51: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch Model

The average step calibrations for Memory strategy are, -1.57, .25, 1.71,

and 3.41 Goal setting, -3.19, -.92, 1.37, and 3.61 Self-evaluation, -2.71, -.59, 1.25, and

3.15 Seeking assistance, -2.70, -1.06, .41,

and 2.30 Environmental structuring, 2.32, -.42,

1.40, and 3.47 Responsibility, -3.43, -1.20, .98, and 3.98 Organizing, -2.88, -.95, .79, and 2.76

51

Page 52: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch Model

52

Page 53: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch Model

53

Page 54: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch Model

54

Page 55: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch Model

3. I take my own notes in class55

Page 56: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch Model

2. I isolate myself from unecessary noisy places

56

Page 57: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch Model

57

Page 58: Item Response Theory in Constructing Measures

Development of the A-SRL-S: Polytomous Rasch Model

3. I put my notebooks, handouts, and the like in a certain container. 4. I study at my own pace.

58

Page 59: Item Response Theory in Constructing Measures

Application of IRT on Test Development

Item Analysis Determining sample invariant item

parameters. Utilizing goodness-of-fit criteria to detect

items that do not fit the specified response model (χ2, analysis of residuals).

Item Selection Assess the contribution of each item the

test information function independent of other items.

Page 60: Item Response Theory in Constructing Measures

Application of IRT on Test Development

Item banking Test developers can build an assessment

to fit any desired test information function with items having sufficient properties.

Comparisons of items can be made across dissimilar samples.

Page 61: Item Response Theory in Constructing Measures

Final Slide

Workshop

61