applying computer based assessment using cognitive diagnostic modeling to benchmark tests terry...

54
Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan Templin, U. of Georgia John Willse, UNCG Tenth Annual Assessment Conference Maryland Assessment Research Center for Education Success University of Maryland, College Park, Maryland October 19, 2010 1

Upload: vincent-boundy

Post on 14-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Applying Computer Based Assessment Using Cognitive

Diagnostic Modeling to Benchmark Tests

Terry Ackerman, UNCG

Robert Henson, UNCG

Ric Luecht, UNCG

Jonathan Templin, U. of Georgia

John Willse, UNCG

Tenth Annual Assessment ConferenceMaryland Assessment Research Center for Education Success

University of Maryland, College Park, MarylandOctober 19, 2010

1

Page 2: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Overview of talk• Purpose of the study• The Cumulative Effect Mathematics Project• Phase I paper and pencil benchmark test

– Q-matrix development– Item writing– Standard setting– Results - Fitting the CDM model– Teacher feedback

• Phase II Multistage CDM CAT – Multistage CDM (development and

administration)• Future Directions

2

Page 3: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Purpose We are currently part of the evaluation

effort of a locally and state funded project called the Cumulative Effect Mathematics Project. As part of that effort, we are applying cognitive diagnostic modeling (CDM) to a benchmark test used in an Algebra II course in Guilford County, North Carolina. Our goal is to eventually make this a computerized CDM assessment.

3

Page 4: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

The CEMP involves the ten high schools in Guilford County that had the lowest performance on the End-of-Course tests in mathematics. The EOC test is part of the federally mandated accountability test under the No Child Left Behind Legislation. The ultimate goal of the CEMP is to increase mathematics scores at these ten high schools.

Cumulative Effect Mathematics Project

4

Page 5: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Benchmark Testing

Standard course of study

Currently in North Carolina, teachers follow strict instructional guidelines called a “standard course of study”. Theseguidelines dictate what objectives and content must betaught during each week. The instruction must “keep moving”.Given this pacing teachers often struggle on how to effectivelyassess students’ learning to make sure they are prepared to take the End-of-Course Test. This is a very “high-stakes”test because it could have implications for both the student(passing the course) and the teacher (evaluation of his orher effectiveness as a teacher).

End-of-Course

Test

September May5

Page 6: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Benchmark Testing

Standard StudyCourse of BT BT

One common method of formative assessment is the“benchmark test”. These tests would provide intermediatefeedback of what the student has learned so that remediation, if necessary, can be implemented priorto the end-of course test.

End-of-Course

Test

Remediation Remediation

6

Page 7: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Potential Benefits of using Cognitive Diagnostic Modeling (CDM) on benchmark tests

7

By constructing the benchmark test to measure attributes with CDMs, several benefits can be realized.

•Student information comes in the form at of a profile of skills that the student has mastered and not mastered.

•The skills needed to perform well on the EOC are measured directly.

•The CDM profile format can diagnostically/prescriptively inform classroom instruction.

•The profile can help students better understand their strengths and weaknesses.

•When presented in a computerized format there is immediatefeedback provided to the teacher and students.

Page 8: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Models used for Cognitive Diagnosis

Many cognitive diagnosis models (CDM) are built upon the work of Tatsouka (1985) and requires one to specify a Q-matrix. For a given test, this matrix identifies which attributes each item is measuring. Thus, for a test containing J items and K attributes the J x K Q-matrix contains elements, qjk , such that

Also, instead of characterizing examinees with a continuous latent variable, examinees are characterized with a 0/1 vector/profile, αi , whose elements denote which of the k attributes subject i has mastered.

qif item j requ ires a ttribu te k

elsejk

1

0

8

Page 9: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Example Q-matrix

A B C D E F

1 0 1 1 0 0 0

2 1 0 0 0 0 1

3 1 0 0 1 0 0

4 0 1 1 0 0 0

5 0 0 0 1 1 1

6 0 0 0 1 1 0

K - Attributes/Skills

J -

Item

s

Item 6 requires attributes D and E.

1= item requires attribute

Attribute F is being

assessed by items 2 and 5

9

Page 10: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Choosing the Attributes

We chose to use the attributes as defined by the Department of Public Instruction’s standard course of study’s course objectives and goals– On the EOC students would ultimately

be evaluated in relation to these course objectives and goals.

– Teachers were already familiar with those definitions and the implied skills

10

Page 11: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Objectives Retained for our Q-Matrix

• 1.03 Operate with algebraic expressions (polynomial, rational, complex fractions) to solve problems

• 2.01 Use the composition and inverse of functions to model and solve problems: justify results

• 2.02 Use quadratic functions and inequalities to model and solve problems; justify results

– a. Solve using tables, graphs and algebraic properties– b. Interpret the constants and coefficients in the context of the problem

• 2.04 Create and use best-fit mathematical models of linear, exponential, and quadratic functions to solve problems involving sets of data

– a. Interpret the constants, coefficients, and bases in the context of the data.– b. Check the model for goodness-of-fit and use the model, where appropriate, to draw

conclusions or make predictions

• 2.08 Use equations and inequalities with absolute value to model and solve problems: justify results.

– a. Solve using tables, graphs and algebraic properties.– b. Interpret the constants and coefficients in the context of the problem.

11

Page 12: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

The Assessment• After we discussed the concept of a Q-matrix

with a group of three Master teachers, we had them write items measuring one or more of the attributes. From this pool of “benchmark” items a pencil and paper assessment was created

• These items were then pilot tested and the assessment was refined using traditional CTT techniques

• A final form was created and the Q-matrix was further verified by another set of five master teachers.

12

Page 13: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

The Simple Math Example used to verify the Q-matrix

Example Test Measuring Basic Math:2+3-1=?2/3=?2*4=?

Notice that in this example every item does not require the four skills (add, subtract, multiply, and divide) and so we need to describe which skills are needed to answer each item. The way that we will summarize this information by using a table like the one below.

We ask that you simply provide a check (or an “X”) under those skills that would be needed to correctly answer each of the items. Again we provide an example of the final table.

Add Subtract Multiply Divide 2+3-1=? 2/3=? 2*4=?

Add Subtract Multiply Divide 2+3-1=? X X 2/3=? X 2*4=? X

We ask that you simply provide a check (or an “X”) underthose skills that are needed to correctly answer eachof the items.

13

Page 14: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Generalizabilty study

We also conducted a generalizability study to examine the dependability the process of assigning the attributes to the items. The sources of variability included:

• Test-Items, Object of Measurement• Raters: Teachers indicating which attributes were

required in order to answer items• Attributes Influencing the items (attributes were treated

as fixed)

In G-theory there is a coefficient for relative decisions (i.e., ranking), g, and one for absolute decisions (i.e., criteria-based),

14

Page 15: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Dependability of the Q-Matrix

• Under our current design (shaded row) the highest dependability coefficients were obtained for objectives 2.01 and 2.08

Table 1. Dependability of AssignmentsObjective

Raters 1.03 2.01 2.02 2.04 2.08

1 0.38 0.73 0.34 0.48 0.662 0.55 0.84 0.50 0.65 0.793 0.64 0.89 0.60 0.74 0.854 0.71 0.91 0.67 0.79 0.885 0.75 0.93 0.72 0.82 0.916 0.78 0.94 0.75 0.85 0.927 0.81 0.95 0.78 0.87 0.938 0.83 0.96 0.80 0.88 0.949 0.84 0.96 0.82 0.89 0.95

10 0.86 0.96 0.84 0.90 0.9511 0.87 0.97 0.85 0.91 0.9512 0.88 0.97 0.86 0.92 0.96

15

Dependability of assigning attributes Attributes

Page 16: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

The Final Q-Matrix

The Final Q-MatrixObjective

Item 1.03 2.01 2.02 2.04 2.081 0 1 0 0 02 1 0 1 0 03 1 0 1 0 04 1 0 1 0 05 1 0 1 0 06 0 1 0 1 07 0 1 0 0 08 0 0 0 0 19 1 0 0 0 010 0 0 0 0 111 0 0 0 0 112 1 0 0 0 013 0 0 0 0 114 0 0 0 0 115 1 0 0 0 016 0 0 0 1 017 0 0 0 1 018 0 0 0 1 019 0 0 0 1 020 1 0 1 0 021 1 0 0 1 022 0 0 0 0 123 0 0 0 1 0

24 1 0 1 0 025 1 0 0 1 0

• The average q-matrix complexity is 1.36– 9 items require 2 attributes– 16 items require 1 attribute

• Stem for item 2 If one factor of f(x) = 12x2 –

14x – 6 is (2x – 3) what is the other factor of f(x) if the polynomial is factored completely.

16

Page 17: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

The LCDM• In this particular case we used the Log-linear

Cognitive Diagnosis Model ( Henson, Templin, and Willse, 2007).

• The LCDM is a special case of a log-linear model with latent classes (Hagenaars, 1993) and thus is also a special case of the General Diagnostic Model (von Davier, 2005).

• The LCDM defines the logit of the probability of a correct response as a linear function of the attributes that have been mastered.

17 17

Page 18: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

The LCDM• Given the simple item, 2+3-1=?, we can model

the logit of the probability of a correct response as a function of mastery or non-mastery of the two attributes (addition and subtraction). Specifically,

18

0*)1(1

)1(ln

subaddsubaddsubsubaddaddij

ij

XP

XP

18

Note that the two-attribute LCDM is very similar to a two-factor ANOVA with two main effects and an interaction term

Page 19: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Standard Setting• Although the LCDM item parameters can be

estimated, it was important to define the parameters so that mastery classifications would be consistent with the standards set by the EOC.

• In getting these probabilities the standard is set for all possible combinations of mastery.

• Thus, we define how a student will be classified in the mastery of each attribute.

19 19

Page 20: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Estimating LCDM item parameters using Standard Setting

• The teachers we used to verify the Q-matrix also helped us perform a standard setting using a modified Angoff approach.

• For each item, teachers were asked to identify what proportion of 100 students who mastered the required attributes and what proportion of 100 students who had not mastered the required attributes would get the item correct.

• These proportions were then averaged across raters and used to determine the parameters for each item in the LCDM model.

20 20

Page 21: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Example Standard Setting Responses Item 1 (01000)

21

subiii

Mea

n

1.0

0.8

0.6

0.4

0.2

0.0

item: 1

rater4rater3rater2rater1

1. If f(x) = x2 +2 and g(x) = x – 3 find .

a. x2 – 6x +11 b. x2 +11c. x2 +x – 1d. x3 – 3x2 +2x – 6

P(X=1|Non-master) P(X=1|Master)

21

Page 22: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Example Standard Setting Responses Item 6 (01010)

22

subiviiiiii

Mea

n

1.0

0.8

0.6

0.4

0.2

0.0

item: 6

rater4rater3rater2rater1

6. Determine which of the following graphs does not

represent Y as a linear function of X.

00 10 01 11 P(X=1)

22

Page 23: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Analyses

23

• Based on the teachers’ standard setting responses, the average probability of a correct response was calculated.

• These averages are used to compute item parameters.– Specifically, if we know the probabilities associated with

each response pattern (based on the teachers’ responses) then we can compute the logit. Therefore we can directly compute the item parameters. For a simplified version having only two attributes the model would like:

0*)1(1

)1(ln

subaddsubaddsubsubaddaddij

ij

XP

XP

23

Page 24: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

• We administered the test and then using these fixed parameters as truth, we obtained estimates of the posterior probability that each skill has been mastered.

• A mastery profile, , was created, i.e., the probabilities. were then categorized as mastery or non-mastery using the rule:

Greater than 0.50 equals a master.Less than 0.50 equals a non-master.

Attributes

Student ID 1.03 2.01 2.02 2.04 2.08

24 0.25 0.87 0.99 0.44 0.05

0 1 1 0 0

24 24

Page 25: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Example Feedback

25

Student 10, Score of 17

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

1.03 2.01 2.02a 2.04 2.08

Goal

Pro

bab

ilit

y o

f M

aste

ry

Series1

01110

25

Page 26: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Example Feedback

26

Student 11, Score of 17

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

1.03 2.01 2.02a 2.04 2.08

Goals

Pro

bab

ilit

y o

f M

aste

ry

Series1

11010

26

Page 27: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

1.01

2.01

2.02

2.04t

2.08

John M M U NM NM

Mary M NM M NM NM

Wim M M M M M

S’s 1.01 2.01 2.02 2.04 2.08 1 0.8651 0.7415 0.5303 0.3925 0.24492 0.9820 0.2816 0.9204 0.3647 0.16923 0.9792 0.9531 0.9236 0.8814 0.96634 0.2045 0.1180 0.4381 0.1200 0.06015 0.8447 0.5948 0.3821 0.7483 0.88206 0.6573 0.8807 0.5966 0.8628 0.7690

Examinee Posterior Probabilities of Mastery

Non Master < .45 .45 < Unsure < .55 Master >.55

Mrs. JonesStudents’

results 27

Page 28: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Mrs. Jones’ Algebra II class results

2.082.04

2.022.011.01

14

3

4

66.7%Master

14.3%Unsure

19%Non-master

15

1

5

71.4%Master

4.8%Unsure

23.8%Non-master

15

1

5

71.4%Master

4.8%Unsure

23.8%Non-master

13

5

3

61.9%Master

23.8%Unsure

14.3%Non-master

15

1

5

71.4%Master

4.8%Unsure

23.8%Non-master

Non-Master

Unsure

Master

pred

Non-Master

Unsure

Master

pred

Non-Master

Unsure

Master

pred

Non-Master

Unsure

Master

pred

Non-Master

Unsure

Master

pred

28

Page 29: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Roadmaps to Proficiency

29

Page 30: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Benchmark results were linked to students’ EOC performance. Then for each profile, a mean EOC score was computed.

30

Mastery Profile

Average EOC score

(0,0,0) 11

(1,0,0) 12

(0,1,0) 14

(0,0,1) 15

(1,1,0) 20

(1,0,1) 18

(0,1,1) 22

(1,1,1) 25

Using this chart we then can indicate for a teacher, which skills will result in the largest gain on the EOC. That is, assume an individual has not mastered any of the three attributes and has a profile of (0,0,0). If he or she mastered attribute 1 the expected EOC gain would be 1 point, if they mastered attribute 2, the gain would be 3 points, and if they mastered attribute 3 the gain would be 4 points. Thus, if time is limited it would be best for this individual to learn attribute 3.

Page 31: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Using the distances between expected increases in EOC scores for each vector additional attribute mastered Templin was able to treat these distances as “strengths of relationship” and use the Social Network Theory software Pajek to create the following “Roadmap to Mastery”.

Roadmaps to Proficiency

31

Page 32: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Mastery of all skills

Road Map to Mastery

32

Mastery of No skills

Page 33: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

0 6 12 18 24EOC test scale

Pathways to EOC attributeMastery

10010

33

11010

Page 34: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Conversion to a Multistage CAT test

We are in the process of converting the benchmark test to a multistage computer adaptive test. To do this we are going to approximate the same procedure that would be used in a traditional CAT. That is, typically in a CAT items are selected to provide the greatest amount of information at the current estimated ability level. To create an analogous approach with diagnostic models we will use an index that is a measure of attribute information.

34

Page 35: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Multi-stage testing for DCM

Currently we are conducting simulation studies and compare the proportion of correct classification of identifying attribute patterns using several different testing scenarios. Initially we are experimenting with three attributes and then will expand the configuration to five attributes. This work combines the work of Henson, et al (2008), Luecht (1997) and Luecht, Brumfield and Breithaupt (2004).

Using a pool of 200 generated items and 1000 examinees we are in the process of verifying the success of a multistage CAT format for the CDM. For this comparison we hope to compare three testing scenarios.

35

Page 36: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Verifying the accuracy of a Multistage CDM CAT

Scenario One: Create a 30-item test using Henson’ et al’s db attribute discrimination index. That is, assuming a uniform distribution of ability, 30 items having thehighest db values would be selected and the administration to the 1000 examinees would be simulated.

Scenario Two, would be to simulate a multistage adaptive CAT.

Scenario Three, would be to use Chang’s CAT approach.

36

Page 37: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Attribute specific item discrimination indices using the Kullbeck-Leibler Information (KLI)

In diagnostic modeling instead of using the Fisher information function, the Kullback-Leibler Information (KLI) is used. KLI represents the difference between two probability distributions. Henson, Roussos, Douglas and He (2008) developed an index, db that describes the discrimination for a specified distribution of attribute patterns.

This index can be aggregated for multiple items (e.g., a test module). That is, given a posterior distribution of probabilities for a complete set of mastery profiles (e.g, (1,1,1), (1,0,1), etc.) this index would indicate which item, or which module of items, would be most discriminating. This is analogous to selecting the most discriminating or most informative item for a given theta.

37

Page 38: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Attribute specific item discrimination indices using the Kullbeck-Leibler Information (KLI)

For example, if Pα(Xj) is the probability of response vector Xj given α. Thus, the KLI between two different distributions for item j can be expressed as

Where and are the probability distributions of Xj conditioned on the 0-1 mastery-nonmastery profiles α and α*, respectively.

j

j

Xj P

PjPK

jX

Xlog)X(*,

*

1

0

jP X jXP *

38

Page 39: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Diagnostic model item indices

In 2008, Henson, Roussos, Douglas and He, designed an attribute discrimination index (d(B)j). When α is estimated, d(B)jk1 and , d(B)jk0 can be computed as

and

The attribute discrimination d(B)j is then the average of the two components,

1

)1|(111)(

K

KkjuvkjkB PwwhereKwd

1

)0|(000)(

K

KkjuvkjkB PwwhereKwd

J

jjBB d

1)()(d

39

21)(0)(

)(jkBjkB

jkB

ddd

Page 40: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Routing test

9-items

10 items

10 items

10 items

10 items

10 items

10 items

Format of our Multistage CDM CAT

Stage 1 Stage 2 Stage 3

40

Page 41: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Routing test

9-items

10 items

10 items

10 items

10 items

10 items

10 items

Construction of the multistage CDM CAT

Stage 1 Stage 2 Stage 3

41

The routing test would be constructed to have a simple structure format with three items measuring only one attribute. The nineitems for this test would be selected again using the attribute discrimination statistic assuming that ability was uniformly distributed.

Page 42: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Routing test

9-items

10 items

10 items

10 items

10 items

10 items

10 items

Construction of the multistage CDM CAT

Stage 1 Stage 2 Stage 3

42

The last two stages would have three modules of ten items each. Optimal items would be selected from the item pool using the db index. The “top” panel would be composed of more difficult items targeted for examinees whose estimated proficiency profile includes mastery of at least 2 attributes.

The “middle” panel would be composed of moderate difficulty items, targeted for examinees whose estimated proficiency profile includes mastery of 1 to 2 attributes.

The “bottom” panel would be composed of easy items targeted for examinees whose estimated proficiency profile includes mastery of 0 to 1 attributes.

Page 43: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

10items

10items

10 items

10items

10items

10items

Stage 2 Stage 3

Modules in Stage 3 would be constructed in the same manner as Stage 2 again based upon the optimal values of the attribute discrimination index.

43

Page 44: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Routing test

9-items

10 items

10 items

10 items

10 items

10 items

10 items

Administration of the multistage CDM CAT

Stage 1 Stage 2 Stage 3

44

Given an examinee’s posterior probability distribution and the known item parameters for each module in Stage 2, a dB index would be computed for each module. The examinee would be routed to the most discriminating module, (i.e., the one producing the largest dB value).

Page 45: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Routing test

9-items

10 items

10 items

10 items

10 items

10 items

10 items

Administration of the multistage CDM CAT

Stage 1 Stage 2 Stage 3

The same procedure would be used to determine the best discriminating module in Stage 3. However, the determination of this path would involve the mastery profile estimated from the 9 items in the routing test and the 10 items in the selected Stage 2 module.

Page 46: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Routing test

9-items

10 items

10 items

10 items

10 items

10 items

10 items

Administration of the multistage CDM CAT

Stage 1 Stage 2 Stage 3

After the last module is taken in Stage 3, estimates of the mastery profile can be calculated. These estimates would incorporate information from the Routing test, the administered Stage 2 module and the administered Stage 3 module, 29 items in all.

Page 47: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Routing test

9-items

10 items

10 items

10 items

10 items

10 items

10 items

Administration of the multistage CDM CAT

Stage 1 Stage 2 Stage 3

Two estimates of the mastery profile can be calculated. One using a modal a posteriori (MAP) estimation, would be a vector probabilities for each mastery profile. A second approach, using expected a posteriori (EAP) estimation, would be a vector of probabilities for mastering each attribute. Both tend to yield similar results.

Page 48: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Routing test

9-items

10 items

10 items

10 items

Stage 1 Stage 2

48

MAP approach1,1,1 → .087 1,1,0 → .2070,1,1 → .1991,0,1 → .2141,0,0 → .1320,1,0 → .0980,0,1 → .0460,0,0 → .017

EAP approach

Attribute p 1 → .687 2 → .307 3 → .793 Converted

Profile:(1,0,1)

1,0,1 → .214

Page 49: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Future Directions• How do our profiles match student mastery profiles

provided by the teachers?

• We want to look at the difference between estimating item parameters versus using the teacher estimates that were obtained from the standard setting process. The question is, how large is the difference in the mastery profiles for the students between the two approaches.

• One different model that we talked about is de la Torre’s MCDINO model in which misconceptions could be estimated. It might be interesting to provide teachers with a misconception profile, to inform the pedagogy of the teacher to improve their classroom instruction as well as provide diagnostic information for the students.

49

Page 50: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Future Directions

• All of this work depends on teacher “buy in”. That is, we need to work closely with teachers every step of the way to determine which type of information has the greatest utility and can be obtained most efficiently.

50

Page 51: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

One closing thought which provides a fresh perspective on our work. It is a quote by Albert Einstein:

If we knew what we were doing it wouldn’t be called research.

51

Page 52: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

Thank You !!!!

[email protected]

52

Page 53: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

53

References

Hagenaars, J. (1993) Loglinear models with latent variables. Thousand Oaks, CA: Sage.

Henson, R., Roussos, L., Douglas, J. & He, S. (2008). Cognitive diagnostic attribute-level discrimination indices. Applied Psychological

Measurement, 32, 275-288.

Henson, R., Templin, J., & Willse, J. (2009). Defining a family of cognitive diagnosis models using log liner models with latent variables.

Psychometrika, 74, 191-210.

Luecht, R. (1997). An adaptive sequential paradigm for managing multidimensional content. Paper present at the annual meeting of the American Educational research Association Annual Meeting, Chicago.

Page 54: Applying Computer Based Assessment Using Cognitive Diagnostic Modeling to Benchmark Tests Terry Ackerman, UNCG Robert Henson, UNCG Ric Luecht, UNCG Jonathan

54

References

Luecht, R., Brumfield, T. & Breithaupt, K. (2004). A testlet assembly design for adaptive multistage tests. Applied Psychological Measurement. 19, 189-202.

Rupp, A., Templin, J. & Henson, R. (2010). Diagnostic measurement: Theory, methods and applications. New York: Guilford Press

Von Davier, M. (2005) A general diagnostic model applied to language testing data (RR-05-16). Princeton, NJ: Educational Testing Service.