selected topics in psychometrics - cas · 2020-01-07 · selected topics in psychometrics l10:...

Selected topics in psychometrics

L10: Computerized Adaptive Testing

Patrıcia MartinkovaNMST570, January 7, 2020

Department of Statistical Modelling

Institute of Computer Science, Czech Academy of Sciences

Institute for Research and Development of Education

Faculty of Education, Charles University, Prague

Outline

1. Introduction

2. Computerized adaptive tests

3. Simulations and examples

4. Conclusion

Introduction Computerized adaptive tests Simulations and examples Conclusion

Motivation Types of tests

Motivation

• Student abilities may differ remarkably

• Some students may become bored by too easy items

• Some students may become stressed by too hard items

• Selection of proper items may save time

Patrıcia Martinkova NMST570, L10: Computerized Adaptive Testing 1/19


Motivation Types of tests

Different types of tests

• Linear tests

• most popular

• test takers take all items

• Linear Computer-based tests (CBT)

• linear test administered on computer

• allow for taking track of response time for each item

• Computerized adaptive tests (CAT)

• gets estimate after each item

• guides selection of subsequent items

• Multistage test (MST)

• compromise between Linear and CAT,

• similar to CAT, but by groups of items



CAT flowchart Item Bank Starting Continuing Stopping

CAT flow chart




Computerized adaptive tests: components

1. Initialization

• administer initial item(s)

• estimate student ability

2. Testing cycle

• Select item and gather response

• Re-estimate student ability

• Check termination criteria

3. Output the final ability estimate / final classification

Further possibilities:

• Exposure rate control

• Content balancingPatrıcia Martinkova NMST570, L10: Computerized Adaptive Testing 4/19



Item Bank

• Based on test specifications

• Sufficient number of items in each content category

• Item quality reviewed by content experts

• Newly written items pretested

• Reviewed items calibrated with CTT or IRT (parameters a, b, c , d)

• Qualified items selected to item bank

• Item bank re-evaluated for size, specifications, content balance




Item properties described by IRT model

• Selected IRT model

• Rasch, 1PL, 2PL, 3PL, 4PL, (G)PCM, GRM, NRM,...

• Functional form (here for 2PL IRT model)

P(Yij = 1|θi , aj , bj) =eaj (θi−bj )

1 + eaj (θi−bj )

aj discrimination, bj difficulty

• Item parameter estimates

• JML, CML, MML, Bayesian methods

• Item Characteristic Curve (ICC)

• Item Information Curve (IIC)




Initialization

• Initial estimate of student ability

• Mean ability (θ0 = 0)

• Based on a single item

• Based on a pretest

• Item selection for initial estimate of ability / pretest

• Item(s) with mean difficulty (b close to 0)

• Item(s) with difficulty closest to ability estimated by pretest

• Random item(s)




Ability Estimation

• Maximum likelihood (MLE)

• Bayes modal estimator, maximal a posteriori estimator (MAP)

• Expected a posteriori estimator (EAP)




Item Selection

• Sequential (non-adaptive)

• Urry’s criterion: difficulty closest to current ability estimate

• Fisher Information

• maximum Fisher information (MFI)

• maximum likelihood weighted information criterion (MLWI)

• maximum posterior weighted criterion (MPWI)

• Kullback-Leibler information criteria

• Maximum expected information criterion (MEI)




Other Issues

• Exposure rate control

• randomesque

• 5-4-3-2-1 method

• etc.

• Content balancing




Stopping rule

• Test length reached

• Test time reached

• Precision criterion on ability estimate

• Classification criterion: threshold met specification for ability confidence

interval



Post-hoc analysis of real data CAT software

Post-hoc analysis of real data

• Having real data from linear test

• Checking how would the results be in CAT design

• Comparison of different settings




Simulation study

• Simulating ability

• Setting item parameters (from real test or based on a model)

• Comparison of different settings

• Correlation with true ability, bias, length,...




CAT Examples and software

• catIRT

• catR + Concerto

• mirtCAT



CAT conclusions Course conclusions

Conclusion

• CAT is an up-do-date testing method!

• Good precision using much less items

• Increased security

• Faster score reporting

• Less stress and less boredom for students




Selected topics in psychometrics

• Introduction

• Reliability and measurement error

• Validity

• Traditional item analysis

• Regression models for item description

• Item response theory models for binary data

• Item response theory models for ordinal and nominal items

• Differential item functioning (2 lessons)

• Computerized adaptive testing and other topics in psychometrics.

Final project assignment




Course goals

1. Name the main topics studied in psychometrics

2. Describe and apply strategies to find proofs about validity

3. Describe and apply strategies to find proofs about reliability

4. Describe and apply traditional methods to describe item functioning

(difficulty, discrimination, distractor analysis)

5. Explain how logistic regression may be used to describe item properties,

apply regression models on real data and interpret results

6. Formulate IRT models for binary, polytomous and nominal items (Rasch,

1-4PL IRT, GRM, GPCM, RSM, NRM), apply them on real data and

interpret results

7. Explain concept of differential item functioning, describe and apply some

DIF detection methods (Mantel-Haenszel test, logistic regression,

IRT-based Lord’s test, Raju’s test etc.)

8. Explain process of computerized adaptive testing, prepare adaptive test

9. Describe process of assessment development and validationPatrıcia Martinkova NMST570, L10: Computerized Adaptive Testing 17/19



Final project

• Introduction (2 pts)

• Methods (2 pts)

• Results (12 pts)

• A consise descriptive analysis (2 pts)

• Proofs of test reliability (2 pts)

• Proofs of test validity (2 pts)

• Item analysis using traditional methods or regression models (2 pts)

• Item response theory model (2 pts)

• Differential item functioning (2 pts)

• Discussion (2 pts)

• References (2 pts)

• Supplemental Materials (4 pts)

• Bonus (own dataset (5 pts), CAT (5 pts), simulation (5pts)




Seminar in Psychometrics NMST571

Seminar covers selected topics in psychometrics in more depth.

Tentative schedule:

• Tuesdays, 3:40-5:10pm

Credit requirements:

• Actively present (2 absences tolerated)

• Presentation during one of the sessions (approx. 30 min presentation plus

15 min discussion).


Thank you for your attention!

www.cs.cas.cz/martinkova

www.cs.cas.cz/martinkova

References

• Magis D, Yan D, von Davier AA. (2017). Computerized Adaptive and

Multistage Testing with R. Springer.

• Magis D, & Raichle G. (2012). Random Generation of Response Patterns under

Computerized Adaptive Testing with the R Package catR. Journal of Statistical

Software, 48, pp. 1-31. https://www.jstatsoft.org/article/view/v048i08

• Chalmers P. (2016). Generating Adaptive and Non-Adaptive Test Interfaces for

Multidimensional Item Response Theory Applications. Journal of Statistical

Software, 71, pp. 1-38. https://www.jstatsoft.org/article/view/v071i05

• Chalmers P. (2017-08-11). Wiki mirtCAT

https://github.com/philchalmers/mirtCAT/wiki

https://www.jstatsoft.org/article/view/v048i08

https://www.jstatsoft.org/article/view/v071i05

https://github.com/philchalmers/mirtCAT/wiki

selected topics in psychometrics - cas · 2020-01-07 · selected topics in psychometrics l10:...

Documents