introduction to computerized adaptive testing (cat)
TRANSCRIPT
![Page 1: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/1.jpg)
An Introduction to CAT
![Page 2: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/2.jpg)
Outline
What and why?Intro to IRTThe 5 componentsModifying CAT for special situationsBuilding a CAT
![Page 3: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/3.jpg)
Background?
How much do you know about CAT?Heard of itI am familiar with IRTHave taken a course/workshopHave built a CAT (why are you here?!?!?)
![Page 4: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/4.jpg)
The ultimate goal: CAT in a Box
How much do you know about CAT?Heard of itI am familiar with IRTHave taken a course/workshopHave built a CAT (why are you here?!?!?)
![Page 5: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/5.jpg)
Part 1Benefits of CAT
![Page 6: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/6.jpg)
What is CAT?
A Computerized Adaptive Test (CAT)is a test administered by computer that dynamically adjusts itself to the trait level of each examinee as the test is being administered
CAT is an algorithm
![Page 7: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/7.jpg)
Why? Benefits of CAT
Efficiency Reduce test length by 50% or more Can be gigantic financial/time benefits (main driver of
CAT) Control of measurement precision
Measure or classify all examinees with the same degree of precision
Increased fairness What is more fair: seeing the same items but some
students having unreliable scores, or reliable scores but different items?
![Page 8: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/8.jpg)
Benefits of CAT
Added securityIf everyone receives with the same 100 items, the items will become well known
More frequent retestingIf you go study for a month then retest, you will probably get a different test
![Page 9: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/9.jpg)
Benefits of CAT
Better examinee motivationTop students do not waste time on easy questions
Low students are not discouraged by tough questions
More precise scoresEspecially at extremes
![Page 10: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/10.jpg)
Benefits of Computerized Testing (in general)
Immediate score reportingP&P testing requires the question papers to come back and be scored
More item formatsAudio, video, hotspots, drag and drop, etc.
Easier results collectionAnd management/reporting
![Page 11: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/11.jpg)
Disadvantages of CAT
Public relationsNeed to explain to examinees/parents why certain things can happen, like failing after only 10 questions, or passing with a 50% correct score
Heavy requirements to do it rightSoftwareExpertise Effort
![Page 12: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/12.jpg)
Disadvantages of CAT
Item ExposureBut better than fixed forms!
Not feasible/applicable for every situationSmall samplesSubjective itemsItems not scorable in real time
![Page 13: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/13.jpg)
Part 2Intro to IRT
![Page 14: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/14.jpg)
Intro to IRT
While it is possible to design CATs with classical test theory (Frick, 1992), IRT is more appropriate because it puts items and examinees on the same scale
We assume use of IRT
![Page 15: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/15.jpg)
Classical item statistics
CTT: option proportions are often translated to a quantile plot
![Page 16: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/16.jpg)
The item response functionBasic building block of IRT
![Page 17: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/17.jpg)
The item response function
The IRFParrameters: a,b,cItem information functionIs used for scoring examinees
![Page 18: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/18.jpg)
The IRFa
The discrimination parameter Represents how well the item differentiates examinees Slope of the curve at its center
b The difficulty parameter Represents how easy or hard the item is with respect to
examinees Location of the curve (left to right)
c The pseudoguessing parameter Represents the ‘base probability’ of answering the question Lower asymptote
![Page 19: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/19.jpg)
Item information functions
Example 5 items
![Page 20: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/20.jpg)
TIF and SEMIIFs can be summed into a Test Information Function (TIF)
The TIF can be inverted into a SEM function (more info = less error)
![Page 21: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/21.jpg)
IRT Scoring
Scoring happens by multiplying IRFs to get a likelihood function
The maximum likelihood is the examinee score
It is on the b/θ scale, so we can then use it to adapt
![Page 22: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/22.jpg)
Part 2Basic principles of CAT (The Five Components)
![Page 23: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/23.jpg)
CAT Components
1. Calibrated item bank2. Starting rule3. Item selection rule4. Scoring rule5. Stopping rule
Given 1 and 2, we loop 3-4-5 until 5 is satisfiedAll CAT follows this basic format – we just modify the details for whatever testing situation we have
![Page 24: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/24.jpg)
CAT Components
1. Calibrated item bank2. Starting rule3. Item selection rule4. Scoring rule5. Stopping rule
Given 1 and 2, we repeat 3 and 4 until 5 is satisfiedAll CAT follows this basic format – we just modify the details for whatever testing situation we have
Algorithms inside your
testing engine
![Page 25: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/25.jpg)
1. Calibrated item bank
Goal: develop an item bank that meets the needs of your CAT
Q: What are the needs? A: Monte carlo simulation
More broadly: think about TIF and CSEM, and work backwards
![Page 26: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/26.jpg)
1. Calibrated item bank
SEM: How accurate do you want to be?How many items would it take?Given your items, what is the best hope?
![Page 27: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/27.jpg)
1. Calibrated item bank
Do you need a lot of info near a cutscore?
![Page 28: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/28.jpg)
2. Starting rule
1. Can start everyone with the same theta estimate (e.g., theta = 0.0)
Everyone gets the same first item Could be an exposure problem in a high stakes test
2. Assign a random theta estimate within an interval
E.g., between theta = -0.5 and +0.5 Improves exposure levels and has little effect on a
properly implemented CAT
![Page 29: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/29.jpg)
2. Starting rule
3. Use prior information available for a given examinee
Subjective evaluations, e.g., below average, above average
Theta estimates from previously administered tests (same or different)
Projected theta based on biodata (age, GPA, etc.) – IACAT 2010
![Page 30: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/30.jpg)
3. Item selection rule
Items are selected to maximize informationThere are different ways to quantify
Fisher (single-point) information at the current score is most common
![Page 31: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/31.jpg)
3. Item selection
Also, there are usually practical constraints in item selectionItem exposureContent area (domain)Cognitive levelEtc.
![Page 32: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/32.jpg)
3. Item selection
Example 5 items
![Page 33: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/33.jpg)
4. Scoring rule
Typically, MLE is used to score examinees after each item (or sometimes Bayesian)
However, this assumes that there is a mixed response vector (not all correct or all incorrect)
But this is rarely the case in the first few items
And never the case after the first item
![Page 34: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/34.jpg)
4. Scoring rule
So, we need to adapt the scoring rule if nonmixed
We can always start with BayesianCan also use fixed step sizes
Add or subtract 1 to score after each item until mixed
![Page 35: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/35.jpg)
5. Stopping rule
Depends primarily on purpose of the test: point estimation or classification?Point estimation: we want an accurate score for each student
Classification: we do NOT need an accurate score, just a classification into pass/fail etc.
Also, Change: to see if score has gone up/down a certain amount
![Page 36: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/36.jpg)
5. Stopping rule
Point estimation methods involve actual scores, and stop when we have zeroed in enough
Classification methods check after every item to see if we can make a classification within a certain degree of accuracy
Specifics will be later
![Page 37: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/37.jpg)
5. Stopping rule
Either type of CAT can be designed with a fixed number of items
But this is a bad idea from a psychometric perspective but can greatly enhance perceived fairness
Variable-length testing is much more efficient
![Page 38: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/38.jpg)
The big picture
1
2
3
4
5
![Page 39: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/39.jpg)
Point estimation CAT
Let’s look at the most common scenarioNeed accurate score for each examineeAdapt the quantity and difficulty of items
1. Bank: Given2. Starting Point: 0.03. Item selection: Info at current theta4. Scoring: MLE5. Termination: Target SEM
![Page 40: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/40.jpg)
Example CAT Item-By-Item Report of Maximum Information CAT This test will terminate when the SEM is equal to or less than 0.200
Minimum number of items = 5 Maximum number of items = 40The standard error band plotted as ---- is plus or minus 2.00 standard errors. X = Initial theta value C = Correct answer I = Incorrect answer
Item Theta SE -3.......-2........-1.........0........+1........+2........+3 0 -0.24* 1.00* --------------------X-------------------- 1 4.00* 1.00* . --------------------> 2 4.00* 1.00* . --------------------> 3 2.52 0.84 . -----------------I----- 4 2.77 0.68 . -------------C--- 5 2.38 0.61 . ------------I------- 6 2.09 0.61 . ------------I---------- 7 1.49 0.89 -----------------I---------------- 8 0.36 1.00 --------------------I-------------------- 9 0.88 0.63 ------------C------------- 10 1.13 0.56 -----------C----------- 11 1.34 0.49 . ----------C---------- 12 1.44 0.46 . ---------C--------- 13 1.55 0.43 . ---------C--------- 14 1.67 0.41 . --------C-------- 15 1.54 0.38 . --------I-------- 16 1.60 0.36 . --------C-------
![Page 41: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/41.jpg)
Example CAT 17 1.70 0.35 . -------C------- 18 1.76 0.34 . -------C------- 19 1.65 0.32 . ------I------ 20 1.52 0.31 . -------I------ 21 1.40 0.30 . -------I------ 22 1.27 0.30 . ------I------ 23 1.30 0.28 . ------C----- 24 1.32 0.28 . ------C----- 25 1.36 0.27 . -----C------ 26 1.40 0.27 . -----C------ 27 1.31 0.26 . ------I----- 28 1.34 0.25 . -----C----- 29 1.37 0.25 . -----C----- 30 1.40 0.24 . ----C----- 31 1.43 0.24 . -----C----- 32 1.46 0.24 . -----C----- 33 1.50 0.24 . ----C----- 34 1.53 0.23 . -----C---- 35 1.55 0.23 . -----C----- 36 1.59 0.23 . ----C----- 37 1.62 0.23 . -----C---- 38 1.58 0.22 . ----I----- 39 1.53 0.22 . -----I---- 40 1.55 0.22 . ----C----_______________________________________________________________________________*Arbitrarily assigned value.This test was terminated when the maximum number of items was reached.
![Page 42: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/42.jpg)
Part 3Modifying CAT for special situations
![Page 43: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/43.jpg)
Practical constraintsTest length
Minimum: Reduce complaints from failuresMaximum: Stop neverending testsFixed: Appears to be fair but is not
ContentEducational standards, etc.
Item exposureReduce overuse of your best items!Randomesque, Sympson-Hetter
![Page 44: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/44.jpg)
Classification testing
What about pass/fail or multicategory?The conventional approach is to administer a long test to obtain an accurate score, then compare to a cutscore
But we don’t need an accurate score Students far above or below the cutscore can be stopped early
Sometimes, EXTREMELY early
![Page 45: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/45.jpg)
Classification testing
This is referred to as computerized classification testing (CCT: Lin & Spray, 2000)
Very similar to CAT, but with a few notable differences
![Page 46: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/46.jpg)
CCT
Methods are primarily delineated based on the stopping ruleAbility confidence intervals (ACI: originally called “adaptive mastery testing” – see Kingsbury & Weiss, 1983)
Likelihood ratio (SPRT or GLR; Wald, 1947; Ferguson, 1967; Reckase, 1983; Thompson et al., 2008)
![Page 47: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/47.jpg)
ACI
Specify the confidence interval: e.g., 95% interval is plus or minus 1.96 SEM
Keep administering items until interval is above or below cutscore (or you hit your maximum test length)
![Page 48: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/48.jpg)
ACIExample graph:
![Page 49: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/49.jpg)
Likelihood ratio
The LR approach completely abandons the use of the theta estimate
Tests only that a student is above or below a cutscore
Indifference regionMore efficient than ACI (Spray & Reckase, 1996; Eggen, 1999; Thompson et al. 2007
![Page 50: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/50.jpg)
GLR (Advanced SPRT)
Example
![Page 51: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/51.jpg)
Part 5Building a CAT
![Page 52: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/52.jpg)
Background
Here is a frameworkIdentify issues and best way to investigateLeads to better quality in the endAlso the foundation for validity arguments
Why did you choose certain things?
![Page 53: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/53.jpg)
Where to start
Go back to the 5 components:1. Calibrated item bank2. Starting rule3. Item selection rule4. Scoring rule5. Stopping rule
![Page 54: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/54.jpg)
Where to start
How do we ensure that we build a good bank and choose good algorithms?
Simulations drives much of this
![Page 55: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/55.jpg)
The model
In practice, you have these steps:Seq. Stage Primary work1 Feasibility, applicability, and
planning studiesMonte carlo simulation; business case evaluation
2 Develop item bank content or utilize existing bank
Item writing and review
3 Pretest and calibrate item bank
Pretesting; item analysis
4 Determine specifications for final CAT
Post-hoc or hybrid simulations
5 Publish live CAT Publishing and distribution; software development
![Page 56: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/56.jpg)
1. Feasibility, applicability, planning
Three types of simulationsMonte CarloGenerate fake item responses (but based on reality)Random U(0,1), compare to P(X) from IRT
Post hoc (real data)Hybrid
![Page 57: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/57.jpg)
1. Feasibility, applicability, planning
Simulations act by “administering” a CATSelect first itemGenerate responseEstimate θSelect next item…
Then analyze resultsAverage test lengthAccuracy: CAT θ vs. true θ (or full bank)
![Page 58: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/58.jpg)
1. Feasibility, applicability, planning
At this point, real data not likely, so Monte Carlo
Generate plausible situations Item bank: 100, 200, 300… Item quality: a = 0.7, 0.8…; spread of bDesired precision: SEM = 0.2, 0.3, 0.4…
Compare results to each other and fixed formsBase values on reality (e.g., mean a)
![Page 59: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/59.jpg)
1. Feasibility, applicability, planning
How do we “generate” responses?Calculate the probability P(X) of a correct response to the item, given q (which is known, because the examinees are imaginary)Done using the IRT equation
Generate a random number between 0.0 and 1.0Compare: if random > P(X) then incorrect, and vice versa
![Page 60: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/60.jpg)
1. Feasibility, applicability, planning
Example with q = 1.0: Item 1 has a=1, b=0, c=0.2P(X) = 0.877Random number = 0.426Random < P(X), so CORRECT Item 2 has a=1, b=1, c=0.2P(X) = 0.6Random number = 0.813Random < P(X), so INCORRECTAnd so on…
![Page 61: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/61.jpg)
1. Feasibility, applicability, planning
Software will do this for you, allowing you to simulate CATs for thousands of examinees in seconds
You can then easily set up an experiment with a wide range of conditions, and run a simulation for each
![Page 62: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/62.jpg)
1. Feasibility, applicability, planning
Example result:CAT with bank of 300 items and SEM=0.25 has average of 53 items
Current fixed test has 100 items, SEM=0.23 in middle and 0.35+ beyond θ of ±1.5
CAT will make test more accurate for extreme examinees, about same accuracy for middle, but with 50% reduction
![Page 63: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/63.jpg)
1. Feasibility, applicability, planning
More on this topic in my session
![Page 64: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/64.jpg)
Epilogue: Maintaining CATLike fixed form testing, maintenance is usually necessary
Check that performing as expectedIs termination criterion being satisfied?Examinees hitting test length or other constraints?
Average test length what you expected?
![Page 65: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/65.jpg)
Epilogue: Maintaining CATExposure – are certain items overused?
CAT is greedy, always selecting items with best discriminations
Refresh pool by replacing exposed itemsTest security: Are any items out on the internet?Related to exposure, but not the same
Parameter drift?P&P comparability study? Equity?
![Page 66: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/66.jpg)
Wrap-up
Questions?More resources?
![Page 67: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/67.jpg)
67
• Is CAT a good fit for my organization?• Many organizations hear these benefits of
CAT• However, CAT is not relevant or possible
in most situations• How can I evaluate if it is applicable for
my organization?• Consider the following requirements…
Computerized Adaptive Testing
![Page 68: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/68.jpg)
68
Items scoreable in real time CAT scores items immediately so the next
item can be determined instantaneously Paper testing obviously irrelevant Essays and other constructed-response
items are not possible unless you package an automated scoring system
Requirement #1
![Page 69: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/69.jpg)
69
Large item banks A general rule of thumb is that you need 3
times as many items in the bank as intended for the test
100 item test? Need 300 items in the bank You then need resources to write, review,
and pilot all of these items Piloting is an issue unto itself
Requirement #2
![Page 70: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/70.jpg)
70
CAT Requirements
Large pilot samples CAT uses item response theory (IRT) as the
underlying psychometric model This requires that all items have at least 100
examinee responses just for a cursory analysis Preferably at least 1000 Then if you have 300 items and each examinee
sees 100 during a pilot, you need 3000 examinees just for the pilot study!
Requirement #3
![Page 71: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/71.jpg)
71
PhD psychometricians Psychometricians are necessary to
perform complex IRT analysis Also needed to perform CAT simulation
validity studies Test is otherwise not defensible
Requirement #4
![Page 72: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/72.jpg)
72
Sophisticated software Software designed specifically for IRT
analysis is necessary to calibrate the pilot data
Separate software is necessary for the CAT simulation studies
General statistical software is not usually acceptable
Requirement #5
![Page 73: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/73.jpg)
73
IRT item banker Items must be packaged with the IRT
parameters Your item banking system must utilize IRT
parameters appropriately for assembling banks/forms
Requirement #6
![Page 74: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/74.jpg)
74
CAT delivery system You must have a computerized test
delivery system capable of performing all the complex CAT calculations
Crude, homegrown approximations are not defensible
Must be reliable, scalable, and secure
Requirement #7
![Page 75: Introduction to Computerized Adaptive Testing (CAT)](https://reader034.vdocument.in/reader034/viewer/2022042611/587224d61a28ab3b7a8b4f49/html5/thumbnails/75.jpg)
75
Money and resources Developing a defensible CAT is extremely
expensive Small research project: $20,000 Large operational CAT: $hundreds of
thousands
However, the benefits stated earlier can outweigh these costs, especially with large volume tests
CAT can then be a positive investment
Requirement #8