standard setting methods with high stakes assessments barbara s. plake buros center for testing...
TRANSCRIPT
Standard Setting Methods with High Stakes Assessments
Standard Setting Methods with High Stakes Assessments
Barbara S. PlakeBarbara S. Plake
Buros Center for TestingBuros Center for Testing
University of NebraskaUniversity of Nebraska
Setting Passing Scores
Essential for making high stakes decisions Essential for making high stakes decisions Must ensure that qualified candidates passMust ensure that qualified candidates pass Must ensure that unqualified candidates failMust ensure that unqualified candidates fail 70% correct is NOT the right answer!70% correct is NOT the right answer! ““Standard Setting” -- setting the “standard” or Standard Setting” -- setting the “standard” or
“passing score”“passing score”
Approaches
Empirically basedEmpirically based– RegressionRegression– Contrasting groups/Borderline groupsContrasting groups/Borderline groups– Norm-basedNorm-based
Test BasedTest Based– JudgmentalJudgmental– Test and candidate basedTest and candidate based
Empirically based methods
Need to know status of candidate Need to know status of candidate (worthy of passing or not)(worthy of passing or not)
More likely in classroom settingsMore likely in classroom settings Not likely the case in licensure settingsNot likely the case in licensure settings Norm-basedNorm-based
– Not tied to the KSAs needed to function Not tied to the KSAs needed to function effectively/safely in the professioneffectively/safely in the profession
– Capricious and arbitraryCapricious and arbitrary
Test Based
KSAs form basis for test contentKSAs form basis for test content Focus on target candidateFocus on target candidate
– MCCMCC– JQCJQC
Assessment Tasks
Multiple choice questionsMultiple choice questions– Good content coverageGood content coverage– Efficient scoringEfficient scoring– Can measure higher order reasoning if well Can measure higher order reasoning if well
constructedconstructed
Constructed Response
More directly related to target skill?More directly related to target skill? Some differences by candidateSome differences by candidate Time consuming to administer and Time consuming to administer and
scorescore Increases costsIncreases costs
Judgmental task
How will the minimally qualified How will the minimally qualified candidate (MCC) perform on the tasks candidate (MCC) perform on the tasks in the test?in the test?
Need qualified, well trained judgesNeed qualified, well trained judges– Often experts (SMEs)Often experts (SMEs)– Need to modify SMEs perception to focus Need to modify SMEs perception to focus
on entry level performanceon entry level performance– FeedbackFeedback
Decision Rules
CompensatoryCompensatory– Performance on total is what mattersPerformance on total is what matters– Weaknesses in one area can be Weaknesses in one area can be
compensated by strengths in anothercompensated by strengths in another– Higher reliabilityHigher reliability
Decision Rules
ConjunctiveConjunctive– Passing scores set on parts of the testPassing scores set on parts of the test– Candidates must pass all parts in order to Candidates must pass all parts in order to
pass the testpass the test– Sometimes candidates are allowed to Sometimes candidates are allowed to
“bank” passed parts“bank” passed parts
Test Based Methods
Multiple choice questionsMultiple choice questions– Angoff MethodAngoff Method– Yes/No Extension Yes/No Extension – BookmarkBookmark
Test Based Methods
Constructed ResponseConstructed Response– Analytical JudgmentAnalytical Judgment– Paper selectionPaper selection
Angoff “Method”
SMEs estimate the probability that a SMEs estimate the probability that a hypothetical, randomly selected MCC will be hypothetical, randomly selected MCC will be able to answer each question correctly.able to answer each question correctly.
Addition of SME’s estimates = SME’s passing Addition of SME’s estimates = SME’s passing scorescore
Average across SMEs = recommended Average across SMEs = recommended passing scorepassing score
Range of probable values (SEE)Range of probable values (SEE)
Angoff variations
Multiple rounds of ratingsMultiple rounds of ratings Feedback in betweenFeedback in between
– SME resultsSME results– Candidate performanceCandidate performance
• P-valuesP-values• % passing% passing
Criticisms of Angoff Methods
Cognitively challengingCognitively challenging ““Impossible task”Impossible task” ““Fatally flawed” NRC reportFatally flawed” NRC report Research has shown that ratings are Research has shown that ratings are
consistent across years and ratersconsistent across years and raters Need strong training/discussion of Need strong training/discussion of
KSAs of MCCsKSAs of MCCs
Yes/No Variation
SMEs estimate whether or not the MCC will SMEs estimate whether or not the MCC will be able to get the item correctly (Y/N)be able to get the item correctly (Y/N)– Response probabilityResponse probability– More likely than not (.50)More likely than not (.50)– Fairly certain (.67)Fairly certain (.67)
Add the Ys to get SME’s passing scoreAdd the Ys to get SME’s passing score Average across SMEs = recommended Average across SMEs = recommended
passing scorepassing score Cutpoint +/- SEE (1 or 2)Cutpoint +/- SEE (1 or 2)
Yes/No Variation
More popular with SMEsMore popular with SMEs Feedback not necessarily neededFeedback not necessarily needed Quicker to implementQuicker to implement
Bookmark Method
Often used with IRT calibrated items but Often used with IRT calibrated items but not necessary not necessary
Test questions order from easy to hardTest questions order from easy to hard Response probabilityResponse probability Insert bookmark between pages when Insert bookmark between pages when
the MCC probability of a correct the MCC probability of a correct response dips below response response dips below response probabilityprobability
Bookmark Method
Number of items preceding bookmark is Number of items preceding bookmark is SMEs passing scoreSMEs passing score
Often little discussion on KSAs of MCCOften little discussion on KSAs of MCC Multiple small groupsMultiple small groups Discussion between roundsDiscussion between rounds Multiple rounds; data usually isn’t Multiple rounds; data usually isn’t
shared until 2nd of 3rd rounds.shared until 2nd of 3rd rounds.
Bookmark Method
Results often shown graphically across Results often shown graphically across roundsrounds
Frequently convergence occurs after Frequently convergence occurs after 1st round1st round
Average across SMEs = recommended Average across SMEs = recommended cutpointcutpoint
SEE formula; cutpoint +/- SEE (1 or 2)SEE formula; cutpoint +/- SEE (1 or 2)
Extended Angoff
SMEs estimate how many of the total SMEs estimate how many of the total points available for the task will be points available for the task will be earned by the MCC.earned by the MCC.
Cutpoint is determined in a similar Cutpoint is determined in a similar fashion to Angoff; sum points for SME, fashion to Angoff; sum points for SME, average across SMEs.average across SMEs.
Range of probable valuesRange of probable values
Analytical Judgment
SMEs see prescored candidate SMEs see prescored candidate responses (but scores aren’t revealed)responses (but scores aren’t revealed)
Task is to sort candidate responses into Task is to sort candidate responses into performance categoriesperformance categories– Clearly passingClearly passing– PassingPassing– Not PassingNot Passing
Analytical Judgment
Clearly passing set asideClearly passing set aside Candidate responses in the Passing Candidate responses in the Passing
and Not Passing categories are ordered and Not Passing categories are ordered from lowest performance to highest.from lowest performance to highest.
Top responses in the Not Passing Top responses in the Not Passing category are identified (usually 3)category are identified (usually 3)
Lowest responses in the Passing Lowest responses in the Passing category are identified (usually 3)category are identified (usually 3)
Analytical Judgment
Average across these 6 papers is Average across these 6 papers is SME’s passing scoreSME’s passing score
Feedback provided on SME passing Feedback provided on SME passing scoresscores
Round 2Round 2 Cutpoint is average across SMEs Cutpoint is average across SMEs
passing scorespassing scores Range of probable valuesRange of probable values
Paper Selection
Exemplar candidate work is selected for each Exemplar candidate work is selected for each score point (typically 2/score point)score point (typically 2/score point)
SMEs task is to pick the two papers that best SMEs task is to pick the two papers that best represent the work of the MCCrepresent the work of the MCC
Scores are not revealed to SMEsScores are not revealed to SMEs Average of SMEs selected papers = SME’s Average of SMEs selected papers = SME’s
passing scorepassing score Average across SMEs = cutpointAverage across SMEs = cutpoint Range of probable valuesRange of probable values
Who Makes the Final Decision? Each approach yielded a cutpoint and a Each approach yielded a cutpoint and a
“range of probable values”“range of probable values” This information should be This information should be
communicated to the policy makers for communicated to the policy makers for their final decision.their final decision.
Standard setting methods only yield a Standard setting methods only yield a range of consistent, defensible cutpointsrange of consistent, defensible cutpoints
Final decision is a policy matter!Final decision is a policy matter!
Providing Validity Evidence
What evidence is useful in supporting the What evidence is useful in supporting the results of the standard setting process?results of the standard setting process?
This evidence should be gathered to have This evidence should be gathered to have available in case of a legal challenge.available in case of a legal challenge.
Responsibility of test developer to provide at Responsibility of test developer to provide at least procedural validity evidence.least procedural validity evidence.
Collatoral evidence could be part of a long-Collatoral evidence could be part of a long-term validity research programterm validity research program
Procedural Evidence
SMEsSMEs– Representative of professionRepresentative of profession– QualificationsQualifications– ConfidentialityConfidentiality– Conflict of interest statementsConflict of interest statements– Cannot teach preparation classes or sit for Cannot teach preparation classes or sit for
examinationexamination
Training
Did SMEs understand method?Did SMEs understand method? Was sufficient time allotted to training?Was sufficient time allotted to training? Did the SMEs have a clear conceptualization Did the SMEs have a clear conceptualization
of the MCC?of the MCC? Did they understand the purpose of the Did they understand the purpose of the
standard setting procedure?standard setting procedure? Do they understand that the final decision will Do they understand that the final decision will
be based on their work, but not dictated by it?be based on their work, but not dictated by it?
Practice
Was enough time devoted to practice?Was enough time devoted to practice? Were the practice materials sufficiently Were the practice materials sufficiently
similar to the operational materials?similar to the operational materials? Did the SMEs feel they had a Did the SMEs feel they had a
reasonable opportunity to ask questions reasonable opportunity to ask questions and receive clarificationsand receive clarifications
Did they understand the feedback Did they understand the feedback information?information?
Operational
Was enough time devoted to their work Was enough time devoted to their work (across rounds)?(across rounds)?
How confident did the SMEs feels about How confident did the SMEs feels about their ratings (across rounds)?their ratings (across rounds)?
How useful/influential was the How useful/influential was the feedback?feedback?
Did the facilities support their work?Did the facilities support their work?
Overall
Confidence that the method used will Confidence that the method used will result in appropriate minimum passing result in appropriate minimum passing score?score?
Was the workshop handled in a Was the workshop handled in a professional manner?professional manner?
Was the workshop well organized?Was the workshop well organized? Opportunity for commentsOpportunity for comments
Main Point
Many methods, all aimed at provided a Many methods, all aimed at provided a structured and reasoned approach to structured and reasoned approach to identifying identifying – CutpointCutpoint– Range of probable valuesRange of probable values– Procedural validity evidenceProcedural validity evidence
Match of Method to Assessment Method selected should be appropriate Method selected should be appropriate
for the assessment (MCQ, constructed for the assessment (MCQ, constructed response).response).
Logistically feasibleLogistically feasible Published in peer-reviewed journals?Published in peer-reviewed journals? Should be replicableShould be replicable Multiple methods? Multiple panels?Multiple methods? Multiple panels?
Purpose of Presentation
Provide an orientation to current Provide an orientation to current standards setting methodsstandards setting methods
Provide background on the needed Provide background on the needed processes and procedures to conduct a processes and procedures to conduct a professional (and legally defensible) professional (and legally defensible) standard setting workshop.standard setting workshop.
Thank you
I am honored to be asked to share my I am honored to be asked to share my expertise in this areaexpertise in this area
I hope the presentation has been useful I hope the presentation has been useful and meaningfuland meaningful
Best outcome for me is if it raised your Best outcome for me is if it raised your awareness of methods and issues in awareness of methods and issues in standard setting.standard setting.