lessons from high-stakes licensure examinations for medical school examinations
DESCRIPTION
Lessons from High-Stakes Licensure Examinations for Medical School Examinations. Queens University 4 December 2008 Dale Dauphinee, MD, FRCPC, FCAHS. Background: FAME Course. - PowerPoint PPT PresentationTRANSCRIPT
Lessons from High-Stakes Licensure Examinations
for Medical School Examinations
Queens University
4 December 2008
Dale Dauphinee, MD, FRCPC, FCAHS
Background: FAME Course
Validating Test Scores and Validating Test Scores and DecisionsDecisions
Pulling All of the Pieces Together! Pulling All of the Pieces Together!
Dale DauphineeDale Dauphinee
Seeing the woods for the trees Seeing the woods for the trees ……and defining the way ahead and defining the way ahead …….. !!!.. !!!
Why? Ensure that you keep out of trouble
Why? Ensure that you keep out of trouble
and get the effect/impact that you want!
and get the effect/impact that you want!
Goal today is to offer insights for those of you working at the undergraduate level – looking back on my two careers in
assessment: Undergraduate Assoc. Dean and CEO of the MCC!
FAME Course FrameworkFAME Course FrameworkAssessment Frames
Themes Knowledge and Reasoning
Clinical Skills Workplace Performance
Program Evaluation
Scoring, Analysis & Reporting
Test Material Development
Standard Setting
Test Design: Test Design: Constructed Constructed ResponseResponse
Test Design: Content and
Validity
Elements of Talk
• Process: be clear on why are doing this!– Describe: assessment steps written down
• Item design: key issues • Structure: clear where decision are made• Outcome: pass-fail or honours-pass-fail• Evaluation cycle: it is about improvement!• Getting into trouble
– Problems in process: questions to be asked– Never ask them after the fact: ANTICIPATE
• Prevention
Preparing a ‘Course’ Flow Chart
• For whom and what?• What is the practice/curriculum model?• What method?• What is the blueprint and sampling frame?• To what resolution level will they answer?• Scoring and analysis• Decision making• Reporting• Due processHINT: Think project management! What are the intended steps?
Classic Assessment CycleClassic Assessment Cycle
Desired Objectives Desired Objectives or Attributesor Attributes
Educational Educational ProgramProgram
Assessment of Assessment of PerformancePerformance
Performance GapsPerformance Gaps Program RevisionsProgram Revisions
Change in the Hallmarks of Change in the Hallmarks of Competence - Increase ValidityCompetence - Increase Validity
Knowledgeassessment
Problem-solvingassessment
Clinical skillsassessment
Practiceassessment
Professional or clinicalProfessional or clinicalAuthenticityAuthenticity
1960 2000
(adapted from van der Vleuten 2000)
Climbing the PyramidClimbing the Pyramid
Knows
Shows how
Knows how
Does
Knows Factual tests: MCQ, essay type, oral…..
Knows how (Clinical) Context based tests:MCQ, essay type, oral…..
Shows how Performance assessment in vitro:OSCE, SP-based test…..
DoesPerformance assessment in vivo:Undercover SPs, Video, Logs…..
Traditional ViewTraditional View
Curriculum
Teacher
Assessment
Student
After van der Vleutin - 1999
An Alternative ViewAn Alternative View
Curriculum
Teacher
Assessment
Student
After van der Vleutin - 1999
Traditional Assessment: Traditional Assessment: What, Where & HowWhat, Where & How
Student-Trainee AssessmentStudent-Trainee Assessment
• Content: maps on to the domain and curriculum → to which the results generalize - basis of assessment
• Where and who: within ‘set’ programs where candidates are in same cohort
• Measurement: – Test or tool testing time is long
enough to yield reliable results– Tests are comparable from
administration to administration– Controlled environment – not
complex– Can attribute differences to
candidate? …and rule out ‘exam-based’ or error attribution
– Adequate numbers per cohort
Traditional Tests/Tools at SchoolTraditional Tests/Tools at School
Does content Does content map to domainmap to domain
Test length Test length = reliable= reliable
Attributable Attributable to candidate?to candidate?
Are tests Are tests comparable?comparable?
Ideal test or Ideal test or all these all these qualities!qualities!
PrinciplePrinciple
It is all about the context and purpose It is all about the context and purpose your course, then intended use of the your course, then intended use of the
test score - or the program!test score - or the program!
‘There is no test for all seasons or for all reasons’.
Written Tests: Designing Items
Key Concepts
PrinciplePrinciple
∴ ∴ TThe case ‘prompts’ or item stems mustmust create low level simulationscreate low level simulations in in
the candidate’s mind about ….the candidate’s mind about ….
the performance situations that are about to be assessed …..
Classifying Constructed FormatsClassifying Constructed Formats
• Cronbach (1984): defined constructed response formats as broad class of item formats where the response is generated by examinee rather than selected from a list of options.
• Haladyna (1997): constructed response formats– High inference format
• Requires expert judgment about a trait being observed
– Low inference format• Are observing behaviour of interest: short answer; checklists
Types of CR Formats*Types of CR Formats*• Low Inference
– Work sampling• Done in real time
– In-training evaluations• Provide rating later
– Mini-CEX– Short answer– Clinical orals: structured– Essays (with score key)– Key features (no menus)– OSCEs at early UG level
• High Inference– Work – 360’s– OSCEs at grad level– Orals (not ‘old’ vivas)– Complex simulations
• Teams• Interventions
– Case-based discussions
– Portfolios– Demonstration of
procedures
*Principle - All CR formats need lots of development *Principle - All CR formats need lots of development planning: you can’t show up and wing it!planning: you can’t show up and wing it!
What Do CRs Offer & What Must What Do CRs Offer & What Must One Consider for Good CRsOne Consider for Good CRs
The CR format can provide– Opportunity candidates to
generate/create a response– Opportunity to move beyond
MCQs– Response is evaluated by
comparing response to pre-developed criteria
– Evaluation criteria have a range of values that are acceptable to the faculty of the course or testing body.
CRs: other considerations- Writers/authors need training- Need CR development process- Need topic selection plan or blueprint- Need guidelines- Need scoring rubric and analysis → reporting- Need content review process- Need test assembly process- May encounter technical issues…
Moving to Clinical Assessment
Think of it as work assessment!
Point: validity of scoring is key because the scores are being used to imply judge clinical competence in certain domains!
Clinical Assessment IssuesClinical Assessment Issues• Context:
– Clinical Skills – Work Assessment
• Overview:– Validating test scores– Validating decisions
• Examples:– Exit (final) OSCE– Mini-CEX
• Conclusion
Clinical Clinical SkillsSkills
Mini-Mini-CEXCEX
Validating Validating ScoringScoring ✔ ✔
Validating Validating DecisionsDecisions ✔ ✔
Presentation GridPresentation Grid
Key Pre-condition #1Key Pre-condition #1
• What Is the Educational Goal?What Is the Educational Goal?– And the level of resolution expected?
• Have you defined the purpose or goal of the evaluation and the manner in which the result will be used?
• Learning point:– Need to avoid Downing’s threats to validity
• Too few cases/items (under representation)• Flawed cases/items (irrelevant variance)
• If not – you are not ready to proceed!
Key Pre-condition #2:Key Pre-condition #2:
• Be Clear About Due Process!Be Clear About Due Process!• Ultimately, if if this instrument is an ‘exit’
exam or an assessment to be used for promotion, clarity about ‘due process’‘due process’ is crucial
• Samples: Student must know that he/she has the right to the last word; the ‘board’ has followed acceptable standards of decision-making; etc.
Practically in 2008, validity implies ...Practically in 2008, validity implies ... … … that in the interpretation of a test score a that in the interpretation of a test score a
series of assertions, assumptions and series of assertions, assumptions and arguments are considered that support that arguments are considered that support that interpretation!interpretation!– ∴∴ Validation is a pre-decision assessment - specifying
how you will consider and the interpret the results as ‘evidence’ that will be used in final ‘decision-making’ decision-making’ !
– In simple terms: for student promotion – a series of conditional steps (‘cautions’) are needed – to document a ‘legitimate’ assessment ‘process’
– ∴ Critical steps for a ‘valid’ process leading to ultimate decision
• i.e. make a pass/fail decision or provide a standing
4
General Framework for Evaluating General Framework for Evaluating Assessment Methods Assessment Methods – after Swanson– after Swanson
Evaluation: determining the quality of the performance observed on the test
Generalization: generalizing from performance on the test to other tests covering similar, but not identical, content
Extrapolation: inferring performance in actual practice from performance on the test
Evaluation, Generalization, and Extrapolation are like links in a chain: the chain is only as strong as the weakest link
5
Evaluation
Generalization
Extrapolation
Kane’s ‘Links in a Chain’ Kane’s ‘Links in a Chain’ Defense - after SwansonDefense - after Swanson
Includes: Scoring and
Decision-making
Scoring: Deriving the EvidenceScoring: Deriving the Evidence• Content validity:
– Performance and work based tests• Enough items/cases?
• Match to exam blueprint and ultimate uses– Exam versus work-related assessment point
• Direct measures of observed attributes• Key: is it being scored by items or cases?• Observed score compared to target score
– Item (case) matches the patient problem!– And the candidates’ ability!
Preparing the EvidencePreparing the Evidence
• From results to evidence: three inferences– Evaluate performance – get score– Generalize that to target score– Translate target score into a verbal ‘description’
• All three inferences must be valid• Process:
– Staff role versus decision-makers responsibilities/role• Flawed items/cases• Flag unusual or critical events for decision-makers’• Prepare analyses
– Comparison data
Validating the Scoring - EvidenceValidating the Scoring - Evidence
• Validation carried out in two stages– Developmental stage: process is nurtured,
refined– Appraisal stage: real thing - trial by fire!
• Interpretive argument
• Content validity: how do scores function in various required conditions?– Enough items/cases?– Eliminate flawed items /cases
7
Evaluation
Generalization
Extrapolation
Observation of PerformanceObservation of PerformanceWith Real PatientsWith Real Patients
- if sees variety
of patients
10
Evaluation
Generalization
Extrapolation
Objective StructuredObjective StructuredClinical Examination (OSCE)Clinical Examination (OSCE)
- Dave Swanson
Stop and Re-consider ….Stop and Re-consider ….
What were the educational goals?
ANDAND
How will the decision be used?
The Decision-making ProcessThe Decision-making Process• Standard setting
– many methods
• But keys are: – ultimate success – fidelity – care with which decision is executed is crucial
– must be documented
• Helpful Hint: can also use standard setting for defining faculty expectations for content and use - in advance of test!
The Decision-making ProcessThe Decision-making Process
• Generic steps: Generic steps: – exam was conducted properly; exam was conducted properly; – results are psychometrically accurate and valid; results are psychometrically accurate and valid; – establish pass-fail point; – and consider each candidate’s results
• Red stepsRed steps require an evaluating process that is – require an evaluating process that is –– Deliberate and reflective– Open discussion
• Black steps: decisionBlack steps: decision– All members of decision-making board must be ‘in’ – or else an
escalation procedure needs to be established – in advance!
ExamplesExamples• OSCEOSCE
– MCC meeting steps• Overview: how exam went• Review each station
– Discussion– Decision: use all cases
• Review results ‘in toto’– Decide on pass-fail point– Consider each person:
• Decide pass-fail for specific challenging instances
• Award standing or tentative decision
– Comments
• Work-based: mini-CEXmini-CEX– Six month rotation in PGY-1– Construction steps
• Sampling grid?– Numbers needed– Score per case
• Rating issues:– Global (preferred) vs. Check-
list– Scale issues
• Examiner strategy– Not same one– Number needed– Preparation
• Awarding standing: Pass-fail or one of several parameters?
– Comments
Appeals vs. Remarking!Appeals vs. Remarking!
• Again – pre-defined process
• Tending to make a negative decision– Candidate’s right to last word before final
decision• Where does that take place? Must plan this!
– Differentiate decision-making from rescoring• Requires independent ‘ombudsperson’
• Other common issues
Delivering the NewsDelivering the News
• Depends on the purpose and desired use• Context driven• In a high stakes situation at a specific faculty –
may want two steps process– Tending - to negative decision:
• Notion of right of the candidate to the last word before a decision is made: has right to provide evidence that addresses the board’s concerns
– Final decision
• Comments/queries?
Key Lessons: Re-capKey Lessons: Re-cap
• Purpose and use of result• Overview of due process – in promotion• Overview of Validity – prefer Kane’s approach• Scoring component of validity• Generalization and extra-polization
– True score variance ↑ ↑ - and error variance ↓↓
• Interpretation/Decision-making components of validity
• Know ‘due process’
Are you ready?Are you ready?
• Are the faculty clear on the ultimate use and purpose of the test or exam?
• How will you track the issues to be resolved?
• Have you defined the major feasibility challenges at your institution – and plan!
• Do you have a process to assure valid scoring and interpretation of the result?
• Do you have supportsupport and back-upback-up?
Summary and QuestionsSummary and Questions
Thank You!
ReferencesClauser BE, Margolis MJ, Swanson DB. (2008). Issues of Validity
and Reliability for Assessments in Medical Education. In Practical Guide to the Evaluation of Clinical Competence. Hawkins R. and Holmboe ES, eds. Publisher - Mosby
Pangaro L, Holmboe ES (2008). Evaluation Forms and Global Rating Forms. In Practical Guide to the Evaluation of Clinical Competence. Hawkins R.& Holmboe ES, eds. Publisher - Mosby
Newble D, Dawson-Saunders B, Dauphinee WD, et al: (1994). Guidelines for Assessing Clinical Competence. Teaching and Learning in Medicine 6 (3): 213-220.
Kane MT. (1992). An Argument-Based Approach to Validity. Psychological Bulletin Validity. 112 (3): 527-535.
Downing S. (2003) Validity: on the meaningful interpretation of assessment data. Medical Education 37:830-7
Norcini J. (2003) Work based assessment. BMJ 326:753-5Smee S. (2003) Skill based assessment. BMJ 326: 703-6