using raschmodelling to investigate inter- board ... · english english lit maths maths meth...
TRANSCRIPT
Using Rasch Modelling to Investigate Inter-board Comparability of Examination Standards in GCSE
Qingping HeMarch 23 2018
Inter-board comparability (IBC)
■ Exams testing the same subject areas are provided by different exam boards (EBs)
■ There is no pre-testing or equating between the exams due to their high-stakes nature
■ Inter-board comparability (IBC) is defined as the extent to which examinees awarded the same grade by different exam boards in a subject have similar level of attainment. It can be viewed as one aspect of validity
■ Inter-board comparability has been a concern for a wide range of stakeholders
Monitoring and maintaining inter-board comparability ■ Methods used to study inter-board comparability:
Judgemental methods (absolute and comparative) and Statistical methods
■ Post-award inter-board statistical screening (IBSS) using mean GCSE scoreCandidates from all boards (a specific subject): Candidates from a particular board (board x):
Expectation matrix: Expected:vs Observed
)(, ,,
10
1
,, ji
ji
i i
jiji N
N
N
Naaa ===
å =
Mean GCSE band
No of Can. in band
Expected Number of Can. in grade
A* A B C …
Band1 Nx1 Nx1,1 Nx1,2 N x1,3 N x1,4 …
… … … … … … …
Band5 Nx5 Nx5,1 Nx5,2 Nx5,3 Nx5,4 …
… … … … … … …
Band10 Nx10 Nx10,1 Nx10,2 Nx10,3 Nx10,4 …
Expected total Npx1 Npx2 Npx3 Npx4 …
Observed Nox1 Nox2 Nox3 Nox4 …
å ===
10
1 ,,, ,i jxipxjjixijxi NNNN a
Mean GCSE band
No of C. in band
Number of Can. in grade
A* A B C …
Band1 N1 N1,1 N1,2 N1,3 N1,4 …
… … … … … … …
Band5 N5 N5,1 N5,2 N5,3 N5,4 …
… … … … … … …
Band10 N10 N10,1 N10,2 N10,3 N10,4 …
Potential issues with existing IBSS using mean GCSE score
■ The appropriateness of the use of mean GCSE score for screening ■ Not all candidates took the same subjects that are used to calculate the mean
GCSE score. Different candidates may have taken a different set of subjects■ Variability in difficulty between subjects■ Variability in entry pattern between the exam boards
Aims of study
■ To gain further understanding of the issues with inter-board comparability■ To explore the potential of using differential category functioning (DCF)
analysis with Rasch modelling to investigate the comparability of examination standards in GCSEs between exam boards as partial validation of the current post-award inter-board statistical screening (IBSS) approach
■ To explore the potential of using Rasch modelling to enhance the existing inter-board statistical screening approach
Differential category functioning analysis using partial credit Rasch model (PCM)
■ The Partial Credit Model for polytomous items:
■ Differential category functioning (DCF): Test takers with the same ability from different subgroups perform differently at specific score (category) levels of an item
■ Methods used to investigate DCF:■ Calibrate items for different subgroups separately ■ Calibrate items using persons from all subgroups and compare average
abilities from different subgroups■ Re-estimate item parameters for individual subgroups using their ability
distributions estimated with the population
kk
k
pp
dq -=-1
ln
Method■ The GCSE letter grades were converted into ordered numerical grades (U to 0, G to 1, …, A* to 8)
■ Each subject was treated as a polytomous item and the grade a candidate received on the subject
as a score (or performance category) on the item
■ The subjects included in the analysis were assumed to define a shared construct which is closely
related to the constructs measured by the individual examinations
■ Candidates taking the exams that test the same subject areas but provided by different exam
boards were treated as different subgroups
■ WINSTEPS was used for Rasch analysis and R code was developed to re-estimate the item
category parameters for individual exam boards and standard error of estimation
■ The existence of significant DCF effect at specific grades in a subject between the exam boards
(subgroups) was assumed to indicate inconsistency in standards at these grades between the
exam boards
■ DCF measures were converted into changes in grade boundary scores required for aligning
standards between the exam boards to investigate potential impact on grade outcomes
■ Results from Rasch analyses were compared with those from the existing inter-board statistical
screening using mean GCSE score
The data (from the 2015 exam series)
Subject nameSample size
Board A Board B Board C Board D
Additional Science 180638 75720 51340 7987
Applications of Mathematics 5156 2284 3812 1179
Biology 74074 39027 11551 5080
Chemistry 73196 38995 11516 5016
English 236791 25687 34854 130959
English Literature 231965 26675 36889 109420
French 80568 8868 44369 14484
Further Additional Science 14462 3072 5383
Geography 107895 33780 42921 29231
German 4987 3252 28025 15233
History 58424 85077 69070 22133
Mathematics 74752 52592 430695 30177
Methods in Mathematics 4397 2126 4033 872
Physics 74030 39397 11483 4919
Science 141775 52715 37670 7305
Model assumptions and model fit
■ Grade U did not fit the PCM well and was removed from analysis. Candidates taking fewer than two subjects were excluded. The final dataset fitted the model reasonably well
■ Unidimensionality assumption of the PCM held reasonably well
Category probability curves (CPCs) for History and Additional Science
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
-9.00 -7.00 -5.00 -3.00 -1.00 1.00 3.00 5.00 7.00 9.00
Cat
egor
y pr
obab
ility
(A
dd. S
ci.)
Ability (logits)
G FE DC BA A*
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
-9.00 -7.00 -5.00 -3.00 -1.00 1.00 3.00 5.00 7.00 9.00
Cat
egor
y pr
obab
ility
(H
isto
ry)
Ability (logits)
G FE DC BA A*
Subject relative Rasch grade difficulty (all boards) and item characteristic curves (ICCs)
■ V
Grade gap in unit of logits:
-8.00
-6.00
-4.00
-2.00
0.00
2.00
4.00
6.00
8.00E
ngli
sh
Eng
lish
Lit
Mat
hs
Mat
hs M
eth
Bio
logy
Mat
hs A
pp
Phy
sics
Che
mis
try
Sci
ence
Add
Sci
ence
Geo
grap
hy
His
tory
Spa
nish
Fre
nch
Ger
man
Fur
Add
Sci
ence
Dif
ficu
lty
(lo
git
s)
Overall Grade F Grade E Grade DGrade C Grade B Grade A Grade A*
0
1
2
3
4
5
6
7
-9.00 -7.00 -5.00 -3.00 -1.00 1.00 3.00 5.00 7.00 9.00
Exp
ecte
d a
nd
ob
serv
ed s
core
Ability (logits)
English Eng lit French GeographyGerman History Maths Maths AppMaths Meth Add Sci Biology ChemistryFur Add Sci Physics Science Spanish
å=
-=DSN
iEiAi
SG
ddNN 1
,, )(1
Average ability of candidates taking different subjects and grade Raschdifficulty (all and individual boards)
-8.00
-6.00
-4.00
-2.00
0.00
2.00
4.00
6.00
8.00
Engl
ish
Eng
lish
Lit
Mat
hs
Mat
hs
Met
h
Bio
log
y
Mat
hs
App
Phy
sics
Ch
emis
try
Sci
ence
Add
Sci
ence
Geo
grap
hy
His
tory
Spa
nish
Fre
nch
Ger
man
Fur
Ad
dS
cien
ce
Gra
de
dif
ficu
lty
(lo
git
s)
All boards Board A Board B Board C Board D-2.00
-1.50
-1.00
-0.50
0.00
0.50
1.00
1.50
2.00
2.50
3.00
Eng
lish
Eng
lish
Lit
Mat
hs
Mat
hs
Met
h
Bio
log
y
Mat
hs
App
Phy
sics
Ch
emis
try
Sci
ence
Add
Sci
ence
Geo
grap
hy
His
tory
Spa
nish
Fre
nch
Ger
man
Fur
Ad
dS
cien
ce
Av
era
ge
ab
ilit
y (
log
its)
All boards Board A Board B Board C Board D
Relative grade difficulty (difference in standards – DCF effect, in unit of logits)
Relative grade difficulty in logits and unit of grade: ALLkkRk ddd ,, -=
-1.80
-1.50
-1.20
-0.90
-0.60
-0.30
0.00
0.30
0.60
0.90
1.20
Eng
lish
Eng
lish
Lit
Mat
hs
Mat
hs M
eth
Bio
logy
Mat
hs A
pp
Phys
ics
Che
mis
try
Scie
nce
Add
Sci
ence
Geo
grap
hy
His
tory
Span
ish
Fren
ch
Ger
man
Fur A
ddSc
ienc
e
Rel
ativ
e gr
ade
diff
icul
ty (
logi
ts)
Board A, A* Board B, A* Board C, A* Board D, A*Board A, A Board B, A Board C, A Board D, ABoard A, B Board B, B Board C, B Board D, BBoard A, C Board B, C Board C, C Board D, CBoard A, D Board B, D Board C, D Board D, DBoard A, E Board B, E Board C, E Board D, EBoard A, F Board B, F Board C, F Board D, F
D= Rk
RGk
dd ,
,
Relative grade difficulty (difference in standards – DCF effect, in unit of grade)Relative grade difficulty in unit of grade:
Changes in boundaries required for aligning standards:RGkkk wdbb ,-=¢
-1.20
-1.00
-0.80
-0.60
-0.40
-0.20
0.00
0.20
0.40
0.60
0.80
Engl
ish
Engl
ish L
it
Mat
hs
Mat
hs M
eth
Biol
ogy
Mat
hs A
pp
Phys
ics
Chem
istry
Scie
nce
Add
Sci
ence
Geo
grap
hy
His
tory
Span
ish
Fren
ch
Ger
man
Fur A
ddSc
ienc
e
Rel
ativ
e gr
ade
diffi
culty
(gra
de u
nit)
Board A, A* Board B, A* Board C, A* Board D, A*Board A, A Board B, A Board C, A Board D, ABoard A, B Board B, B Board C, B Board D, BBoard A, C Board B, C Board C, C Board D, CBoard A, D Board B, D Board C, D Board D, DBoard A, E Board B, E Board C, E Board D, EBoard A, F Board B, F Board C, F Board E, F
Potential impact of aligning standards between exam boards: Changes in grade outcomes based on Rasch analysis (changing grade boundaries)
-12.00-11.00-10.00-9.00-8.00-7.00-6.00-5.00-4.00-3.00-2.00-1.000.001.002.003.004.005.006.007.00
Engl
ish
Engl
ish
Lit
Mat
hs
Mat
hs M
eth
Bio
logy
Mat
hs A
pp
Phys
ics
Che
mis
try
Scie
nce
Add
Sci
ence
Geo
grap
hy
His
tory
Span
ish
Fren
ch
Ger
man
Fur A
ddSc
ienc
e
Per
cent
age
chan
ge b
ased
on
Ras
ch
diff
icul
ty (%
)
Board A, A* Board B, A* Board C, A* Board D, A*Board A, A Board B, A Board C, A Board D, ABoard A, B Board B, B Board C, B Board D, BBoard A, C Board B, C Board C, C Board D, CBoard A, D Board B, D Board C, D Board D, DBoard A, E Board B, E Board C, E Board D, EBoard A, F Board B, F Board C, F Board D, F
Changes in grade outcomes after aligning standards between exam boards based on existing IBSS with mean GCSE score (no change in boundaries)
-12.00-11.00-10.00-9.00-8.00-7.00-6.00-5.00-4.00-3.00-2.00-1.000.001.002.003.004.005.006.00
Engl
ish
Engl
ish
Lit
Mat
hs
Mat
hs M
eth
Bio
logy
Mat
hs A
pp
Phys
ics
Che
mis
try
Scie
nce
Add
Sci
ence
Geo
grap
hy
His
tory
Span
ish
Fren
ch
Ger
man
Fur A
ddSc
ienc
e
Per
cent
age
chan
ge b
ased
on
inte
r-bo
ard
scre
enin
g w
ith
mea
n G
CSE
(%
)
Board A, A* Board B, A* Board C, A* Board D, A*Board A, A Board B, A Board C, A Board D, ABoard A, B Board B, B Board C, B Board D, BBoard A, C Board B, C Board C, C Board D, CBoard A, D Board B, D Board C, D Board D, DBoard A, E Board B, E Board C, E Board D, EBoard A, F Board B, F Board C, F Board D, FBoard A, G Board B, G Board C, G Board D, G
Comparison of changes in grade outcomes based on Rasch grade difficulty and those based on IBSS with mean GCSE score
-8.00-7.00-6.00-5.00-4.00-3.00-2.00-1.000.001.002.003.004.005.006.007.00
-7.00 -6.00 -5.00 -4.00 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 4.00 5.00
Cha
nge
base
d on
bou
ndar
y ch
ange
s us
ing
Ras
ch d
iific
ulty
(%)
Change based on inter-board screening with mean GCSE (%)
Board A Board B Board C Board D
Changes in grade outcomes after aligning standards between exam boards based on matrix approach with Rasch ability instead of mean GCSE
-12.00-11.00-10.00-9.00-8.00-7.00-6.00-5.00-4.00-3.00-2.00-1.000.001.002.003.004.005.006.00
Engl
ish
Engl
ish L
it
Mat
hs
Mat
hs M
eth
Biol
ogy
Mat
hs A
pp
Phys
ics
Chem
istry
Scie
nce
Add
Sci
ence
Geo
grap
hy
His
tory
Span
ish
Fren
ch
Ger
man
Fur A
ddSc
ienc
e
Perc
enta
ge c
hang
e ba
sed
on m
atri
x ap
proa
ch w
ith R
asch
abi
lity
(%)
Board A, A* Board B, A* Board C, A*Board D, A* Board A, A Board B, ABoard C, A Board D, A Board A, BBoard B, B Board C, B Board D, BBoard A, C Board B, C Board C, CBoard D, C Board A, D Board B, DBoard C, D Board D, D Board A, EBoard B, E Board C, E Board D, EBoard A, F Board B, F Board C, FBoard D, F
Comparison of changes in grade outcomes based on IBSS with mean GCSE score and the matrix approach with Rasch ability
-6.00
-5.00
-4.00
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
4.00
5.00
-6.00 -5.00 -4.00 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 4.00 5.00
Cha
nge
base
d on
inte
r-bo
ard
scre
enin
g us
ing
mea
n G
CSE
(%)
Change based on the matrix approach with Rasch ability (%)
Board A Board B Board C Board D
GCSE Science: aligning standards based on IBSS with mean GCSE score and Rasch ability; effect of variability in subject difficulty and entry pattern (1)
-5.00
-4.00
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
4.00
5.00
6.00
F f E e D d C c B b A a A* a*
Perc
enta
ge c
hang
e (%
), Sc
ienc
e
Board A Board B Board C Board D
F,E,D,C,B,A,A*: Rasch ability basedf,e,d,c,b,a,a*: mean GCSE based
GCSE Science: aligning standards based on IBSS with mean GCSE score and Rasch ability; effect of variability in subject difficulty and entry pattern (2)
“Value added” measure: AbilityiGCSEii xxv ,, -=
0
1
2
3
4
5
6
7
8
-9.00 -7.00 -5.00 -3.00 -1.00 1.00 3.00 5.00 7.00 9.00
Mea
n G
CSE
sco
re, S
cien
ce
Rasch ability (logits)
BoardA BoardB BoardC BoardD
GCSE Science: aligning standards based on IBSS with mean GCSE score and Rasch ability; effect of variability in subject difficulty and entry pattern (3)
-0.60
-0.50
-0.40
-0.30
-0.20
-0.10
0.00
0.10
0.20
0.30
F E D C B A A*
Diff
eren
ce b
etw
een
stan
dard
ised
mea
n G
CSE
sco
re a
nd s
tand
ardi
sed
abili
ty
All boardsBoard ABoard BBoard CBoard D
GCSE Science: aligning standards based on IBSS with mean GCSE score and Rasch ability; effect of variability in subject difficulty and entry pattern (4)
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
All boards Board A Board B Board C Board D
Ave
rgae
num
ber
of su
bjec
ts ta
ken
GCSE Science: aligning standards based on IBSS with mean GCSE score and Rasch ability; effect of variability in subject difficulty and entry pattern (5)■ When the total number of subjects taken by students from a board was small,
the subject itself would make a substantial contribution to the overall mean
GCSE score. And science was a relatively easy subject. These factors may
partly explain the higher value added estimated for students from Board D and
the differences in estimated changes in grade outcomes between IBSS with
mean GCSE score and IBSS with Rasch ability
■ The use of mean GCSE score as a performance measure with the existing
inter-board statistical screening procedure would not be able to take into
account the different difficulties of the subjects and the variability in entries
between the exam boards
Conclusions
■ For most of the grades across the 16 subjects studied, the effect of DCF was small or moderate. Comparability of standards in the exams between the exam boards for the higher grades was found to be higher than that for the lower grades
■ Changes in grade outcomes after aligning standards between the exam boards based on Rasch grade difficulty and those estimated using existing inter-board statistical screening procedure with mean GCSE score were broadly consistent
■ The use of Rasch ability as a performance measure with the existing IBSS procedure produced results which were closely similar to those produced using the mean GCSE score. However, since the ability measure takes into account the difference in difficulty between subjects, its use with the existing inter-board screening procedure is likely to produce fairer results than the use of mean GCSE score