items analysis

72
Items analysis 1

Upload: levi-medina

Post on 31-Dec-2015

29 views

Category:

Documents


0 download

DESCRIPTION

Items analysis. 1. Introduction. Items can adopt different formats and assess cognitive variables (skills, performance, etc.) where there are right and wrong answers, or non-cognitive variables (attitudes, interests, values, etc.) where there are no right answers. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Items analysis

Items analysis

1

Page 2: Items analysis

1. Introduction

• Items can adopt different formats and assess cognitive variables (skills, performance, etc.) where there are right and wrong answers, or non-cognitive variables (attitudes, interests, values, etc.) where there are no right answers.

• The statistics that we present are used primarily with skills or performance items.

2

Page 3: Items analysis

• To carry out the analysis of the items should be available :– A data matrix with the subjects' responses to each items.

• To analyze test scores and the responses to the correct alternative, the matrix will take the form of ones (hits) and zeros (mistakes).

• To analyze incorrect alternatives, in the matrix should appear specific options selected by each subject.

• The correct alternative analysis (which is offered more information about the quality of the test) allow us to obtain the index of difficulty, discrimination, and reliability and validity of the item.

1. Introduction

3

Page 4: Items analysis

• Empirical difficulty of an item: proportion of subjects who answer it correctly.

• Discriminative power: the ability of the item to distinguish subjects with different level in the trait measured.

• Both statistics are directly related to the mean and variance of total test scores.

• The reliability and validity of the items are related to the standard deviation of the test and indicate the possible contribution of each item to the reliability and validity of total scores of the test.

1. Introduction

4

Page 5: Items analysis

• Proportion of subjects who have responded correctly to the item:– One of the most popular indices to quantify the difficulty of the items

dichotomous or dichotomized.

• The difficulty thus considered is relative because it depends on:– Number of people who attempt to answer the item.– Their characteristics.

A: number of subjects who hits the item.N: number of subjects that attempt to respond the item.

• It ranges between 0 and 1.

– 0: No subject has hit the item. It is difficult.– 1: All subjects have answered correctly to the item. It

is easy.

AID

N

2. Items difficulty

5

Page 6: Items analysis

2. Items difficulty

• Example: A performance item in mathematics is applied to 10 subjects, with the following result :

• The obtained value does not indicate whether the item is good or bad. It represents how hard it has been to the sample of subjects who have attempted to answer it.

Subject a b c d e f g h i j

Answers 1 1 1 1 0 1 0 1 1 0

70.70

10ID

6

Page 7: Items analysis

2. Items difficulty

• The ID is directly related to the mean and variance of the test. In dichotomous items:

– The sum of all scores obtained by subjects in this item is equal to the number of hits. Therefore, the item difficulty index is equal to its mean.

• If we generalize to the total test, the average of the test scores is equal to the sum of the difficulty indices of items.

1

1 or 0 according to success or failure in the item

n

jj

j

X

IDN

X

1

(hits)n

jj

X A

7

Page 8: Items analysis

2. Items difficulty

• The relationship between difficulty and variance of the test is also direct. In dichotomous items:

• In item analysis, a relevant question is to find the value of pj that maximizes the variance of items.– Maximum variance is achieved by an item when its pj is 0.5

– An item is appropriate when it is answered by different subjects and causes in them different answers.

2

proportion of subjects that answer correctly to the item; the ID.

1-

j j j

j

j j

S p q

p

q p

8

Page 9: Items analysis

2. Items difficulty2.1. Correction of hits by chance

• The fact of hitting an item depends not only on the subjects know the answer, but also of subjets’ luck who without know it they choose the correct answer.

• The higher the number of distractors less likely that subjects hit the item randomly.

• It is advisable to correct the ID:

11

corrected ID

hits

E= mistakes

p= proportion of hits

q= proportion of mistakes

k= number of item alternatives

N= número de sujetos que intentan responder el ítem

c

c

EA qKID pN N K

ID

A

9

(negative values can be found)

Page 10: Items analysis

2. Items difficulty2.1. Correction of hits by chance

Subjects Item 1 Item 2 Item 3 Item 4 Item 5

A 1 1 1 1 1

B 1 0 1 0 1

C 1 1 0 1 0

D 1 0 0 1 0

E 0 1 0 1 1

F 1 0 0 1 0

G 0 1 1 1 0

H 1 0 0 1 0

I 1 1 0 0 0

J 0 0 0 1 1

ID

IDc

Example. Test composed by items with 3 alternativesCalculate ID and IDc to each item

10

Page 11: Items analysis

2. Items difficulty2.1. Correction of hits by chance

Subjects Item 1 Item 2 Item 3 Item 4 Item 5

A 1 1 1 1 1

B 1 0 1 0 1

C 1 1 0 1 0

D 1 0 0 1 0

E 0 1 0 1 1

F 1 0 0 1 0

G 0 1 1 1 0

H 1 0 0 1 0

I 1 1 0 0 0

J 0 0 0 1 1

ID 0.7 0.5 0.3 0.8 0.4

IDc 0.55 0.25 -0.05 0.7 0.1

Items which have suffered a major correction are those that have proved more difficult.

11

Page 12: Items analysis

2. Items difficulty2.1. Correction of hits by chance

Recommendations• That items with extreme values to the population that

they are targeted in the index of difficulty are eliminated from the final test.

• In aptitude test we will get better psychometric results if the majority of items are of medium difficulty.– Easy items should be included, preferably at the beginning (to

measure less competent subjects).– And difficult items too (to measure less competent subjects).

12

Page 13: Items analysis

3. Discrimination

• Logic: Given an item, subjects with good test scores have been successful in a higher proportion than those with low scores.

• If an item is not useful to differentiate between subjects based on their skill level (does not discriminate between subjects) it should be deleted.

13

Page 14: Items analysis

3. Discrimination3.1. Index of discrimination based on extreme

groups (D)• It is based on the proportions of hits among skill

extreme groups (25 or 27% above and below from the total sample).– The top 25 or 27% would be formed by subjects who scored

above the 75 or 73 percentile in the total test.

• Once formed the groups we have to calculate the proportion of correct answers to a given item in both groups and then apply the following equation:

proportion of hits in the higher group

proportion of hits in the lower group

s i

s

i

D p p

p

p

14

Page 15: Items analysis

3. Discrimination3.1. Index of discrimination based on extreme

groups (D)

• D index ranges between -1 and 1.– 1= when all people in the upper group hit the item

and those people from the lower group fail it.– 0= item is equally hit in both groups.– Negative values= less competent subjects hit the item

more than the most competent subjects (the item confuses to more skilled subjects).

15

Page 16: Items analysis

Example. Answers given by 370 subjects at 3 alternatives (A, B, C) of an item where B is the correct option. In rows are the frequency of subjects who selected each alternative and have received scores above and below the 27% of their sample in the total test, and the group formed by the central 46%.

Calculate the corrected index of difficulty and the discrimination index.

A B* C

Upper 27% 19 53 28

Intermediete 46% 52 70 48

Lower 27% 65 19 16

16

3. Discrimination3.1. Index of discrimination based on extreme

groups (D)

Page 17: Items analysis

• Proportion of correct answers:(53+70+19)/370=0.38

• Proportion of mistakes:228/370=0.62

0.620.38 0.07

1 3 1c

qID p

k

53 19 53 19

0.3419 53 28 65 19 16 100s iD p p

17

3. Discrimination3.1. Index of discrimination based on extreme

groups (D)

Page 18: Items analysis

Interpretation of D values (Ebel, 1965)

Values Interpretation

D ≥ 0.40 The item discriminates very well

0.30 ≤ D ≤ 0.39 The item discriminates well

0.20 ≤ D ≤ 0.29 The item discriminates slightly

0.10 ≤ D ≤ 0.19 The item needs revision

D < 0.10 The item is useless

18

3. Discrimination3.1. Index of discrimination based on extreme

groups (D)

Page 19: Items analysis

3. Discrimination3.2. Indices of discrimination based

on the correlation• If an item discriminate adequately, the correlation

between the scores obtained by subjects in that item and the ones obtained in the total test will be positive.– Subjects who score high on the test are more likely to hit the

item.

• Def.: correlation between subjects scores in the item and their scores in the test (Muñiz, 2003).

• The total score of subjects in the test will be calculated discounting the item score.

19

Page 20: Items analysis

3. Discrimination3.2. Indices of discrimination based on the correlation

3.2.1. Correlation coefficient Φ

• When test score and item scores are strictly dichotomous.

• It allow us to estimate the discrimination of an item with some criterion of interest (eg. fit and unfit, sex, etc.).

• First, we have to sort data in a 2x2 contingency table.– 1= item is hit/criterion is exceeded.– 0= item is fail/criterion is not exceeded.

xy x y

x x y y

p p p

p q p q

20

Page 21: Items analysis

Example. The following table shows the sorted results from 50 subjects who took the last psychometrics exam.

Item 5 (X)

1 0

Criterion (Y) Fit pxy

30/50=0.65 py

35/50=0.7

Not fit 5 10 qy

15/50=0.3

px

35/50=0.7qx

15/30=0.3N=50

21

3. Discrimination3.2. Indices of discrimination based on the correlation

3.2.1. Correlation coefficient Φ

Page 22: Items analysis

• There is a high correlation between the item and the criterion. That is, those subjects who hit the item usually pass the psychometrics exam.

0.6 0.7*0.70.52

0.7*0.3*0.7*0.3xy x y

x x y y

p p p

p q p q

22

3. Discrimination3.2. Indices of discrimination based on the correlation

3.2.1. Correlation coefficient Φ

Page 23: Items analysis

3. Discrimination3.2. Indices of discrimination based on the correlation

3.2.2. Point-biserial correlation• When the item is a dichotomous variable and the test

score is continuous.

= Mean in the test of participants that answered the item correctly. = Mean of the test. SX = Standard deviation of the test.

p = Proportion of participants that answered the item correctly. q = proportion of participants that answered the item incorrectly.

• Remove the item score from the test score.23

q

p

S

XXr

X

T

pb

1

1X

TX

Page 24: Items analysis

Example. The following table shows the responses of 5 subjects to 4 items. Calculate the point-biserial correlation of the second item.

Items

Participants 1 2 3 4

A 0 1 0 1

B 1 1 0 1

C 1 1 1 1

D 0 0 0 1

E 1 1 1 0

24

3. Discrimination3.2. Indices of discrimination based on the correlation

3.2.2. Point-biserial correlation

Page 25: Items analysis

Items Total

Participants 1 2 3 4 X (X-i) (X-i)2

A 0 1 0 1 2 1 1

B 1 1 0 1 3 2 4

C 1 1 1 1 4 3 9

D 0 0 0 1 1 1 1

E 1 1 1 0 3 2 4

∑ 9 19

25

3. Discrimination3.2. Indices of discrimination based on the correlation

3.2.2. Point-biserial correlation

Page 26: Items analysis

• Participants who answered correctly the item are A, B, C and E; so their mean is:

• The total mean is:

• The standard deviation of the test is:

26

3. Discrimination3.2. Indices of discrimination based on the correlation

3.2.2. Point-biserial correlation

24

23211

X

8.15

9TX

75.056.08.15

19 222

XN

XSX

Page 27: Items analysis

27

3. Discrimination3.2. Indices of discrimination based on the correlation

3.2.2. Point-biserial correlation

8.05

4p

2.05

1q

54.02.0

8.0

75.0

8.121

q

p

S

XXr

X

T

pb

Page 28: Items analysis

• When both item and test score are inherently continuous variables, although one is dichotomized (the item).

y = height in the normal curve corresponding to the typical score that leaves beneath a probability equal to p (see table).

• We can find values greater than 1, especially when one of the variables is not normal.

• Example. Based on the table of the previous example, calculate the biserial correlation of item 3. 28

3. Discrimination3.2. Indices of discrimination based on the correlation

3.2.3. Biserial correlation

y

p

S

XXr

X

T

b

1

Page 29: Items analysis

Items Total

Participants 1 2 3 4 X (X-i) (X-i)2

A 0 1 0 1 2 2 4

B 1 1 0 1 3 3 9

C 1 1 1 1 4 3 9

D 0 0 0 1 1 1 1

E 1 1 1 0 3 2 4

∑ 11 27

29

3. Discrimination3.2. Indices of discrimination based on the correlation

3.2.3. Biserial correlation

Page 30: Items analysis

• Participants who answered correctly the item are C and E; so their mean is:

• The total mean is:

• The standard deviation of the test is:

30

3. Discrimination3.2. Indices of discrimination based on the correlation

3.2.3. Biserial correlation

5.22

231

X

2.25

11TX

75.056.084.44.52.25

27 222

XN

XSX

Page 31: Items analysis

31

3. Discrimination3.2. Indices of discrimination based on the correlation

3.2.3. Biserial correlation

4.05

2p

41.002.1*4.03863.0

4.0

75.0

2.25.21

q

p

S

XXr

X

T

pb

Because the value p=0.4 does not appear in the first column of the table, we look for its complement have to be looked up (0.6), which is associated with an y=0.3863.

Page 32: Items analysis

3. Discrimination3.3. Discrimination in attitude items

• There are no right or wrong answers but the participant must be placed in the continuum established based on the degree of the measured attribute.

• Correlation between item scores and test scores.– Because items are not dichotomous, Pearson

correlation coefficient is used.• That coefficient can be interpreted as a Homogeneity Index

(HI). It indicates how much the item is measuring the same dimension or attitude as the rest of the items of the scale.

32

Page 33: Items analysis

• Items in which HI is below 0.20 should be eliminated.• Correction: deduct from the total score the item score or apply

the formula below:

2 2 2 2

( )

[ ( ) ][ ( ) ]

sample size

sum of the subjects scores in J element

sum of the subjects scores in the scale

correlation between scores obtained by subject

jxj x

jx

N JX J X COV JXR

S SN J J N X X

N

J

X

R

s

in J element and in the scale

( ) 2 2 2

jx x jj x j

x j jx x j

R S SR

S S R S S

33

3. Discrimination3.3. Discrimination in attitude items

Page 34: Items analysis

Example. The table below presents the answers of 5 people to 4 attitudes items. Calculate the discrimination of item 4 by Pearson correlation.

Items Total XT X4XT X24 X2

T

Subjects X1 X2 X3 X4

A 2 4 4 3 13 39 9 169

B 3 4 3 5 15 75 25 225

C 5 2 4 3 14 42 9 196

D 3 5 2 4 14 56 16 196

E 4 5 2 5 16 80 25 256

20 72 292 84 104234

3. Discrimination3.3. Discrimination in attitude items

Page 35: Items analysis

• The correlation or IH between item 4 and total score of the test will be:

• Inflated result because item 4 score is included in total score. Correction:– Standard deviation both for item 4 and total score:

2 2 2 2 2 2

5*292 20*720.88

[ ( ) ][ ( ) ] [5*84 20 ][5*1042 72 ]

jx

N JX J XR

N J J N X X

2 2 2 2 22

4

2 2 2 2 22

3 5 3 4 5(4) 0.80 0.89

5

13 15 14 14 16(14.4) 1.04 1.02

5

x

xT

S

S

( ) 2 2

0.88*1.02 0.890.01

1.04 0.80 2*0.88*1.02*0.892

jx x jj x j

x j jx x j

R S SR

S S R S S

35

3. Discrimination3.3. Discrimination in attitude items

Page 36: Items analysis

• The big difference when applying the correction is due to the small number of items that we have used in the example.– As the number of items increases, that effect

decreases because the influence of item scores on the total score is getting smaller. With more than 25 items, the result is very close.

36

3. Discrimination3.3. Discrimination in attitude items

Page 37: Items analysis

• Other procedure:– Useful but less efficient than the previous because it does not use the entire sample.– Determine whether the item mean for the subjects with higher scores on the total

test is statistically higher than the mean of those with lower scores. It is common to use 25% or 27% of subjects with best and worst scores.

– Once the groups are identified, we calculate if the mean difference is statistically significant by Student T test.

– Ho: means in both groups are equal.

37

3. Discrimination3.3. Discrimination in attitude items

2

)1()1( 22

lu

ljluju

ljuj

nn

SnSn

XXT

Page 38: Items analysis

= mean of the scores obtained in the item by the 25% of the participants that obtained the highest scores in the test.

= mean of the scores obtained in the item by the 25% of the participants that obtained the lowest scores in the test.

= variance of the scores obtained in the item by the 25% of the participants that obtained the highest scores in the test.

= variance of the scores obtained in the item by the 25% of the participants that obtained the lowest scores in the test.

nu and nl = number of participants in the upper and the lower group respectively.

– Conclusions:• T≤T(α,nu+nl-2) – Null hypothesis is accepted. There are not statistical differences

between means. The item does not discriminate adequately.• T≤T(α,nu+nl-2) – Null hypothesis is rejected. There are statistical differences between

means. The item discriminates adequately.– Student T test is used when the scores in the item and the scale are distributed

normally, and their variances are equal. If some of these assumptions are violated, a non-parametric test should be used (e.g., Mann-Whitney U).

38

3. Discrimination3.3. Discrimination in attitude items

2ujS

ujX

ljX

2ljS

Page 39: Items analysis

Exercise: using the data presented in the last example, calculate Student T test for item 2 (α=0.05).

• To calculate the discrimination of item 2 by Student T Test, we have to do groups with extreme scores. Because of didactic reasons, we are going to use just 2 participants to form those groups.

Participants X2

Upper group E (16) 5

B (15) 4

Participants X2

Lower group A (13) 4

C (14) 2

39

3. Discrimination3.3. Discrimination in attitude items

Page 40: Items analysis

Participants X2

Upper group E (16) 5 25

B (15) 4 16

∑ 9 41

Participants X2

Lower group A (13) 4 16

C (14) 2 4

∑ 6 20

40

3. Discrimination3.3. Discrimination in attitude items

22X

22X

32

6

5.42

9

l

ljlj

u

ujuj

n

XX

n

XX

Page 41: Items analysis

41

3. Discrimination3.3. Discrimination in attitude items

9.1

21

21

2221)12(25.0)12(

35.4

2

)1()1( 22

lu

ljluju

ljuj

nn

SnSn

XXT

One tail: T(α,nu+nl-2) = T(0.05, 2+2-2) = T(0.05, 2) = 2.92

1.9 < 2.92 – Null hypothesis is accepted. There are not statistical differences between means. The item does not discriminate adequately.

191032

20

25.025.205.205.42

41

222

2

222

2

lj

l

ljlj

uj

u

ujuj

Xn

XS

Xn

XS

Page 42: Items analysis

• Relation between test variability and item discrimination:

• If the test is composed by dichotomous items:

• To maximize the discriminative ability of one test, we have to consider together both the difficulty (pj) and the discrimination (rjx) of its items.– It is achieved when discrimination is maximun (rjx=1) and the difficulty is

medium (pj=0.5).

1

Standard deviation of the test

Standard deviation of the item

Discrimination index of item j

n

x j jxj

x

j

jx

S S r

S

S

r

2 2 2

1 1

;n n

x j j jx x j j jxj j

S p q r S p q r

42

3. Discrimination3.4. Factors that affect the discrimination

3.4.1. Variability

Page 43: Items analysis

An item reaches its maximum discriminative power when its difficulty is medium.

43

3. Discrimination3.4. Factors that affect the discrimination

3.4.2. Item difficultyIt

em d

iscr

imin

atio

n

Item difficulty

Page 44: Items analysis

• When we are constructing a test, usually we try to measure one single construct (unidimensionality).

• In multidimensional tests, item discrimination should be estimated considering only the items that are associated with each dimension.

44

3. Discrimination3.4. Factors that affect the discrimination

3.4.3. Dimensionality of the test

Page 45: Items analysis

• If discrimination is defined as the correlation between scores obtained by participants in the item and the test, then reliability and discrimination are closely related.

• It is posible to express the Cronbach alpha coefficient from the discrimination of items:

• Small values in item discrimination are typically associated with unreliable tests.

2 2

1 1

22

1

1 11 1

n n

j jj j

nx

j jxj

S Sn n

n S nS r

45

3. Discrimination3.4. Factors that affect the discrimination

3.4.4. Test reliability

Page 46: Items analysis

As the mean discrimination of the test increases, so does the reliability coefficient.

46

3. Discrimination3.4. Factors that affect the discrimination

3.4.4. Test reliabilityR

elia

bilit

y co

effic

ient

KR

21

Mean discrimination

Page 47: Items analysis

4. Indices of reliability and validity of the items4.1. Reliability index

• To quantify the degree in which an item is measuring accurately the attribute of interest.

Sj = Standard deviation of the scores in the item.

Dj = Discrimination index of the item.

• When any correlation coefficient is used to calculate the discrimination of items,

47

jjDSRI

jxjrSRI 22XSRI pqS j 2

Page 48: Items analysis

4. Indices of reliability and validity of the items4.1. Reliability index

• To the extent that we select items with higher RI, the better the reliability of the test will be.

• Highest possible value of RI = 1.

• Example: Having the information presented in the table below, calculate the RI of item 4.

48

p rbp

Item 4 0.47 0.5

Page 49: Items analysis

4. Indices of reliability and validity of the items4.1. Reliability index

49

5.025.0

53.047.011

25.053.0*47.0

25.05.0*5.0

2

2

jj

j

jxj

SS

pq

pqS

rSRI

Page 50: Items analysis

• The validity of an item involves the correlation of the scores obtained by a sample of participants in the item with the scores obtained by the same subjects in any external criterion of our interest.– It serves to determine the degree in which each item of one test

contributes successfully to make predictions about that external criterion.

• In the case that the criterion is a continuous variable and the item is a dichotomous variable, we are going to use the point- biserial correlation; but it is not necessary to substract from the total score of the external criterion the item score because it is not included.

50

4. Indices of reliability and validity of the items4.2. Validity index

pbjyjrSVI

jyjrSVI

Page 51: Items analysis

• Test validity (rxy) can be expressed in connexion with the VI of the items. The higher VI of the items are, the more optimized the validity of the test will be.

• This formula allows us to see how the validity of the test can be estimated from the discrimination index of each item (rjx), their validity indexes (rjy) and their difficulty indexes ( ).

51

4. Indices of reliability and validity of the items4.2. Validity index

RI

VI

rS

rSr

jxj

jyjxy

jjj qpS 2

Page 52: Items analysis

• Paradox in the selection of items: if we want to select items to maximize the reliability of the test we have to choose those items with a high discrimination index (rjx), but this would lead us to reduce the validity of the test (rxy) because it increases as validity indexes (VI) are high and reliability indexes (RI) are low.

52

4. Indices of reliability and validity of the items4.2. Validity index

Page 53: Items analysis

Example. The table below presents the scores of 5 participants in a test with 3 items.

Calculate the validity index of the test (rxy).53

4. Indices of reliability and validity of the items4.2. Validity index

Participants Item 1 Item 2 Item 3

A 0 0 1

B 1 1 1

C 1 0 0

D 1 1 1

E 1 1 1

rjy 0.2 0.4 0.6

Page 54: Items analysis

54

4. Indices of reliability and validity of the items4.2. Validity index

4.016.016.02.0*8.05

1*

5

4

49.024.024.04.0*6.05

2*

5

3

4.016.016.02.0*8.05

1*

5

4

75.0685.0

516.0

25.0*4.099.0*49.025.0*4.0

6.0*4.04.0*49.02.0*4.0

323

222

12

1

22

SS

SS

SS

SSqpS

rS

rSr

jjjjj

jxj

jyjxy

Page 55: Items analysis

55

4. Indices of reliability and validity of the items4.2. Validity index

1 2 3 X (X-it1) (X-it2) (X-it3) (X-it1)2 (X-it2)2 (X-it3)2

A 0 0 1 1 1 1 0 1 1 0

B 1 1 1 3 2 2 2 4 4 4

C 1 0 0 1 0 1 1 0 1 1

D 1 1 1 3 2 2 2 4 4 4

E 1 1 1 3 2 2 2 4 4 4

rjy 0.2 0.4 0.6 Σ=7 Σ=8 Σ=7 Σ=13 Σ=14 Σ=13

Page 56: Items analysis

56

4. Indices of reliability and validity of the items4.2. Validity index

2.08.011

8.05

4

8.064.096.16.24.15

13

4.15

7

5.14

6

4

2202

25.02*125.048.0

1.0

2.0

8.0

8.0

4.15.1

2

1

1

1

pq

p

S

X

X

q

p

S

XXr

X

T

X

T

pb

Page 57: Items analysis

57

4. Indices of reliability and validity of the items4.2. Validity index

4.06.011

6.05

3

49.024.056.28.26.15

14

6.15

8

23

222

99.05.149.0

4.0

4.0

6.0

49.0

6.12

2

1

1

2

pq

p

S

X

X

q

p

S

XXr

X

T

X

T

pb

Page 58: Items analysis

58

4. Indices of reliability and validity of the items4.2. Validity index

2.08.011

8.05

4

8.064.096.16.24.15

13

4.15

7

5.14

6

4

2220

25.02*125.048.0

1.0

2.0

8.0

8.0

4.15.1

2

1

1

3

pq

p

S

X

X

q

p

S

XXr

X

T

X

T

pb

Page 59: Items analysis

5. Analysis of distractors• It involves investigating in the distribution of subjects across the

wrong alternatives (distractors), in order to detect possible reasons for the low discrimination of any item or see that some alternatives are not selected by anyone, for example.

• In this analysis, the first step implies:– To check that all the incorrect options are chosen by a minimum number

of subjects. If possible, they should be equiprobable.• Criteria: each distractor have to be selected by at least the 10% of the sample

and there is not many difference between them.

– That performance on the test of subjects who have selected each incorrect alternative is less than the performance of subjects that have selected the correct one.

– It is expected that as the skill level of subjects increases, the percentage of those who select incorrect alternatives decrease and vice versa.

59

Page 60: Items analysis

5. Analysis of distractors5.1. Equiprobability of distractors

• Distractors are equiprobable if they are selected by a minimum of participants and if they are equally attractive to those who do not know the correct answer.

• Χ2 Test:

Ei = Expected (theoretical) frequency.

Oi = Observed frequency.

60

k

j i

ii

E

OE

1

22

Page 61: Items analysis

5. Analysis of distractors5.1. Equiprobability of distractors

• Degrees of freedom: K -1 (K = number of incorrect alternatives).

• Ho: Ei = Oi (in the participants that do not know the correct answer, the election of any distractor is equally attractive).

• Conclusion: – → The null hypothesis is accepted. The distractors are equally

attractive.– → The null hypothesis is rejected. The distractors are not equally

attractive.

61

2)1,(

2 kO

2)1,(

2 kO

Page 62: Items analysis

Example. Determine if the incorrect alternatives are equally attractive (α=0.05).

A B* C

Number of answers 136 142 92

62

5. Analysis of distractors5.1. Equiprobability of distractors

Page 63: Items analysis

To be equiprobable, each distractor should be selected by 114 participants.

63

5. Analysis of distractors5.1. Equiprobability of distractors

1142

228

2

92136

49.8114

968

114

484484

114

2222

114

)92114()136114(

22

22

1

22

i

k

j i

ii

E

E

OE

Page 64: Items analysis

8.49>3.84 → The null hypothesis is rejected. Incorrect alternatives are not equally attractive to all subjects, although they met the criterion of being selected by a minimum of 10% of the total sample (N).

64

5. Analysis of distractors5.1. Equiprobability of distractors

84.32)1,05.0(

2)12,05.0(

2)1,( k

3792

37136

37100

10*370%10

37092142136

N

Page 65: Items analysis

5. Analysis of distractors5.2. Discriminative power of distractors

• It is expected from a good distractor that its correlation with test scores is negative.

• To quantify the discriminative power of incorrect alternatives we use the correlation. Depending on the kind of variable, we will use biserial, biserial-point, phi or Pearson.

65

Page 66: Items analysis

• Example. Answer of 5 subjects to 4 items. Brackets show the alternatives selected by each subject and the correct alternative with an asterisk. Calculate discrimination of distractor b in item 3.

Items Total

Subjects 1(a*) 2(b*) 3(a*) 4(c*) X (X-i)

A 0 (b) 1 0 (b) 1 2 2

B 1 1 0 (b) 1 3 3

C 1 1 1 1 4 3

D 0 (c) 0 (a) 0 (b) 1 1 1

e 1 1 1 0 (b) 3 2

66

5. Analysis of distractors5.2. Discriminative power of distractors

Page 67: Items analysis

• Subjets who have selected alternative b (the incorrect one) in item 3 have been A, B and D. The mean of these subjects in the test after eliminating the analyzed item score is:

• Total mean of the test resting from the scores obtained by subjects, the score of item 3:

• Standard deviación of scores corresponding to (X-i):

• The proportion of subjects that have hit the item is 2/5=0.4; and the proportion of subjects that have failed is 3/5=0.6.

2 3 12

3AX

2 3 3 1 22.2

5T iX

2 2 2 2 222 3 3 1 2

(2.2) 0.56 0.755T iS

67

5. Analysis of distractors5.2. Discriminative power of distractors

Page 68: Items analysis

• The point-biserial correlation between the incorrect alternative “b” and test scores discounting the item score is:

– As the incorrect alternative in the score of these score subjects in the item is 0, no need to remove anything from the total test.

• The distractor discriminates in the opposite direction than the correct alternative. It's a good distractor.

2 2.2 0.40.22

0.75 0.6A T i

bpX i

X X pr

S q

68

5. Analysis of distractors5.2. Discriminative power of distractors

Page 69: Items analysis

• Visual inspection of the distribution of subject answers to the various alternatives.

– Proportion of subjects that have selected each option: p– Test mean of subjects that have selected each alterative:

mean– Discrimination index of all options: rbp

A B C*

Skill level High 20 25 55

Low 40 35 25

Statistics p 0.28 0.5 0.22

Mean 5 10 9

rbp -0.20 0.18 0.29

69

5. Analysis of distractors5.2. Discriminative power of distractors

Page 70: Items analysis

• Positive discrimination index: the correct alternative is mostly chosen by competent subjects.

• Distractor A:– Has been selected by an acceptable minimum of

subjects (28%) and is selected by subjects less competent in a higher proportion.

– The test mean of subjects who have selected it is less than the test mean of subjects that have selected the correct alternative (consistent with its negative discrimination index). 70

5. Analysis of distractors5.2. Discriminative power of distractors

Page 71: Items analysis

• Distractor B should be revised:– It is chosen as correct by the subjects with better

scores in the test.– It has been the most selected (50%), its

discrimination is positive, and the mean of the subjects that have selected it is higher than one of subjects who have chosen the correct alternative.

71

5. Analysis of distractors5.2. Discriminative power of distractors

Page 72: Items analysis

• In distractors analysis we still can go further and use statistical inference. The test mean of subjects that choose the correct alternative should be higher than the test mean of subjects that have chosen each distractor.– ANOVA. IV or factor: each item with as many levels

as answer alternatives. DV: the raw score of subjects in the test.

72

5. Analysis of distractors5.2. Discriminative power of distractors