ingénierie cognitive pour les environnements …€¢ could create item types, but unreliable and...

Ingenierie cognitive Structures de connaissances Arrimage Vrai/Meilleur Conclusion

Ingenierie cognitive et environnements d’apprentissage

Ingenierie cognitive pour les environnementsd’apprentissage

M.C. Desmarais

Polytechnique Montreal

Informatique cognitive, UQAM, 10 juin 2015

Desmarais Ingenierie cognitive 1/60


Ingenierie cognitive

Structures de connaissances

Arrimage des items aux competences latentes

La representativite d’un modele

Conclusion



Problem statement



Diagnostic des connaissances



Exerciseur



Mapping items to skillsExample 1

Problem VEUPS1aCompute cosine of CAB

9% success rate (N=246)

Problem VEUPS3Compute cosine of CAD





Problem VEUPS1aCompute cosine of CAB9% success rate (N=246)

Problem VEUPS3Compute cosine of CAD54% success rate (N=675)




Problem GPCER2aCompute area


Problem GPCER2bCompute area





Problem GPCER2aCompute area44% success rate (N=281)

Problem GPCER2bCompute area79% success rate (N=841)







Conclusion



Modelisation des competences (etude 1)

Objectifs de l’etude 1 :

• Determiner le modele de diagnostic des competences leplus performant

• Comparer une approche basee sur des traits latents aune approche basee sur les caracteristiquesobservables uniquement



A Bayesian Network example

BN example from Vomlel (2004)

latent

8>>><

>>>:

item

(

ACMI ACIM ACD

CL

MTCIM ADCDCMI

ACL

SB

X11X10X1

CP

HV1

X3

X9X8

X6X5 X4

X13 X20 X16X14 X12 X7 X18X15X19

X2

X17

MMT4MMT1 MMT2

MAD MSB

MMT3 MC



Network topologies

latent

8>>><

>>>:

item

(



Graphical representation of an IRT model

latent

(

item

(X

2

• • •X1

Xn

✓

• IRT: a single node (dimension/skill) to predict theoutcome to items X1, X2, ..., Xn.

• Logistic function determines probability of success:

P (Xi|✓) =1

1 + e�ai(✓�bi)

• Estimation of ability based on:

argmax

✓P (✓|X) = P (✓|X1, X2, ..., Xn) =

nYP (Xi|✓)



Network topologies

latent

8>>><

>>>:

item

(



Item to item approach

item

8>>>>>><

>>>>>>: X2

• • •X1

Xn

Xk

• One network for each observable node• Naive Bayes and simple posterior probability•arg max

Xk={0,1}P (Xk|X) =

Y

Xi2XP (Xi|Xk)

• Conditional probabilities replace the logistic function of IRT.They are directly obtained from frequency tables since allnodes are observable.



TAN: Tree-Augmented Network

item

8>>>>>><

>>>>>>: X2

X3

X1

Xk

X4

• A Naive Bayesian Network with a tree structure over leafnodes.

• Each leaf node can have at most two parents: Xk andsome other leaf node.

• Follows the usual Bayesian Network semantics:P (X) =

Y

Xi2XP (Xi|Xpa(Xi))



Performance comparison

• 1 model for single latent trait

• IRT: Item Response Theory X2

• • •X1

Xn

✓

• 3 models for item to item

• NB: Naive Bayes X2

• • •X1

Xn

Xk

• TAN: Tree Augmented Network• BNC: Bayesian Network Classifier, Variant of TAN with K2

algorithm

X2

X3

X1

Xk

X4



Simulation methodology

Simulation that consists in providing a subset of observednodes and predicting the outcome to all other nodes

• N-folds: 10 to 20 folds with test sample size from 10 to 100• Choice of 4–5 predictors (other items) based on

correlation with target• Measure of:

• AUC (Area Under the ROC Curve)• Accuracy at 0.5 cutoff



ROC: Receiver Operator Characteristic Curve

(from Tape, T.G. Interpreting Diagnostic Tests.)(http://gim.unmc.edu/dxtests/roc3.htm)



Data sets

1 College mathematics: 60 items on algebra and functions,trigonometry, geometry, matrices, and calculus;246 respondents newly registered in engineering

2 Fraction algebra: 20 items on basic fraction algebra rules;171 pupils, 10-12 years old

3 LSAT: 5 items from Law School Admission Test;1000 respondents (higher average: 76%)

4 UNIX: 34 items on UNIX shell commands;48 respondents (wide ranging scores)



AUC (Area under ROC Curve) performance

TAN BNC NB IRT AoV significance levelAll TAN-IRT w/o IRT

Coll. math 0.77 0.76 0.75 0.74 *** *** **Frac. algebra 0.90 0.90 0.88 0.85 *** *** **

LSAT 0.59 0.59 0.58 0.57 - - -UNIX 0.96 0.96 0.95 0.91 *** *** -

*** p < 0.001, ** p < 0.01, * p < 0.05 - p > 0.05

N.B. 0.91 ! 0.96 = 44% error reduction

0.85 ! 0.90 = 33% error reduction




Accuracy results

Accuracy at 0.5 cutoff

TAN BNC NB IRT AoV significance levelAll TAN-IRT w/o IRT

Coll. math 0.64 0.64 0.63 0.65 - - -Frac. algebra 0.70 0.70 0.68 0.71 - - -

LSAT 0.83 0.83 0.83 0.83 - - -UNIX 0.93 0.94 0.91 0.86 *** *** ***

*** p < 0.001, ** p < 0.01, * p < 0.05 - p > 0.05



Discussion

• Item to Item models either outperform or match thesingle skill IRT model

• Large differences between data sets• Small size favours item-to-item Bayesian models

• TAN/BNC slightly better than NB



Item to Item vs. Latent modelsAdvantages

Advantages of Item to Item models:• Good performance

• Still needs comparison to multidimensional IRT and othermore sophisticated models

• Does have sound cognitive foundations (cf. KnowledgeSpaces of Falmagne and Doignon, 1985)

• No knowledge engineering at the modeling phase• KE postponed to the skills assessment phase



Item to Item vs. Latent modelsDrawbacks

Drawbacks of Item to Item models:• May perform better, but does not replace knowledge

engineering for didactic purposes• Adding a new item requires learning with old items

• Actually a big drawback• IRT avoids this problem (parameter estimation is not

relative to other items)• Could create item types, but unreliable and falls into

knowledge engineering issues







Conclusion



Problem statement



Diagnostic des connaissances



Four Q-matricesVariations on Tatsuoka’s fraction algebra item set

Skills ofQM 1 QM 2 QM 3 QM 4

Item 1 2 31 1 1 02 1 0 13 1 0 14 1 0 05 1 1 06 1 1 07 1 0 18 1 0 19 1 0 0

10 1 0 011 1 1 0





Item 1 2 3 1 2 3 4 5 1 2 3 1 2 31 1 1 0 1 1 1 1 0 0 1 0 1 1 02 1 0 1 1 1 1 1 1 0 0 1 1 0 13 1 0 1 0 0 1 0 0 0 0 1 0 1 04 1 0 0 1 1 1 1 0 1 0 0 1 0 05 1 1 0 1 1 1 1 0 0 1 0 1 0 06 1 1 0 1 1 0 0 0 0 1 0 0 0 17 1 0 1 1 0 1 1 1 0 0 1 1 0 18 1 0 1 1 0 1 0 0 0 0 1 0 1 19 1 0 0 1 0 1 1 0 1 0 0 1 0 0

10 1 0 0 1 1 1 1 0 1 0 0 1 0 111 1 1 0 1 1 1 1 0 0 1 0 1 0 0



Data driven approaches

• Start withtest data:

R =

items

stud

ents

0

BB@

1 1 1

0 0 1

0 1 0

0 0 0

1

CCA

• Define aQ-matrix:

Q =

skills

item

s 0

@1 1 1

0 0 1

1 0 0

1

A

• Assessskills:

S =

skills

stud

ents

0

BB@

1 1 1

0 0 1

0 1 1

0 0 0

1

CCA

What we expect:R = S�QT

R =

items

stud

ents

0

BB@

1 1 1

0 1 00 1 10 0 0

1

CCA



Detecting perturbationsSynthetic data

Introduce perturbations in Q-matrix and assess suggestedchanges

2 4 6 8 10

01

23

45

67

True Positives (synth.)

Number of perturbations

Aver

age

frequ

ency

●

●

●

●

●

●

●

●

● ●

●● ● ● ●

● ● ● ● ●●

●

●

●

●

●

●

●

●

●

●

●

●

TotalChiu (2013)de la Torre (2008)ALS

2 4 6 8 10

False Positives (synth.)


Aver

age

frequ

ency

● ●● ●

● ●● ●

● ●

●●

●●

●●

● ● ● ●

● ●● ●

●

● ●●

●●



Detecting perturbationsReal data

Introduce perturbations in Q-matrix and assess suggestedchanges

2 4 6 8 10

01

23

45

67

True Positives (real)


Aver

age

frequ

ency

●

●

●

●

●

●

●

●

●

●

●● ●

●● ● ●

● ● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

TotalChiu (2013)de la Torre (2008)ALS

2 4 6 8 10

False Positives (real)


Aver

age

frequ

ency

●●

● ● ●● ● ● ● ●

● ● ●● ● ● ● ● ● ●

●●

● ●●

● ●● ●

●



Can we combine methods?



Three methods

1 MinRSS: minimizes the residual sum of square (RSS)between the real responses and the ideal responses

2 MaxDiff: maximizes the difference in the probabilities of acorrect response to an item between examinees whopossess all the skills required for a correct response to thatitem and examinees who do not

3 ALS: given a Q-matrix, find skills-matrix that minizes sumof square errors, then alternate to find new Q-matrix, andso on.



Partition tree

• The combination of methods based on a partition treealgorithm

• Factors retained• Number of skills per row• Number of skills per column• Stickyness: persistance of false positives/negatives for a

given Q-matrix



Partition tree example

node), split, n, deviance, yval

* denotes terminal node

1) root 43213 10583.7900 0.5712633

2) minrss< 0.5 22733 5146.0780 0.3462807

4) alsc< 0.5 13937 2561.8270 0.2427352 *

5) alsc>=0.5 8796 2198.0590 0.5103456 *

3) minrss>=0.5 20480 3009.7720 0.8209961

6) alsc>=1.5 1359 216.9595 0.1994113 *

7) alsc< 1.5 19121 2230.4200 0.8651744

14) alsc< 0.5 3452 720.6475 0.7030707 *

15) alsc>=0.5 15669 1399.0780 0.9008871 *



Process

Data generation

Test Synthetic

Test Real

2. Permutated QMs(ground truth)

3. Synthetic test outcome data with DINA model

(400 records)

10. Comparison with ground truth

5. Partition trees(3 types)

provides ground truth

labels for learning trees

Perturbations(one per cell)

4. Refinements with three techniques

9. Refinements with partition trees and

the three techniques

7. Refinements with partition trees and

the three techniques

8. Comparison with original QMi

6. Fraction data set

Perturbations(one per cell)

1. QMi

Permutations(1000)

Key principles:• Training of the partition

tree is done over syntheticdata

• This is how the influence offactors such asstickyness and skills perrow and column areassessed



Partition tree training data

Prediction Skills per Stickyness

Truth (1)M

inR

SS

(2)M

axD

iff

(3)A

LSC

row col (1)M

inR

SS

(2)M

axD

iff

(3)A

LSC

1 1 na 1 0 5 0.00 0.00 0.091 1 1 2 1 7 0.00 0.00 0.091 1 1 2 1 7 0.00 0.00 0.090 0 1 1 3 7 0.04 0.00 0.030 0 0 1 2 7 0.04 0.00 0.030 0 0 1 2 7 0.04 0.00 0.03



Test data

• Real data from Tatsuoka: 536 respondants• Q-matrices from different authors

• 20 ⇥ 8• 13 ⇥ 5• 15 ⇥ 3

• Common denominator of 11 items





Item 1 2 3 1 2 3 4 5 1 2 3 1 2 31 1 1 0 1 1 1 1 0 0 1 0 1 1 02 1 0 1 1 1 1 1 1 0 0 1 1 0 13 1 0 1 0 0 1 0 0 0 0 1 0 1 04 1 0 0 1 1 1 1 0 1 0 0 1 0 05 1 1 0 1 1 1 1 0 0 1 0 1 0 06 1 1 0 1 1 0 0 0 0 1 0 0 0 17 1 0 1 1 0 1 1 1 0 0 1 1 0 18 1 0 1 1 0 1 0 0 0 0 1 0 1 19 1 0 0 1 0 1 1 0 1 0 0 1 0 0

10 1 0 0 1 1 1 1 0 1 0 0 1 0 111 1 1 0 1 1 1 1 0 0 1 0 1 0 0



Cases

Perturbation Refinement

Value Value Value Outcomebefore after proposed

Perturbed cell(1) 0 1 0 correct (TP)(2) 1 0 1 correct (TP)(3) 0 1 1 wrong (FN)(4) 1 0 0 wrong (FN)

Non Perturbed cell(5) 0 0 0 correct (TN)*(6) 1 1 1 correct (TN)*(7) 0 0 1 wrong (FP)(8) 1 1 0 wrong (FP)

* ignoredDesmarais Ingenierie cognitive 43/60


Measures

• Finding back perturbed cells• 1-cell: TP or FN (recall)• 0-cell: TN or FP (precision)

• Harmonic mean (F-score)

F-score = 2⇥ precision⇥ recall

precision + recall

= 2⇥ Acc¬P ⇥AccP

Acc¬P +AccP



Results for Synthetic DataF-score

QM Technique Partition tree

MinRSS MaxDiff ALSC (1) (2) (3)

F-score

1 0.88 0.51 0.58 0.88 0.90 0.972 0.13 0.35 0.42 0.68 0.69 0.903 0.96 0.34 0.83 0.97 0.97 1.004 0.93 0.52 0.58 0.93 0.94 0.98

X 0.72 0.43 0.60 0.87 0.87 0.96




Results on Real DataF-score

QM Technique Partition tree


F-score

1 0.42 0.27 0.54 0.42 0.37 0.632 0.50 0.17 0.37 0.73 0.74 0.773 0.38 0.16 0.39 0.64 0.86 0.834 0.48 0.20 0.42 0.48 0.50 0.56

X 0.41 0.23 0.38 0.57 0.62 0.70




Late breaking results!

Boosting brings another:⇡ 10% improvement

Principles of boosting:• compute each record (observation) residual error (fit)• assign weights to records according to residual error• resample with weights as a probability or as factors to

re-estimate model parameters



Conclusion

• Major improvements obtained• 86% over synthetic data• 55% over real data• better reliability (systematically better than the best

method, while no method is systematically the best)• Some limits

• Single set of 11 questions• Static data



Boosting: Results on Real DataF-score (unvalidated yet)

QM Technique Boosting


F-score

1 0.42 0.27 0.54 0.65 0.72 0.982 0.50 0.17 0.37 0.60 0.81 0.883 0.23 0.27 0.18 0.64 0.82 0.984 0.48 0.20 0.42 0.55 0.72 0.99

X 0.41 0.23 0.38 0.61 0.77 0.96








Conclusion



Comment determiner qu’un modele est representatif desphenomenes derriere les donnees?

Representativite () Meilleure performance ?



How do we know a model fits the data?

Standard answer:• Pick the model with the highest predictive performance• Use person or item fit measures (given the ground truth)



How do we know a model fits the data?

Standard answer:• Pick the model with the highest predictive performance• Use person or item fit measures (given the ground truth)

Alternative answer:• Use performance signatures• Use parameter signatures (Pardos et al.)



Parameter signature (Rosenberg-Kima and Pardos)

Key idea: draw a likelihood map of the parameters giventhe data and compare



Performance signatures

Key idea:find the closest model in the performance spaceAssumptions:

• performance space is stable across conditions of• sample size and characteristic• parameter space



Performance of models across models’ data

Prediction technique

Perc

ent a

ccur

acy

diffe

renc

e fro

m E

xpec

ted

valu

e

−30

−20

−10

0

10

20

30

Expec

tedPOKS IRT

NMF.con Dina

NMF.add Dino

DINA

Expec

tedPOKS IRT

NMF.con Dina

NMF.add Dino

DINO

Expec

tedPOKS IRT

NMF.con Dina

NMF.add Dino

IRT.Rasch

Expec

tedPOKS IRT

NMF.con Dina

NMF.add Dino

NMF.Add

1 2 3 4 5 6 7

NMF.Con

1 2 3 4 5 6 7

POKS

−30

−20

−10

0

10

20

30

1 2 3 4 5 6 7

Random

Each blockrepresentsa syntheticgenerateddataset



Performance of models across models’ data


Perc

ent a

ccur

acy

diffe

renc

e fro

m E

xpec

ted

valu

e

−10

−5

0

5

10

Expec

tedPOKS IRT

NMF.con Dina

NMF.add Dino

ECPE

Expec

tedPOKS IRT

NMF.con Dina

NMF.add Dino

Fraction

Expec

tedPOKS IRT

NMF.con Dina

NMF.add Dino

Fraction1

Expec

tedPOKS IRT

NMF.con Dina

NMF.add Dino

Fraction2.1

Fraction2.2 Fraction2.3

−10

−5

0

5

10

Vomlel Each blockrepresentsa Realdataset



Real vs. synthetic comparison


Perc

ent accura

cy d

iffe

rence fro

m E

xpecte

d v

alu

e

−30

−20

−10

0

10

20

30

Expecte

d

POKS IR

T

NMF.con

Dina

NMF.a

ddDino

DINA

Expecte

d

POKS IR

T

NMF.con

Dina

NMF.a

ddDino

DINO

Expecte

d

POKS IR

T

NMF.con

Dina

NMF.a

ddDino

IRT.Rasch

Expecte

d

POKS IR

T

NMF.con

Dina

NMF.a

ddDino

NMF.Add

1 2 3 4 5 6 7

NMF.Con

1 2 3 4 5 6 7

POKS

−30

−20

−10

0

10

20

30

1 2 3 4 5 6 7

Random

Each blockrepresentsa syntheticgenerateddataset


Perc

ent accura

cy d

iffe

rence fro

m E

xpecte

d v

alu

e

−10

−5

0

5

10

Expecte

d

POKS IR

T

NMF.con

Dina

NMF.a

ddDino

ECPE

Expecte

d

POKS IR

T

NMF.con

Dina

NMF.a

ddDino

Fraction

Expecte

d

POKS IR

T

NMF.con

Dina

NMF.a

ddDino

Fraction1

Expecte

d

POKS IR

T

NMF.con

Dina

NMF.a

ddDino

Fraction2.1

Fraction2.2 Fraction2.3

−10

−5

0

5

10

Vomlel Each blockrepresentsa Realdataset







Conclusion



Effervescence des approches numeriques etstatistiques pour l’ingenierie cognitive

• Abondance de donnees educationnelles• Affluence de techniques et d’outils pour la simulation et

l’analyse des donnees• Emergence d’un paradigme numerique et statistique a

l’ingenierie cognitive



Questions?


ingénierie cognitive pour les environnements …€¢ could create item types, but unreliable and...

Documents