a machine learning approach for automatic student model discovery

A MACHINE LEARNING APPROACH FOR AUTOMATIC STUDENT MODEL DISCOVERYNan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger

Computer Science Department

Carnegie Mellon University

2

STUDENT MODEL

A set of knowledge components (KCs) Encoded in intelligent tutors to model

how students solve problemsExample: What to do next on problems like

3x=12 A key factor behind instructional

decisions in automated tutoring systems

3

STUDENT MODEL CONSTRUCTION

Traditional Methods Structured interviews Think-aloud protocols Rational analysis

Previous Automated Methods Learning factor analysis (LFA)

Proposed Approach Use a machine-learning

agent, SimStudent, to acquire knowledge

1 production rule acquired => 1 KC in student model (Q matrix)

Require expert input.Highly subjective.

Within the search space of human-provided factors.

Independent of human-provided

factors.

4

A BRIEF REVIEW OF SIMSTUDENT

• A machine-learning agent that• acquires production

rules from• examples & problem

solving experience• given a set of

feature predicates & functions

5

PRODUCTION RULES

Skill divide (e.g. -3x = 6)

What: Left side (-3x) Right side (6)

When: Left side (-3x) does not

have constant term

=> How:

Get-coefficient (-3) of left side (-3x)

Divide both sides with the coefficient

Each production rule is associated with one KC

Each step (-3x = 6) is labeled with one KC, decided by the production applied to that step

Original model required strong domain-specific operators, like Get-coefficient Does not differentiate important distinctions in learning (e.g., -x=3 vs -3x = 6)

6

DEEP FEATURE LEARNING

Expert vs Novice (Chi et al., 1981) Example: What’s the coefficient of -3x?

Expert uses deep functional features to reply -3 Novice may use shallow perceptual features to reply 3

Model deep feature learning using machine learning techniques

Integrate acquired knowledge into SimStudent learning

Remove dependence on strong operators & split KCs into finer grain sizes

7

FEATURE RECOGNITION ASPCFG INDUCTION

Underlying structure in the problem Grammar

Feature Non-terminal symbol in a grammar rule

Feature learning task Grammar induction Student errors Incorrect parsing

8

LEARNING PROBLEM

Input is a set of feature recognition records consisting of An original problem (e.g. -3x) The feature to be recognized (e.g. -3 in -3x)

Output A probabilistic context free grammar (PCFG) A non-terminal symbol in a grammar rule that

represents target feature

9

A TWO-STEP PCFG LEARNING ALGORITHM

• Greedy Structure Hypothesizer: Hypothesizes grammar

rules in a bottom-up fashion

Creates non-terminal symbols for frequently occurred sequences

E.g. – and 3, SignedNumber and Variable

• Viterbi Training Phase: Refines rule

probabilities Occur more frequently

Higher probabilitiesGeneralizes Inside-Outside Algorithm (Lary & Young, 1990)

10

EXAMPLE OF PRODUCTION RULES BEFORE AND AFTER INTEGRATION

Extend the “What” Part in Production RuleOriginal:

Skill divide (e.g. -3x = 6)What:

Left side (-3x)Right side (6)

When:Left side (-3x) does not have constant term

=>How:

Get coefficient (-3) of left side (-3x)Divide both sides with the coefficient (-3)

Extended:Skill divide (e.g. -3x = 6)What:

Left side (-3, -3x)Right side (6)


=>How:


• Fewer operators• Eliminate need for domain-specific operators

11

Original:Skill divide (e.g. -3x = 6)What:

Left side (-3x)Right side (6)


=>How:


12

EXPERIMENT METHOD

SimStudent vs. Human-generated model Code real student data

71 students used a Carnegie Learning Algebra I Tutor on equation solving

SimStudent: Tutored by a Carnegie Learning Algebra I Tutor Coded each step by the applicable production rule Used human-generated coding in case of no applicable

production Human-generated model:

Coded manually based on expertise

13

HUMAN-GENERATED VS SIMSTUDENT KCS

Human-generated Model

SimStudent

Comment

Total # of KCs 12 21

# of Basic Arithmetic Operation KCs

4 13 Split into finer grain sizes based on different problem forms

# of Typein KCs 4 4 Approximately the same

# of Other Transformation Operation KCs (e.g. combine like terms)

4 4 Approximately the same

14

HOW WELL TWO MODELS FIT WITH REAL STUDENT DATA Used Additive Factor Model (AFM)

An instance of logistic regression that Uses each student, each KC and KC by opportunity

interaction as independent variables To predict probabilities of a student making an error

on a specific step

divide 1 1 1 1 1 1 1 1 1 1

simSt-divide 1 1 1 1 1 1 1 0 0 0

simSt-divide-1

0 0 0 0 0 0 0 1 1 1

AN EXAMPLE OF SPLIT IN DIVISION Human-generated

Model divide:

Ax=B & -x=A SimStudent

simSt-divide: Ax=B

simSt-divide-1: -x=A

Ax=B -x=A

16

PRODUCTION RULES FOR DIVISION

Skill simSt-divide (e.g. -3x = 6) What:

Left side (-3, -3x) Right side (6)

When: Left side (-3x) does not

have constant term How:

Divide both sides with the coefficient (-3)

Skill simSt-divide-1 (e.g. -x = 3) What:

Left side (-x) Right side (3)

When: Left side (-x) is of the

form -v How:

Generate one (1) Divide both sides with -1

17

AN EXAMPLE WITHOUT SPIT IN DIVIDE TYPEIN

Human-generated Model divide-typein

SimStudent simSt-divide-

typein

divide-typein 1 1 1 1 1 1 1 1 1

simSt-divide-typin

1 1 1 1 1 1 1 1 1

18

SIMSTUDENT VS SIMSTUDENT + FEATURE LEARNING

SimStudent Needs strong

operators Constructs student

models similar to human-generated model

Extended SimStudent Only requires weak

operators Split KCs into finer

grain sizes based on different parse trees

Does Extended SimStudent produce a KC model that better fits student learning data?

19

RESULTS

Human-generated Model

SimStudent

AIC 6529 6448

3-Fold Cross Validation RMSE

0.4034 0.3997

Significance Test SimStudent outperforms the human-

generated model in 4260 out of 6494 steps p < 0.001

SimStudent outperforms the human-generated model across 20 runs of cross validation

p < 0.001

20

SUMMARY

Presented an innovative application of a machine-learning agent, SimStudent, for an automatic discovery of student models.

Showed that a SimStudent generated student model was a better predictor of real student learning behavior than a human-generate model.

21

FUTURE STUDIES

Test generality in other datasets in DataShop

Apply this proposed approach in other domains Stoichiometry Fraction addition

23

AN EXAMPLE IN ALGEBRA

24

FEATURE RECOGNITION ASPCFG INDUCTION

Underlying structure in the problem Grammar

Feature Non-terminal symbol in a grammar rule

Feature learning task Grammar induction Student errors Incorrect parsing

25

LEARNING PROBLEM

Input is a set of feature recognition records consisting of An original problem (e.g. -3x) The feature to be recognized (e.g. -3 in -3x)

Output A probabilistic context free grammar (PCFG) A non-terminal symbol in a grammar rule that

represents target feature

26

A COMPUTATIONAL MODEL OF DEEP FEATURE LEARNING

Extended a PCFG Learning Algorithm (Li et al., 2009)

Feature Learning Stronger Prior Knowledge:

Transfer Learning Using Prior Knowledge

27

A TWO-STEP PCFG LEARNING ALGORITHM

• Greedy Structure Hypothesizer: Hypothesizes grammar

rules in a bottom-up fashion

Creates non-terminal symbols for frequently occurred sequences

E.g. – and 3, SignedNumber and Variable

• Viterbi Training Phase: Refines rule

probabilities Occur more frequently

Higher probabilitiesGeneralizes Inside-Outside Algorithm (Lary & Young, 1990)

28

FEATURE LEARNING

Build most probable parse trees For all observation

sequences Select a non-

terminal symbol that Matches the most

training records as the target feature

29

TRANSFER LEARNING USING PRIOR KNOWLEDGE

GSH Phase: Build parse trees

based on some previously acquired grammar rules

Then call the original GSH

Viterbi Training: Add rule frequency

in previous task to the current task

0.660.330.50.5

a machine learning approach for automatic student model discovery

Documents

model deep feature learning

learning probleminput

machine learning approach

grammar rules

target feature

xthe feature

steporiginal model

grammar tree image