using inverse planning for personalized feedback · after feedback reliable improvement, but no...

Using Inverse Planning for Personalized Feedback

Anna N. Rafferty Computer Science Department, Carleton College

Rachel A. Jansen Thomas L. Griffiths

Department of Psychology, University of California, Berkeley

Using Data for Personalization

Algorithm?

Provide experience X

Outline

• Inverse planning: Diagnosing misunderstandings about equation solving

• Developing personalized feedback based on diagnosis

• Testing effectiveness of personalized feedback

• Future directions

Interpreting Equation Solving: Bayesian Inverse Planning

⇥ = space of possible understandings

p(✓ | equations)

Algebra skills (𝜃1)


Representing Understanding: Θ

Conceptual Mal-rules 1+3x => 4x 3(2+5x) => 6+5x

Arithmetic 1+5.9x+3.2x => 1+8.1x -3+5+x => -2+x

Planning 3x+5x+4 = 2 => 3x+4 = -5x+2

e.g., Sleeman, 1984; Payne & Squibb, 1990; Koedinger & MacLaren,1997

✓ 2 ⇥: 6-dimensional vector of parameters related to skill

Bayesian Inverse Planning

p(✓ | equations) / p(✓)p(equations | ✓)




{Prior

Prior: Encode information about what misunderstandings are common

Likelihood

{p(✓ | equations) / p(✓)p(equations | ✓)




Likelihood: What is the probability of the observed data if the learner has a particular understanding?

{Prior Likelihood

{p(✓ | equations) / p(✓)p(equations | ✓)



Generative Model of Equation Solving: Markov Decision Processes

2 + 3x = 6 3x = 6 + 2 3x = 8

Move 2 to right side

Combine 6 and 2

Divide both sides by 3

...

𝜃 affects what actions are considered and transition probabilities for actions.

How are Actions Chosen?

Assume a noisily optimal policy:p(a | s) / exp(✓� ·Q(s, a))

Q(s, a) =X

s02S

p(s0|s, a) R(s, a) + �

X

a02A

p(a0|s0)Q(s0, a0)

!Long term expected value:

2 + 3x = 6 3x = 6 + 2 3x = 8

Move 2 to right side

Combine 6 and 2

Divide both sides by 3

...

Inverse Planning Overview

5 + 9 = 6.0x + 2.0x + 10.0[1 + 1 + 7.0x]5 + 9 = 6.0x + 2.0x + 10 + 10 + 70.0x

5 + 9 = 6.0x + 2.0x + 20 + 70.0x

14 = 6.0x + 2.0x + 20 + 70.0x

5 + 9 = 6.0x + 2.0x + 10.0[1 + 1 + 7.0x]5 + 9 = 6.0x + 2.0x + 10 + 10 + 70.0x

5 + 9 = 6.0x + 2.0x + 20 + 70.0x

14 = 6.0x + 2.0x + 20 + 70.0x

14 = 76.0x + 2.0x + 20.014 = 78.0x + 20.014 +�20 = 78.0x

�7 = 78.0x

� 778

= 1x

Arithmetic error parameter

Action planning parameter

Distributive property error parameter

Move term error parameter

...

Representation of understanding

(Θ)

Model of equation solving as a

(parameterized) MDP

Infer posterior probability over Θ

(MCMC)

0 0.5 10

0.5

1

Sign Error Parameter

Pro

bab

ilit

y

Value0 1

0

0.5

1

Combine Only Like Parameter

Pro

bab

ilit

y

Value0 0.5 1

0

0.5

1

Reciprocal Error Parameter

Pro

bab

ilit

y

Value0 0.5 1

0

0.5

1

Distributive Error Parameter

Pro

bab

ilit

y

Value0 2 4

0

0.5

1

Planning Parameter

Pro

bab

ilit

yValue

0 0.5 10

0.5

1

Arithmetic Error Parameter

Pro

bab

ilit

y

Value

Arithmetic Error Parameter

Prob

abili

ty

Value

Output for One Learner

How do we turn this into a feedback activity?

0 0.5 1

Value

0

0.5

Pro

bab

ilit

y

Move

0 1

Value

0

0.5

Pro

bab

ilit

yCombine

0 0.5 1

Value

0

0.5

Pro

bab

ilit

y

Divide

0 0.5 1

Value

0

0.5

Pro

bab

ilit

y

Distributive

0 2 4

Value

0

0.5

Pro

bab

ilit

y

Planning

0 0.5 1

Value

0

0.5

Pro

bab

ilit

y

Arithmetic

Feedback Activities

Overview of skills and assessment Text explanation and video from Khan Academy

Targeted practice with fading scaffolding

Testing Personalized Feedback

Session 1: Website Problem

Solving and Multiple Choice Test

Session 3: Website Problem

Solving and Multiple Choice Test

Session 2: Feedback Activity

Results: Changes in Performance Across Sessions

Targeted Feedback Random Feedback0

6

12

18

24

Sco

re

Accuracy Improvements by Time and Condition

Before Feedback

After Feedback


6

12

18

24

Sco

re


Before Feedback

After Feedback


6

12

18

24

Sco

re


Before Feedback

After Feedback


6

12

18

24

Sco

re


Before Feedback

After Feedback

Reliable improvement, but no difference in amount of improvement across conditions.

Performance Based on Proficiency Level of Feedback Skill

Skill level > 0.85 Skill level < 0.850

6

12

18

24

Sco

re

Accuracy Improvements by Time and Level of Skill

Before Feedback

After Feedback

Performance Change for Participants with Varying Skill Levels

Random Feedback Targeted Feedback0

6

12

18

24

Sco

re

Accuracy Improvements by Time and Conditionfor Participants with Some Mastered and Some Unmastered Skills

Before Feedback

After Feedback

Reliable difference in amount of improvement by condition.

Contributions and Next Steps

• Personalization using inverse planning is helpful for learners who struggle with only some skills

• Provides an applied metric assessing the algorithm

• Next steps:

• Greater specificity and more interactivity in feedback

• Longer term interventions

Thank you!

Acknowledgements: Thank you to students Jonathan Brodie and Sam Vinitsky for programming contributions.

Funding: This work is supported by NSF grant number DRL-1420732.

Contact: Anna Rafferty, [email protected]

Skill Proficiencies by Participant

0 1 2 3 4 5

Number of skills with proficiency < 0.85

0

0.2

0.4

0.6

0.8

1

Pro

po

rtio

n o

f p

arti

cip

ants

Markov Decision Processesa1 a2 a3

s1 s2 s3 ...

Actions:- move a term - multiply or divide by a constant - combine two terms - distribute a coefficient - stop solving

using inverse planning for personalized feedback · after feedback reliable improvement, but no...

Documents