temporal models for predicting student dropout in massive...

Temporal Models for Predicting Student Dropout inMassive Open Online Courses

Fei Mi, Dit-Yan Yeung

Hong Kong University of Science and Technology (HKUST)

[email protected] ([email protected])

November, 14th, 2015

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 1 / 17

Page 2: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Outline

1 Background and Motivation

2 Temporal Models

3 Experiments

4 Conclusion

Page 3: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Outline

2 Temporal Models

3 Experiments

4 Conclusion

Page 4: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Overview

1 What can we do?Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)

Page 5: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Overview

1 What can we do?

Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)

Page 6: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Overview

1 What can we do?Performance evaluation (Peer Grading)

Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)

Page 7: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Overview

1 What can we do?Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)

Build personalized platform (Recommendation)

Page 8: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Overview

Page 9: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Overview

Page 10: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Motivation of our work

1 High attrition rate commonly on MOOC platforms (60%� 80%)

2 Current methods: SVM, Logistic RegressionActivity features (lecture video, discussion forum)Static models

Page 11: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Motivation of our work

1 High attrition rate commonly on MOOC platforms (60%� 80%)

2 Current methods: SVM, Logistic RegressionActivity features (lecture video, discussion forum)Static models

Page 12: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Contribution of our work

1 A sequence labeling perspective

Week 1 Week 2 Week 3 Week 4 Week t

𝒙1 𝒙2 𝒙3 𝒙4 𝒙𝑡

𝑦1 𝑦2 𝑦3 𝑦4 𝑦𝑡 Labels

Activities

2 Compare di↵erent temporal machine learning modelsInput-output Hidden Markov Model (IOHMM)Recurrent Neural Network (RNN)RNN with long short-term memory (LSTM) cells

Page 13: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Activities

2 Compare di↵erent temporal machine learning models

Input-output Hidden Markov Model (IOHMM)Recurrent Neural Network (RNN)RNN with long short-term memory (LSTM) cells

Page 14: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Activities

2 Compare di↵erent temporal machine learning modelsInput-output Hidden Markov Model (IOHMM)Recurrent Neural Network (RNN)RNN with long short-term memory (LSTM) cells

Page 15: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Outline

2 Temporal Models

3 Experiments

4 Conclusion

Page 16: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

How to capture temporal information?

Sliding window structures (NLP tasks):

1 Features aggregated using sliding window structure

2 Temporal span fixed by sliding window

Temporal models:

1 Learn from the previous inputs and the current input

2 Temporal pathway allows a “memory” of the previous inputs topersist in the internal state

3 Flexible temporal span, learn from data

Page 17: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

How to capture temporal information?

Sliding window structures (NLP tasks):

1 Features aggregated using sliding window structure

2 Temporal span fixed by sliding window

Temporal models:

1 Learn from the previous inputs and the current input

2 Temporal pathway allows a “memory” of the previous inputs topersist in the internal state

3 Flexible temporal span, learn from data

Page 18: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Input-output Hidden Markov Model (IOHMM)

Originated from HMMLearn to map input sequences to output sequences

ht = Aht�1

+ Bxt +N (0,Q)

y

t

= Cht +N (0,R)(1)

IOHMM 1

Page 19: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Input-output Hidden Markov Model (IOHMM)

Originated from HMMLearn to map input sequences to output sequences

ht = Aht�1

+ Bxt +N (0,Q)

y

t

= Cht +N (0,R)(1)

!" !"#$

%&%&'(

!"'$

)&'( )& )&#(

Hidden&states

Dropout labels

Input&features

%&#(

…

IOHMM 1

Page 20: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Vanilla Recurrent Neural Network (Vanilla RNN)

RNN allows the network connections to form cycles.

ht = H(W1

xt + W

2

ht�1

+ b

h

)

y

t

= F(W3

ht + b

y

)(2)

Left: Vanilla RNN structure; Right: Vanilla RNN unfolded

Page 21: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Drawbacks of RNN

1 Influence of an input either decays or blows up as it cycles therecurrent connection

2 Vanishing gradient problem

3 The range of temporality that can be accessed in practice is usuallyquite limited

4 Dynamic state of regular RNN is short-term memory

Page 22: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Drawbacks of RNN

Page 23: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Drawbacks of RNN

Page 24: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Long Short-Term Memory Cell (LSTM)

1 Hochreiter & Schimidhuber (1997)solved the problem of getting anRNN to remember things for a longtime.

m n

1 Information get into a cellwhenever the “input” gate is on

2 Information stays in the cell solong as the “forget” gate isclosed

3 Information can read from thecell by turning the “output”gate on

Page 25: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Update Functions of LSTM

m n

i

t

= �(Wxi

x

t

+W

hi

h

t�1 +W

ci

c

t�1 + b

i

)

f

t

= �(Wxf

x

t

+W

hf

h

t�1 +W

cf

c

t�1 + b

f

)

c

t

= f

t

⌦ c

t�1 + i

t

⌦ tanh(Wxc

x

t

+W

hc

h

t�1 + b

c

)

o

t

= �(Wxo

x

t

+W

ho

h

t�1 +W

co

c

t�1 + b

o

)

h

t

= o

t

⌦ tanh(ct

)

(3)

Page 26: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Hybrid of LSTM Memory Cells and RNN (LSTM Network)

…

……

…

Left: Hybrid of LSTM and RNN (LSTM network); Right: LSTM networkunfolded

Page 27: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Outline

2 Temporal Models

3 Experiments

4 Conclusion

Page 28: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Datasets for Dropout Prediction

1 “Science of Gastronomy”, six-week course (Coursera).

2 85394 ! 39877

1 “Introduction to Java Programming”, ten-week course (edX).

2 46972 ! 27629

Page 29: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Datasets for Dropout Prediction

1 “Science of Gastronomy”, six-week course (Coursera).

2 85394 ! 39877

1 “Introduction to Java Programming”, ten-week course (edX).

2 46972 ! 27629

Page 30: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Dropout Definitions

1 Three definitions capture di↵erent contexts of the student status in a course

DEF1 Participation in the final week: whether a student will stayto the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015]

DEF2 Last week of engagement: whether the current week is thelast week the student has activities [Amnueypornsakul et al.2014,Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014]

DEF3 Participation in the next week: whether a student hasactivities in the comming week

Three dropout definitions

Time Week 1 Week 2 Week 3 Week 4 Week 5

Features [7,34,9,2,0,7,5] Zeros [6,3,12,4,1,8,3] Zeros Zeros

DEF1 1 1 1 1 1DEF2 0 0 1 1 nullDEF3 1 0 1 1 null

An illustrative example for DEF1-DEF3

Page 31: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Dropout Definitions

1 Three definitions capture di↵erent contexts of the student status in a course

DEF1 Participation in the final week: whether a student will stayto the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015]

DEF2 Last week of engagement: whether the current week is thelast week the student has activities [Amnueypornsakul et al.2014,Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014]

DEF3 Participation in the next week: whether a student hasactivities in the comming week

Three dropout definitions

Time Week 1 Week 2 Week 3 Week 4 Week 5

Features [7,34,9,2,0,7,5] Zeros [6,3,12,4,1,8,3] Zeros Zeros

DEF1 1 1 1 1 1DEF2 0 0 1 1 nullDEF3 1 0 1 1 null

An illustrative example for DEF1-DEF3

Page 32: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Model Performance Comparison

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

1Model Performance Comparison (DEF1)

LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

AUC scores of all models for Coursera course

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

AUC scores of all models for edX course

1 LSTM network performs consistently best

2 IOHMMs performance worst

3 Baselines ' vanilla RNN; Not consistent on two datasets

Page 33: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Page 34: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Page 35: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Page 36: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Outline

2 Temporal Models

3 Experiments

4 Conclusion

Page 37: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Takehome Message

1 Temporal perspective to dropout prediction problem

2 The e↵ectiveness of RNN and LSTM network

3 Try not “dropout” the MOOC courses you are taking

Page 38: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Takehome Message

1 Temporal perspective to dropout prediction problem

2 The e↵ectiveness of RNN and LSTM network

3 Try not “dropout” the MOOC courses you are taking

temporal models for predicting student dropout in massive...

Documents