temporal models for predicting student dropout in massive...
TRANSCRIPT
Temporal Models for Predicting Student Dropout inMassive Open Online Courses
Fei Mi, Dit-Yan Yeung
Hong Kong University of Science and Technology (HKUST)
[email protected] ([email protected])
November, 14th, 2015
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 1 / 17
Outline
1 Background and Motivation
2 Temporal Models
3 Experiments
4 Conclusion
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 2 / 17
Outline
1 Background and Motivation
2 Temporal Models
3 Experiments
4 Conclusion
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17
Overview
1 What can we do?Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17
Overview
1 What can we do?
Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17
Overview
1 What can we do?Performance evaluation (Peer Grading)
Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17
Overview
1 What can we do?Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)
Build personalized platform (Recommendation)
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17
Overview
1 What can we do?Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17
Overview
1 What can we do?Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 4 / 17
Motivation of our work
1 High attrition rate commonly on MOOC platforms (60%� 80%)
2 Current methods: SVM, Logistic RegressionActivity features (lecture video, discussion forum)Static models
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 5 / 17
Motivation of our work
1 High attrition rate commonly on MOOC platforms (60%� 80%)
2 Current methods: SVM, Logistic RegressionActivity features (lecture video, discussion forum)Static models
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 5 / 17
Contribution of our work
1 A sequence labeling perspective
Week 1 Week 2 Week 3 Week 4 Week t
𝒙1 𝒙2 𝒙3 𝒙4 𝒙𝑡
𝑦1 𝑦2 𝑦3 𝑦4 𝑦𝑡 Labels
Activities
2 Compare di↵erent temporal machine learning modelsInput-output Hidden Markov Model (IOHMM)Recurrent Neural Network (RNN)RNN with long short-term memory (LSTM) cells
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 6 / 17
Contribution of our work
1 A sequence labeling perspective
Week 1 Week 2 Week 3 Week 4 Week t
𝒙1 𝒙2 𝒙3 𝒙4 𝒙𝑡
𝑦1 𝑦2 𝑦3 𝑦4 𝑦𝑡 Labels
Activities
2 Compare di↵erent temporal machine learning models
Input-output Hidden Markov Model (IOHMM)Recurrent Neural Network (RNN)RNN with long short-term memory (LSTM) cells
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 6 / 17
Contribution of our work
1 A sequence labeling perspective
Week 1 Week 2 Week 3 Week 4 Week t
𝒙1 𝒙2 𝒙3 𝒙4 𝒙𝑡
𝑦1 𝑦2 𝑦3 𝑦4 𝑦𝑡 Labels
Activities
2 Compare di↵erent temporal machine learning modelsInput-output Hidden Markov Model (IOHMM)Recurrent Neural Network (RNN)RNN with long short-term memory (LSTM) cells
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 6 / 17
Outline
1 Background and Motivation
2 Temporal Models
3 Experiments
4 Conclusion
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 7 / 17
How to capture temporal information?
Sliding window structures (NLP tasks):
1 Features aggregated using sliding window structure
2 Temporal span fixed by sliding window
Temporal models:
1 Learn from the previous inputs and the current input
2 Temporal pathway allows a “memory” of the previous inputs topersist in the internal state
3 Flexible temporal span, learn from data
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 7 / 17
How to capture temporal information?
Sliding window structures (NLP tasks):
1 Features aggregated using sliding window structure
2 Temporal span fixed by sliding window
Temporal models:
1 Learn from the previous inputs and the current input
2 Temporal pathway allows a “memory” of the previous inputs topersist in the internal state
3 Flexible temporal span, learn from data
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 7 / 17
Input-output Hidden Markov Model (IOHMM)
Originated from HMMLearn to map input sequences to output sequences
ht = Aht�1
+ Bxt +N (0,Q)
y
t
= Cht +N (0,R)(1)
IOHMM 1
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 8 / 17
Input-output Hidden Markov Model (IOHMM)
Originated from HMMLearn to map input sequences to output sequences
ht = Aht�1
+ Bxt +N (0,Q)
y
t
= Cht +N (0,R)(1)
!" !"#$
%&%&'(
!"'$
)&'( )& )&#(
Hidden&states
Dropout labels
Input&features
%&#(
…
…
…
…
IOHMM 1
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 8 / 17
Vanilla Recurrent Neural Network (Vanilla RNN)
RNN allows the network connections to form cycles.
ht = H(W1
xt + W
2
ht�1
+ b
h
)
y
t
= F(W3
ht + b
y
)(2)
Left: Vanilla RNN structure; Right: Vanilla RNN unfolded
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 9 / 17
Drawbacks of RNN
1 Influence of an input either decays or blows up as it cycles therecurrent connection
2 Vanishing gradient problem
3 The range of temporality that can be accessed in practice is usuallyquite limited
4 Dynamic state of regular RNN is short-term memory
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 10 / 17
Drawbacks of RNN
1 Influence of an input either decays or blows up as it cycles therecurrent connection
2 Vanishing gradient problem
3 The range of temporality that can be accessed in practice is usuallyquite limited
4 Dynamic state of regular RNN is short-term memory
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 10 / 17
Drawbacks of RNN
1 Influence of an input either decays or blows up as it cycles therecurrent connection
2 Vanishing gradient problem
3 The range of temporality that can be accessed in practice is usuallyquite limited
4 Dynamic state of regular RNN is short-term memory
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 10 / 17
Long Short-Term Memory Cell (LSTM)
1 Hochreiter & Schimidhuber (1997)solved the problem of getting anRNN to remember things for a longtime.
m n
1 Information get into a cellwhenever the “input” gate is on
2 Information stays in the cell solong as the “forget” gate isclosed
3 Information can read from thecell by turning the “output”gate on
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 11 / 17
Update Functions of LSTM
m n
i
t
= �(Wxi
x
t
+W
hi
h
t�1 +W
ci
c
t�1 + b
i
)
f
t
= �(Wxf
x
t
+W
hf
h
t�1 +W
cf
c
t�1 + b
f
)
c
t
= f
t
⌦ c
t�1 + i
t
⌦ tanh(Wxc
x
t
+W
hc
h
t�1 + b
c
)
o
t
= �(Wxo
x
t
+W
ho
h
t�1 +W
co
c
t�1 + b
o
)
h
t
= o
t
⌦ tanh(ct
)
(3)
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 12 / 17
Hybrid of LSTM Memory Cells and RNN (LSTM Network)
…
…
…
……
…
Left: Hybrid of LSTM and RNN (LSTM network); Right: LSTM networkunfolded
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 13 / 17
Outline
1 Background and Motivation
2 Temporal Models
3 Experiments
4 Conclusion
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 14 / 17
Datasets for Dropout Prediction
1 “Science of Gastronomy”, six-week course (Coursera).
2 85394 ! 39877
1 “Introduction to Java Programming”, ten-week course (edX).
2 46972 ! 27629
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 14 / 17
Datasets for Dropout Prediction
1 “Science of Gastronomy”, six-week course (Coursera).
2 85394 ! 39877
1 “Introduction to Java Programming”, ten-week course (edX).
2 46972 ! 27629
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 14 / 17
Dropout Definitions
1 Three definitions capture di↵erent contexts of the student status in a course
DEF1 Participation in the final week: whether a student will stayto the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015]
DEF2 Last week of engagement: whether the current week is thelast week the student has activities [Amnueypornsakul et al.2014,Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014]
DEF3 Participation in the next week: whether a student hasactivities in the comming week
Three dropout definitions
Time Week 1 Week 2 Week 3 Week 4 Week 5
Features [7,34,9,2,0,7,5] Zeros [6,3,12,4,1,8,3] Zeros Zeros
DEF1 1 1 1 1 1DEF2 0 0 1 1 nullDEF3 1 0 1 1 null
An illustrative example for DEF1-DEF3
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 15 / 17
Dropout Definitions
1 Three definitions capture di↵erent contexts of the student status in a course
DEF1 Participation in the final week: whether a student will stayto the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015]
DEF2 Last week of engagement: whether the current week is thelast week the student has activities [Amnueypornsakul et al.2014,Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014]
DEF3 Participation in the next week: whether a student hasactivities in the comming week
Three dropout definitions
Time Week 1 Week 2 Week 3 Week 4 Week 5
Features [7,34,9,2,0,7,5] Zeros [6,3,12,4,1,8,3] Zeros Zeros
DEF1 1 1 1 1 1DEF2 0 0 1 1 nullDEF3 1 0 1 1 null
An illustrative example for DEF1-DEF3
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 15 / 17
Model Performance Comparison
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for edX course
1 LSTM network performs consistently best
2 IOHMMs performance worst
3 Baselines ' vanilla RNN; Not consistent on two datasets
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 16 / 17
Model Performance Comparison
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for edX course
1 LSTM network performs consistently best
2 IOHMMs performance worst
3 Baselines ' vanilla RNN; Not consistent on two datasets
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 16 / 17
Model Performance Comparison
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for edX course
1 LSTM network performs consistently best
2 IOHMMs performance worst
3 Baselines ' vanilla RNN; Not consistent on two datasets
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 16 / 17
Model Performance Comparison
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for edX course
1 LSTM network performs consistently best
2 IOHMMs performance worst
3 Baselines ' vanilla RNN; Not consistent on two datasets
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 16 / 17
Outline
1 Background and Motivation
2 Temporal Models
3 Experiments
4 Conclusion
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 17 / 17
Takehome Message
1 Temporal perspective to dropout prediction problem
2 The e↵ectiveness of RNN and LSTM network
3 Try not “dropout” the MOOC courses you are taking
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 17 / 17
Takehome Message
1 Temporal perspective to dropout prediction problem
2 The e↵ectiveness of RNN and LSTM network
3 Try not “dropout” the MOOC courses you are taking
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 17 / 17