in focus presentation: improving retention: predicting at-risk students by analysing clicking...
DESCRIPTION
Improving retention: predicting at-risk students by analysing clicking behaviour in a virtual learning environment. Presentation from 'InFocus: Learner analytics and big data', a CDE technology symposium held at Senate House on 10 December 2013. Conducted by Annika Wolff, Knowledge Media Institute, Open University. Audio of the session and more details can be found at www.cde.london.ac.uk.TRANSCRIPT
Improving retention: predicting at-risk students by analysing clicking behaviour in a virtual learning environment
Annika Wolff and Zdenek Zdrahal10th December 2013
Student retention
• Struggling students don’t always ask for help – drop-out of module or fail and then don’t progress further
• When timely help is offered, this can make the difference between success and failure.
• It can be hard to know who’s in trouble and where to direct resources
Open University context
students
tutors
Distance learning:• Content through VLE• Contact mediated
through VLE – how to tell if students are struggling?
Solution: develop predictive models from student data
Data sources and data sets
VLE Assessment Demographic
Learning contentForumsQuizzes….
Ongoing assessmentsFinal exam
AgeGenderPrevious study…..
Typical VLE clicks
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 470
500
1000
1500
2000
2500
3000
Students Tutors
VLE activity (prior TMA1)• No VLE activity … 317 students• 1-20 clicks ……….. 609 students• 21-80 clicks ……… 943 students• 81-150 clicks ……. 621 students• 151-300 clicks …. 803 students• 301-600 clicks …. 516 students• > 600 clicks ……… 355
students
Problem specification
• Given:– Demographic data at the Start (may include information about
student’s previous modules studied at the OU and his/her objectives)– Assessments (TMAs) as they are available during the module– VLE activities between TMAs– Conditions student must satisfy to pass the module
• Goal: – Identify students at risk of failing the module as early as possible so
that OU intervention is meaningful.
Comments on problem specification
• OU intervention is meaningful if the cost of the intervention is lower than the expected gain from retaining the student.
• Modelling the problem:
We are here
Comments on problem specification
• OU intervention is meaningful if the cost of intervention is lower than the expected gain from retaining the student.
• Modelling the problem:
We are here
History we know
Comments on problem specification
• OU intervention is meaningful if the cost of intervention is lower than the expected gain from retaining the student.
• Modelling the problem:
We are here
History we know Future we can estimate
Comments on problem specification
• OU intervention is meaningful if the cost of intervention is lower than the expected gain from retaining the student.
• Modelling the problem:
We are here
History we know Future we can estimate
… and we can influence!
Comments on problem specification
• OU intervention is meaningful if the cost of intervention is lower than the expected gain from retaining the student.
• Modelling the problem:
We are here
History we know Future we can estimate
How can we estimate the future? … Based on student’s history and properties of upcoming parts of the module known from previous presentations.
Prediction at TMA1
– Why? TMA1 is a good predictor of success or failure
– It is enough time to intervene
We are hereHistory we know Future we can affect
Building a classifier
Training instances
New instances
FAIL
PASS
PassFail
Pass
Fail
FailPass
Assessment 1 score?
>40% <40%
Decision Tree – first results (no demographics)
Performance drop (VLE+TMA)
Final outcome
Naïve Bayes network
Sex
Education
N/C
VLE
TMA1
• Education:– No formal qualif.– Lower than A level– A level– HE qualif.– Postgraduate qualif.
• VLE:– No engagement– 1-20 clicks– 21-100 clicks– 101 – 800 clicks
• N/C:– New student– Continuing student
• Sex:– Female– Male
Goal:Calculate probability of failing at TMA1 • either by not submitting TMA1,• or by submitting with score < 40.
Predicting final result from TMA1
TMA1 Final resultTMA7TMA2
Pass/Distinction
Fail
TMA1 >=40
TMA1 <40
Prior probabilities: P(Success) = 0.807, P(Fail) = 0.193
Posteriori probabilities: P(Success|TMA1) = 0.858, P(Fail|TMA1) = 0.142P(Success|~TMA1) = 0.093, P(Fail|~TMA1) = 0.907
Bayes minimum error classifierIf student fails in TMA1 he/she is likely to fail the final result
VLE
P(Fail|TMA1-score), P(Pass/Dist|TMA1-score)
0-39 40-59 60-69 70-79 80-1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FailPass/Dist
TMA1
Predicting final result from TMA1
Sex
Education
N/C
VLE
TMA1 Final resultTMA7TMA2
Pass/Distinction
Fail
TMA1 >=40
TMA1 <40
Prior probabilities: P(Success) = 0.807, P(Fail) = 0.193
Posteriori probabilities: P(Success|TMA1) = 0.858, P(Fail|TMA1) = 0.142P(Success|~TMA1) = 0.093, P(Fail|~TMA1) = 0.907
Bayes minimum error classifierIf student fails in TMA1 he/she is likely to fail the final result
VLE
Demo Case 1• Demographic data
– Student fits certain demographic profile of gender, educational background etc.
Sex
Education
N/CTMA1
Without VLE:Probability of failing at TMA1 = 18.5%
Sex
Education
N/C
VLE
TMA1
Clicks Probability Nr of students0 64% 4
1-20 44% 3
21-100 26% 5
101-800 6.3% 14
With VLE:
Demo Case 2• Demographic data
– Different demographic profile to previous slide
Sex
Education
N/CTMA1
Without VLE:Probability of failing at TMA1 = 7.7%
Sex
Education
N/C
VLE
TMA1
Clicks Probability Nr of students0 39% 35
1-20 22% 74
21-100 11.2% 178
101-800 2.4% 461
With VLE:
TMA1? … it might be too late!
Can we predict TMA1 from VLE activities 1 week before the TMA1 deadline? How about 2, 3, … weeks?
We are here
History Future we can affect
predicted to fail
has not engaged with VLE
average score < 40
Dashboard and Chart
at least one TMA below 40
Has not submitted TMA5
has not engaged with VLEaverage score = 81.71 !!!
However
Dashboard – new design
Conclusions
• In a distance learning context, the VLE data provides a valuable source of data for prediction
• Prediction improves as a module progresses, but this is too late!
• We need to optimise methods for early prediction