in focus presentation: improving retention: predicting at-risk students by analysing clicking...

Improving retention: predicting at-risk students by analysing clicking behaviour in a virtual learning environment

Annika Wolff and Zdenek Zdrahal10th December 2013

Student retention

• Struggling students don’t always ask for help – drop-out of module or fail and then don’t progress further

• When timely help is offered, this can make the difference between success and failure.

• It can be hard to know who’s in trouble and where to direct resources

Open University context

students

tutors

Distance learning:• Content through VLE• Contact mediated

through VLE – how to tell if students are struggling?

Solution: develop predictive models from student data

Data sources and data sets

VLE Assessment Demographic

Learning contentForumsQuizzes….

Ongoing assessmentsFinal exam

AgeGenderPrevious study…..

Typical VLE clicks

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 470

500

1000

1500

2000

2500

3000

Students Tutors

VLE activity (prior TMA1)• No VLE activity … 317 students• 1-20 clicks ……….. 609 students• 21-80 clicks ……… 943 students• 81-150 clicks ……. 621 students• 151-300 clicks …. 803 students• 301-600 clicks …. 516 students• > 600 clicks ……… 355

students

Problem specification

• Given:– Demographic data at the Start (may include information about

student’s previous modules studied at the OU and his/her objectives)– Assessments (TMAs) as they are available during the module– VLE activities between TMAs– Conditions student must satisfy to pass the module

• Goal: – Identify students at risk of failing the module as early as possible so

that OU intervention is meaningful.

Comments on problem specification

• OU intervention is meaningful if the cost of the intervention is lower than the expected gain from retaining the student.

• Modelling the problem:

We are here


• OU intervention is meaningful if the cost of intervention is lower than the expected gain from retaining the student.


We are here

History we know




We are here

History we know Future we can estimate




We are here


… and we can influence!




We are here


How can we estimate the future? … Based on student’s history and properties of upcoming parts of the module known from previous presentations.

Prediction at TMA1

– Why? TMA1 is a good predictor of success or failure

– It is enough time to intervene

We are hereHistory we know Future we can affect

Building a classifier

Training instances

New instances

FAIL

PASS

PassFail

Pass

Fail

FailPass

Assessment 1 score?

>40% <40%

Decision Tree – first results (no demographics)

Performance drop (VLE+TMA)

Final outcome

Naïve Bayes network

Sex

Education

N/C

VLE

TMA1

• Education:– No formal qualif.– Lower than A level– A level– HE qualif.– Postgraduate qualif.

• VLE:– No engagement– 1-20 clicks– 21-100 clicks– 101 – 800 clicks

• N/C:– New student– Continuing student

• Sex:– Female– Male

Goal:Calculate probability of failing at TMA1 • either by not submitting TMA1,• or by submitting with score < 40.

Predicting final result from TMA1

TMA1 Final resultTMA7TMA2

Pass/Distinction

Fail

TMA1 >=40

TMA1 <40

Prior probabilities: P(Success) = 0.807, P(Fail) = 0.193

Posteriori probabilities: P(Success|TMA1) = 0.858, P(Fail|TMA1) = 0.142P(Success|~TMA1) = 0.093, P(Fail|~TMA1) = 0.907

Bayes minimum error classifierIf student fails in TMA1 he/she is likely to fail the final result

VLE

P(Fail|TMA1-score), P(Pass/Dist|TMA1-score)

0-39 40-59 60-69 70-79 80-1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FailPass/Dist

TMA1

Predicting final result from TMA1

Sex

Education

N/C

VLE

TMA1 Final resultTMA7TMA2

Pass/Distinction

Fail

TMA1 >=40

TMA1 <40

Prior probabilities: P(Success) = 0.807, P(Fail) = 0.193

Posteriori probabilities: P(Success|TMA1) = 0.858, P(Fail|TMA1) = 0.142P(Success|~TMA1) = 0.093, P(Fail|~TMA1) = 0.907

Bayes minimum error classifierIf student fails in TMA1 he/she is likely to fail the final result

VLE

Demo Case 1• Demographic data

– Student fits certain demographic profile of gender, educational background etc.

Sex

Education

N/CTMA1

Without VLE:Probability of failing at TMA1 = 18.5%

Sex

Education

N/C

VLE

TMA1

Clicks Probability Nr of students0 64% 4

1-20 44% 3

21-100 26% 5

101-800 6.3% 14

With VLE:

Demo Case 2• Demographic data

– Different demographic profile to previous slide

Sex

Education

N/CTMA1

Without VLE:Probability of failing at TMA1 = 7.7%

Sex

Education

N/C

VLE

TMA1

Clicks Probability Nr of students0 39% 35

1-20 22% 74

21-100 11.2% 178

101-800 2.4% 461

With VLE:

TMA1? … it might be too late!

Can we predict TMA1 from VLE activities 1 week before the TMA1 deadline? How about 2, 3, … weeks?

We are here

History Future we can affect

predicted to fail

has not engaged with VLE

average score < 40

Dashboard and Chart

at least one TMA below 40

Has not submitted TMA5

has not engaged with VLEaverage score = 81.71 !!!

However

Dashboard – new design

Conclusions

• In a distance learning context, the VLE data provides a valuable source of data for prediction

• Prediction improves as a module progresses, but this is too late!

• We need to optimise methods for early prediction