analysis of social interactions and prediction of assignment grades in a massive open online...

Analysis of social interactions and prediction of assignment grades in a Massive Open

Online Course

Pedro Manuel Moreno MarcosUniversidad Carlos III de Madrid

eMadrid Seminar on ‘OERs & Smart Education’

UNED, Madrid, 24th November 2017

INDEX

1. INTRODUCTION

2. RELATED WORK

3. FORUM DASHBOARD

4. JAVA PROGRAMMING MOOC: CASE STUDY

5. ASSIGNMENT PREDICTION: METHODOLOGY

6. ASSIGNMENT PREDICTION: RESULTS

7. CONCLUSIONS AND FUTURE WORK

2

INTRODUCTION: CONTEXT

3

Greller, W., & Drachsler, H. (2012). Translating learning intonumbers: A generic framework for learning analytics. Journal ofEducational Technology & Society, 15(3), 42-57

Prediction Visualizations

INTRODUCTION: MOTIVATION

• BENEFITS

– Teachers: Improve learning processes. Support students.

– Learners: Self-reflection

• Use of dashboards to display information

• Importance of timing considerations

4

INTRODUCTION: OBJECTIVES

5

• Design of a Web application with different visualizations regarding forum interactions

• Obtain conclusions regarding learners’ behaviour in a real MOOC

• Analyze how assignments grades can be anticipated and which factors affect the predictive power

INDEX

1. INTRODUCTION

2. RELATED WORK

3. FORUM DASHBOARD





6

RELATED WORK: VISUALIZATIONS

• Objective: present visual results to stakeholders

• Examples: ANALYSE (Open edX) / edX Insights

• Lack of visualizations related to the forum activity

7

RELATED WORK: PREDICTION IN EDUCATION

• Two types: future prediction / detection

• Course completion

• Student’s behaviors: motivations, problems, etc.

• Scores

– ASSISTment

– Peer-review activities

8

6

18

20

18

16

7

0 5 10 15 20 25

Others

Platform use

Forum-related

Exercises-related

Video-related

Demographic

Number of articles

Type o

f va

riab

les

Distribution of predictor variables in MOOCs

RELATED WORK: PREDICTION IN MOOCs

• Systematic review

• predict(ion) AND MOOC(s)

• 35 analysed papers

9

5

3

2

3

9

11

6

0 2 4 6 8 10 12

Others

Student engagement/personality

Value/interest of items

Forum posts classification

Scores prediction

Drop-out

Certificate earners

Number of articles

Pre

cit

ion p

aram

ete

rs

Distribution of prediction parameters in MOOCs

INDEX

1. INTRODUCTION

2. RELATED WORK

3. FORUM DASHBOARD





10

FORUM DASHBOARD: FIRST FUNCTIONALITIES

• Basic Statistics

– Number of messages, votes, response times, etc.

• Participation

– Number of learners, top contributors, etc.

• Messages with more responses/votes

11

FORUM DASHBOARD: COURSE ABILITIES

• Definition of abilities

– Plain or hierarchical structure

– JavaScript (D3)

• Visualize what abilities appear more

12

FORUM DASHBOARD: SENTIMENT ANALYSIS (I)

• Determine if amessage is positive,negative or neutral

• Algorithm:

– Based on dictionaries

– Use emoticons

– Consider negations

13

APPROACH

FORUM DASHBOARD: SENTIMENT ANALYSIS (II)

• Two main categories:

– Supervised (machine learning based)

• 8 types of indicators, including votes, length, responses, etc.

– Unsupervised (lexicon based)

METRICS

• Accuracy

• AUC (Area Under the Curve)14

Method AUC Accuracy

Dictionaries 71/78 74/78

SentiWordNet 65/75 66/77

Logistic Reg. 68/84 70/81

SVM 70/77 72/72

Decision Trees 64/74 69/74

Random Forest 71/82 72/74

Naïve Bayes 66/85 57/79

Results expressed in %

INDEX

1. INTRODUCTION

2. RELATED WORK

3. FORUM DASHBOARD





15

JAVA PROGRAMMING MOOC: CASE STUDY

• Introduction to Programming with Java – Part I: Starting to Program in Java

• 5 weeks

• Instructor-led

• Typically 14 days for each assignment

• Passing grade: 60%

• Evaluation:

– 5 graded tests (Ti)

– 2 programming assignments (Pi)

16

JAVA PROGRAMMING MOOC: FORUM USE

• 13,302 messages

• Activity rises in critical dates

17

JAVA PROGRAMMING MOOC: MESSAGES

MORE RESPONSES

• Cover varied issues:

- Technical questions

- Course-related questions

MORE VOTES

• Provide answers to questions related to course concepts

• Top three messages belong to the first week

18

JAVA PROGRAMMING MOOC: SENTIMENTS

• 5,292 positives

• 2,934 negatives

• 5,076 neutral

• 64.33% positive

• Higher positivity at the beginning

• Decrease near the deadlinesof programming tasks

19

JAVA PROGRAMMING MOOC: ABILITIES

• Analysis based on 42 abilities: method, casting, calculator, array.

• Analysis based on 10 relevant terms: array, loop, certificate, deadline

20

INDEX

1. INTRODUCTION

2. RELATED WORK

3. FORUM DASHBOARD





21

ASSIGNMENT PREDICTION: DATA COLLECTION

SOURCE OF DATA

• Data provided by edX

• Database data:

– Course structure

– State of course components per learner

– Forum interactions

• Instructor dashboard:

– Grade report

SAMPLE SELECTION

• 95,555 enrolled users

• Two filters:

– Consider only participants in the forum

– Exclude unenrolled users

• Result: 4,358 learners

22

ASSIGNMENT PREDICTION: VARIABLES AND TECHNIQUES

TYPES OF VARIABLES TECHNIQUES

23

METRIC

Forum

Exercises

Video

Previous grades

Regression (RG)

Support Vector

Machines (SVM)

Decision Trees (DT)

Random Forest (RF)

Root Mean

Squared Error

(RMSE)

INDEX

1. INTRODUCTION

2. RELATED WORK

3. FORUM DASHBOARD





24

ASSIGNMENT PREDICTION: PREDICTIVE POWER IN COURSE ASSIGNMENTS

• Model A: Exercises and video variables

• Model B: Model A + previous grades

25

Results expressed in RMSE

Method T1 T2 T3 T4 T5 P3 P5 FG

Model A Best 0.26 0.21 0.20 0.18 0.16 0.25 0.20 0.14

Worse 0.34 0.28 0.26 0.22 0.18 0.31 0.27 0.16

Model B Best 0.26 0.20 0.18 0.15 0.13 0.24 0.19 -

Worse 0.34 0.26 0.23 0.20 0.17 0.32 0.26 -

ASSIGNMENT PREDICTION: EFFECT OF FORUM-RELATED VARIABLES

• Model C: Forum variables

• Model D: Model C + exercises and videos

• Model E: Model D + previous grades

26


Method T1 T2 T3 T4 T5 P3 P5 FG

Model C Best 0.41 0.36 0.33 0.31 0.27 0.34 0.24 0.25

Worse 0.46 0.40 0.35 0.33 0.30 0.36 0.28 0.28

Model D Best 0.25 0.21 0.20 0.18 0.16 0.25 0.20 0.14

Worse 0.34 0.28 0.26 0.23 0.19 0.32 0.28 0.17

Model E Best 0.25 0.20 0.18 0.15 0.13 0.24 0.19 -

Worse 0.34 0.26 0.23 0.20 0.17 0.32 0.26 -

ASSIGNMENT PREDICTION: CLOSE-ENDED VS. OPEN-ENDED QUESTIONS

Assignment Forum

(Model C)

Problems and video (Model A)

Problems, video and grades (Model B)

Test 3 0.33 0.20 0.18Peer-review 3 0.34 0.25 0.24

Test 5 0.27 0.16 0.13Peer-review 5 0.25 0.20 0.19

• No differences in Model C

• Statistically Significant difference in Models A and B (p<0.05)

27


ASSIGNMENT PREDICTION: EFFECT OF VARIABLES FROM PREVIOUS WEEKS

• Model F (Model A + previous data)

• Assignments →

Non-cumulative

• Final Grade →

Cumulative

• Factors:

– Independency

– Engagement over time 28

Grades prediction using data from previous weeks

ASSIGNMENT PREDICTION: STABILISATION OF PREDICTIVE POWER IN A DAY-BY-DAY ANALYSIS

• Threshold is between days 7-9

• Trade-off between anticipation and predictive power

29

Evolution of the predictive power day-by-day

INDEX

1. INTRODUCTION

2. RELATED WORK

3. FORUM DASHBOARD





30

CONCLUSIONS: FORUM ACTIVITY

• Acceptablefunctioning

• Deadlines alter learners’ behaviors and thus forum activity

• Low participation

• Higher activity in some concepts: arrays, loops or casting

• Different valid approaches for sentiment analysis

31

CONCLUSIONS: ASSIGNMENT PREDICTION

1) Early assignments are harder to predict

2) Algorithms are less important than data

3) Previous grades always enhance models

4) Forum-related variables have low predictive power

5) Closed-ended assignments can be predicted better

6) Previous interactions make models worse

7) Data from nearest previous week have stronger

relationship with current grades

8) Interactions from current week become relevant

after 7 days 32

LIMITATIONS AND FUTURE WORK: FORUM ACTIVITY

LIMITATIONS

• Limited evaluation of the usability

• Applicability on the context

• Lack of labelled data

• Subjectivity of the labelling process

FUTURE WORK

• Incorporate data from new courses

• Automatic detection of abilities

• Improve training setfor sentiment analysis

33

LIMITATIONS AND FUTURE WORK: ASSIGNMENT PREDICTION

LIMITATIONS

• Data restrictions

• Sample selection criteria

• Applicability depending on context

FUTURE WORK

• Use courses with more comprehensive traces

• Comparison with other learners

• Assess applicability

• Differentiate learners who fail

• Put models into practise

• Analyse possible interventions

34

PUBLICATIONS SENT

• P.M. Moreno-Marcos, C. Alario-Hoyos, P.J Muñoz-Merino and C. Delgado Kloos. Prediction in MOOCs: A review and future research directions. IEEE Transactions on Learning Technologies.

• P.M. Moreno-Marcos, C. Alario-Hoyos, P.J. Muñoz-Merino, I. Estévez-Ayres and C. Delgado Kloos. Sentiment Analysis in MOOCs: A case study. EDUCON Conference 2018.

• P.M. Moreno-Marcos, P.J. Muñoz-Merino, C. Alario-Hoyos, I. Estévez-Ayres and C. Delgado Kloos. Analysing the predictive power for anticipating assignment grades in a Massive Open Online Course. Behaviour & Information Technology

35