performance analysis for blended moocs on iitbombayx · e ectiveness of learning can be low....

26
Performance Analysis for Blended MOOCs on IITBombayX Submitted in partial fulfillment of the requirements of the degree of Master of Technology by Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak Department of Computer Science and Engineering Indian Institute of Technology Bombay 2015

Upload: others

Post on 10-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Performance Analysis for BlendedMOOCs on IITBombayX

Submitted in partial fulfillment of the requirements

of the degree of

Master of Technology

by

Rahul Dev Parashar

(Roll No. 13305R006)

Supervisor:

Prof. Deepak B Phatak

Department of Computer Science and Engineering

Indian Institute of Technology Bombay

2015

Page 2: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Declaration of Authorship

I declare that this written submission represents my ideas in my own words and where

others’ ideas or words have been included, I have adequately cited and referenced the

original sources. I also declare that I have adhered to all principles of academic honesty

and integrity and have not misrepresented or fabricated or falsified any idea/data/fac-

t/source in my submission. I understand that any violation of the above will be cause

for disciplinary action by the Institute and can also evoke penal action from the sources

which have thus not been properly cited or from whom proper permission has not been

taken when needed.

Signature: ......................................

Rahul Dev Parashar

13305R006

Date: ...... October 2015

i

Page 3: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Abstract

Multiple institutes are partnering with IIT Bombay to offer blended MOOCs. Students

will study the online course on IITBombayX, and will also study the same course normally

in their institute. Final grade will be based on the composite performance of students,

in the online assessment, and in the tests/exams at the institute. In blended model it

is important to understand the learning of each student and their performances. This

performance analysis is important to provide better learning experience on web based

interaction of students. Considering large number of students in each course on MOOC

it is not possible to do manual analysis. So an automated system is needed to do this

analysis. Objective is to design and implement a system to permit performance analysis

of students of different participating institutions. Using this system, a teacher from such

an institute will be able to compare performances of local students with that of other

students, compare performances of students in local assessments and online assessments,

and view the event logs analytics to compare learning habits of students.

ii

Page 4: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Contents

Declaration of Authorship i

Abstract iii

List of Figures v

1 Introduction 1

1.1 MOOCs (Massive Open Online Courses) . . . . . . . . . . . . . . . . . . . 1

1.1.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.2 Disadvantages and Challenges . . . . . . . . . . . . . . . . . . . . . 2

1.2 Blended Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Benefits over MOOCs . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Literature Survey 4

2.1 IITBombayX Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Various Data Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.1 Student Information and Progress Data . . . . . . . . . . . . . . . . 5

2.2.2 Discussion Forum Data . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.3 Tracking Logs Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Events in Tacking Logs (Student Engagement) . . . . . . . . . . . . . . . . 5

2.4 Student Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

iii

Page 5: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

List of Figures CONTENTS

3 Proposed Approach and Prototype 8

3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Proposed Method and Prototype . . . . . . . . . . . . . . . . . . . . . . . 10

3.2.1 System Architecture: . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2.2 Available Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2.3 Data Cleaning (Prototype) . . . . . . . . . . . . . . . . . . . . . . . 13

3.2.4 Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Obserations and Future Work 17

4.1 Some Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2 Stage-2 Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Conclusion 19

iv

Page 6: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

List of Figures

2.1 Sample log for video interaction event . . . . . . . . . . . . . . . . . . . . . 6

3.1 Architecture of data analytic system . . . . . . . . . . . . . . . . . . . . . 10

3.2 Various data modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3 Typical analytic report (*Source: edX Insights) . . . . . . . . . . . . . . . 16

4.1 Invalidated JSON objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

v

Page 7: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 1

Introduction

1.1 MOOCs (Massive Open Online Courses)

Massive open online courses (MOOCs) are model for delivering learning content online

to any person who wants to take a course, with flexibility to view and access content

anytime, anywhere. In addition to traditional course materials such as lectures, reading

materials, exams, and class discussions, MOOC also provide discussion forums to interact

with instructors, teaching assistants and other participants. Since past few years MOOCs

has emerged as a popular mode in distant learning.

MOOCs have some signature characteristics that include: lectures formatted as short

videos combined with formative quizzes; automated assessment and/or peer and self–assessment

and an online forum for peer support and discussion.

1.1.1 Advantages

MOOCs are delivered by top-tier institutions and to not just a few hundred students in a

lecture hall on campuses, but free via the Internet to thousands or even millions around the

world. Typically, students watch short video lectures and complete assignments that are

graded either by machines or by other students. That way a lone professor can support

a large class. As there is no hard time limit on accessing content. Student can learn

anytime, anywhere.

1

Page 8: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 1. Introduction 2

1.1.2 Disadvantages and Challenges

One of the biggest challenge is that how can you effectively teach thousands of students

simultaneously where each student’s learning style and capabilities are different. As the

class size is large and if the teacher is not aware of learning style of students then the

effectiveness of learning can be low.

1.2 Blended Learning

Blended learning is an education program where student learns in part through MOOC

and the same course normally in their institute.

1.2.1 Benefits over MOOCs

Blended learning is more effective than purely face-to-face or purely online classes in

sense that it provides collaborative learning experience. Only class based learning may be

hindered by ability of the teacher to deliver. Only MOOC based learning highly depends

on participants self learning and motivation. But in blended learning students with special

talents or interests outside of the available curriculum can use educational technology to

advance their skills. Also students which have difficulty in learning the material can seek

help from either class teacher or discussion forums. So this collaborative model overcomes

the limitations of pure classroom based or MOOC learning.

1.2.2 Challenges

For best use of blended model, MOOC and class room learning must be in sync. Matching

the course content of a MOOC can be challenging in blended model because faculties from

various institutions might have different syllabus either as per their college curriculum or

their own interest. Also the lecture recording technologies can result in students falling

behind on the materials. Students may also watch several weeks’ worth of videos in one

sitting.

2

Page 9: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 1. Introduction 3

1.3 Problem Statement

As it is not possible to keep track of every student and their learning behavior. An auto-

mated system is needed which can create a time line for each student according to their

engagement with the course. This analysis can help in self regulated learning environment

of MOOCs.

Further, Blended MOOC require students’ performance in classroom and MOOC envi-

ronment and how it can be brought to best use. The performance analysis of students

can help in achieving the better outcomes. In this report, performance analysis is done

on students from partnering institutes, offering blended MOOC with IITBombayX.

Objective: Purpose is to design and implement a system to permit performance

analysis of students of different participating institutions. Using this system, a teacher

from such an institute will be able to:

• Compare performance of local students with that of other students.

• Compare performance of students in local assessments and online assessments.

• View the event logs analytics to compare learning habits of students.

There are certain parameters which will be used to do performance analysis. Some

typical characteristics/questions are following.

• How many students are solving questions before going through study material?

This can help in understanding knowledge of student for a particular module.

• What are grades of students for a particular class in comparison with other students?

• Is there any relation between performance on MOOC and classroom learning(provided

teacher has submitted classroom grades on IITBombayX)?

If there is much difference in performance of a student in one of the mode. Then

teacher can look for reasons behind that and help student.

• Any other feedback that might help for better learning?

3

Page 10: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 2

Literature Survey

2.1 IITBombayX Architecture

Open edX is a web-based platform for creating, delivering, and analyzing online courses.

IITBombayX uses architecture of open edX. In addition to that IITBombayX also provide

support for Blended Learning. Separate authentication process is used in blended model

as wrapper to open edX.

OpenEdX Components:

• CMS (Content management system): This component allows for the authoring of

tools. A Django application that uses MongoDB(NoSQL) for content management.

• LMS (Learning Management System): The part of OpenEdX that students interact

with. It displays content, runs quizzes and interactive applications. It’s subcompo-

nents are Wiki, Discussion Forum.

• Event Tracking: Track events for any interaction with the system. Capture and store

events with nested data structures in order to truly take advantage of schemaless

data storage systems. These event logs are stored as JSON objects.

• EdX Insights and Analytics: Insights is a development version of a Python, Mongo,

Django framework for creating simple, pluggable analytics based on streaming events.

This does not include the analysis of every event from logs.

4

Page 11: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 2. Literature Survey 5

2.2 Various Data Modules

IITBombayX data is stored in various data modules. For the ease of storage and interac-

tion with data, various designs are used. These are the various data models which stores

related data.

2.2.1 Student Information and Progress Data

General information about students and their progress is stored in MySQL database. This

can be termed as summary information about students. edX Insight makes use of this

data to give simple analytics. Information about assignments, quizzes, exams is stored

here.

2.2.2 Discussion Forum Data

EdX discussion data is stored as collections of JSON documents in a MongoDB database.

It gives information about students interaction with other students. Comment threads

are used to analyze this data.

2.2.3 Tracking Logs Data

Whenever student interacts with the course. Every action by student is stored in logs,

classified based on event type. For example whenever student clicks on some video to

watch or to pause. These events are stored in logs with the adequate information to

analyze it. Events are emitted by the server, the browser, or the mobile device to capture

information about interactions with the courseware and the Instructor Dashboard in the

LMS, and are stored in JSON documents.

2.3 Events in Tacking Logs (Student Engagement)

Tracking logs can be classified based on event type for which they are generated. Events

comprise of some common fields, fields related to students activity and events related to

course team events. Here is a sample log for video event:

5

Page 12: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 2. Literature Survey 6

Figure 2.1: Sample log for video interaction event

These logs can be analyzed by checking the events they emitted from. In each log

there are some common fields and some fields related to particular event. Here are some

fields from logs.

• Common Fields: Fields that are common to the schema definitions of all logs.

– Context: It contains course id, org id, path(URL that generated the event),

user id fields.

– Event: This field tells about for which event this log is created. Various events

are listed later.

– Event Source: This field can be used to identify that the application was used

from browser or mobile device.

– Event Type: This field tells about for whom this event is created. It can be a

student or course team member.

– Page: URL of the page the user was visiting when the event was emitted.

– Time: Gives the UTC time at which the event was emitted.

– UserName: The username of the user who caused the event to be emitted.

• Student Events

– Enrollment Event: Activities like activation, deactivation of account.

– Navigational Events: Events like page close, goto position and jump to discussion

are found.

6

Page 13: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 2. Literature Survey 7

– Video Interaction Events: It consist of events hide transcript, load video, pause video,

play video, seek video, show transcript, speed change video, stop video.

– Textbook Interaction Events: Consist of events for interaction with pdf and

other text material provided.

– Problem Interaction Events: Interaction with problems in quizzes and exams

are problem interaction events. Some typical events are problem check, prob-

lem graded, problem save, problem show, save problem success, show answer.

– Discussion Forums Events: Event are generated when comment is created,

response is given or new thread is created in discussion forums.

• Course Team Events: It consists of event which are emitted when a teacher or

admin interact with system.

2.4 Student Performance

Performance of students can be measured from their response to quizzes, exams etc. There

are 2 kind of content from which performance can be analyzed.

• Graded Content

Graded content contributes toward final score of a student. Overall score of student

is calculated by taking given weightage of each quizzes, assignment, exams etc.

• Ungraded Content

Ungraded content do not contribute toward final score of a student. But this content

can be used to understand learning ability of student and improvement in learning

for that course.

7

Page 14: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 3

Proposed Approach and Prototype

3.1 Problem Statement

To understand the learning behavior of students it is necessary that their performance in

course should be analyzed properly. As MOOCs are offered to students in large numbers.

It is not possible for a teacher to keep manual check on this. It is also noted that learning

style of each student is different. For that purpose they must be taught in different

manner. Some student might catch concept really fast and they might want to improve

their learning by solving challenging problems. On the other student some student find

it difficult to learn even the provided material. So for them other resources must be

suggested.

In blended model teacher who is providing classroom learning might want to know

that how online learning is helping his students and how is their performance for online

course. This analysis can give him insight about understanding learning behavior of his

students.

To cater the above need it is essential that performance analysis of students is done and

provided to teachers for both online course and classroom course. IITBombayX uses open

edX Insights to monitor activity for a course. This model provides basic information about

course progress, students responses to quizzes and assignment and other basic details

about students. EdX Insights do not capture all the events that are being generated in

log files. There are various components which edX Insights shows. They are:

8

Page 15: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 3. Proposed Approach and Prototype 9

• Course Enrollment

This category gives information about students enrollment, demographics, geogra-

phy etc.

• Student Engagement

This category gives information about for what quizzes student has answered, which

choices were selected, about their assignment submission status and interaction with

the videos. Only general information about videos is used like for how long they

have watched and which section was watched again.

• Student Performance

Students’ answers are recorded for graded and non graded quizzes and assignments.

Based on this various reports are generated like how many students have answered

it correct and so on.

But the edX Insight doesn’t provide complete analysis of students’ performance and

their timeline about their interaction with the system. It is essential to show the learning

style of students. Like which students watch videos first and then going to quizzes and

other way around. So an automated system is needed which can create a time line for

each student according to their engagement with the course and also to analyze their

performances.

As IITBombayX provide blended courses so it is essential that proper wrappers are

provided over this system to give these report to their classroom faculties. In addition

to that for better analyzing performance of their students comparison among various

students can done with both blended and non blended course students.

In stage-I, cleaning and preprocessing of tracking logs is done. This data is stored in

MySQL tables for purpose of prototyping. It has been explained in previous section that

for purpose of proper analysis preprocessing of logs is required. Once we get the data in

proper place. We can start doing performance analysis based on particular characteristics.

This work will be done in stage-2.

9

Page 16: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 3. Proposed Approach and Prototype 10

3.2 Proposed Method and Prototype

3.2.1 System Architecture:

The architecture diagram for model used in data analytics is shown below.

Figure 3.1: Architecture of data analytic system

The above diagram explains how analysis will be done. Various steps are explained

below in detail.

• Data: As shown initially data is in various modules explained earlier. Out of these

tracking logs are not structured. So preprocessing and cleaning log only is done.

• ETL: These tracking logs are cleaned and preprocessed using a JAVA program

based on event type.

10

Page 17: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 3. Proposed Approach and Prototype 11

• Storage: This data then will be moved to MySQL tables. From here this can be

used for analytics.

• Data Analytics for non blended MOOCs: Now the available data can be used

to do students performance analysis.

• Data Analytics for blended MOOCs: The analysis results for non blended

MOOCs can be wrapped using authentication of blended model. After filtering these

results reports can be shown to faculties, students etc. on web based platform.

3.2.2 Available Data

To do detailed analysis for student performance tracking logs provide lot of useful infor-

mation. But the format of tracking log is semi-structured. To understand the pattern

and classify these logs based on their events it is necessary to preprocess and clean them.

Once we can classify them based on event, it will be easier to analyze this data. Steps

followed in the method are:

1. Identify various data modules which can be brought to use in performance analysis.

2. Clean the data

3. Analyze the data and generate reports for students’ performance.

As detailed earlier, there are various data modules which holds IITBombayX data.

For purpose of performance analysis we are considering following data modules. These

data modules are stored in different ways. Each modules’ use and it’s storage is explained

in detail below.

1. Student Info and Progress Data

In this section it is explained that how stateful data for students is stored internally.

It contains general information about student like their name, username, email id,

geographical details etc. It also stores students progress in course. Data for students

is presented in these categories:

• User Data: Basic information about the user.

11

Page 18: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 3. Proposed Approach and Prototype 12

Figure 3.2: Various data modules

• Courseware Progress Data: It stores information about what material student

has covered and what were responses to various modules.

• Certificate Data: It contains information like final grade, status of certificate

etc.

This data is stored in MySQL tables. EdX Insights uses this data to produce various

reports like how many students are registered in course etc. This data is less in size

than the tracking log data and can be useful for generating reports in less time.

This will be used for student performance analysis as required. Some reports which

are produced from this data by edX Insights will be shown later.

2. Course Content Data

Course content data can be used to get information about course modules. We

can check how many quizzes/exams are there in course or module wise and other

related information. We can also check that how many videos, quizzes, assignments

are released till then. This data is stored in JSON files.

3. Discussion Forums Data

IITBombayX discussion forum data is stored as collections of JSON documents in

12

Page 19: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 3. Proposed Approach and Prototype 13

a MongoDB database. The primary collection that holds all of the discussion posts

written by users is “contents”. Comment and comment threads are created for

discussions. In addition to these collections, events are also emitted to track specific

user activities and stored in tracking logs. These will be explained in next section.

Wiki data is also stored in SQL files. One files gives information about articles

added on wiki and other about modifications made to articles on wiki. But this

data won’t be used for performance analysis.

4. Tracking Logs

Tracking logs store every activity or interaction with the system. These logs can

be classified based on various events. These events are already discussed in section

2.3. There are various approaches used to understand and use this data for analysis.

Open edX provides edX Insights to do basic analysis and gather reports produced

from this analysis. For purpose of this analysis the logs are processed using HDFS

and then the required information is stored in MySQL. Then generated MySQL

data is used for analysis.

But Insight only capture required information. Currently mainly video based events

are captured. It is necessary to capture other events like problem interaction event,

course interaction event etc. for proper performance analysis of student. For this

purpose this data is cleaned and pre-processed. In next section it is explained that

for maximum benefit how this data can be classified based on events.

3.2.3 Data Cleaning (Prototype)

A JAVA program has been written to break these logs based on event type. Separate

modules are created for various events. These logs are processed one-by-one and parsing is

done for JSON objects. After classifying event type, respective object is used to store that

information. Once the processing of that particular log is done, it is stored in database.

For purpose of developing a prototype MySQL is used to store this information. Each

event is also given a unique id. Along with storing this data in respective tables based on

events. Summary of each log is written as user session. Following is the detail for various

objects created and schema of MySQL.

13

Page 20: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 3. Proposed Approach and Prototype 14

• Objects Created

To capture the events various objects are created. Once the event is identified

data is stored in these objects. Which then are stored in SQL tables. Various

objects used are: CourseProblems, CourseQuizzes, CourseVideos, EventCourseIn-

teract, EventEnrollment, EventForumInteract, EventProbInteract, EventVideoInt-

eract, StudentCourseEnrolment, StudentCourseGrade, UserSession.

These objects have various fields to store required information about that particu-

lar event. After storing data in these objects these are passed to respective MySQL

tables.

• Tables created

Once the data is stored in objects based on events. They are transfered to MySQL

tables. Some table and their columns are shown in Table 3.1.

Summary of results captured from the prototype are shown in section 3.3.

3.2.4 Data Analytics

The data stored in various modules can be used for analyzing the performance of a student.

Data generated from logs in actual implementation will be moved to HDFS. Using this

we will be able to get timeline of a student. This then can be used to tell about learning

style of student.

This way analytics system can be made for non blended model. To use this for blended

model we need to wrap results using authentication mechanism provided for blended

model. In this way each faculty will see results for only their students. In case when

we compare performances of students from different institutes then this data can also be

shown to them. One typical diagram for weekly student engagement chart shown below

displays the number of students who engaged in different activities over time in some

particular course.

14

Page 21: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 3. Proposed Approach and Prototype 15

Table 3.1: Tables created in MySQL for various events

Table Name ColumnsCourse courseId, lmsName, orgName, courseName, courseTitle, au-

thorUserId, textbackslash currConcepts, prevConcepts, course-Lang, minPrice, suggestedPrice, countryCode, endDate, start-Date

CourseForums forumId, lmsName, orgName, courseName, courseRun, com-mentSysId, commentType, anonymousMode, lmsAuthorId,lmsAuthorName, createDateTime, lastModDateTime, upVote-Count, totVoteCount, commentCount, threadType, title, com-mentableSysId, endorsed, closed, visible

CourseProblems problemId, lmsName, orgName, courseName, chapterSys-Name, sessionSysName, quizSysName, quizTitle, quizType,quizWeight, noOfAttemptsAllowed, quizMaxMarks, hintAvail-able, correctChoice

CourseVideos videoId, lmsName, orgName, courseName, chapterSysName,videoSysName, videoUTubeId, videoDownload, videoTrack-DownLoad, videoTitle, videoUTubeId075, videoUTubeId125,videoUTubeId15, videolength

CourseWiki wikiId, lmsName, orgName, courseName, wikiSlug, lmsWikiId,createdDate, lastModDate, lastRevId, ownerId, groupId,groupRead, groupWrite, otherRead, otherWrite

EventCourseInteract eventId, lmsName, orgName, courseName, courseRun, lm-sUserId, eventName, eventNo, moduleType, moduleSysName,moduleTitle, chapterSysName, chapterTitle, createDateTime,modDateTime, oldPosition, curPosition, source

EventForumInterect eventId, lmsName, orgName, courseName, eventName, com-mentThreadId, lmsUserId, queryText, noOfResults

EventProbInteract eventId, lmsName, orgName, courseName, lmsUserId, event-Name, eventNo, quizzSysName, quizzTitle, chapterSysName,chapterTitle, hintAvailable, hintMode, inputType, response-Type, variantId, oldScore, newScore, maxGrade, attempts,maxAttempts, choice, success, source, probSubTime, done,createDateTime, lastModDateTime, courseRun

EventVideoInteract eventId, sessionSysName, lmsName, orgName, courseName,courseRun, lmsUserId, eventName, eventNo, videoSysName,videoTitle, chapterSysName, chapterTitle, oldSeekTime,currSeekTime, videoNavigType, oldSpeed, currSpeed, source,createDateTime, lastModDateTime

15

Page 22: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 3. Proposed Approach and Prototype 16

Figure 3.3: Typical analytic report (*Source: edX Insights)

3.3 Results

Large data from tracking logs has been processed. Tracking logs generated for IITBom-

bayX are used for purpose of analysis. Records for month of June to September were

used. Summary of this is shown below.

• Summary of processed logs: Around 20 million IITBombayX tracking logs were

processed. There were JSON objects which were not in proper format are recorded.

Also, there were some events which are not documented. These logs are classified

as of now based on behavior and property of these logs. They are also recorded for

further analysis and will be reported.

• Data in MySQL: The backup of MySQL database is taken in sql dump and given

to various IITBombayX teams for experimentation.

16

Page 23: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 4

Obserations and Future Work

4.1 Some Observations

Preprocessing of tracking logs reveals some interesting facts about logs. Few observations

made from log preprocessing are:

• Few logs are not properly structured. Their format is not readable by JSON parser.

This is because of some minor bug in some modules of code from where these logs

are generated. As we have all the information required to find exact location of this

error. We can use this to fix this kind of coding issues. On an average we have

around 2 such cases out of 2500 logs. One such log is shown below.

Figure 4.1: Invalidated JSON objects

17

Page 24: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 5. Obserations and Future Work 18

• There are some logs for which event are not defined. There are events name and their

fields in documentation. But these event/fields names are nowhere found. These

results can also be used to understand the issues in code that why such events are

generated and we can modify the code accordingly or document the newly found

events. Few events of such kind are shown below. For the purpose of understanding

and storing these in tables. These are classified in some predefined events.

– These event are of type navigational:

goto position, dashboard, jsi18n, i18n.js, jump to discussion, progress, view courses,

logout, how it works, calculate, jump to vertical

– These event are of type video interaction:

save user state, transcript translation, transcript download, /transcript/trans-

lation, /transcript/download

– These event are of type discussion forum:

users, reply, upvote, flagAbuse, follow, unfollow, upload

4.2 Stage-2 Work

• One task pending in tracking log cleaning is to complete the preprocessing of discus-

sion forum event. Because discussion forum can give many more interesting results.

Student’s interaction with the discussion forum shows that they are taking huge

interest in learning. If they answers questions on discussion forum for some topic

correctly then it shows that they have good command over that particular topic.

One of the challenge that was present in performance analysis to preprocess tracking

logs is completed. Now using this and other data modules we can do better analysis

for student performance. Then, based on this analysis reports will be generated for

teachers. These reports will then can be used to make the blended learning more

useful.

• Before proceeding with any analysis it is essential that characteristics and measures

used for this analysis are studied. As discussed in section 1.3, these measures will

be identified. Some points that will be considered in stage-2 are:

18

Page 25: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 5. Obserations and Future Work 19

– How can we use students timeline to learn about their learning style.

– Based on their interaction with study material and quizzes, and marks ob-

tained. We can identify knowledge of students for some particular module.

– What kind of reports need to be generated for teachers?

– Can we give some suggestions to students for improving their learning.

There are various parameters need to be decided for a better model. Out of various

possibilities, best ones will be considered and implemented in stage-2.

19

Page 26: Performance Analysis for Blended MOOCs on IITBombayX · e ectiveness of learning can be low. 1.2Blended Learning Blended learning is an education program where student learns in part

Chapter 5

Conclusion

It has been shown that proper preprocessing of tracking logs data is necessary to get good

performance analysis. If we don’t do the preprocessing then it is not possible to process the

logs on the go. Size of the logs are quite high and to make use of them it is essential that

they are preprocessed. Tracking log data with other data models like students’ normal

data and discussion forum data can give clear idea about students’ learning.

It is also noted that these performance reports can really help students in their self learning

experience. Constant feedback on students’ process will also help faculties to learn more

about students learning and their performance to provide better learning experience to

them.

20