edx log data analysis - iit bombay

25
Edx Log Data Analysis A Research & Development Report Submitted in partial fulfillment of requirements for the degree of Master of Technology by Rajeev Kumar Gautam Roll No : 13305R007 under the guidance of Prof. Deepak B. Phatak Department of Computer Science and Engineering Indian Institute of Technology, Bombay May, 2015

Upload: others

Post on 22-Nov-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Edx Log Data Analysis - IIT Bombay

Edx Log Data Analysis

A Research & Development Report

Submitted in partial fulfillment of requirements for the degree of

Master of Technology

by

Rajeev Kumar GautamRoll No : 13305R007

under the guidance of

Prof. Deepak B. Phatak

Department of Computer Science and EngineeringIndian Institute of Technology, Bombay

May, 2015

Page 2: Edx Log Data Analysis - IIT Bombay

Contents

1 Introduction 2

2 Experimental Setup 32.1 Tools Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Data Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Question wise analysis of students 43.1 Quiz-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.2 Quiz-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.3 Quiz-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.4 Quiz-5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.5 Quiz-final . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.6 Quiz-final assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Attempt wise analysis of students 104.1 Quiz-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.2 Quiz-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.3 Quiz-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.4 Quiz-5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.5 Quiz-final . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Video played in exam 155.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6 Enrollment and Unenrollment in week-2 186.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

7 Code and Commands 197.1 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197.2 Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7.2.1 For analysing the the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207.2.2 For plotting the the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217.2.3 For Running Hadoop and Hive Server . . . . . . . . . . . . . . . . . . . . . . . . . 22

8 Analysis Results 238.1 Question wise analysis of students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238.2 Attempt wise analysis of students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238.3 Video played in exam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238.4 Enrollment and Unenrollment in week-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

9 Future Work 24

1

Page 3: Edx Log Data Analysis - IIT Bombay

Chapter 1

Introduction

Today data analysis is very important to enhance the quality of online education in today,s online era.Like traditional teaching teacher keeps track of the small size of the class so in online teaching log recordsis the way how we keep track of student, To get the valuable information from these log records , we haveto analyse these records. These log records keep all the activity done by students in studying , takingexams , etc.We are gonig to anylyse the log data of students CS101.2x , This course is offered by IIT Bombay onEDX platform. The course duration was 6 weeks and we do not have the first week data so we analyselast 5 weeks data. In that data, we mainly focus on quizzes and final exam. In the analysis part we willfind the no. of of students that that has correct or incorrect specific question. we will also see how manytime video has been played during an examination when a student has enter wrong answer or he hasforgotten the concept this will help to determine the week and the good student so using this analysisinstructor can help poor students within time where they are lagging in specific topic. Another type ofanalysis we will see how many students have left the course and joined the course.Hadoop uses as data storage while running the query from Hive interface , RStudio use to plot the datain graphical format , Python is used to parse the Log data and convert into csv format so Hive query caneasily perform . SED is used to cleaning the parse data .

2

Page 4: Edx Log Data Analysis - IIT Bombay

Chapter 2

Experimental Setup

2.1 Tools Used

HadoopHiveRStudioPythonSED (stream editor)Latex

2.2 Data Used

The data used in the analysis were EDX log data of students that was in JSON format. We decode thatdata into a useful format like csv. We break the data into table according to query and with the help ofHADOOP and HIVE we perform desire oprations. We perform map reduce oprations on HADOOP andperform HIVE query on HADOOP database.

3

Page 5: Edx Log Data Analysis - IIT Bombay

Chapter 3

Question wise analysis of students

3.1 Quiz-2

Number of students did correct questions in any attempt and number of students make the wrong attemptson each question.

Figure 3.1: quiz-2

4

Page 6: Edx Log Data Analysis - IIT Bombay

CHAPTER 3. QUESTION WISE ANALYSIS OF STUDENTS

3.2 Quiz-3

Number of students did correct questions in any attempt and number of students make the wrong attemptson each question.

Figure 3.2: quiz-3

5

Page 7: Edx Log Data Analysis - IIT Bombay

CHAPTER 3. QUESTION WISE ANALYSIS OF STUDENTS

3.3 Quiz-4

Number of students did correct questions in any attempt and number of students make the wrong attemptson each question.

Figure 3.3: quiz-4

6

Page 8: Edx Log Data Analysis - IIT Bombay

CHAPTER 3. QUESTION WISE ANALYSIS OF STUDENTS

3.4 Quiz-5

Number of students did correct questions in any attempt and number of students make the wrong attemptson each question.

Figure 3.4: quiz-5

7

Page 9: Edx Log Data Analysis - IIT Bombay

CHAPTER 3. QUESTION WISE ANALYSIS OF STUDENTS

3.5 Quiz-final

Number of students did correct questions in any attempt and number of students make the wrong attemptson each question.

Figure 3.5: quiz-final

8

Page 10: Edx Log Data Analysis - IIT Bombay

CHAPTER 3. QUESTION WISE ANALYSIS OF STUDENTS

3.6 Quiz-final assignment

Number of students did correct questions in any attempt and number of students make the wrong attemptson each question.

Figure 3.6: quiz-final assignment

9

Page 11: Edx Log Data Analysis - IIT Bombay

Chapter 4

Attempt wise analysis of students

4.1 Quiz-2

Number of students did correct the questions in 1,2 and 3 attempt.

Figure 4.1: quiz-2

10

Page 12: Edx Log Data Analysis - IIT Bombay

CHAPTER 4. ATTEMPT WISE ANALYSIS OF STUDENTS

4.2 Quiz-3

Number of students did correct the questions in 1,2 and 3 attempt.

Figure 4.2: quiz-3

11

Page 13: Edx Log Data Analysis - IIT Bombay

CHAPTER 4. ATTEMPT WISE ANALYSIS OF STUDENTS

4.3 Quiz-4

Number of students did correct the questions in 1,2 and 3 attempt

Figure 4.3: quiz-4

12

Page 14: Edx Log Data Analysis - IIT Bombay

CHAPTER 4. ATTEMPT WISE ANALYSIS OF STUDENTS

4.4 Quiz-5

Number of students did correct the questions in 1,2 and 3 attempt

Figure 4.4: quiz-5

13

Page 15: Edx Log Data Analysis - IIT Bombay

CHAPTER 4. ATTEMPT WISE ANALYSIS OF STUDENTS

4.5 Quiz-final

Number of students did correct the questions in 1,2 and 3 attempt

Figure 4.5: quiz-final

14

Page 16: Edx Log Data Analysis - IIT Bombay

Chapter 5

Video played in exam

5.1

Number of students attempt the exam and no. of students play the video at that day.

Figure 5.1: Play Video

15

Page 17: Edx Log Data Analysis - IIT Bombay

CHAPTER 5. VIDEO PLAYED IN EXAM

5.2

Number of students gave wrong answer and saw the video while attempting the quiz.

Figure 5.2: Play Video

16

Page 18: Edx Log Data Analysis - IIT Bombay

CHAPTER 5. VIDEO PLAYED IN EXAM

5.3

Per students video played in exam.

Figure 5.3: Play Video

17

Page 19: Edx Log Data Analysis - IIT Bombay

Chapter 6

Enrollment and Unenrollment inweek-2

6.1

Number of students enroll an unenroll in week-2.

Figure 6.1: enrollment

18

Page 20: Edx Log Data Analysis - IIT Bombay

Chapter 7

Code and Commands

7.1 Code

This is python script that we used to cleaned the data

import j sonfrom ppr int import ppr intf=open ( ’tmpQopRMw’ )l i n e=f . r e a d l i n e ( )i=0fo = open ( ’ r f i l e ’ , ’w’ )#fo . wr i t e (” username ”+ ’ , ’+” u s e r i d ”+ ’ , ’+ ” event type ”+ ’ , ’+” c o u r s e i d ”+’\n ’ ) # python w i l l convert \n to os . l i n e s e p#fo . wr i t e(”=================\t==============\t===========\t +

==========================\n”)whi l e l i n e :

i+=1#pr in t i ,data=json . l oads ( l i n e )ct = data [ ” context ” ]i f ”module” in ct . keys ( ) :

i f ” s u c c e s s ” in data [ ” event ” ] . keys ( ) :i f ” attempts ” in data [ ” event ” ] . keys ( ) :

#pr in t ” blah ”s = data [ ” username ” ] + ’ , ’++ s t r ( c t [ ” module ” ] [ ” display name ”])+

+ ’ , ’ + s t r ( data [ ” event ” ] [ ” s u c c e s s ” ] ) + ’ , ’++ s t r ( data [ ” event ” ] [ ” attempts ”])+

+ ’ , ’ + s t r ( c t [ ” c o u r s e i d ” ] ) +’\n ’f o . wr i t e ( s )

e l s e :pass

#pr in t ct . keys ( )#pr in t data [ ” ip ” ] , data [ ” username ” ] , c t [ ” u s e r i d ” ]#s = data [ ” username ” ] + ’ , ’ + s t r ( c t [ ” module ” ] [ ” display name ” ] ) +

+ ’ , ’ + s t r ( c t [ ” c o u r s e i d ” ] ) +’\n ’#fo . wr i t e ( s )#pr in t data [ ” event ” ] [ ” attempts ” ]l i n e=f . r e a d l i n e ( )

#ppr int ( data )

f o . c l o s e ( )

19

Page 21: Edx Log Data Analysis - IIT Bombay

CHAPTER 7. CODE AND COMMANDS

## + ’ , ’ + data [ ” event type ” ]

import j sonfrom ppr int import ppr intf=open ( ’tmpQopRMw’ )l i n e=f . r e a d l i n e ( )i=0fo = open ( ’ r f i l e ’ , ’w’ )#fo . wr i t e (” username ”+ ’ , ’+” u s e r i d ”+ ’ , ’+ ” event type ”+ ’ , ’+” c o u r s e i d ”+’\n ’ ) # python w i l l convert \n to os . l i n e s e p#fo . wr i t e(”=================\t==============\t===========\t+

==========================\n”)whi l e l i n e :

i+=1#pr in t i ,data=json . l oads ( l i n e )ct = data [ ” context ” ]i f ” event type ” in data . keys ( ) :

#pr in t ” blah ”s = data [ ” username ” ] + ’ , ’ + s t r ( data [ ” event type ”])+

+ ’ , ’ + s t r ( c t [ ” c o u r s e i d ” ] ) +’\n ’f o . wr i t e ( s )

e l s e :pass

f o . c l o s e ( )

## + ’ , ’ + data [ ” event type ” ]

7.2 Commands

7.2.1 For analysing the the Data

c r e a t e t a b l e csq2 (name s t r i ng , qs s t r i ng , s t a t u s s t r i ng , c ou r s e i d s t r i n g )+ROW FORMAT DELIMITED FIELDS TERMINATED BY ” , ” ;

s e l e c t DISTINCT name from csq2 where qs = ”Q15” and s t a t u s = ” i n c o r r e c t ” ;

s e l e c t DISTINCT name from csq5 where qs = ”Q20” +and s t a t u s = ” c o r r e c t ” and atmpt = 1 ;

s e l e c t count (DISTINCT name) from csq5 where +qs = ”Q1” and s t a t u s = ” c o r r e c t ” and atmpt = 1 ;

LOAD DATA LOCAL INPATH ’/home/ r a j e e v /Downloads/work/ csq4 ’ OVERWRITE +INTO TABLE csq3 ;

sed −n ’/ play /p ’ . / csp > csv

cat csv | awk ’ ! seen [ $0 ]++’ > csq5

s o r t −u −t ’ , ’ −k1 , 1 csp > t o t a l

20

Page 22: Edx Log Data Analysis - IIT Bombay

CHAPTER 7. CODE AND COMMANDS

awk −F ” ,” ’{ pr in t $1 } ’ t o t a l > a1

grep −f a1 a2 | wc − l

grep −f a1 a2 > v1

grep −f c1 v1 | wc − l

grep i n c o r r e c t t o t a l > c1

grep −f i 1 v1 | wc − l

7.2.2 For plotting the the Data

p lo t <− read . csv (”˜/ Downloads/work/ p l o t ” , sep =”,” , header=TRUE)

barp lo t ( as . matrix ( p l o t ) )

barp lo t ( as . matrix ( p l o t ) , be s ide=TRUE)

barp lo t ( as . matrix ( p l o t ) , be s ide=TRUE , xlab=”NUMBER OF +

QUESTIONS” , ylab=”NUMBER OF STUDENTS” , c o l=c (” darkblue ” ,” red ”) )

legend (” top r i gh t ” , c o l = c (” darkblue ” ,” red ”) , l egend = c (” i n c o r r e c t ” , ” c o r r e c t ” ) )

barp lo t ( as . matrix ( p l o t ) , be s ide=TRUE , main = ”Quiz − FINAL Analys i s ” , +

xlab=”NUMBER OF QUESTIONS” , ylab=”NUMBER OF STUDENTS” , c o l=c +

(” darkblue ” ,” red ”) , l egend (” top r i gh t ” , l t y =1, c o l = c (” darkblue ” ,” red ”) , +

legend = c (” i n c o r r e c t ” , ” c o r r e c t ” ) ) )

21

Page 23: Edx Log Data Analysis - IIT Bombay

CHAPTER 7. CODE AND COMMANDS

7.2.3 For Running Hadoop and Hive Server

/ usr / l o c a l /hadoop/ sb in / s ta r t−a l l . sh

export HADOOPHOME=/usr / l o c a l /hadoopexport HIVE HOME=/usr / l o c a l / h iveexport PATH=$PATH:$HIVE HOME/ binexport PATH=$PATH:$HADOOP HOME/ bin

/ usr / l o c a l / h ive / bin / hive

22

Page 24: Edx Log Data Analysis - IIT Bombay

Chapter 8

Analysis Results

8.1 Question wise analysis of students

From the above figure we can figure out that as the course progress the number of wrong attempt inexams also increases.Reason students may be careless or may not focus on course well.

8.2 Attempt wise analysis of students

From the above figure we can figure out that as the course progress the students correct in one attemptsis also decreases.

8.3 Video played in exam

From the above figure we can figure out that as the course progress the video played per students duringexam also reduce. Giving wrong answer after watching video also reduce.

8.4 Enrollment and Unenrollment in week-2

Analysis of Enrollment and Unenrollment in week-2 .

23

Page 25: Edx Log Data Analysis - IIT Bombay

Chapter 9

Future Work

We can perform Real time analysis with the use of HUE(Hadoop User Experience). This can providemore variety of query so some decision can be taken as early as possible.

24