web forum mining based on user satisfaction page 1 web forum mining based on user satisfaction by:...

32
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies Asian Institute of Technology Committee: Dr. Sumanta Guha (Chairperson) Prof. Phan Minh Dung Assoc. Prof. Tapio J. Erke May 2010

Upload: abel-barrett

Post on 25-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1

WEB FORUM MINING BASED ON USER SATISFACTION

By:

Suresh PokharelInformation and Communications Technologies

Asian Institute of Technology

Committee:Dr. Sumanta Guha (Chairperson)

Prof. Phan Minh Dung Assoc. Prof. Tapio J. Erke

May 2010

Page 2: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Introduction

Problem Statement

Objectives

Methodology

Implementation

Results and Discussion

Conclusion and Future Work

Demonstration

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 2

Agendas

Introduction

Page 3: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Introduction

Internet-Forum or Message Board

Online Discussion Site

Asynchronous

People participating in an Internet forum may cultivate social bonds and interest groups for a topic may form from the discussions

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 3

Page 4: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Figure 1: Organization of Threads

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 4

Introduction

Page 5: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Table 1: An example of a thread

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 5

Introduction

Title: Software for Ubuntu.

Post No. Post Users Category

1I am really new to Linux, where can I find software for Ubuntu?

avacomputers Question

2Applications>Add/remove http://www.getdeb.net/ If you are using Ubuntu Firefox, you can also use http://allmyapps.com/

danielrmt Answer

3 http://www.getdeb.net theozzlives Answer

4something else I was reading about is a port from OzOs (http://www.cafelinux.org/OzOs/) called apt:foo

devildoc5 Answer

5

or the easy way >.> CLICK SYSTEM > PREFERENCES > SYNAPTIC PACKAGE MANAGER there you can search all kinds of games, software! anything you like, thousands to choose from! They download and install easily onto your system

Codix121 Answer

6 Thanks GUys. avacomputers Answer

Questioner

Repliers

Questioner

Questioner Post

Page 6: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Introduction

Problem Statement

Objectives

Methodology

Implementation

Results and Discussion

Conclusion and Future Work

Demonstration

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 6

Agendas

Page 7: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Problem Statement

Which forum may have solution?

Lots of

Forums……

Ooops

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 7

I don’t want to test all forums…

Page 8: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Introduction

Problem Statement

Objectives

Methodology

Implementation

Results and Discussion

Conclusion and Future Work

Demonstration

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 8

Agendas

Page 9: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Objectives

To categorize a post as a question post or an answer post.

To classify a thread as answered or unanswered based on questioner’s satisfaction and forum features.

To predict a solution post based on interaction and satisfaction of questioner.

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 9

Page 10: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Introduction

Problem Statement

Objectives

Methodology

Implementation

Results and Discussion

Conclusion and Future Work

Demonstration

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 10

Agendas

Page 11: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Methodology: Framework of Study

Figure 4: Framework of Study

Abstract PostsClassify Post:

Question or not

Derive Questioner Post

Features

Classification of Thread

Predict Solution Post

Abstraction of Threads from

Forum

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 11

Page 12: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Figure 5 : Sentence Classification

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 12

Methodology: Sentence Classification

Example

Page 13: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Label Sequential Patterns (LSPs), p, in the form of <LHS, c> LHS is a sequence <a1, ..., am>, ai is named “item”. c is a class label (question/non-question)

A=<abcdefgh> has a subsequence B=<bdeg> A contains B A LSP p1 is contained by p2 if the sequence p1.LHS is contained by p2.LHS

and p1.c = p2.c.

Example:t1 = (< a, d, e, f >,Q)t2 = (< a, f, e, f >,Q) t3 = (< d, a, f >,NQ)

1 ) LSP p1 = (< a, e, f >, Q)is contained in t1 and t2

sup(p1) = 2/3 = 66.7%, conf(p1)=(2/3)/(2/3) = 100%

2) LSP p2 = (< a, f >, Q)sup(p2) = 2/3 = 66.7%, conf(p2)= (2/3)/(3/3) = 66.7%

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 13

Methodology: Sentence Classification

Page 14: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Mining LSPs Word length of sequence : 4 Setting minimum support at 0.01% and minimum

confidence at 95%

Converting to features LSP 5W1H word Auxiliary Verb Question Mark

The corresponding feature being set at 1 if a sentence includes a LSP, question mark, start with 5W1H word, or auxiliary verb.

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 14

Methodology: Sentence Classification

Page 15: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Figure 6 : Classification of Thread

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 15

Methodology: Thread Classification

Page 16: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Satisfied Phrase

Satisfaction phraseHappy Emoticons

Derive Features :

Unsatisfied Phrase

Un-satisfaction Phrase

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 16

Methodology: Questioner Post Classification

Page 17: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Question Present

Presence of More Post

Question Detection

OP

R

QP

R

R

Original Post

Reply

Reply

Reply

Questioner PostCurrent Post

More Posts

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 17

Methodology: Questioner Post Classification

Derive Features :

Page 18: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Satisfied Post Length

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 18

Presence of Emoticons

Happy emoticon () : mood of satisfaction Unhappy emoticon () : mood of un-satisfaction

Methodology: Questioner Post Classification

Derive Features :

Page 19: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Original Post

Original Post

Request for Solution

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 19

Methodology: Questioner Post Classification

Derive Features :

Page 20: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Questioner Posts

Extract Features

Convert to Binary Input

Features

SVM Application

Class 1Dissatisfied

Class 0Satisfied

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 20

Figure 7: Classification of Questioner Post

Methodology: Questioner Post Classification

Page 21: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Presence of Quote

Find Solution Post

Presence of User Name

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 21

Methodology: Predict Solution Posts

Page 22: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

May be between the questioner post

QP

R

R

SP

Solution Posts

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 22

Methodology: Predict Solution Posts

Page 23: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Introduction

Problem Statement

Objectives

Scope and Limitation

Methodology

Implementation

Results and Discussion

Conclusion and Future Work

Demonstration

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 23

Agendas

Page 24: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Dataset Forum : Ubuntu (http://ubuntuforums.org/)

Sentence classification: datasets of 100, 200 and 300 from 3000 sentences

Questioner Post Classification: 250 posts from 79 threads

Manual Evaluation : 100 threads by two team contains 5 person in each

team

Tools and Language POS Tag, Tokenization, Sentence Detection : OpenNLP

Classifier : Support Vector Machine (LibSVM, SMO)

Model : SVM is trained using libSVM for classifier model

Language : Java

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 24

Implementation

Page 25: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Introduction

Problem Statement

Objectives

Methodology

Implementation

Results and Discussion

Conclusion and Future Work

Demonstration

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 25

Agendas

Page 26: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Features Accuracy Recall Precision F-Measure

WH word (5W1H) 0.59 0.59 0.77 0.67

Question Mark (QM) 0.86 0.86 0.88 0.87

Auxiliary Verb (Aux) 0.66 0.66 0.78 0.72

5W1H + QM + Aux 0.88 0.88 0.89 0.88

Labeled Sequential Pattern (LSP) 0.94 0.94 0.94 0.94

QM+Aux+LSP+5W1H (LSP+) 0.96 0.96 0.96 0.96

Table 3: Accuracy of Sentence Classification by using LSP+ by Class

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 26

Result and Discussion : Sentence Classification Comparison

All the results obtained from10 fold cross validation

Page 27: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Table 5: Questioner Post Classification Comparison using Different FeaturesFeatures Precision Recall F-Measure

Satisfied Words (SW) 0.78 0.78 0.78

Unsatisfied Words (UW) 0.72 0.72 0.72

Question (Ques) 0.84 0.85 0.84

More Post (MP) 0.83 0.83 0.83

Word Count (WC) 0.70 0.58 0.63

Happy Emoticon (HE) 0.62 0.55 0.58

Unhappy Emoticon (UE) 0.76 0.56 0.64

Original Post (OP) 0.83 0.76 0.79Ques+MP 0.84 0.84 0.84Ques+OP 0.86 0.86 0.86MP+UE+OP 0.88 0.88 0.88Ques+MP+OP 0.85 0.85 0.85SW + UW + Ques 0.83 0.82 0.83

SW+UW+Ques+MP+WC+HE+UE+OP 0.91 0.91 0.91

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 27

Result and Discussion : QP Classification Comparison

Page 28: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Performance Accuracy Recall Precision F-Measure

System with Team A 0.84 0.79 0.82 0.80

System with Team B 0.79 0.73 0.78 0.75

Average 0.81 0.76 0.80 0.78

Table 6: Comparison of System Result with Manual Evaluation for thread classification

Accuracy Recall Precision F-Measure

System Accuracy with Team A 0.45 0.65 0.54

System Accuracy with Team B 0.43 0.65 0.52

Average System Accuracy 0.44 0.65 0.53

Table 7: System Accuracy for Prediction of Solution Posts

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 28

Result and Discussion : Comparison with Team’s Evaluation

Page 29: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Introduction

Problem Statement

Objectives

Methodology

Implementation

Results and Discussion

Conclusion and Future Work

Demonstration

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 29

Agendas

Page 30: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Conclusion :

Finding answered threads in web forum is achieved by tracing user satisfaction.

Thread and sentence are classified by deriving different features.

Performance of system is increased when combining different features.

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 30

Conclusion and Future Work: Conclusion

Page 31: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies

Future Work :

It can be used for query based raking of thread.

It can be used for extracting answered sentences with better accuracy.

The performance of system can be increased by incorporating semantics.

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 31

Conclusion and Future Work: Future Work

Page 32: WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies