web forum mining based on user satisfaction page 1 web forum mining based on user satisfaction by:...
TRANSCRIPT
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1
WEB FORUM MINING BASED ON USER SATISFACTION
By:
Suresh PokharelInformation and Communications Technologies
Asian Institute of Technology
Committee:Dr. Sumanta Guha (Chairperson)
Prof. Phan Minh Dung Assoc. Prof. Tapio J. Erke
May 2010
Introduction
Problem Statement
Objectives
Methodology
Implementation
Results and Discussion
Conclusion and Future Work
Demonstration
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 2
Agendas
Introduction
Introduction
Internet-Forum or Message Board
Online Discussion Site
Asynchronous
People participating in an Internet forum may cultivate social bonds and interest groups for a topic may form from the discussions
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 3
Figure 1: Organization of Threads
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 4
Introduction
Table 1: An example of a thread
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 5
Introduction
Title: Software for Ubuntu.
Post No. Post Users Category
1I am really new to Linux, where can I find software for Ubuntu?
avacomputers Question
2Applications>Add/remove http://www.getdeb.net/ If you are using Ubuntu Firefox, you can also use http://allmyapps.com/
danielrmt Answer
3 http://www.getdeb.net theozzlives Answer
4something else I was reading about is a port from OzOs (http://www.cafelinux.org/OzOs/) called apt:foo
devildoc5 Answer
5
or the easy way >.> CLICK SYSTEM > PREFERENCES > SYNAPTIC PACKAGE MANAGER there you can search all kinds of games, software! anything you like, thousands to choose from! They download and install easily onto your system
Codix121 Answer
6 Thanks GUys. avacomputers Answer
Questioner
Repliers
Questioner
Questioner Post
Introduction
Problem Statement
Objectives
Methodology
Implementation
Results and Discussion
Conclusion and Future Work
Demonstration
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 6
Agendas
Problem Statement
Which forum may have solution?
Lots of
Forums……
Ooops
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 7
I don’t want to test all forums…
Introduction
Problem Statement
Objectives
Methodology
Implementation
Results and Discussion
Conclusion and Future Work
Demonstration
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 8
Agendas
Objectives
To categorize a post as a question post or an answer post.
To classify a thread as answered or unanswered based on questioner’s satisfaction and forum features.
To predict a solution post based on interaction and satisfaction of questioner.
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 9
Introduction
Problem Statement
Objectives
Methodology
Implementation
Results and Discussion
Conclusion and Future Work
Demonstration
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 10
Agendas
Methodology: Framework of Study
Figure 4: Framework of Study
Abstract PostsClassify Post:
Question or not
Derive Questioner Post
Features
Classification of Thread
Predict Solution Post
Abstraction of Threads from
Forum
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 11
Figure 5 : Sentence Classification
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 12
Methodology: Sentence Classification
Example
Label Sequential Patterns (LSPs), p, in the form of <LHS, c> LHS is a sequence <a1, ..., am>, ai is named “item”. c is a class label (question/non-question)
A=<abcdefgh> has a subsequence B=<bdeg> A contains B A LSP p1 is contained by p2 if the sequence p1.LHS is contained by p2.LHS
and p1.c = p2.c.
Example:t1 = (< a, d, e, f >,Q)t2 = (< a, f, e, f >,Q) t3 = (< d, a, f >,NQ)
1 ) LSP p1 = (< a, e, f >, Q)is contained in t1 and t2
sup(p1) = 2/3 = 66.7%, conf(p1)=(2/3)/(2/3) = 100%
2) LSP p2 = (< a, f >, Q)sup(p2) = 2/3 = 66.7%, conf(p2)= (2/3)/(3/3) = 66.7%
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 13
Methodology: Sentence Classification
Mining LSPs Word length of sequence : 4 Setting minimum support at 0.01% and minimum
confidence at 95%
Converting to features LSP 5W1H word Auxiliary Verb Question Mark
The corresponding feature being set at 1 if a sentence includes a LSP, question mark, start with 5W1H word, or auxiliary verb.
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 14
Methodology: Sentence Classification
Figure 6 : Classification of Thread
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 15
Methodology: Thread Classification
Satisfied Phrase
Satisfaction phraseHappy Emoticons
Derive Features :
Unsatisfied Phrase
Un-satisfaction Phrase
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 16
Methodology: Questioner Post Classification
Question Present
Presence of More Post
Question Detection
OP
R
QP
R
R
Original Post
Reply
Reply
Reply
Questioner PostCurrent Post
More Posts
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 17
Methodology: Questioner Post Classification
Derive Features :
Satisfied Post Length
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 18
Presence of Emoticons
Happy emoticon () : mood of satisfaction Unhappy emoticon () : mood of un-satisfaction
Methodology: Questioner Post Classification
Derive Features :
Original Post
Original Post
Request for Solution
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 19
Methodology: Questioner Post Classification
Derive Features :
Questioner Posts
Extract Features
Convert to Binary Input
Features
SVM Application
Class 1Dissatisfied
Class 0Satisfied
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 20
Figure 7: Classification of Questioner Post
Methodology: Questioner Post Classification
Presence of Quote
Find Solution Post
Presence of User Name
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 21
Methodology: Predict Solution Posts
May be between the questioner post
QP
R
R
SP
Solution Posts
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 22
Methodology: Predict Solution Posts
Introduction
Problem Statement
Objectives
Scope and Limitation
Methodology
Implementation
Results and Discussion
Conclusion and Future Work
Demonstration
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 23
Agendas
Dataset Forum : Ubuntu (http://ubuntuforums.org/)
Sentence classification: datasets of 100, 200 and 300 from 3000 sentences
Questioner Post Classification: 250 posts from 79 threads
Manual Evaluation : 100 threads by two team contains 5 person in each
team
Tools and Language POS Tag, Tokenization, Sentence Detection : OpenNLP
Classifier : Support Vector Machine (LibSVM, SMO)
Model : SVM is trained using libSVM for classifier model
Language : Java
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 24
Implementation
Introduction
Problem Statement
Objectives
Methodology
Implementation
Results and Discussion
Conclusion and Future Work
Demonstration
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 25
Agendas
Features Accuracy Recall Precision F-Measure
WH word (5W1H) 0.59 0.59 0.77 0.67
Question Mark (QM) 0.86 0.86 0.88 0.87
Auxiliary Verb (Aux) 0.66 0.66 0.78 0.72
5W1H + QM + Aux 0.88 0.88 0.89 0.88
Labeled Sequential Pattern (LSP) 0.94 0.94 0.94 0.94
QM+Aux+LSP+5W1H (LSP+) 0.96 0.96 0.96 0.96
Table 3: Accuracy of Sentence Classification by using LSP+ by Class
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 26
Result and Discussion : Sentence Classification Comparison
All the results obtained from10 fold cross validation
Table 5: Questioner Post Classification Comparison using Different FeaturesFeatures Precision Recall F-Measure
Satisfied Words (SW) 0.78 0.78 0.78
Unsatisfied Words (UW) 0.72 0.72 0.72
Question (Ques) 0.84 0.85 0.84
More Post (MP) 0.83 0.83 0.83
Word Count (WC) 0.70 0.58 0.63
Happy Emoticon (HE) 0.62 0.55 0.58
Unhappy Emoticon (UE) 0.76 0.56 0.64
Original Post (OP) 0.83 0.76 0.79Ques+MP 0.84 0.84 0.84Ques+OP 0.86 0.86 0.86MP+UE+OP 0.88 0.88 0.88Ques+MP+OP 0.85 0.85 0.85SW + UW + Ques 0.83 0.82 0.83
SW+UW+Ques+MP+WC+HE+UE+OP 0.91 0.91 0.91
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 27
Result and Discussion : QP Classification Comparison
Performance Accuracy Recall Precision F-Measure
System with Team A 0.84 0.79 0.82 0.80
System with Team B 0.79 0.73 0.78 0.75
Average 0.81 0.76 0.80 0.78
Table 6: Comparison of System Result with Manual Evaluation for thread classification
Accuracy Recall Precision F-Measure
System Accuracy with Team A 0.45 0.65 0.54
System Accuracy with Team B 0.43 0.65 0.52
Average System Accuracy 0.44 0.65 0.53
Table 7: System Accuracy for Prediction of Solution Posts
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 28
Result and Discussion : Comparison with Team’s Evaluation
Introduction
Problem Statement
Objectives
Methodology
Implementation
Results and Discussion
Conclusion and Future Work
Demonstration
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 29
Agendas
Conclusion :
Finding answered threads in web forum is achieved by tracing user satisfaction.
Thread and sentence are classified by deriving different features.
Performance of system is increased when combining different features.
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 30
Conclusion and Future Work: Conclusion
Future Work :
It can be used for query based raking of thread.
It can be used for extracting answered sentences with better accuracy.
The performance of system can be increased by incorporating semantics.
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 31
Conclusion and Future Work: Future Work