cocqa : co-training over questions and answers with an application to predicting question...
TRANSCRIPT
![Page 1: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/1.jpg)
CoCQA: Co-Training Over Questions and Answerswith an Application to Predicting Question Subjectivity Orientation
Baoli Li, Yandong Liu, and Eugene Agichtein
Emory University
1
![Page 2: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/2.jpg)
Community Question Answering An effective way of seeking information from
other users Can be searched for resolved questions
2
![Page 3: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/3.jpg)
Community Question Answering (CQA)
Yahoo! Answers Users
Asker: post questions Answerer: post answers Voter: vote for existing answers
Questions Subject Detail
Answers Answer text Votes
Archive: millions of questions and answers
3
![Page 4: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/4.jpg)
Lifecycle of a Question in CQA
User
Choose a category
Choose a category
Compose the question
Compose the question
Openquestion
Openquestion Examine
Find the answer?Find the answer?
Close questionChoose best answers
Give ratings
Close questionChoose best answers
Give ratings
Question is closed by system.Best answer is chosen by voters
Question is closed by system.Best answer is chosen by voters
Yes
No
AnswerAnswer AnswerAnswer AnswerAnswer
User User UserUser User User User
+-
-- + ++
4
![Page 5: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/5.jpg)
Problem Statement How can we exploit structure of CQA to
improve question classification?
Case Study: Question Subjectivity Prediction Subjective questions: seek answers
containing private states such as personal opinion, judgment, and experience;
Objective questions: are expected to be answered with reliable or authoritative information;
5
![Page 6: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/6.jpg)
Example Questions Subjective:
Has anyone got one of those home blood pressure monitors? and if so what make is it and do you think they are worth getting?
Objective: What is the difference between
chemotherapy and radiation treatments?
6
![Page 7: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/7.jpg)
Motivation Guiding the CQA engine to process questions
more intelligently Some Applications
Ranking/filtering answers Improving question archive search Evaluating answers provided by users Inferring user intent
7
![Page 8: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/8.jpg)
Challenges
Some challenges in online real question analysis: Typically complex and subjective Can be ill-phrased and vague Not enough annotated data
8
![Page 9: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/9.jpg)
Key Observations
Can we utilize the inherent structure of the CQA interactions, and use the unlimited amounts of unlabeled data to improve classification performance?
9
![Page 10: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/10.jpg)
Natural Approach: Co-Training Introduced by
Combining labeled and unlabeled data with co-training, Blum and Mitchell, 1998
Two views of the data E.g.: content and hyperlinks in web pages
Provide complementary information for each other
Iteratively construct additional labeled data Can often significantly improve accuracy
10
![Page 11: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/11.jpg)
Questions and Answers: Two Views Example:
Q: Has anyone got one of those home blood pressure monitors? and if so what make is it and do you think they are worth getting?
A: My mom has one as she is diabetic so its important for her to monitor it she finds it useful.
Answers usually match/fit question My mom… she finds…
Askers can usually identify matching answers by selecting the “best answer”
11
![Page 12: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/12.jpg)
CoCQA: A Co-Training Framework over Questions and Answers
12
Labeled DataLabeled DataCQCQ
CACA
Q
A Unlabeled Data????????????????????
Unlabeled Data????????????????????
Q
A
+--++----++--+
Unlabeled Data????????????????????
Unlabeled Data????????????????????
Labeled DataLabeled Data
Validation(Holdout training
data)
Validation(Holdout training
data)
Cla
ssify
Stop
![Page 13: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/13.jpg)
Details of CoCQA implementation
Base classifier LibSVM
Term Frequency as Term Weight Also tried Binary, TF*IDF
Select top K examples with highest confidence Margin value in SVM
13
![Page 14: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/14.jpg)
Feature Set Character 3-grams
has, any, nyo, yon, one… Words
Has, anyone, got, mom, she, finds… Word with Character 3-grams Word n-grams (n<=3, i.e. Wi, WiWi+1,
WiWi+1Wi+2) Has anyone got, anyone got one, she finds it…
Word and POS n-gram (n<=3, i.e. Wi, WiWi+1, Wi POSi+1, POSiWi+1 , POSiPOSi+1, etc.) NP VBP, She PRP, VBP finds…
14
![Page 15: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/15.jpg)
Overview of Experimental Setup Datasets
From Yahoo! Answers Manually labeled data by Amazon Mechanical
Turk Metrics Compare CQA to state-of-the semi-supervised
method
15
![Page 16: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/16.jpg)
Dataset
1,000 Labeled Questions from Yahoo! Answers 5 categories (Arts, Education, Science, Health &
Sports) 200 questions from each category
10,000 Unlabeled Questions from Yahoo! Answers 2,000 questions from each category
Data available at http://ir.mathcs.emory.edu/shared
16
![Page 17: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/17.jpg)
Manual Labeling
17
Annotated using Amazon’s Mechanical Turk service Each question was judged by 5 Mechanical Turk
workers 25 questions included in each HIT task Worker needs to pass the qualification test Majority vote to derive gold standard
Discarded small fraction (22 out of 1000) of nonsensical questions such as “Upward Soccer Shorts?” and “1+1=?fdgdgdfg” by manual inspection
![Page 18: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/18.jpg)
Example HIT task
18
![Page 19: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/19.jpg)
Subjectivity Statistics by Category
19
Objective
Objective Subjecti
veSubjecti
ve
![Page 20: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/20.jpg)
Evaluation Metric Macro-Averaged F-1
Prediction performance on both subjective questions and objective questions is equally important
F-1
Averaged over subjective and objective classes
20
RecallPrecision
RecallPrecision21
xxF
![Page 21: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/21.jpg)
Experimental Settings
5 fold cross validation Methods Compared:
Supervised: LibSVM (Chang and Lin, 2001) Generalized Expectation (GE): (Mann and
McCallum, 2007) CoCQA: our method Base classifier: LibSVM View 1: question text; View 2: answer text
21
![Page 22: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/22.jpg)
F1 for Supervised Learning
FeaturesFeatures
CharChar3-gram3-gram
WordWordWord+Word+CharChar
3-gram3-gram
WordWordPOSPOS
n-gramn-gram(n<=3)(n<=3)
questionquestion 0.7000.700 0.7170.717 0.6940.694 0.7200.720
best_ansbest_ans 0.5870.587 0.5970.597 0.5780.578 0.5650.565
q_bestansq_bestans 0.6810.681 0.6950.695 0.6620.662 0.7120.712
NaNaïïve (majority class) baseline: ve (majority class) baseline: 0.3980.398
22
F1 with different sets of features
![Page 23: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/23.jpg)
Semi Supervised Learning: Adding unlabeled data
FeaturesFeaturesMethodMethod
QuestionQuestionQuestion+Question+
Best AnswerBest Answer
SupervisedSupervised 0.7170.717 0.6950.695
GEGE 0.712 (-0.7%)0.712 (-0.7%) 0.717 (+3.2%)0.717 (+3.2%)
CoCQACoCQA 0.731 (+1.9%)0.731 (+1.9%) 0.745 0.745 (+7.2%)(+7.2%)
23
Comparison between Supervised, GE and CoCQA
![Page 24: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/24.jpg)
CoCQA with varying K(# new examples added in each iteration)
0.64
0.65
0.66
0.67
0.68
0.69
0.7
0.71
0.72
0.73
0.74
0.75
0.76
20 40 60 80 100 120 140 160 180 200K: # labeled examples added on each
co-training iteration
F1
CoCQA(Question and Best Answer)Supervised Q_bestansCoCQA(Question and All Answers)Supervised Q_allans
24
![Page 25: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/25.jpg)
CoCQA for varying # iterations
0.71
0.72
0.73
0.74
0.75
161377776666
# co-training iterations
F1
0
500
1000
1500
2000
2500
3000
3500T
ota
l #
Un
lab
eled
Ad
ded
CoCQA (Question + Best Answer)Supervised
Total # Unlabeled
25
![Page 26: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/26.jpg)
CoCQA for varying amount of labeled data
0.52
0.54
0.56
0.58
0.6
0.62
0.64
0.66
0.68
0.7
0.72
50 100 150 200 250 300 350 400
# of labeled data used
F1
CoCQA (Question + Best Answer)
Supervised Q_Best Ans
26
![Page 27: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/27.jpg)
Conclusions and Future Work Problem: Non-topical text classification
in CQA CoCQA: a co-training framework that can
exploit information from both question and answers
Case study: subjectivity classification for real questions in CQA
We plan to explore: more sophisticated features; related variants of semi-supervised learning; other applications (Sentiment classification)27
![Page 29: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/29.jpg)
Performance of Subjective vs. Objective classes Subjective class
80% Objective class
60%
29
![Page 30: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/30.jpg)
Related work Some related work:
Question Classification: (Zhang and Lee, 2003)( Tri et al., 2006)
Sentiment Analysis: (Pang and Lee, 2004) (Yu and Hatzivassiloglou, 2003) (Somasundaran et al. 2007)
30
![Page 31: CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein](https://reader036.vdocument.in/reader036/viewer/2022081603/5697bf9a1a28abf838c92326/html5/thumbnails/31.jpg)
Important words for Subjective, Objective classes by Information Gain
31