predicting long-term impact of cqa posts: a comprehensive ... · yuan yao. joint work with ....
TRANSCRIPT
![Page 1: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/1.jpg)
Yuan YaoJoint work with
Hanghang Tong, Feng Xu, and Jian Lu
Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint
1Aug 24-27, KDD 2014
![Page 2: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/2.jpg)
Roadmap
Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions
2
![Page 3: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/3.jpg)
Roadmap
Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions
3
![Page 4: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/4.jpg)
CQA
4
![Page 5: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/5.jpg)
Long-Term Impact
Q: How many users will find it beneficial?
What is the Long-Term Impact of a Q/A post?5
![Page 6: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/6.jpg)
Challenges
Q: Why not off-the-shell data mining algorithms?
Challenge 1: Multi-aspect C1.1. Coupling between questions and answers C1.2. Feature non-linearity C1.3. Posts dynamically arrive
Challenge 2: Efficiency
6
![Page 7: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/7.jpg)
C1.1 Coupling
Strong positive correlation![Yao+@ASONAM’14]
[Yao+@ASONAM’14] Y Yao, H Tong, T Xie, L Akoglu, F Xu, J Lu. Joint Voting Prediction for Questions and Answers in CQA. ASONAM 2014.
Fq
Predicted Q Impact Yq
Question Features
Fa
Predicted A Impact Ya
Answer Features
VotingConsistency
1 2
3
7
![Page 8: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/8.jpg)
C1.1 Coupling
LIP-M: [Yao+@ASONAM’14]
Question prediction Answer prediction
Voting Consistency Regularization
1 2
3
8
![Page 9: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/9.jpg)
C1.2 Non-linearity
The kernel trick (e.g., SVM) Mercer kernel
Kernel matrix as new feature matrix
9
![Page 10: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/10.jpg)
C1.3 Dynamics
Solution: recursive least squares regression[Haykin2005]
(x1, y1)(x2, y2)
…(xt, yt)
(xt+1, yt+1)
Modelt
Modelt+1+
[Haykin2005] S Haykin. Adaptive filter theory. 2005.
Existing examples
New examples
Current model
New model
10
![Page 11: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/11.jpg)
This Paper
Q1: how to comprehensively capture the multi-aspect in one algorithm? Coupling, non-linearity, and dynamics
Q2: how to make the long-term impact prediction algorithm efficient?
11
![Page 12: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/12.jpg)
Roadmap
Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions
12
![Page 13: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/13.jpg)
Modeling Non-linearity
Basic Idea: kernelize LIP-M Details - LIP-KM:
Closed-form solution:
Complexity: O(n3)
Question prediction Answer prediction
Voting Consistency Regularization
1 2
3
13
![Page 14: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/14.jpg)
Modeling Dynamics
Basic idea: recursively update LIP-KM Details - LIP-KIM:
Complexity: O(n3) -> O(n2)
(matrix inverse lemma)
14
![Page 15: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/15.jpg)
Roadmap
Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions
16
![Page 16: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/16.jpg)
Approximation Method (1)
Basic idea: compress the kernel matrix Details – LIP-KIMA:
1) Separate decomposition 2) Make decomposition reusable 3) Apply decomposition on LIP-KIM
Complexity: O(n2) -> O(n)
(Nyström approximation)
(Eigen-decomposition)
(SVD on X1)(Eigen-decomposition)
17
![Page 17: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/17.jpg)
Approximation Method (2)
Basic idea: filter less informative examples Details - LIP-KIMAA:
Complexity: O(n) -> <O(n)
?
(x1, y1)(x2, y2)
…(xt, yt)
ModeltExisting examples
Current model
(xt+1, yt+1)
New examples
Modelt+1 +New model
Filtering
18
![Page 18: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/18.jpg)
LIP-KIM O(n2)
LIP-M (CoPs)
Ridge Regression
CouplingNon-linearity Dynamics
LIP-KI (Recursive Kernel Ridge Regression)LIP-KM O(n3) LIP-IM
LIP-KIMA O(n)
LIP-KIMAA <O(n)
K: Non-linearityI: DynamicsM: CouplingA: Approximation
LIP-K (Kernel Ridge Regression)
LIP-I (Recursive Ridge Regression)
Summary
19
![Page 19: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/19.jpg)
Roadmap
Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions
20
![Page 20: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/20.jpg)
Experiment Setup
Datasets (http://blog.stackoverflow.com/category/cc-wiki-dump/)
Stack Overflow , Mathematics Stack Exchange
Features Content (bag-of-words) & contextual features
Test setTraining setInitial set Incremental set
Time
21
![Page 21: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/21.jpg)
Evaluation Objectives
O1: Effectiveness How accurate are the proposed algorithms
for long-term impact prediction?
O2: Efficiency How scalable are the proposed algorithms?
22
![Page 22: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/22.jpg)
Effectiveness Results
Comparisons with existing models.
(better)
Our methods
23
![Page 23: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/23.jpg)
Efficiency Results
The speed comparisons.
(better)
LIP-KIMAA(sub-linear)
Ours
25
![Page 24: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/24.jpg)
Quality-Speed Balance-off
Our methods
(better)
26
![Page 25: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/25.jpg)
Roadmap
Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions
27
![Page 26: Predicting Long-Term Impact of CQA Posts: A Comprehensive ... · Yuan Yao. Joint work with . Hanghang Tong, Feng Xu, and Jian Lu. Predicting Long-Term Impact of CQA Posts: A Comprehensive](https://reader030.vdocument.in/reader030/viewer/2022040905/5e7a6bc1db012026e51fcba9/html5/thumbnails/26.jpg)
Conclusions
A family of algorithms for long-term impact prediction of CQA posts
Q1: how to capture coupling, non-linearity, and dynamics?
A1: voting consistency + kernel trick + recursive updating
Q2: how to make the algorithms scalable? A2: approximation methods
Empirical Evaluations Effectiveness: up to 35.8% improvement Efficiency: up to 390x speedup and sub-linear
scalability28