automatically facilitating discussion in online health forums€¦ · automatically facilitating...
TRANSCRIPT
Automatically Facilitating Discussion in Online Health Forums
Kishaloy Haldar, Min-Yen Kan, Lahari Poddar and Kazunari Sugiyama
25 Apr 2019 NExT++ Workshop 1
Online Discussion Forums
Learning from the community’s collective wisdom
Users ask questions, share anecdotal observations
Others reply with relevant information or personal opinions
2Photo Credits: https://www.pexels.com/
25 Apr 2019 NExT++ Workshop
Thread
Users
325 Apr 2019 NExT++ Workshop
Challenges
Large number of users, threads, discussion topics
Semi-structured posts
Long trails of posts inside threads
Continuous influx of threadsHow do we help users navigate through the online discussion
forums efficiently?
425 Apr 2019 NExT++ Workshop Photo Credits: freestocks.org on Unsplash
Solutions Challenges
Matching Users Interest[ CIKM 2017 ]
Handling Newly Created Threads
Identifying Helpful Posts
525 Apr 2019 NExT++ Workshop Photo Credits: Dan Gold on Unsplash
<< NOW
Kishaloy Halder, Min-Yen Kan, Kazunari Sugiyama, "Health Forum Thread Recommendation using Interest Aware Topic Model", 2017, CIKM.
2
1
4
3
Users Items/Threads
?
A recommendation system tries to find a match between users and items, given past interactions
Potential Benefits
Better visibility of threads
Improved quality of responses
Reduced latency for responses
625 Apr 2019 NExT++ Workshop
Sample Posts
PostText is there anyone experiencing lower back pain while standing up after sitting for a while?...”
“... My lower back and hips painsa lot. It even hurts to walk somedays ...”
Lexically similar posts
.. yet associated with different conditions
And with substantially different treatments
Incomplete reporting of conditions
Inter Condition Similarity
Condition Dependent Treatments for Symptoms
Symptoms Treatments
ALS Parkinson’s Disease
Fatigue Wheelchair (powered),Non-invasive Ventilator,Modafinil
Modafinil,Amatadine,Methylphenidate
Stiffness/Spasticity
Message Therapy,Baclofen
Ropinirole,Rasagiline
Associated Condition
ALS (Amyotrophic Lateral Sclerosis)
Parkinson’s Disease
725 Apr 2019 NExT++ Workshop
Sample Posts
PostText is there anyone experiencing lower back pain while standing up after sitting for a while?...”
“... My lower back and hips painsa lot. It even hurts to walk somedays ...”
Post Text “Have been suffering from back pain for couple of years now and it’s getting worse now I can barely put weight on the right leg...”
“I get a lot of cramps in the leg. The only remedy is to stand up and put the body weight on it...”
Lexically similar posts, as they are from the same condition
• Common affected body parts
… yet associated with different symptoms
Intra Condition Similarity
AssociatedSymptom
Back pain Cramps in leg
Sample Posts in ALS forum
825 Apr 2019 NExT++ Workshop
9
A Two-stage Probabilistic Framework
Inter Condition Similarity
Intra Condition Similarity
Interest Aware Topic Model
Jointly Normalized Collaborative
Topic Regression
25 Apr 2019 NExT++ Workshop
Interest Aware Topic Model (IATM)
A generic topic model that involves:• Self reported interests (conditions; e.g., ALS, Diabetes)• Latent topics within the interest (symptoms; e.g., tremor)
Similar signals observed in other social platforms• Quora (Politics, Sports, Movies)• StackOverflow (Java, OS)
Reported interests can be incomplete:But some interests can be inferred from users’ interaction with other threads
∴ IATM is a generative model:
Inputs: User profile, thread document, user-reported conditions
Outputs:– User-interest-topic distribution– Thread-interest-topic distribution
u1
t4
t3
t2t1
Diabetes, Pancreatitis Diabetes
Diabetes, Arthritis
Diabetes,Hypothyroidism
Diabetes?
1025 Apr 2019 NExT++ Workshop
Generative Story of IATM
Ddocuments
wword
Ndwords in doc d
id
Set of interests for doc d
i z
interest for word w topic for word w
θuU.I.Z V.I.ZUser–Interest-Topic Distribution
Thread–Interest-Topic Distribution
User 123:
Interest topic1 topic2
Diabetes: 0.8 0.3 0.7
High BP: 0.2 0.4 0.6
Thread 456:
Interest topic1 topic2
Diabetes: 0.3 0.2 0.8
Insomnia: 0.7 0.9 0.1
Addresses Inter Condition similarity
Identifies distribution of conditions for every word, per thread
φI.Z
Word-Interest-Topic Distribution
1125 Apr 2019 NExT++ Workshop
θv
Jointly Normalized Collaborative Topic Regression (JNCTR)
• Enhanced version of Collaborative Topic Regression (CTR) [Wang et al., 2011]
• CTR only considers the latent topic distribution for the item (thread)• Unable to capture users’ distribution
JNCTR captures both user and thread latent distributions
• Red component introduced to CTR
vλv
Өv
r
uλu
U
V
Өu
Threads
Users
Rating
Thread-topic distribution
User–topic distribution
Thread vector
User vector
Regularization
Regularization
Addresses Intra Condition similarity
Adjusts the IATM discovered thread-, user-topic distributions within an interest (condition)
1225 Apr 2019 NExT++ Workshop
Dataset
Health Boards (HBD)• Excluding generic categories: Family, Support, Healthcare, General.
Retained top (tf-idf based) 8,000 words
Dataset # Users # Threads # Posts Avg. Posts / Thread
# Distinct Conditions
Avg. Conditions / User
HBD 127,903 155,863 716,744 4.6 235 4.01
1325 Apr 2019 NExT++ Workshop
Baseline ComparisonMethod User–Thread
InteractionUser Docs Thread Docs User reported
Conditions
Baselines
Our Methods
Non-negative Matrix Factorization (NMF) [Lee ’01]
√
AuthorTopic Model (AT) [Rosen-Zvi ’04]
√ √
CTR [Wang ’11] √ √Context Aware Recommender (CAR) [Rendle ’12]
√ √
IATM √ √ √AT + JNCTR √ √ √IATM + JNCTR √ √ √ √
1425 Apr 2019 NExT++ Workshop
Experimental Results – In-Matrix Prediction
In Matrix ≡ Every user or thread in test occurs at least once in training.
√ √ ? X
√ √ √ ?
X √ ? √
X √ √ ?
Threads
Use
rs
IATM + JNCTR consistently outperforms others
User–Thread interaction is the strongest signal
• Both Author–Topic model, and IATM suffers in its absence
1525 Apr 2019 NExT++ Workshop
Experimental Results – Out of Matrix Prediction
Newly created Threads: Unseen Threads in training
Newly joined Users: Unseen Users in training
√ √ X ? ?
√ X X ? ?
√ X √ ? ?
Threads
Use
rs
Not possible to predict without additional signals about the thread
- NMF, CAR do not work
√ √ X X
X √ X √
X √ √ X
? ? ? ?
? ? ? ?
Use
rs
Not possible to predict without additional signals about the user- NMF, CAR, CTR do not work
Threads
Method MRR
Author–Topic 0.025
CTR 0.131
IATM 0.094
AT + JNCTR 0.164
IATM + JNCTR 0.221
Method MRR
Author–Topic 0.062
IATM 0.109
AT + JNCTR 0.110
IATM + JNCTR 0.146
NMF based models suffer from cold-start
Our model alleviate the problem using additional context
1625 Apr 2019 NExT++ Workshop
Predicting Missing Conditions
Many people do not report their conditions in the websites• Around 13% patients report no condition at all
However, they participate regularly in the forums
# reported conditions
Can we predict the missing conditions from User-interest distribution learnt by IATM?
Yes!
Simulation Experiment• Omit 1-3 conditions for each user• How much can we recover?
# held-out Conditions Perfect Recall
1 0.64
2 0.45
3 0.39
1725 Apr 2019 NExT++ Workshop
Making sense of topic models: Topics from 5 Diseases
ALS Parkinson’sDisease
Diabetes Cancer HeartDisorders
als neurologist carb cancer rate
reflexes pd sugar chemo beating
muscle tremors insulin radiation bp
amyotrophic scan glucose cells fast
nervous shaking levels prayers palpitations
irregular facial eat god atrial
brain control diet luck verapamil
For terminal diseases, people often participate for emotional support
“Would love to talk to anyone with ovarian cancer .. really believe faith can play a huge role in recovering and also positive attitude .. I wish this disease didn’t exist”
1825 Apr 2019 NExT++ Workshop
Solutions
Matching Users Interest
Handling Newly Created Threads[ XMLC 2017 ]
Identifying Helpful Posts
1925 Apr 2019 NExT++ Workshop Photo Credits: Dan Gold on Unsplash
<< NOW
Kishaloy Halder, Lahari Poddar, Min-Yen Kan, "Cold Start Thread Recommendation as Extreme Multi-label Classification", 2018, XMLC for Social Media, WWW.
Cold Start Thread Recommendation
New contents created continuously in Web 2.0
• Threads in discussion forums• Questions in community question answering
platforms• Social Media posts
Task: Recommend new threads to potentially interested users in order to get them answered
In recommendation, this is known as cold-start
20Photo Credits: https://www.pexels.com/
25 Apr 2019 NExT++ Workshop
Cold Start Item Recommendation
Typically user and item are represented as vectors in latent factor models
• ith User à ui
• jth Item à vj
Predicted recommendation is obtained by
rij =ui .vjT
For a new item j=4:• vj=4 is randomly initialized• Its rating can not be predicted for any user
Interaction Graph Interaction Matrix
2125 Apr 2019 NExT++ Workshop
Revisiting Cold Start as eXtreme Multi-Label Classification (XMLC) In absence of interaction history for a newly created thread, traditional recommendation systems break
Need to use the textual content of a thread in order to find potentially interested users
Can be viewed as an Extreme Multi-Label Text Classification problem• Existing users à Class labels• Out-of-matrix thread recommendation à multi-label classification
22
New Thread XMLC
1
0
1
…
0
Input Output
Users
25 Apr 2019 NExT++ Workshop
Extreme Multi-label Text Classification
Large dimensionality in the number of labels• Thousands or more• Typically used for tag prediction – Wiki pages, ecommerce products
Multi-Label Classification Models
• Embedding based Method: SLEEC (NIPS ’15)• Tree based Method: FastXML (KDD ’14)• Deep Learning Method: XML-CNN (SIGIR ’17)
(State-of-the-art for XMLC)
2325 Apr 2019 NExT++ Workshop
Our Approach
Propose an NN to predict the subset of users interested in a new thread from the set of forum community users
Textual content is encoded to a lower dimensional space • Word embedding: maps words to vectors• Bi-directional GRUs: encodes sequence of words
A universal encoding of a post text might not be enough… different users have different interests/expertise
2425 Apr 2019 NExT++ Workshop
“I have been recommended to undergo tracheotomy and put in a PEG. I am wondering how many days I’ll have to stay in the hospital? Will I have a hard time adjusting afterwards? Does the hose need to be connected while transferring? Will the equipments take up a lot of room? How do you call for help?..”
2525 Apr 2019 NExT++ Workshop
The post contains diverse questions –Different parts can be answered by different users
Photo Credits: Lucian Novesel on Unsplash, Wikipedia
Method: Cluster Sensitive Attention
Attention mechanism: effective in capturing important parts of the text• Gives weights to words of post• Post encoding: weighted sum of word encodings
Separate attention for every user: … but not scalable due to huge number of parameters
Hypothesis: Clusters of users exist who are interested in similar items
Cluster sensitive attention on textual content• # users, K clusters where $ << #• K attention layers
Each attention layer captures cluster-specific preferences
2625 Apr 2019 NExT++ Workshop Photo Credits: JustinChrn on Unsplash,
Overall Architecture
27
Experiments - Datasets
4 forum datasets across multiple domains• Online Health Forum: Epilepsy, ALS, Fibromyalgia• StackOverflow: removed the code snippets
Metrics: Recall@M, nDCG@M, MRR
Dataset # users # threads Avg. # word in thread
Avg. # user per thread
Sparsity
Epilepsy 1506 2056 147 7.39 99.49%
ALS 3182 8083 148 9.85 99.69%
Fibromyalgia 5669 10,270 203 9.02 99.84%
StackOverflow 69,631 34,172 93 6.81 99.99%
2825 Apr 2019 NExT++ Workshop
Experiments – Results (MRR)
Dataset CVAE CTR CNN-Kim XML-CNN BiGRU-2 Our Model
Epilepsy 0.159 0.443 0.536 0.551 0.631 0.671
ALS 0.201 0.275 0.270 0.293 0.297 0.306
Fibromyalgia 0.304 0.435 0.669 0.668 0.740 0.773
StackOverflow 0.003 0.032 0.025 0.029 0.047 0.050
Our model outperforms the baselines in all cases
StackOverflow is most challenging• Highest sparsity• Shorter threads• Larger number of target users
2925 Apr 2019 NExT++ Workshop
Experiments – Results (Recall@M)
Dataset Rec@M CVAE CTR CNN-Kim
XML-CNN
BiGRU-2
Our Model
Epilespy30 21.14 43.83 44.63 49.08 50.99 51.21
50 29.62 50.86 52.45 53.69 59.47 59.80
100 42.44 59.93 65.77 63.67 68.23 69.37
ALS30 17.04 25.00 22.15 23.56 30.18 31.84
50 24.07 32.46 31.27 30.61 36.32 36.55
100 35.77 44.14 43.82 43.14 48.12 49.78
Fibromyalgia30 32.83 54.39 58.04 61.83 62.39 63.06
50 42.43 63.91 67.83 68.92 69.17 72.04
100 55.02 72.31 76.37 75.74 77.98 78.19
Stackoverflow30 0.16 2.73 1.84 2.42 2.94 2.80
50 0.31 4.02 2.74 3.43 4.03 4.11
100 0.69 6.36 4.43 5.35 6.09 6.33
Our model outperforms baselines in most cases
Scores at smaller & are not important:
New content is targeted to a much larger audience by common practice
Cluster sensitive attention is effective
3025 Apr 2019 NExT++ Workshop
Solutions
Matching Users Interest
Handling Newly Created Threads
Identifying Helpful Posts[NAACL ’19]
3125 Apr 2019 NExT++ Workshop Photo Credits: Dan Gold on Unsplash
<< NOW
Kishaloy Halder, Min-Yen Kan, Kazunari Sugiyama, "Predicting Helpful Posts in Open-Ended Discussion Forums: A Neural Architecture", 2019, NAACL
Discussion Forum ≠
Community Question Answering
CQA mostly receives factoid based questions• Single correct answer
In contrast, in discussion forums, the thread opening post is not always a question
• Personal Anecdotes, Asking for recommendations• Multiple correct answers
Threads are more subjective (open-ended) in discussion forums
3225 Apr 2019 NExT++ WorkshopPhoto Credits: https://www.pexels.com/
Discussion Forum ≠
Community Question Answering
CQA mostly receives factoid based questions• Single correct answer
In contrast, in discussion forums, the thread opening post is not always a question
• Personal Anecdotes, Asking for recommendations• Multiple correct answers
Threads are more subjective (open-ended) in discussion forums
3325 Apr 2019 NExT++ Workshop
Predicting Helpful Posts
Task: Given post text, Identify whether post is ‘helpful’ to user
• interested in the textual content of the posts only (not social media-style features, e.g., user profile, followers, etc.)
Helpfulness: decided by user feedback• “upvote”, “like”, “mark as helpful”, “highlight”
Motivation:• Early detection of helpful posts can aid to the
recommendation process• Can also help in summarizing long running threads
Discussion Thread
Order Post Text Helpful?
1 How to do X?
2 Do you really need X? No
3 Sorry, new here. No
4 Sure, follow these steps… Yes!
5 I can tell you about Y. No Notify usersInterested in X
3425 Apr 2019 NExT++ Workshop
Our Approach
Hypothesis: A post would be helpful if it is • relevant to the original post and • introduces some novel information compared to past posts in the same thread
Order Post Text Relevant? Novel? Helpful?
Original Post I was working yesterday .. and my back was bent over and when I got up I felt like I strained my back but now my mind is
linking it to my kidney..
1 I have this and my doc has told me it’s muscular and physio might help..
Yes Yes Yes
2 Kidney pain is usually constant and doesn’t change when you move, or get better when you change position, from how I
understand it .. you’ll be fine :)
Yes Yes Yes
3 If it happens only when you move there is a big chance it’s a muscle spasm, this happens after some physical activities.
Yes No No
Sample thread from Reddit
3525 Apr 2019 NExT++ Workshop
Target Post Past(Original Post Past1
Relevance Novelty
…
Fully Connected Layer
X X
Sequence Encoder
Text Encoder
Helpfulness
Concatenate
Past ( posts in the same thread
Text Encoder
Sequence Encoder
Post Helpfulness Prediction Model
Neural Architecture
Post content is never used directly to avoid popularity bias
Trained with binary cross-entropy loss
End-to-end trainable
3625 Apr 2019 NExT++ Workshop
Experiments - Datasets
Coursera: MOOC discussion forum on online lectures• Android Apps• Matrix
Travel Stack Exchange• Questions are mainly subjective
Data splitting: 80-10-10 for train, dev, and test sets
Dataset # Posts # Threads Avg. # Posts / Thread
Avg. # Words / Post
Reddit_10+ 200,006 9,744 20.52 29.45
Reddit_3+ 200,016 28,763 6.95 30.58
Android Apps 11,643 2,077 5.60 56.53
Matrix 19,159 2,484 4.08 65.30
Travel 30,116 10,250 2.93 163.43
3725 Apr 2019 NExT++ Workshop
Reddit: a generic discussion forum• Public dumps available
• Created two datasets to understand modeling capabilities• Reddit_10+: with threads having
more than 10 posts• Reddit_3+: threads w/ >= 3 posts
Experiments - Baselines
• BiLSTM (Sun et al., ’17): Bidirectional LSTM encoders on post text. • Stacked LSTM (Liu et al., ’16): a stack of 2 LSTM layer encoders on the post text. • LSTM with Attention (Rocktäschel et al., ’16): LSTM with hierarchical attention. • Answer Sentence Selection (Yu et al., ’14): a CNN model pioneered in TREC QA.
Ablation Study
• Only the relevance component • Only the novelty component
Ground Truth Label for Helpfulness
User feedback in forms of “upvote”, “like”, “mark as helpful”80th percentile vote count as the threshold
3825 Apr 2019 NExT++ Workshop
ResultsModel Reddit_10+ Reddit_3+ Android Apps Matrix Travel
P R F1 P R F1 P R F1 P R F1 P R F1
BiLSTM 0.23 0.23 0.23 0.23 0.22 0.22 0.36 0.32 0.34 0.29 0.35 0.32 0.28 0.31 0.29
Stacked LSTM 0.24 0.21 0.22 0.23 0.20 0.21 0.34 0.29 0.31 0.32 0.29 0.31 0.23 0.26 0.25
LSTM with Attention 0.24 0.21 0.23 0.24 0.21 0.22 0.34 0.27 0.30 0.30 0.36 0.33 0.25 0.26 0.25
Answer Sentence Selection 0.28 0.27 0.27 0.31 0.32 0.32 0.28 0.21 0.24 0.33 0.34 0.33 0.30 0.31 0.31
Our Model (Relevance only) 0.30 0.30 0.30 0.32 0.34 0.33 0.31 0.35 0.33 0.38 0.31 0.34 0.35 0.30 0.32
Our Model (Novelty only) 0.53 0.38 0.44 0.42 0.27 0.33 0.33 0.24 0.28 0.43 0.27 0.33 0.47 0.27 0.34
Our Model (full) 0.48 0.53 0.51 0.41 0.39 0.40 0.35 0.40 0.38 0.37 0.37 0.37 0.37 0.31 0.34
• A challenging task from text-only perspective• Our model outperforms the state-of-the-art text classification models • Ablation study shows that considering original post or past posts help compared to the vanilla models
3925 Apr 2019 NExT++ Workshop
Effect of Context Length
Let’s vary the context length of particular post!
Context length = k, novelty would be computed against past ( posts (if existing)
Keeping the entire context in long threads (e.g., thread with 40 posts) is infeasible for humans• Set context length from 1-18 for Reddit_3+
Longer context improves accuracy in general
The accuracy improves sharply from context length 1-11
From length 11-18, the improvement is positive but the rate is lower
A trade-off exists between training time and accuracy
4025 Apr 2019 NExT++ Workshop
Conclusion
Pioneered techniques to help discussion forum users navigate threads
Thread Visibility• Matching Users Interest • Handling Newly Created Threads
Thread Helpfulness• Identifying Helpful Posts
Applicable to many domains: e-health, MOOCs, and generic discussion forums
25 Apr 2019 NExT++ Workshop 41
Questions? Ask Kishaloy! >>(he’s on the market!)
42
Back up Slides
25 Apr 2019 NExT++ Workshop
Pipeline Overview
4325 Apr 2019 NExT++ Workshop
IATM Learning
44
• Gibbs sampling• For doc d
• id[] ßpossible conditions for the doc• For word w
• a ß author of word w• * ⃪ 0, * = /0 ∗ 2
• P[i][z] ß (0.5 * P(z|a, i) + 0.5* P(z|d, i)) * P(w|i, z)
• Sample i, z from P[]• U.I.Z ßUser-Interest-Topic distribution• V.I.Z ßThread-Interest-Topic distribution
25 Apr 2019 NExT++ Workshop
JNCTR – Learning
45
• We develop an EM style algorithm• Maximize the likelihood function 3
• 456 is the confidence parameter for 756
• Where 8 > : > 0
25 Apr 2019 NExT++ Workshop
JNCTR – Learning…
46
• ;<;=5
= 0, ;<;>6 = 0 ⇒
• Prediction:
• Recommendation:• Recommend items to user with high predicted
ratings
25 Apr 2019 NExT++ Workshop
Author Topic Model
4725 Apr 2019 NExT++ Workshop
Modeling Users’ Evolving Interests for Discussion Forum Thread
Recommendation
4825 Apr 2019 NExT++ Workshop
Users’ Evolving Interests
• Users’ interests keep evolving over time [RecSys 2016, WSDM 2017]• Items evolve as well in certain domains – Movies, Books
• Not so much in discussion forums
80 percentile Movie lifetime = 5.7 years 80 percentile Thread lifetime = 16 days
4925 Apr 2019 NExT++ Workshop
Trend Aware Recommendation System
• Hypothesis: A user interacts with a thread if it is relevant to her recent (or long term) interests, and introduces something new with respect to her past interacted threads
5025 Apr 2019 NExT++ Workshop