automatically facilitating discussion in online health forums€¦ · automatically facilitating...

Automatically Facilitating Discussion in Online Health Forums

Kishaloy Haldar, Min-Yen Kan, Lahari Poddar and Kazunari Sugiyama

25 Apr 2019 NExT++ Workshop 1

Online Discussion Forums

Learning from the community’s collective wisdom

Users ask questions, share anecdotal observations

Others reply with relevant information or personal opinions

2Photo Credits: https://www.pexels.com/

25 Apr 2019 NExT++ Workshop

https://www.pexels.com/

Thread

Users


Challenges

Large number of users, threads, discussion topics

Semi-structured posts

Long trails of posts inside threads

Continuous influx of threadsHow do we help users navigate through the online discussion

forums efficiently?

425 Apr 2019 NExT++ Workshop Photo Credits: freestocks.org on Unsplash

http://freestocks.org/

Solutions Challenges

Matching Users Interest[ CIKM 2017 ]

Handling Newly Created Threads

Identifying Helpful Posts

525 Apr 2019 NExT++ Workshop Photo Credits: Dan Gold on Unsplash

<< NOW

Kishaloy Halder, Min-Yen Kan, Kazunari Sugiyama, "Health Forum Thread Recommendation using Interest Aware Topic Model", 2017, CIKM.

https://unsplash.com/photos/4_jhDO54BYg?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText

2

1

4

3

Users Items/Threads

?

A recommendation system tries to find a match between users and items, given past interactions

Potential Benefits

Better visibility of threads

Improved quality of responses

Reduced latency for responses


Sample Posts

PostText is there anyone experiencing lower back pain while standing up after sitting for a while?...”

“... My lower back and hips painsa lot. It even hurts to walk somedays ...”

Lexically similar posts

.. yet associated with different conditions

And with substantially different treatments

Incomplete reporting of conditions

Inter Condition Similarity

Condition Dependent Treatments for Symptoms

Symptoms Treatments

ALS Parkinson’s Disease

Fatigue Wheelchair (powered),Non-invasive Ventilator,Modafinil

Modafinil,Amatadine,Methylphenidate

Stiffness/Spasticity

Message Therapy,Baclofen

Ropinirole,Rasagiline

Associated Condition

ALS (Amyotrophic Lateral Sclerosis)

Parkinson’s Disease


Sample Posts

PostText is there anyone experiencing lower back pain while standing up after sitting for a while?...”

“... My lower back and hips painsa lot. It even hurts to walk somedays ...”

Post Text “Have been suffering from back pain for couple of years now and it’s getting worse now I can barely put weight on the right leg...”

“I get a lot of cramps in the leg. The only remedy is to stand up and put the body weight on it...”

Lexically similar posts, as they are from the same condition

• Common affected body parts

… yet associated with different symptoms

Intra Condition Similarity

AssociatedSymptom

Back pain Cramps in leg

Sample Posts in ALS forum


9

A Two-stage Probabilistic Framework

Inter Condition Similarity

Intra Condition Similarity

Interest Aware Topic Model

Jointly Normalized Collaborative

Topic Regression


Interest Aware Topic Model (IATM)

A generic topic model that involves:• Self reported interests (conditions; e.g., ALS, Diabetes)• Latent topics within the interest (symptoms; e.g., tremor)

Similar signals observed in other social platforms• Quora (Politics, Sports, Movies)• StackOverflow (Java, OS)

Reported interests can be incomplete:But some interests can be inferred from users’ interaction with other threads

∴ IATM is a generative model:

Inputs: User profile, thread document, user-reported conditions

Outputs:– User-interest-topic distribution– Thread-interest-topic distribution

u1

t4

t3

t2t1

Diabetes, Pancreatitis Diabetes

Diabetes, Arthritis

Diabetes,Hypothyroidism

Diabetes?


Generative Story of IATM

Ddocuments

wword

Ndwords in doc d

id

Set of interests for doc d

i z

interest for word w topic for word w

θuU.I.Z V.I.ZUser–Interest-Topic Distribution

Thread–Interest-Topic Distribution

User 123:

Interest topic1 topic2

Diabetes: 0.8 0.3 0.7

High BP: 0.2 0.4 0.6

Thread 456:

Interest topic1 topic2

Diabetes: 0.3 0.2 0.8

Insomnia: 0.7 0.9 0.1

Addresses Inter Condition similarity

Identifies distribution of conditions for every word, per thread

φI.Z

Word-Interest-Topic Distribution


θv

Jointly Normalized Collaborative Topic Regression (JNCTR)

• Enhanced version of Collaborative Topic Regression (CTR) [Wang et al., 2011]

• CTR only considers the latent topic distribution for the item (thread)• Unable to capture users’ distribution

JNCTR captures both user and thread latent distributions

• Red component introduced to CTR

vλv

Өv

r

uλu

U

V

Өu

Threads

Users

Rating

Thread-topic distribution

User–topic distribution

Thread vector

User vector

Regularization

Regularization

Addresses Intra Condition similarity

Adjusts the IATM discovered thread-, user-topic distributions within an interest (condition)


Dataset

Health Boards (HBD)• Excluding generic categories: Family, Support, Healthcare, General.

Retained top (tf-idf based) 8,000 words

Dataset # Users # Threads # Posts Avg. Posts / Thread

# Distinct Conditions

Avg. Conditions / User

HBD 127,903 155,863 716,744 4.6 235 4.01


Baseline ComparisonMethod User–Thread

InteractionUser Docs Thread Docs User reported

Conditions

Baselines

Our Methods

Non-negative Matrix Factorization (NMF) [Lee ’01]

√

AuthorTopic Model (AT) [Rosen-Zvi ’04]

√ √

CTR [Wang ’11] √ √Context Aware Recommender (CAR) [Rendle ’12]

√ √

IATM √ √ √AT + JNCTR √ √ √IATM + JNCTR √ √ √ √


Experimental Results – In-Matrix Prediction

In Matrix ≡ Every user or thread in test occurs at least once in training.

√ √ ? X

√ √ √ ?

X √ ? √

X √ √ ?

Threads

Use

rs

IATM + JNCTR consistently outperforms others

User–Thread interaction is the strongest signal

• Both Author–Topic model, and IATM suffers in its absence


Experimental Results – Out of Matrix Prediction

Newly created Threads: Unseen Threads in training

Newly joined Users: Unseen Users in training

√ √ X ? ?

√ X X ? ?

√ X √ ? ?

Threads

Use

rs

Not possible to predict without additional signals about the thread

- NMF, CAR do not work

√ √ X X

X √ X √

X √ √ X

? ? ? ?

? ? ? ?

Use

rs

Not possible to predict without additional signals about the user- NMF, CAR, CTR do not work

Threads

Method MRR

Author–Topic 0.025

CTR 0.131

IATM 0.094

AT + JNCTR 0.164

IATM + JNCTR 0.221

Method MRR

Author–Topic 0.062

IATM 0.109

AT + JNCTR 0.110

IATM + JNCTR 0.146

NMF based models suffer from cold-start

Our model alleviate the problem using additional context


Predicting Missing Conditions

Many people do not report their conditions in the websites• Around 13% patients report no condition at all

However, they participate regularly in the forums

# reported conditions

Can we predict the missing conditions from User-interest distribution learnt by IATM?

Yes!

Simulation Experiment• Omit 1-3 conditions for each user• How much can we recover?

# held-out Conditions Perfect Recall

1 0.64

2 0.45

3 0.39


Making sense of topic models: Topics from 5 Diseases

ALS Parkinson’sDisease

Diabetes Cancer HeartDisorders

als neurologist carb cancer rate

reflexes pd sugar chemo beating

muscle tremors insulin radiation bp

amyotrophic scan glucose cells fast

nervous shaking levels prayers palpitations

irregular facial eat god atrial

brain control diet luck verapamil

For terminal diseases, people often participate for emotional support

“Would love to talk to anyone with ovarian cancer .. really believe faith can play a huge role in recovering and also positive attitude .. I wish this disease didn’t exist”


Solutions

Matching Users Interest

Handling Newly Created Threads[ XMLC 2017 ]

Identifying Helpful Posts


<< NOW

Kishaloy Halder, Lahari Poddar, Min-Yen Kan, "Cold Start Thread Recommendation as Extreme Multi-label Classification", 2018, XMLC for Social Media, WWW.


Cold Start Thread Recommendation

New contents created continuously in Web 2.0

• Threads in discussion forums• Questions in community question answering

platforms• Social Media posts

Task: Recommend new threads to potentially interested users in order to get them answered

In recommendation, this is known as cold-start

20Photo Credits: https://www.pexels.com/



Cold Start Item Recommendation

Typically user and item are represented as vectors in latent factor models

• ith User à ui

• jth Item à vj

Predicted recommendation is obtained by

rij =ui .vjT

For a new item j=4:• vj=4 is randomly initialized• Its rating can not be predicted for any user

Interaction Graph Interaction Matrix


Revisiting Cold Start as eXtreme Multi-Label Classification (XMLC) In absence of interaction history for a newly created thread, traditional recommendation systems break

Need to use the textual content of a thread in order to find potentially interested users

Can be viewed as an Extreme Multi-Label Text Classification problem• Existing users à Class labels• Out-of-matrix thread recommendation à multi-label classification

22

New Thread XMLC

1

0

1

…

0

Input Output

Users


Extreme Multi-label Text Classification

Large dimensionality in the number of labels• Thousands or more• Typically used for tag prediction – Wiki pages, ecommerce products

Multi-Label Classification Models

• Embedding based Method: SLEEC (NIPS ’15)• Tree based Method: FastXML (KDD ’14)• Deep Learning Method: XML-CNN (SIGIR ’17)

(State-of-the-art for XMLC)


Our Approach

Propose an NN to predict the subset of users interested in a new thread from the set of forum community users

Textual content is encoded to a lower dimensional space • Word embedding: maps words to vectors• Bi-directional GRUs: encodes sequence of words

A universal encoding of a post text might not be enough… different users have different interests/expertise


“I have been recommended to undergo tracheotomy and put in a PEG. I am wondering how many days I’ll have to stay in the hospital? Will I have a hard time adjusting afterwards? Does the hose need to be connected while transferring? Will the equipments take up a lot of room? How do you call for help?..”


The post contains diverse questions –Different parts can be answered by different users

Photo Credits: Lucian Novesel on Unsplash, Wikipedia

https://unsplash.com/photos/Qg-r7OxZN7A?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText

Method: Cluster Sensitive Attention

Attention mechanism: effective in capturing important parts of the text• Gives weights to words of post• Post encoding: weighted sum of word encodings

Separate attention for every user: … but not scalable due to huge number of parameters

Hypothesis: Clusters of users exist who are interested in similar items

Cluster sensitive attention on textual content• # users, K clusters where $ << #• K attention layers

Each attention layer captures cluster-specific preferences

2625 Apr 2019 NExT++ Workshop Photo Credits: JustinChrn on Unsplash,

https://unsplash.com/@justinchrn

Overall Architecture

27

Experiments - Datasets

4 forum datasets across multiple domains• Online Health Forum: Epilepsy, ALS, Fibromyalgia• StackOverflow: removed the code snippets

Metrics: Recall@M, nDCG@M, MRR

Dataset # users # threads Avg. # word in thread

Avg. # user per thread

Sparsity

Epilepsy 1506 2056 147 7.39 99.49%

ALS 3182 8083 148 9.85 99.69%

Fibromyalgia 5669 10,270 203 9.02 99.84%

StackOverflow 69,631 34,172 93 6.81 99.99%


Experiments – Results (MRR)

Dataset CVAE CTR CNN-Kim XML-CNN BiGRU-2 Our Model

Epilepsy 0.159 0.443 0.536 0.551 0.631 0.671

ALS 0.201 0.275 0.270 0.293 0.297 0.306

Fibromyalgia 0.304 0.435 0.669 0.668 0.740 0.773

StackOverflow 0.003 0.032 0.025 0.029 0.047 0.050

Our model outperforms the baselines in all cases

StackOverflow is most challenging• Highest sparsity• Shorter threads• Larger number of target users


Experiments – Results (Recall@M)

Dataset Rec@M CVAE CTR CNN-Kim

XML-CNN

BiGRU-2

Our Model

Epilespy30 21.14 43.83 44.63 49.08 50.99 51.21

50 29.62 50.86 52.45 53.69 59.47 59.80

100 42.44 59.93 65.77 63.67 68.23 69.37

ALS30 17.04 25.00 22.15 23.56 30.18 31.84

50 24.07 32.46 31.27 30.61 36.32 36.55

100 35.77 44.14 43.82 43.14 48.12 49.78

Fibromyalgia30 32.83 54.39 58.04 61.83 62.39 63.06

50 42.43 63.91 67.83 68.92 69.17 72.04

100 55.02 72.31 76.37 75.74 77.98 78.19

Stackoverflow30 0.16 2.73 1.84 2.42 2.94 2.80

50 0.31 4.02 2.74 3.43 4.03 4.11

100 0.69 6.36 4.43 5.35 6.09 6.33

Our model outperforms baselines in most cases

Scores at smaller & are not important:

New content is targeted to a much larger audience by common practice

Cluster sensitive attention is effective


Solutions

Matching Users Interest

Handling Newly Created Threads

Identifying Helpful Posts[NAACL ’19]


<< NOW

Kishaloy Halder, Min-Yen Kan, Kazunari Sugiyama, "Predicting Helpful Posts in Open-Ended Discussion Forums: A Neural Architecture", 2019, NAACL


Discussion Forum ≠

Community Question Answering

CQA mostly receives factoid based questions• Single correct answer

In contrast, in discussion forums, the thread opening post is not always a question

• Personal Anecdotes, Asking for recommendations• Multiple correct answers

Threads are more subjective (open-ended) in discussion forums

3225 Apr 2019 NExT++ WorkshopPhoto Credits: https://www.pexels.com/


Discussion Forum ≠

Community Question Answering

CQA mostly receives factoid based questions• Single correct answer

In contrast, in discussion forums, the thread opening post is not always a question

• Personal Anecdotes, Asking for recommendations• Multiple correct answers

Threads are more subjective (open-ended) in discussion forums


Predicting Helpful Posts

Task: Given post text, Identify whether post is ‘helpful’ to user

• interested in the textual content of the posts only (not social media-style features, e.g., user profile, followers, etc.)

Helpfulness: decided by user feedback• “upvote”, “like”, “mark as helpful”, “highlight”

Motivation:• Early detection of helpful posts can aid to the

recommendation process• Can also help in summarizing long running threads

Discussion Thread

Order Post Text Helpful?

1 How to do X?

2 Do you really need X? No

3 Sorry, new here. No

4 Sure, follow these steps… Yes!

5 I can tell you about Y. No Notify usersInterested in X


Our Approach

Hypothesis: A post would be helpful if it is • relevant to the original post and • introduces some novel information compared to past posts in the same thread

Order Post Text Relevant? Novel? Helpful?

Original Post I was working yesterday .. and my back was bent over and when I got up I felt like I strained my back but now my mind is

linking it to my kidney..

1 I have this and my doc has told me it’s muscular and physio might help..

Yes Yes Yes

2 Kidney pain is usually constant and doesn’t change when you move, or get better when you change position, from how I

understand it .. you’ll be fine :)

Yes Yes Yes

3 If it happens only when you move there is a big chance it’s a muscle spasm, this happens after some physical activities.

Yes No No

Sample thread from Reddit


Target Post Past(Original Post Past1

Relevance Novelty

…

Fully Connected Layer

X X

Sequence Encoder

Text Encoder

Helpfulness

Concatenate

Past ( posts in the same thread

Text Encoder

Sequence Encoder

Post Helpfulness Prediction Model

Neural Architecture

Post content is never used directly to avoid popularity bias

Trained with binary cross-entropy loss

End-to-end trainable


Experiments - Datasets

Coursera: MOOC discussion forum on online lectures• Android Apps• Matrix

Travel Stack Exchange• Questions are mainly subjective

Data splitting: 80-10-10 for train, dev, and test sets

Dataset # Posts # Threads Avg. # Posts / Thread

Avg. # Words / Post

Reddit_10+ 200,006 9,744 20.52 29.45

Reddit_3+ 200,016 28,763 6.95 30.58

Android Apps 11,643 2,077 5.60 56.53

Matrix 19,159 2,484 4.08 65.30

Travel 30,116 10,250 2.93 163.43


Reddit: a generic discussion forum• Public dumps available

• Created two datasets to understand modeling capabilities• Reddit_10+: with threads having

more than 10 posts• Reddit_3+: threads w/ >= 3 posts

Experiments - Baselines

• BiLSTM (Sun et al., ’17): Bidirectional LSTM encoders on post text. • Stacked LSTM (Liu et al., ’16): a stack of 2 LSTM layer encoders on the post text. • LSTM with Attention (Rocktäschel et al., ’16): LSTM with hierarchical attention. • Answer Sentence Selection (Yu et al., ’14): a CNN model pioneered in TREC QA.

Ablation Study

• Only the relevance component • Only the novelty component

Ground Truth Label for Helpfulness

User feedback in forms of “upvote”, “like”, “mark as helpful”80th percentile vote count as the threshold


ResultsModel Reddit_10+ Reddit_3+ Android Apps Matrix Travel

P R F1 P R F1 P R F1 P R F1 P R F1

BiLSTM 0.23 0.23 0.23 0.23 0.22 0.22 0.36 0.32 0.34 0.29 0.35 0.32 0.28 0.31 0.29

Stacked LSTM 0.24 0.21 0.22 0.23 0.20 0.21 0.34 0.29 0.31 0.32 0.29 0.31 0.23 0.26 0.25

LSTM with Attention 0.24 0.21 0.23 0.24 0.21 0.22 0.34 0.27 0.30 0.30 0.36 0.33 0.25 0.26 0.25

Answer Sentence Selection 0.28 0.27 0.27 0.31 0.32 0.32 0.28 0.21 0.24 0.33 0.34 0.33 0.30 0.31 0.31

Our Model (Relevance only) 0.30 0.30 0.30 0.32 0.34 0.33 0.31 0.35 0.33 0.38 0.31 0.34 0.35 0.30 0.32

Our Model (Novelty only) 0.53 0.38 0.44 0.42 0.27 0.33 0.33 0.24 0.28 0.43 0.27 0.33 0.47 0.27 0.34

Our Model (full) 0.48 0.53 0.51 0.41 0.39 0.40 0.35 0.40 0.38 0.37 0.37 0.37 0.37 0.31 0.34

• A challenging task from text-only perspective• Our model outperforms the state-of-the-art text classification models • Ablation study shows that considering original post or past posts help compared to the vanilla models


Effect of Context Length

Let’s vary the context length of particular post!

Context length = k, novelty would be computed against past ( posts (if existing)

Keeping the entire context in long threads (e.g., thread with 40 posts) is infeasible for humans• Set context length from 1-18 for Reddit_3+

Longer context improves accuracy in general

The accuracy improves sharply from context length 1-11

From length 11-18, the improvement is positive but the rate is lower

A trade-off exists between training time and accuracy


Conclusion

Pioneered techniques to help discussion forum users navigate threads

Thread Visibility• Matching Users Interest • Handling Newly Created Threads

Thread Helpfulness• Identifying Helpful Posts

Applicable to many domains: e-health, MOOCs, and generic discussion forums

25 Apr 2019 NExT++ Workshop 41

Questions? Ask Kishaloy! >>(he’s on the market!)

42

Back up Slides


Pipeline Overview


IATM Learning

44

• Gibbs sampling• For doc d

• id[] ßpossible conditions for the doc• For word w

• a ß author of word w• * ⃪ 0, * = /0 ∗ 2

• P[i][z] ß (0.5 * P(z|a, i) + 0.5* P(z|d, i)) * P(w|i, z)

• Sample i, z from P[]• U.I.Z ßUser-Interest-Topic distribution• V.I.Z ßThread-Interest-Topic distribution


JNCTR – Learning

45

• We develop an EM style algorithm• Maximize the likelihood function 3

• 456 is the confidence parameter for 756

• Where 8 > : > 0


JNCTR – Learning…

46

• ;<;=5

= 0, ;<;>6 = 0 ⇒

• Prediction:

• Recommendation:• Recommend items to user with high predicted

ratings


Author Topic Model


Modeling Users’ Evolving Interests for Discussion Forum Thread

Recommendation


Users’ Evolving Interests

• Users’ interests keep evolving over time [RecSys 2016, WSDM 2017]• Items evolve as well in certain domains – Movies, Books

• Not so much in discussion forums

80 percentile Movie lifetime = 5.7 years 80 percentile Thread lifetime = 16 days


Trend Aware Recommendation System

• Hypothesis: A user interacts with a thread if it is relevant to her recent (or long term) interests, and introduces something new with respect to her past interacted threads


automatically facilitating discussion in online health forums€¦ · automatically facilitating...

Documents