subconscious crowdsourcing: a feasible data collection mechanism for mental disorder detection on...

19
Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental Disorder Detection on Social Media Chun-Hao Chang, Elvis Saravia, and Yi-Shin Chen Institute of Information Systems and Applications National Tsing Hua University Hsinchu, Taiwan 30013, R.O.C. Email: { ccha97u, ellfae, yishin}@gmail.com 1

Upload: elvis-saravia

Post on 13-Apr-2017

53 views

Category:

Social Media


1 download

TRANSCRIPT

Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental

Disorder Detection on Social Media

Chun-Hao Chang, Elvis Saravia, and Yi-Shin Chen

Institute of Information Systems and Applications

National Tsing Hua University

Hsinchu, Taiwan 30013, R.O.C.

Email: { ccha97u, ellfae, yishin}@gmail.com

1

Introduction

➔ One in three persons report sufficient criteria for at least one form of mental disorder at some point in their life.

➔ 16% in US suffer from some form of mental disorder. The leading cause of disability worldwide.

➔ Problem: Majority of cases remain largely undetected. Diagnosis is difficult.

➔ Solution: Social networks provide a venue for mental disorder research.

Source: Wikipedia 2

Background

Bipolar Disorder:

- Unstable and impulsive emotions- Cycling between mania and depression

Borderline Personality Disorder:

- Unstable and impulsive emotions- Impaired social interactions

3

Motivation

➔ Open access to patients data from social websites.

➔ Build a real-time mental health assessment tool to assist in diagnosis.

4

Related Work

➔ Predicting Depression via Social Media - Microsoft (M De Choudhury, M Gamon, S Counts, E Horvitz - ICWSM, 2013)

1. Collected data using crowdsourcing platform, Amazon Mechanical Turk.2. Purchased Twitter data.3. Prediction of depression before diagnosis.

➔ Quantifying Mental Health Signals in Twitter - John Hopkins University (Coppersmith, G., Dredze, M., & Harman, C. (2014))

1. Automatically collected patients by keyword matching (e.g., “I was diagnosed with X”).2. Predicts 4 different kinds of mental disorders.

Limitation: Data not easily accessible or reproduced.

5

Challenges

➔ How to identify online patients?

➔ How to efficiently collect patients data?

➔ Avoid selection bias - Is the predictive model detecting patients with mental illnesses or just people talk about it?

6

Objectives

➔ To build predictive models for the purpose of mental disorder detection.

➔ To extract features which alleviate the selection bias problem.

➔ Standardize features for mental disorder detection.

7

Methodology

8

Data Collection

➔ Subconscious crowdsourcing - a reliable and efficient mechanism to gather patients data. Community is the key element.

Therapist

Patients

9

Preprocessing

➔ Twitter accounts with more than 100 posts

➔ Accounts with more than 50% hyperlinks were also removed

Purpose: Getting rid of spam accounts.

10

Feature Extraction

➔ Overall, we are interested in linguistic and behavioural features.

➔ Information that reveals a user’s personality and behavior: emotion transition, social interactions, age, gender, etc.

➔ TF-IDF, LIWC, and Pattern of Life Features

11

Features

➔ TF-IDF Model:◆ Unigrams and bigrams

➔ LIWC (Linguistic Inquiry and Word Count):◆ Thoughts, feeling, personality and motivation

➔ Pattern of Life:◆ Emotional scores, age, and gender◆ Polarity features (negative ratio, positive ratio, positive combo,

negative combo, and flips ratio)◆ Social features (tweeting frequency, mention ratio, frequent

mentions, and unique mentions)

12

Experiments: Data

Group Users Tweets Averaged Tweets

Random Samples 548 796957 1454.3

Bipolar Patients 278 347774 1250.99

BPD Patients 203 225774 1112.19

Bipolar Experts 11 14056 1611.67

BPD Experts 9 19696 1790.55

13

Experiments: Evaluation

➔ Three predictive models (Random Forest) for each mental disorder◆ Pattern of Life Model◆ TF-IDF Model◆ LIWC Model

➔ Three experiments◆ 10-Fold Cross Validation Test◆ Selection Bias Test◆ Limited Data Test

14

10-Fold Cross Validation

Pattern of Life 0.90

LIWC 0.91

TF-IDF 0.96

Pattern of Life 0.91

LIWC 0.90

TF-IDF 0.9615

Selection Bias TestIs model detecting user suffering from mental disorder or just talking about it?

Bipolar BPD

mentalhealth dbt

meds feeling

blog borderline

therapy helps

anxiety self harm

thoughts psychiatrist

feel better cpn

electroboyusa disorder

health bpdchat

bipolarblogger depression

Top TF-IDF terms16

Data Limitation

What if user only has a few tweets?

17

Conclusion

➔ We proposed an efficient and accessible mechanism for collection patients data.

➔ We improved the Pattern of Life Model to produce better predictions.

➔ Address selection bias problem, previously not addressed.

Future work: Support more mental illnesses

18

Demonstration

19