on the ground validation of online diagnosis with twitter and medical records

20
+ On the Ground Validation of Online Diagnosis with Twitter and Medical Records 2015/6/8(Mon.) Chang Wei-Yuan @ MakeLab Lab Meeting Todd Bodnar, Victoria Barclay WWW’ 14

Upload: chang-wei-yuan

Post on 30-Jul-2015

116 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

+

On the Ground Validation of Online Diagnosis with Twitter and Medical Records

2015/6/8(Mon.)Chang Wei-Yuan @ MakeLab Lab Meeting

Todd Bodnar, Victoria BarclayWWW’ 14

Page 2: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

+Outline

nIntroduction

nData Collection

nFeature Signals

nMeta Classifier

nConclusion

nThought

2

Page 3: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Introduction

nSocial media has been considered as a data source for tracking disease. n However, most analyses are based on

models with population level disease rates.

nThis paper develop a novel system for social-media based disease detection at the individual level.

Page 4: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Introduction

nHow do you track a disease?n Traditional Systems

n Doctor Reports It

n Self Reporting Systemsn Self Reports It

n Data Mining Systemsn Tweets Reports It

n Goal:extend Data Mining Systems with high accuracy

Page 5: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Data Collection

n104 Students with Twitter accounts

n35 diagnosed with Influenza

nStudy period: August 2012 - May 2013

Page 6: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Data Collection

n52,301 Tweets, 1609 when sick

n194,835 Friends and Followers

n31,103,713 Tweets from Friends orFollowers

n17/35 users that used Twitter while sick explicitly mention illness

Page 7: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Feature Signals

nAutomated text classificationn Expert keyword selectionn Machine keyword selection

nAnomaly Detection

nNetwork Classification

7

Page 8: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Automated text classification

nWe consider diagnosis based on the content of a user’s tweets.

nExpert keyword selectionn A set of keywords are defined that are

possibly signals of influenza. n { flu, influenza, sick, cough, cold, medicine,

fever, … }

Page 9: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Automated text classification

nMachine keyword selectionn We try algorithmically selecting keywords by

first finding the 12,393 most common keywords in the data set.

n rank them based off of information gain on predicting influenza.

Page 10: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Automated text classification

Page 11: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Anomaly Detection

nIn addition to illness affecting the content of individuals’ tweets, it is likely that illness also affects the rate at which individuals tweet.

nTo detect this, we perform one-dimensional anomaly detection on each user’s monthly tweeting rate as follows.

Page 12: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Anomaly Detection12

Page 13: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Network Classification

nEven if a user is not currently active on Twitter, users on her social network may give clues to her health status.

nAccounts that follow a user are referred to as her ‘followers,’ and accounts that a user follow are referred to as her ‘friends.’

nWe consider all text that a user’s friends or followers tweeted and perform keyword analysis.

Page 14: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Network Classification

Page 15: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Meta-classifiers

nSo far we have considered five separate methods for detecting illness based off of a user’s Twitter activity:n hand chosen keyword analysisn data mined keyword analysisn anomaly detectionn network analysis.

Page 16: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Meta-classifiers

nAggregating multiple classifiers by a ‘meta classifier’ has been shown to be an effective method for increasing classification accuracy

Page 17: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Meta-classifiers

Page 18: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Conclusion

nIn this paper, we have shown that it is possible to diagnose an individual from her social media data with high accuracy.n Combined long-term twitter data with medical

recordsn Able to find signal of disease in most Twitter

users that were sick

Page 19: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Thought

Page 20: On the Ground Validation of Online Diagnosis with Twitter and Medical Records

+Thanks for listening.2015 / 5 / 8 (Mon.) @ MakeLab Lab [email protected]