on the ground validation of online diagnosis with twitter and medical records

Post on 30-Jul-2015

116 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

+

On the Ground Validation of Online Diagnosis with Twitter and Medical Records

2015/6/8(Mon.)Chang Wei-Yuan @ MakeLab Lab Meeting

Todd Bodnar, Victoria BarclayWWW’ 14

+Outline

nIntroduction

nData Collection

nFeature Signals

nMeta Classifier

nConclusion

nThought

2

Introduction

nSocial media has been considered as a data source for tracking disease. n However, most analyses are based on

models with population level disease rates.

nThis paper develop a novel system for social-media based disease detection at the individual level.

Introduction

nHow do you track a disease?n Traditional Systems

n Doctor Reports It

n Self Reporting Systemsn Self Reports It

n Data Mining Systemsn Tweets Reports It

n Goal:extend Data Mining Systems with high accuracy

Data Collection

n104 Students with Twitter accounts

n35 diagnosed with Influenza

nStudy period: August 2012 - May 2013

Data Collection

n52,301 Tweets, 1609 when sick

n194,835 Friends and Followers

n31,103,713 Tweets from Friends orFollowers

n17/35 users that used Twitter while sick explicitly mention illness

Feature Signals

nAutomated text classificationn Expert keyword selectionn Machine keyword selection

nAnomaly Detection

nNetwork Classification

7

Automated text classification

nWe consider diagnosis based on the content of a user’s tweets.

nExpert keyword selectionn A set of keywords are defined that are

possibly signals of influenza. n { flu, influenza, sick, cough, cold, medicine,

fever, … }

Automated text classification

nMachine keyword selectionn We try algorithmically selecting keywords by

first finding the 12,393 most common keywords in the data set.

n rank them based off of information gain on predicting influenza.

Automated text classification

Anomaly Detection

nIn addition to illness affecting the content of individuals’ tweets, it is likely that illness also affects the rate at which individuals tweet.

nTo detect this, we perform one-dimensional anomaly detection on each user’s monthly tweeting rate as follows.

Anomaly Detection12

Network Classification

nEven if a user is not currently active on Twitter, users on her social network may give clues to her health status.

nAccounts that follow a user are referred to as her ‘followers,’ and accounts that a user follow are referred to as her ‘friends.’

nWe consider all text that a user’s friends or followers tweeted and perform keyword analysis.

Network Classification

Meta-classifiers

nSo far we have considered five separate methods for detecting illness based off of a user’s Twitter activity:n hand chosen keyword analysisn data mined keyword analysisn anomaly detectionn network analysis.

Meta-classifiers

nAggregating multiple classifiers by a ‘meta classifier’ has been shown to be an effective method for increasing classification accuracy

Meta-classifiers

Conclusion

nIn this paper, we have shown that it is possible to diagnose an individual from her social media data with high accuracy.n Combined long-term twitter data with medical

recordsn Able to find signal of disease in most Twitter

users that were sick

Thought

+Thanks for listening.2015 / 5 / 8 (Mon.) @ MakeLab Lab Meetingv123582@gmail.com

top related