surveillance of social media: big data analytics

16
Presented by: Thomas Otto (Manager Business Intelligence) Dr. Mehnaz Adnan (Senior Scientist Health Intelligence) Institute of Environmental Science & Research Ltd. Credits: ESR – Dr. Mehnaz Adnan: Health Intelligence Analytics on Tweets ESR - Franco Andrews: SAP Data Integration, Modelling, Analytics and Visualisation ESR IT: Infrastructure / Firewall Soltius NZ - Erik Roelofs: Connection Module and SAP Data Services Syndromic Surveillance of Social Media - Big Data Analytics © ESR 2015

Upload: health-informatics-new-zealand

Post on 15-Apr-2017

702 views

Category:

Healthcare


0 download

TRANSCRIPT

Page 1: Surveillance of social media: Big data analytics

Presented by: Thomas Otto (Manager Business Intelligence)

Dr. Mehnaz Adnan (Senior Scientist Health Intelligence) Institute of Environmental Science & Research Ltd.

Credits:ESR – Dr. Mehnaz Adnan: Health Intelligence Analytics on Tweets ESR - Franco Andrews: SAP Data Integration, Modelling, Analytics and VisualisationESR IT: Infrastructure / FirewallSoltius NZ - Erik Roelofs: Connection Module and SAP Data Services

Syndromic Surveillance of Social Media - Big Data Analytics

© ESR 2015

Page 2: Surveillance of social media: Big data analytics

Problem statement and hypothesis• Individuals disclose a lot of personal information on Social Media

channels (i.e. Facebook, Twitter etc.)

• There’s lots of Social Media Data (SMD) out there and:-• It is very noisy • It is not verified • It needs to be curated (checked by a clinician)

• Personal information contains location, names and self diagnosed syndromes

• SMD could be used to feed an early warning surveillance system

Page 3: Surveillance of social media: Big data analytics

1. How to exploit twitter for public health monitoring (http://goo.gl/sOx9xo)

2. Digital disease detection—harnessing the Web for public health surveillance. (http://goo.gl/fxwoJT)

3. Influenza forecasting with Google flu trends. (http://goo.gl/z7GZco)

Related work

1) Denecke, K., Krieck, M., Otrusina, L., Smrz, P., Dolog, P., Nejdl, W., & Velasco, E. (2013). How to exploit twitter for public health monitoring. Methods Inf Med, 52(4), 326-339.

2) Brownstein, J. S., Freifeld, C. C., & Madoff, L. C. (2009). Digital disease detection—harnessing the Web for public health surveillance. New England Journal of Medicine, 360(21), 2153-2157.

3) Dugas, A. F., Jalalpour, M., Gel, Y., Levin, S., Torcaso, F., Igusa, T., & Rothman, R. E. (2013). Influenza forecasting with Google flu trends. PloS one, 8(2), e56176.

Page 4: Surveillance of social media: Big data analytics

What is Social Media?Social media refers to the means of interactions among people in which they create, share, and/or exchange information and ideas in virtual communities and networks¹.

1 Tufts university, Boston, U.S.A.2 Social Media Examiner: 2014 Social Media Marketing Industry Report

Page 5: Surveillance of social media: Big data analytics

What is Twitter?

Some Twitter statistics• 1 billion users registered• 255 million users/month • 100 million users per day ³

1 Wikipedia2 PEW Research Centre: January 20143 DMR: March 20144 Twitter Terms of Service as of 24/7/14

Page 6: Surveillance of social media: Big data analytics

Overview• This is a proof of concept (POC)

• The POC is not yet used for surveillance or to monitor actual diseases

• This POC is an experimental application at ESR to understand the validity of the approach

Page 7: Surveillance of social media: Big data analytics

Method of data collection

Commercial / Government clients

Future workMachine Learning (ML), Artificial Intelligence (AI) etc.

Commercial / Government clients

Page 8: Surveillance of social media: Big data analytics

• There was a measles outbreak in 2014

• We extracted a subset of tweets for the period of Jan 2014 to Dec. 2015 containing the key word ‘measles’ from our twitter data mart

• We extracted the number of confirmed measles cases for the period of Jan 2014 to Dec. 2015 from a national New Zealand surveillance system (EpiSurv)

• We performed quantitative data analysis on both data sets

Study Design

Page 9: Surveillance of social media: Big data analytics

Results

Page 10: Surveillance of social media: Big data analytics

• Number of tweets collected for measles: 1408

• Single keyword-based data curation

• Usage of free Twitter API 1.1 (volume, timeliness)

Limitations

Page 11: Surveillance of social media: Big data analytics

Social Media (Twitter) – Visualisation / Front-End

Select keywordMeasles

Zoom into WLG

Basic stats by location

Measles Tweets

Current, active keywords

Page 12: Surveillance of social media: Big data analytics

• We believe that Social Media Data (SMD) is a relevant source of information

• Storage is potentially challenging (it has aspects of Big Data)

• Cleansing (it needs to be curated)• A mixed approach between machine automation and

human verification (i.e. clinician)

• Curated SMD will be the source for down-stream Analytics and early warning systems (syndromic surveillance)

Conclusion

Page 13: Surveillance of social media: Big data analytics

• Potentially use a Twitter data aggregator, or a paid Twitter API connection (higher volumes, better timeliness)

• Adding to the Linguistic Analytical Module applying:-• Machine Learning (sentiment analysis, linear regression analysis

etc.)• Prediction

• Evaluate the Deep Dive engine from Stanford University (http://deepdive.stanford.edu/)

• Develop ontology for syndromic keywords related to specific diseases (i.e. spots, rash, itching for measles)

Future work

Page 14: Surveillance of social media: Big data analytics

Confidence

Tweet is a real event.

100 %

0 %

Time

? %UnverifiedTwitter data

Verify withHealth Line data

Enrich Twitter data set with other, verified data (counts, location, time).

Verify withHealth Stats data

Verify with Lab Information Data

Verify with Sentinel data

Verify with National SurveillanceDatabase

PRONature scientific journal: There is a close correlation between the rates of doctor visits for flu symptoms, and the use of flu-like search terms. NZ Herald 23/7/14

CONResearchers from Harvard University state: Google Flu Tracker has overestimated for 100 of the 108 weeks starting from August, 2011 source: motherboard.vice.com

VISION - Big Data complements traditional methods Calibrate social media data with verified and trusted data to identify valid tweets

Risk Opportunity

• Social Media Data is validated and a trusted source of information

• Maybe used for indicative, early warnings of potential outbreaks?

Google ?? Flu Tracker

NZ

50 %

Page 15: Surveillance of social media: Big data analytics

Some Social Media exploration tools• Try below tools first and see what benefit they offer

(This is random selection and does not rate or recommend any of the tools in particular)

• PlusOne Social (http://plusonesocial.com )

• Microsoft Excel Twitter add-in (http://goo.gl/WBaXt5)

• Others• http://www.razorsocial.com/free-twitter-analytics/• http://www.socialmediaexaminer.com/6-twitter-analytics-tools/

Page 16: Surveillance of social media: Big data analytics

Presented by: Thomas Otto (Manager Business Intelligence)

Dr. Mehnaz Adnan (Senior Scientist Health Intelligence) Institute of Environmental Science & Research Ltd.

Credits:ESR – Dr. Mehnaz Adnan: Health Intelligence Analytics on Tweets ESR - Franco Andrews: SAP Data Integration, Modelling, Analytics and VisualisationESR IT: Infrastructure / FirewallSoltius NZ - Erik Roelofs: Connection Module and SAP Data Services

Syndromic Surveillance of Social Media - Big Data Analytics

© ESR 2015