lag models with social response outcomes · 2020-02-02 · social media data can be matched to...

25
Michigan SAS Conference June 7, 2017 Lag Models with Social Response Outcomes David J. Corliss, PhD Director, Peace-Work

Upload: others

Post on 13-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

Michigan SAS ConferenceJune 7, 2017

Lag Models with Social Response Outcomes

David J. Corliss, PhD

Director, Peace-Work

Page 2: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

OUTLINE

Lag Models

Social Media Data

Linking Solar Storms & Tweets

Hate Tweets and ViolenceGoogle Trends

Summary

Page 3: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

Lag Models – An Example

Page 4: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

Creating Lagged Variables%macro lag(var1,var2,lag_n);data lag_data;

set source_data;

x1_lag0 = &var1;x2_lag0 = &var2;

%do i = 1 %to &lag_n; %let j = %eval(&i - 1);x1_lag&i = lag(x1_lag&j);x2_lag&i = lag(x2_lag&j);

%end;run;%mend;

%lag(gdp,unemployment,3);

Page 5: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

A Model With Lagged Variables

proc reg data=ht_data;model count = count_tminus1 forecl_change_rate

PCI rtw_state Poverty_pct_tminus2 ;run;

Page 6: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

Social Media Data

Different Social Media channels have unique characteristics

Twitter: Short duration and highly reactive

Facebook: medium term, extended conversation threads

Google Trends: May be more predictive. The Social Media equivalent of Durable Goods.

YouTube – the #2 Social Media channel - and #3 Instagram: –no effective means to mine images at this time

Page 7: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

Data Source: Geomagnetic World Data Center / Kyoto

Solar StormDST < (μ - 2σ) = -38.2

Calm

Solar Eruptions and Twitter Data

Page 8: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

#techfail: Tweets & Astrophysics

Solar Storms(DST < 2σ below long-term baseline)

#techfail Count(Tweets)

Page 9: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

#techfail: Tweets & Astrophysics

Time Window: 2 to 4 days following a storm

16.0% of all days follow a solar storm

Top #techfail days: 57.9% follow a solar storm

Burst of #techfail tweets 3.6 times more likely

Statistically significant at > 99% confidence

Page 10: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

Mining Twitter Feeds: A Process

1. Register with Twitter as a developer to get access

2. API to deploy search terms and receive tweets

3. Parse plain text tweet stream into a SAS dataset

4. Exploratory Data Analysis to find best search terms

5. High-volume search using results from #4

Page 11: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

1. Register With Twitter

https://dev.twitter.com/oauth/overview

Page 12: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

2. Twitter API: Obtain Tokens

1. consumerKey: Twitter User ID

2. consumerSecret: Twitter login password

3. Bearer Token: Used in SAS code to access tweets

Page 13: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

2. Twitter API: SAS Code/* Code from Isabel Litton and Rebecca Ottesen, California Polytechnic State University. Paper: "%GrabTweet: A SAS® Macro to Read JSON Formatted Tweets", WUSS 2013. Modified by D. Corliss 2017 to parameterize TYPE */

%MACRO grab_tweet_100(search_term, type, target_amount);

/* Location of Token File */filename auth "C:\dcorliss\SAS\Twitter\token.txt";

/* Location of Output File */filename twtOut "C:\dcorliss\SAS\Twitter\Tweets.txt";

/* Set search parameters: COUNT = and TYPE = Popular, Recent, or Mixed */

%IF &target_amount < 100 %THEN %LET num_tweet = %NRSTR(&count=)&target_amount;

%ELSE %LET num_tweet = %NRSTR(&count=100);%LET type = %NRSTR(&result_type=&type);

Page 14: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

Raw Twitter Text Output:

3. Parse Plain Text Tweet Stream

Page 15: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

3. Parse Plain Text Tweet Stream/* Code from Isabel Litton and Rebecca Ottesen, California Polytechnic State University. Paper: "%GrabTweet: A SAS® Macro to Read JSON Formatted Tweets", WUSS 2013 */

INPUT @'"created_at":' created_at1@'"id":' tweet_id@'"text":' text@'"user":{"id":' user_id@'"name":' name@'"screen_name":' screen_name@'"location":' location@'"created_at":' created_at2@'"lang":' lang@'"contributors":null,' retweeted @@;

Page 16: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

4. Exploring Search Terms**** Mining Twitter for Hate Speech ****;

%grab_tweet_100(Sharia America hate, recent, 100);

Page 17: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

5. High-Volume Search

1. Litton-Ottesen macro to access Twitter-imposed

limit of 100 tweets at a time

2. Macro wrapper to repeat until specified count,

going backwards from end of previous iteration

3. Data transformation required for time series

dataset

Page 18: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

Analytic Results

Page 19: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

Analytic Results

Page 20: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

Analytic Results

Page 21: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

Analytic Results

Page 22: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

Analytic Results

Olathe, KansasAttack

2/22/2017

Page 23: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

Google Trends: A Leading Indicator?

Google TrendsSearch Term: Sharia

Page 24: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

Summary

Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling

Different social channels have different characteristics, including lead or lag times

Analytics methods optimized for big data methods may be necessary

Twitter data can be mined and analyzed at high volume

Analysis of Hate Crimes with social media data shows promise but better crime data is needed.

Page 25: Lag Models with Social Response Outcomes · 2020-02-02 · Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling Different social

Questions

David J. Corliss

[email protected]

www.peace-work.org