lag models with social response outcomes · 2020-02-02 · social media data can be matched to...
TRANSCRIPT
Michigan SAS ConferenceJune 7, 2017
Lag Models with Social Response Outcomes
David J. Corliss, PhD
Director, Peace-Work
OUTLINE
Lag Models
Social Media Data
Linking Solar Storms & Tweets
Hate Tweets and ViolenceGoogle Trends
Summary
Lag Models – An Example
Creating Lagged Variables%macro lag(var1,var2,lag_n);data lag_data;
set source_data;
x1_lag0 = &var1;x2_lag0 = &var2;
%do i = 1 %to &lag_n; %let j = %eval(&i - 1);x1_lag&i = lag(x1_lag&j);x2_lag&i = lag(x2_lag&j);
%end;run;%mend;
%lag(gdp,unemployment,3);
A Model With Lagged Variables
proc reg data=ht_data;model count = count_tminus1 forecl_change_rate
PCI rtw_state Poverty_pct_tminus2 ;run;
Social Media Data
Different Social Media channels have unique characteristics
Twitter: Short duration and highly reactive
Facebook: medium term, extended conversation threads
Google Trends: May be more predictive. The Social Media equivalent of Durable Goods.
YouTube – the #2 Social Media channel - and #3 Instagram: –no effective means to mine images at this time
Data Source: Geomagnetic World Data Center / Kyoto
Solar StormDST < (μ - 2σ) = -38.2
Calm
Solar Eruptions and Twitter Data
#techfail: Tweets & Astrophysics
Solar Storms(DST < 2σ below long-term baseline)
#techfail Count(Tweets)
#techfail: Tweets & Astrophysics
Time Window: 2 to 4 days following a storm
16.0% of all days follow a solar storm
Top #techfail days: 57.9% follow a solar storm
Burst of #techfail tweets 3.6 times more likely
Statistically significant at > 99% confidence
Mining Twitter Feeds: A Process
1. Register with Twitter as a developer to get access
2. API to deploy search terms and receive tweets
3. Parse plain text tweet stream into a SAS dataset
4. Exploratory Data Analysis to find best search terms
5. High-volume search using results from #4
1. Register With Twitter
https://dev.twitter.com/oauth/overview
2. Twitter API: Obtain Tokens
1. consumerKey: Twitter User ID
2. consumerSecret: Twitter login password
3. Bearer Token: Used in SAS code to access tweets
2. Twitter API: SAS Code/* Code from Isabel Litton and Rebecca Ottesen, California Polytechnic State University. Paper: "%GrabTweet: A SAS® Macro to Read JSON Formatted Tweets", WUSS 2013. Modified by D. Corliss 2017 to parameterize TYPE */
%MACRO grab_tweet_100(search_term, type, target_amount);
/* Location of Token File */filename auth "C:\dcorliss\SAS\Twitter\token.txt";
/* Location of Output File */filename twtOut "C:\dcorliss\SAS\Twitter\Tweets.txt";
/* Set search parameters: COUNT = and TYPE = Popular, Recent, or Mixed */
%IF &target_amount < 100 %THEN %LET num_tweet = %NRSTR(&count=)&target_amount;
%ELSE %LET num_tweet = %NRSTR(&count=100);%LET type = %NRSTR(&result_type=&type);
Raw Twitter Text Output:
3. Parse Plain Text Tweet Stream
3. Parse Plain Text Tweet Stream/* Code from Isabel Litton and Rebecca Ottesen, California Polytechnic State University. Paper: "%GrabTweet: A SAS® Macro to Read JSON Formatted Tweets", WUSS 2013 */
INPUT @'"created_at":' created_at1@'"id":' tweet_id@'"text":' text@'"user":{"id":' user_id@'"name":' name@'"screen_name":' screen_name@'"location":' location@'"created_at":' created_at2@'"lang":' lang@'"contributors":null,' retweeted @@;
4. Exploring Search Terms**** Mining Twitter for Hate Speech ****;
%grab_tweet_100(Sharia America hate, recent, 100);
5. High-Volume Search
1. Litton-Ottesen macro to access Twitter-imposed
limit of 100 tweets at a time
2. Macro wrapper to repeat until specified count,
going backwards from end of previous iteration
3. Data transformation required for time series
dataset
Analytic Results
Analytic Results
Analytic Results
Analytic Results
Analytic Results
Olathe, KansasAttack
2/22/2017
Google Trends: A Leading Indicator?
Google TrendsSearch Term: Sharia
Summary
Social media data can be matched to event data for Time Series Analysis, including lead / lag modeling
Different social channels have different characteristics, including lead or lag times
Analytics methods optimized for big data methods may be necessary
Twitter data can be mined and analyzed at high volume
Analysis of Hate Crimes with social media data shows promise but better crime data is needed.