from%twi)er%apito%social% science%paper%

22
From Twi)er API to Social Science Paper Presenta6on for the ICOS Big Data Boot Camp Todd Schifeling 5/22/14

Upload: others

Post on 30-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From%Twi)er%APIto%Social% Science%Paper%

From  Twi)er  API  to  Social  Science  Paper  

Presenta6on  for  the  ICOS  Big  Data  Boot  Camp  

Todd  Schifeling  5/22/14  

 

Page 2: From%Twi)er%APIto%Social% Science%Paper%

Outline  

I.  Collec6ng  Twi)er  Data  with  a  Snowball  

II.  Mo6va6on  for  Collec6ng  the  Data  i.  Big  Data-­‐Social  Science  Divide  ii.  Possible  Solu6ons  

Page 3: From%Twi)er%APIto%Social% Science%Paper%

Snowballing  Twi)er  Data  

Procedure:  •  star6ng  point  •  network  search  •  selec6on  principle  

NOTES  ON  SNOWBALLING  TWITTER  DATA  

Page 4: From%Twi)er%APIto%Social% Science%Paper%

Snowballing  Twi)er  Data  

Procedure:  •  star6ng  point:  Scratchtruck  •  network  search:  friends  •  selec6on  principle:  self-­‐descrip3on  matches  2  dic3onaries  

NOTES  ON  SNOWBALLING  TWITTER  DATA  

Page 5: From%Twi)er%APIto%Social% Science%Paper%

Twi)er  Data  Calls  

•  friends.ids  returns  friendship  6es  (from,  to)  – 5000  per  call  at    one  minute  per  call  =  5000  friendship  6es  per  minute  (but  only  one  user  per  minute)  

•  users.lookup  returns  user  info  (name,  descrip6on,  loca6on,  last  tweet,  etc.)  – 100  per  call  at  six  seconds  per  call  =  1000  users  per  minute  

  more  info  at  h)ps://dev.twi)er.com/docs/api/1.1    

NOTES  ON  SNOWBALLING  TWITTER  DATA  

Page 6: From%Twi)er%APIto%Social% Science%Paper%

Snowballing  Twi)er  Data  

Results:    

NOTES  ON  SNOWBALLING  TWITTER  DATA  

Steps   Time   Possible   Already  Done   Selected   Collected   Friends  

1   1  min   1   0   1   1   3002  

2   1  hr  42  mins   3002   0   91   88   106769  

3  3  dys  4  hrs  24  mins  

67764   2383   4359   4324   2511143  

Page 7: From%Twi)er%APIto%Social% Science%Paper%

Workflow  for  Food  Trucks  Paper  

•  Get  Twi)er  data  on  possible  trucks  •  Iden6fy  trucks  •  Get  idiosyncra6c  trucks  from  Twi)er  via  in-­‐degree  

•  Match  trucks  to  ci6es  •  Get  addi6onal  data  (demographics,  chains,  microbreweries,  weather,  etc.)  

•  Regressions!  Co-­‐author:  Daphne  Demetry,  Northwestern  University  

NOTES  ON  SNOWBALLING  TWITTER  DATA  

Page 8: From%Twi)er%APIto%Social% Science%Paper%

Now  We’re  Doing  Social  Science!  NOTES  ON  SNOWBALLING  TWITTER  DATA  

Page 9: From%Twi)er%APIto%Social% Science%Paper%

But  Why  Collect  Twi)er  Data  on  Gourmet  Food  trucks?  

Page 10: From%Twi)er%APIto%Social% Science%Paper%

How  Well  Do  They  Mesh?  SURVEYING  THE  DIVIDE  

Social  Science   Big  Data  

Measurement   fidelity   large  unobtrusive  N  

Page 11: From%Twi)er%APIto%Social% Science%Paper%

How  Well  Do  They  Mesh?  SURVEYING  THE  DIVIDE  

Social  Science   Big  Data  

Measurement   fidelity   IDEAL   large  unobtrusive  N  

Page 12: From%Twi)er%APIto%Social% Science%Paper%

How  Well  Do  They  Mesh?  SURVEYING  THE  DIVIDE  

Social  Science   Big  Data  

Measurement   fidelity   IDEAL   large  unobtrusive  N  

Sampling   random   digital  breadcrumbs  

Page 13: From%Twi)er%APIto%Social% Science%Paper%

How  Well  Do  They  Mesh?  SURVEYING  THE  DIVIDE  

Social  Science   Big  Data  

Measurement   fidelity   IDEAL   large  unobtrusive  N  

Sampling   random   CHASM   digital  breadcrumbs  

Page 14: From%Twi)er%APIto%Social% Science%Paper%

How  Well  Do  They  Mesh?  SURVEYING  THE  DIVIDE  

Social  Science   Big  Data  

Measurement   fidelity   IDEAL   large  unobtrusive  N  

Sampling   random   CHASM   digital  breadcrumbs  

Causality   realism   descrip6on  

Page 15: From%Twi)er%APIto%Social% Science%Paper%

How  Well  Do  They  Mesh?  SURVEYING  THE  DIVIDE  

Social  Science   Big  Data  

Measurement   fidelity   IDEAL   large  unobtrusive  N  

Sampling   random   CHASM   digital  breadcrumbs  

Causality   realism   CHASM   descrip6on  

Page 16: From%Twi)er%APIto%Social% Science%Paper%

The  Fallout  SURVEYING  THE  DIVIDE  

Page 17: From%Twi)er%APIto%Social% Science%Paper%

A  Possible  Way  Forward  

Iden6fy  popula6ons  that  simultaneously  inhabit  both  offline  and  online  worlds…          …which  links  sampling  frames  to  available  breadcrumbs,  and  ‘real’  to  digital  phenomena  

POSSIBLE  SOLUTIONS  

Page 18: From%Twi)er%APIto%Social% Science%Paper%

A  Typology  of  Examples  that  Cross  the  Offline/Online  Divide  

1.  Offline  ac6vi6es  that  are  more  common  online  or  are  difficult  to  observe  offline:    –  rare  or  deviant  subcultures  – bullying,  decep6on,  and  other  bad  behaviors  

POSSIBLE  SOLUTIONS  

Page 19: From%Twi)er%APIto%Social% Science%Paper%

A  Typology  of  Examples  that  Cross  the  Offline/Online  Divide  

2.  Offline  ac6vi6es  with  a  significant  online  share:  – da6ng  markets  –  reviews  of  restaurants,  books,  movies,  consumer  goods,  etc.  

– neighborhood  ac6vism  

POSSIBLE  SOLUTIONS  

Page 20: From%Twi)er%APIto%Social% Science%Paper%

A  Typology  of  Examples  that  Cross  the  Offline/Online  Divide  

3.  Offline  ac6vi6es  that  are  also  born  online:    – crowdsourcing  projects    – modern  poli6cal  ads  – start-­‐ups  

POSSIBLE  SOLUTIONS  

Page 21: From%Twi)er%APIto%Social% Science%Paper%

Why  the  Case  of  Gourmet  Food  Trucks  Bridges  Offline  and  Online  

•  A  new  organiza6onal  form  

•  Twi)er  is  crucial  to  the  opera6ons  of  the  trucks  

•  Golden  breadcrumbs  get  lem  behind  

POSSIBLE  SOLUTIONS  

Page 22: From%Twi)er%APIto%Social% Science%Paper%

Comparison  of  Twi)er  Data  to  Standard  Organiza6onal  Data  

•  Advantages:  user-­‐generated  data,  unfiltered  by  media6ng  data  collector,  digital  breadcrumbs  tracks  organiza6onal  ac6vity,  rela6onal  data  

•  Disadvantages:  less  systema6c  comparison  across  organiza6ons,  have  to  clean  and  validate  data  yourself  

POSSIBLE  SOLUTIONS