aapor - comparing found data from social media and made data from surveys
DESCRIPTION
This presentation was for the 2014 AAPOR conference, and deals with specific components of how "big data" from social media is different from data acquired through surveys.TRANSCRIPT
![Page 1: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/1.jpg)
"When Are Big Data Methods Trustworthy for Social Measurement?"
Cliff Lampe (@clifflampe), Josh Pasek, Lauren Guggenheim, Fred ConradUniversity of Michigan
Michael SchoberThe New School for Social Research
![Page 2: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/2.jpg)
Presenting on “Big Data”
• Cliff Lampe– University of Michigan
School of Information– Social Scientist who uses
some Big Data techniques
– NOT A REAL DATA SCIENTIST
– Background in survey research
![Page 3: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/3.jpg)
Mostly publish in Computer Science conferences
![Page 4: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/4.jpg)
CHI – Computer Human InteractionKDD – Knowledge Discovery and Data MiningWSDM – Web Search and Data Mining
![Page 5: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/5.jpg)
Ironically Data-Free Presentation
Today we are presenting on methodological issues of Big Social Data and surveys. Not presenting new data.
First we describe Big Data and Big Social Data as terms.
Then we describe methodological considerations at the intersection of surveys and Big Social Data
![Page 6: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/6.jpg)
There have been many hyperbolic claims about Big Data
Is Big Data going to replace other forms of social measurement, or is it too flawed to survive (HINT: Neither)
![Page 7: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/7.jpg)
What is Big Data?
![Page 8: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/8.jpg)
Big Data started in the physical sciences
![Page 9: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/9.jpg)
Big Data is increasingly being applied to social science questions
![Page 10: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/10.jpg)
What counts as “big”?
LHC: .001% of sensors lead to 25 petabytes annually.Wikipedia: 17 terabytesTwitter: ~ 10 GB/day
How many observations needed to count as “big”?
Note: 100 million records not all that big.
![Page 11: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/11.jpg)
Almost nobody who uses these techniques would use the term “big data”. Similar to surveys vs. polls.
Big Data is short hand for a variety of techniques that include:
- Data capture- Data storage- Data analytics- Search and Retrieval
![Page 12: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/12.jpg)
Challenges in “Big Data”
CaptureCurationStorageSearchSharingTransferAnalysisVisualization
Related terms:
Computational social science, data science, information access and retrieval, Web-scale data, data mining, machine learning, non-reactive data
![Page 13: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/13.jpg)
Big Social Data: large data sets about humans that are collected from social interactions captured online, primarily in social media sites.
![Page 14: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/14.jpg)
What are the characteristics of surveys and Big Social Data that define when they are complementary,
supplementary, or orthogonal?
![Page 15: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/15.jpg)
Bob Groves“Three Eras of Survey Research”
Mick Couper“Is the Sky Falling? New Technology, Changing Media, and the Future of Surveys”
![Page 16: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/16.jpg)
Survey Research80+ years of research and practice
Sampling proceduresQuestion designEstimating precision of statisticsPractices in reducing survey error
Attempt to represent the population of interest with a sample
![Page 17: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/17.jpg)
Research Questions
• Do we see big social data and survey data telling us the same things about society? When and why might this happen?
• How do survey data and big social data compare on important dimensions?
• In what ways are the two fundamentally different from each other?
• How are their uses different from one another?
![Page 18: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/18.jpg)
Highlighting 3 Areas of Concern
How participants understand the activity of responding or posting
Different motivations and communicative dynamics
Nature of the dataDifferent structure, users, and data properties
Practical, ethical, and analytic considerations
![Page 19: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/19.jpg)
Participants Understanding
![Page 20: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/20.jpg)
Participants’ Understanding
– Posting initiative or motivation– Informed consent– Ability to opt out– Prior considerations– User identity– Perceived audience and social desirability– Time pressure/synchrony– Respondent burden
![Page 21: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/21.jpg)
Participants’ Understanding
• Nature of perceived audience– Survey: Interviewer, Organization, others in HH– BSD: Groups of friends, acquaintances, public
• Social Desirability– Survey: Avoid negative evaluations from researcher– BSD: Manage impressions for their audience
• Scale of data• Face threatening topics
![Page 22: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/22.jpg)
Participants’ Understanding
• Identity of user– Survey: Kept anonymous– BSD: User-created persona. Multiple users on a
single account, multiple accounts for one user, corporate users, etc.
• Prior Considerations– Survey: May not have thought about issue– BSD: Have thought about it, maybe not deeply
• Being asked vs caring to post
![Page 23: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/23.jpg)
Nature of the Data
![Page 24: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/24.jpg)
Nature of the Data
– Population coverage– Sampled units– Sampling– Sample size– Temporal properties– Relevance to research topic– Granularity of possible analyses– Data structure– Auxiliary information
![Page 25: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/25.jpg)
Nature of the Data
• Sampling– Surveys: Representative of population of interest (via probability
sampling)– BSD: Users/messages not the full population. User accounts are not
always users. Frequency of posting among users varies
• Sample Size– Surveys: Balance between large enough to make inference and low
cost– BSD: More users and posts than surveys. Limited by access/storage.
• Can size help overcome sampling/representativeness problems?• The aggregation of SM does not necessarily map on to collection of
individual users in survey research
![Page 26: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/26.jpg)
Nature of the Data
• Temporal properties:– Surveys: Memory retrieval, measurement at
discrete moments– BSD: Posting on recent events, continuously
• Auxiliary data:– Surveys: Paradata (# calls, behavior during
interview)– BSD: Geolocation, system activity, profile info
![Page 27: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/27.jpg)
Practical, Ethical and Analytic Considerations
![Page 28: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/28.jpg)
Practical, Ethical, and Analytic Considerations
– Established research communities– Consent to research/IRB– Perception of research among public– Costs to researchers– Data ownership– Adjustments for non-representativeness– Stability of data source and adjustments– Updating models in changing environment– Users and impact
![Page 29: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/29.jpg)
Practical Considerations
• Adjustments for non-representativeness– Surveys: Well developed, weighting– BSD: No standard use, depends on style of analysis,
may not be done if using certain techniques
• Ethical issues– Surveys: Explicit consent, regulated by govm’t/IRB– BSD: Unaware of terms in user agreement,
inconsistently regulated by IRBs
![Page 30: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/30.jpg)
Practical Considerations
• Perception of research/Legitimacy– Surveys: fatigue, falling response rates, confusion
about legitimacy– BSD: not considered while posting, but concerns
over surveillance
![Page 31: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/31.jpg)
YOU’RE SLOW AND EXPENSIVE!
YOU AREN’T REPRESENTATIVE!
![Page 32: AAPOR - comparing found data from social media and made data from surveys](https://reader037.vdocument.in/reader037/viewer/2022110302/54953203b47959fa0a8b4603/html5/thumbnails/32.jpg)
Conclusion
We need to stop arguing about the wrong things.
We need a systematic agenda of research looking at the intersection of these [email protected]
[email protected]: @clifflampe