some things to talk about social and political polarization a cool dynamic network simulation (which...

Some things to talk about• Social and political polarization• A cool dynamic network simulation (which

we haven’t done yet)• Statistical cutoffs and p-values (work of

Wald, Berger, …)• Survey weighting and poststratification

Andrew GelmanDepartments of Statistics and Political Science,

Columbia University7 Feb 2009

Also: Tian Zheng, Thomas DiPrete, Julien Teitler, Jiehua Chen,Tyler McCormick, Rozlyn Redd, Juli Simon Thomas, Delia Baldassarri, David Park, Yu-Sung Su, Matt Salganik, Duncan Watts, Sharad Goel

Studying social and political polarization

Studying social and political polarization• Questions from sociology• Questions from political science• Sources of data• Statistical challenges

Questions from sociology• The “degree distribution”• Characteristics of “the social network”• Homophily• Quantifying segregation• Knowing and trusting

Questions from political science• Polarization of Democrats and Republicans• Polarization of political discourse• How are people swayed by news media,

talk radio, each other, …• Geographic polarization• Polarization and the perception of

polarization

Sources of data• Complete data on small social networks

(schools, monks, …)• Very sparse data on large social networks

(Framingham, …)• Complete data on other networks (scientific

coauthors, …)• Other network datasets (email, Facebook, …)• From random sample surveys• Questions about close contacts (GSS 1985/2004,

NES 2000)• Questions about acquaintances (“How many X’s

do you know?”)

Statistical challenges: Misconceptions of others• Examples• Name• Disease status• Sexual preference• Political leanings• Challenge/opportunity: attributed and perceived attributes

• Appearance vs. reality• How large is the “footprint” of a group?

Statistical challenges: Learning about small and large groups• 1500 respondents x 750 acquaintances = 1

million• Potential to learn about small groups• Potential to learn about people you can’t interview

• Difficulty with large groups• For example, “How many Democrats do you know”• #known is too high to quickly estimate• Potential solution: look at subnetworks

• “Cube model” (individuals x groups x subnetworks)• Need main effects and two-way interactions

Statistical challenges: Network structure• Social network is patterned• Sex, age, ethnicity, SES, location• Names, occupations, attitudes

• Correct for non-uniform patterns by using a mix of names

• Estimate non-uniform patterns using a conditional probability matrix for ages

• Overdispersion to model unexplained variation

• Can’t do much with triangles, 4-cycles, etc.

Statistical challenges: Recall bias• Some people are easier to recall than

others• David, Olga, Sharad• For some sets of names, can be quantified:

Nicole/Christine/Michael• Sliding definitions

• Who are your friends?• Estimates of average #known range from

300 to 750 to …• Estimates of average #trusted range from

1.5 to 15 to 150

Statistical challenges: Returning to the social science questions• Polarization as political segregation in the

social network• Comparing polarization to perceived

polarization• Answering conjectures such as: People in

big cities know more people but trust fewer people

• Getting geography back in the picture



Forming Voting Blocs and Coalitions as a

Prisoner's Dilemma: A Possible Theoretical

Explanation for Political Instability

Dynamic network model for political coalitionsMathematics of coalitions

Forming a coalition helps the subgroup (or they wouldn’t do it)

But it hurts the general population (negative externality)Coalitions are inherently unstable

Coalitions of coalitionsOpportunistic acts of secession, poaching, and

dissolutionThe simulation I want to do:

Set up a political settings: “agents” with attributes and locations

Payoff function for agentsLocally optimal movesSchedulingImplementation



Statistical cutoffs and p-values

Setting a cutoff for selecting patterns for further studyOld problem in statistics: Neyman, Wald, Berger, …Also of interest to biologists!Some different goals:

Finding patterns that are “statistically significant”Classifying into those to study further, and those to set

asideMathematical framework: distribution of a “score”Solution depends upon:

Distribution of the score among “uninteresting” casesDistribution of the score among “interesting” casesNumber of uninteresting and interesting casesCost of follow-up of uninteresting casesCost of follow-up of interesting cases



Survey weighting and poststratification

Survey weighting and poststrafication

General framework for adjusting for differences between sample and population

Population estimate = avg over poststratification cells

You might have to model:The survey responseSize of poststratification cellsProbabilities of selection

Respondent-driven sampling example:Cells determined by “gregariousness” and

“distance”Could approx correlations using clustering

some things to talk about social and political polarization a cool dynamic network simulation (which...

Documents

political polarization

trusting slide

poststratification slide

picture slide

political segregation

sociology questions

large groups

statistical cutoffs