applying geostatistical methods to lattice data: an initial examination of u.s. presidential...
TRANSCRIPT
![Page 1: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/1.jpg)
Applying Geostatistical Methods to Lattice Data: An Initial
Examination of U.S. Presidential Elections in Iowa
A.C. ThomasStatistics 225
December 14, 2004
![Page 2: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/2.jpg)
Sources/Guides
• Main source: “Hierarchical Models”, chapters 2 and 3 (geostatistical and spatial data)
• Data sources: http://www.sos.state.ia.us/elections/results/ (1996/2000)
• http://www.cnn.com/ (2004)• Special thanks: Brad Carlin (UMN),
Andy Gelman (Columbia), Paul Edlefsen (Harvard)
• GeoR: P.J. Ribeiro and P.J. Diggle
![Page 3: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/3.jpg)
Motivation
• In this course, we have learned about three different methods of examining spatial data (depending on relevant conditions) with some interchangeabilities
• Often, we may not have the tools to examine data sets using one method (i.e. the shortcomings of R in manipulating lattice data)
• In this case, we will compare and contrast the effectiveness of a geostatistical method used on lattice data to a lattice method through self cross-validation
![Page 4: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/4.jpg)
Interrelationship
• Geostats and kriging: using variograms and distance relationships to predict quantities across distances
• Lattices: using neighbour relationships to predict quantities across distances
• Direct similarities: some weighting schemes across distances directly resemble covariograms
![Page 5: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/5.jpg)
Why election data?
• Why not?• Spatial organization is well understood
and constant in time (county borders have not changed across data sets) and built into R (maps library)
• While specific challengers change over time, parties are relatively constant, as are other control variables
• Ramifications are germane to the functioning of society (and the insatiable appetite of news junkies and policy wonks)
![Page 6: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/6.jpg)
Questions:
• For this data set, does a geostatistical approximation produce a result comparable in error to a lattice model?
• If so, can we use fitted information from one election to predict the complete results of the next one? (And how much are we off?)
![Page 7: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/7.jpg)
Chosen model: Iowa
![Page 8: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/8.jpg)
Why Iowa?
• 99 counties which have roughly equal area, removing a possible nuisance (and are rectilinear, so easier to draw)
• Swing state, with a rough vote balance over time
• Not too big, not too small in either population or size
![Page 9: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/9.jpg)
Simplification: No third parties
• For now, considering only the votes for Democrat and Republican candidates in presidential elections from 1996-2004
• Not so bad in 2000/2004, when independent vote was about 3% of total
• Worse in 1996 (Perot’s successful campaign drew a lot), up to 10% of total votes
![Page 10: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/10.jpg)
Iowa in 1996 (Dole, Clinton)
![Page 11: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/11.jpg)
Iowa in 2000 (Bush, Gore)
![Page 12: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/12.jpg)
Iowa in 2004 (Bush, Kerry)
![Page 13: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/13.jpg)
Initial impressions
• There seems to be a tendency to vote more Republican the further west we look
• (Observation, courtesy Matt Anthony: as we go east, we hit Illinois, a Democratic core.)
• What is the population distribution by county over time?
![Page 14: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/14.jpg)
Iowa’s total voters, 1996
![Page 15: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/15.jpg)
Iowa’s total voters, 2000
![Page 16: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/16.jpg)
Iowa’s total voters, 2004
![Page 17: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/17.jpg)
Quick-and-dirty non-spatial analysis
• Question: how does population size correlate with the Democratic vote?
• Correlation between blue vote and “total” vote:
• 1996: = 0.18• 2000: = 0.30• 2004: = 0.29.• So population would appear to be
an important covariate.
![Page 18: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/18.jpg)
Geostatistical analysis
• Locations: centroids of each county (obtained through centroid.polygon function in maps library of R)
• Data: Republican percentage of vote (arbitrarily chosen, not necessarily personal political affiliation)
![Page 19: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/19.jpg)
Initial data plots: Unaltered
![Page 20: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/20.jpg)
Initial fitting
• Semivariogram appears to increase without bound, suggesting nonstationarity
• Plan: use Universal Kriging with this semivariogram
• Problem: Trend appears to be power law, with power greater than 2 (impossible to fit with conventional definitions
• Possible solutions: a) remove trend from data. b) don’t care.
![Page 21: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/21.jpg)
Plan A: Remove trend from data
• What it does: lets us remove known spatial dependence, look at other trends
• Initial look: – major discrepancies.
![Page 22: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/22.jpg)
Plan B: Don’t care.
• The goodness of fit only tails off at the end
• Preliminary results show the other option to be extremely inaccurate due to noise levels in residual data
![Page 23: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/23.jpg)
Second trend removed, data centered
![Page 24: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/24.jpg)
Exploratory Kriging
![Page 25: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/25.jpg)
Meaningful Kriging
• Since we want to test the predictive power of this method, we should test it on our current data through cross-validation
• Key: remove one point, use semivariogram with remaining points to interpolate the value at each centroid
• Then, return trend to data and compare with original values
• Use universal kriging with second-degree trend
![Page 26: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/26.jpg)
1996 Redux – Predicted Values
• In total, Dole “receives” 9,726 more votes than predicted.
• Absolute error: 43,526
• Total 2-party votes: 1,112,902
![Page 27: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/27.jpg)
Fitting variograms between models
• For all, power model was appropriate choice ^2 + ^2 * t^
• 1996: ^2 = 9.24e-4, =1.98, ^2=0.031• 2000: ^2 = 9.93e-4, =2.00, ^2=0• 2004: ^2 = 1.16e-3, =2.00, ^2=0.025• All roughly identical, even with different
total averages
![Page 28: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/28.jpg)
2000 Predicted
• Prediction: Bush gets 26,000 more votes
• Absolute error: 181,880
• Total Bush/Gore votes: 1,272,890
![Page 29: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/29.jpg)
2004 Prediction
• Prediction: Bush gets 32,094 more votes
• Absolute difference: 74,458
• Total votes: 1,479,702
![Page 30: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/30.jpg)
“Naïve Neighbour”
• For a baseline comparison, take the simplest (stupidest) lattice cross-validation test – “ask your neighbour”, trivial SAR weights
• Predicted value at a square is simply the mean of border-sharing neighbours (data is Republican percentage of vote)
![Page 31: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/31.jpg)
“NN” 1996
• Dole: 10,819 more predicted
• Total deviation: 40,923
![Page 32: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/32.jpg)
“NN” 2000
• Bush gets 28,535 extra in prediction
• Total deviation: 59,670
![Page 33: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/33.jpg)
“NN” 2004
• Bush gets 37,175 more
• Total deviation: 76,926
![Page 34: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/34.jpg)
Cross-validation summary
Geostat error
NN error
Geostat total error
NN total error
Voting pop.
1996
9,726 10,819 43,526 40,923 1,112,902
2000
26,000
28,535 61,485 59,670 1,272,890
2004
32,094
37,175 74,458 76,926 1,479,702
![Page 35: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/35.jpg)
Conclusions
• Data is definitely not stationary, even after removing trends
• Good kriging is about as effective as “naïve neighbour”, both without covariates
• Prediction with these tools at this simple level is not yet accurate enough
• Each method overpredicts the Republican vote
• Fitting information for each year is very close
![Page 36: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,](https://reader036.vdocument.in/reader036/viewer/2022062713/56649cc15503460f949882ea/html5/thumbnails/36.jpg)
Future Developments and Unanswered Questions – New!
• I’ve since introduced universal co-kriging with population, past voting behavior and second-degree spatial dependences using the gstat package.
• Needed: data from the last 4 elections, conveniently packaged. Other prediction using spatial methods.