1 statistical bases for map reconstructions and comparisons jerry platt may 2005
TRANSCRIPT
1
Statistical Bases for Map Reconstructions and Comparisons
Jerry Platt
May 2005
2
Preliminaries
3
Outline• Motivation
– Do Different Maps “Differ”?
• Methods
– Singular-Value Decomposition
Multidimensional Scaling and PCA
Mantel Permutation TestProcrustean Fit and Permu. TestBidimensional Regression
• Working Example– Locational Attributes of Eight URSB Campuses
4
Motivation• Comparing Maps Over Time
Accuracy of a 14th Century MapLeader Image Change in Great BritainWhere IS Wall Street, post-9/11?
• Comparing Maps Among Sub-samplesThings People Fear, M v. F Face-to-Face Comparisons
• Comparing Maps Across AttributesCompetitive Positioning of FirmsChinese Provinces & Human Dev. Indices
5
Accuracy of a 14th Century Map
http://www.geog.ucsb.edu/~tobler/publications/pdf_docs/geog_analysis/Bi_Dim_Reg.pdf
6http://www.mori.com/pubinfo/rmw/two-triangulation-models.pdf
7http://igeographer.lib.indstate.edu/pohl.pdf
8
http://www.analytictech.com/borgatti/papers/borgatti%2002%20-%20A%20statistical%20method%20for%20comparing.pdf
Things People Fear, F v. M
9http://www.multid.se/references/Chem%20Intell%20Lab%20Syst%2072,%20123%20(2004).pdf
Face-to-Face Comparisons
10http://www.gsoresearch.com/page2/map.htm
11
12
MethodsEigen-Analysis and Singular-Value Decomposition
Multidimensional Scaling & Principal Comps.
Mantel Permutation Test
Procrustean Fit and Permutation Test
Bidimensional Regression
13
Eigen-analysis
• C = an NxN variance-covariance matrix
• Find the N solutions to C = = the N Eigenvalues, with 1 ≥ 2 ≥ …
= the N associated Eigenvectors
• C = LDL’, where
L = matrix of s
D = diagonal matrix of s
14
Singular Value Decomposition
• Every NxP matrix A has a SVD
• A = U D V’
• Columns of U = Eigenvectors of AA’
• Entries in Diagonal Matrix D = Singular Values
= SQRT of Eigenvalues of either AA’ or A’A
• Columns of V = Eigenvectors of A’A
15
SVD
16
Principal Component Analysis
• A is a column-centered data matrix
• A = U D V’
• V’ = Row-wise Principal Components
• D ~ Proportional to variance explained
• UD = Principal Component Scores
• DV’ = Principle Axes
17
Multidimensional Scaling• A is a column-centered dissimilarity matrix
• B =
• B = U D V’
• B = XX’, where X = UD1/2
• Limit X to 2 Columns Coordinates to 2d MDS
'
1'
1
2
1 2 iiN
IAiiN
I
18
A RandomPermutation
Test
Given DissimilarityMatrices A and B:
N! Permutations37! = 1.4*E+43 8! = 40,320
19
Permutation Tests
PermuteList & rerun
ObservedTestStatisticTS = 25# CorrectOf 37 SB.
Is 25Significantly> 18.5?
Ho: TS = 18.5HA: TS > 18.5
P = .069P > .05Do NotReject Ho
20
21http://www.entrenet.com/~groedmed/greekm/mythproc.html
22
http://www.zoo.utoronto.ca/jackson/pro2.html
Centering &Scaling
MirrorReflection
Rotation &Dilation toMin ∑(є2)
23
Procrustean Analysis
• Two NxP data configurations, X and Y• X’Y = U D V’• H = UV• OLS Min SSE = tr ∑(XH-Y)’(XH-Y)
= tr(XX’) + tr(YY’) -2tr(D)
= tr(XX’) + tr(YY’) – 2tr(VDV’)
24
OLS Regression
• Y = X + • Y = Xb + e• X = UDV’• b = VrD-1Ur’Y, where r = first r columns (N>P)
• b = (X’X)-1X’Y
• b = VrVr’ • Estimated Y values = Ur Ur’Y
25
Bidimensional Regression• (Y,X) = Coordinate pair in 2d Map 1
Y = 0 + 0X
• (A,B) = Coordinate pair in 2d Map 2
E[A] 1 1 -2 X 1
E[B] 1 2 1 Y 2
1 = Horizontal Translation
2 = Vertical Translation
= Scale Transformation = SQRT(12
+ 22)
= Angle Transformation = TAN-1(2 / 1 ) +1800
= + +
Iff 1 < 0
26
Althoughr = 1,differ inlocation,scale, andangles ofrotationaroundorigin (0,0)
Horizontal& VerticalTranslation
Angle ofrotationaroundorigin (0,0)
Scaletransform,with < 1 ifcontration,& > 1 ifexpansion
27
Working Example
• Eight URSB Campuses– RD, BK, TO, RC, SA, RV, SD, TA
• Data Sources– Locations– Housing Attributes– Tapestry Attributes
• Data Analyses
28
Eight URSB Campuses
29
87.5 miles
88.1 miles
30
31
…
32
33
EXAMPLE: Eight URSB Campuses
34
35
SD
TA
RDRVRCBK
TOSA
36
… and if DISTANCES available, but COORDINATES Unavailable?
• Treat Distance Matrix as Dissimilarity Matrix
• Apply Multidimensional Scaling
• Apply the two-dimension solution “as if” it represents latitude and longitude coordinates
37
Distance Estimates Vary
… But Not “Significantly”
38
MDS RepresentationInput = D; Output = 2d
D8x8
39
SD
TA
RD
RVRC
BK
TO
SA
Errors“appear”
to bequitesmall
…BUT
is therea wayto test
if errorsare
“STATSIGNIF”
?
40
Mantel Test
41
Procrustean Test:MDS Map Recreation
CONCLUDE: Near-perfect Map Recreation
42
Driving Distances
Do these differ “significantly” from linear distances?
STATISTICAL PRACTICAL
43
DriveD = Driving DistancesEight URSB Locations
Multidimensional Scaling,with 2-dimension solution
44
SD
TA
RD
RVRC
BK
TO
SA
45
46
Bidimensional Regression:AB on YX
47
PROTEST Comparison
BidimensionalRegression
ProcrusteanRotation
48
Housing
49
Tapestry (ESRI)
50
Map Coordinates as Explanatory Variables in Linear Models
51
Incremental Tests
So Map Coordinates seem sufficient as predictors
52
Proxy Measures of lat-longin Linear Model
Translations& Transforms
Reduce 8
And ↑ R2
53
Robust criterionwould help here:
Min (Med(є2))
54
Bidimensional Regression
Is There a Linear RelationshipBetween Housing and Tapestry
Data?
r = 0.5449
MustStandardize
Data
55
56
It’s Still a 3-d World
57