spatial dependency modeling using spatial auto-regression
Post on 30-Dec-2015
47 Views
Preview:
DESCRIPTION
TRANSCRIPT
Spatial Dependency Modeling Using Spatial Auto-Regression
Mete Celik 1,3, Baris M. Kazar 4, Shashi Shekhar 1,3, Daniel Boley 1, David J. Lilja 1,2
1 CSE Department @ University of Minnesota, Twin Cities2 ECE Department @ University of Minnesota, Twin Cities3 Army High Performance Computing Research Center 4 Oracle USA
07/08/2006 Spatial Dependency Modeling Using SAR 2
Outline of Today’s Talk
• Motivation & Background
• Problem Definition
• Related Work & Contributions
• Proposed Approach
• Experimental Evaluation
• Conclusion & Future Work
07/08/2006 Spatial Dependency Modeling Using SAR 3
Motivation
• Widespread use of spatial databases Mining spatial patterns The 1855 Asiatic Cholera on London [Griffith]
• Fair Landing [NYT, R. Nader] Correlation of bank locations with loan
activity in poor neighborhoods• Retail Outlets [NYT, Walmart, McDonald etc.]
Determining locations of stores by relating
neighborhood maps with customer
databases• Crime Hot Spot Analysis [NYT, NIJ CML]
Explaining clusters of sexual assaults by
locating addresses of sex-offenders• Ecology [Uygar]
Explaining location of bird nests based on structural environmental variables
07/08/2006 Spatial Dependency Modeling Using SAR 4
Spatial Auto-correlation (SA)• Random Distributed Data (no SA): Spatial distribution satisfying assumptions of classical data
• Cluster Distributed Data: Spatial distribution NOT satisfying assumptions of classical data
Pixel property with
independent identical
distribution
RandomNest
Locations
Pixel property with
spatial auto-
correlation
ClusterNest
Locations
07/08/2006 Spatial Dependency Modeling Using SAR 5
Execution Trace
WEST21 )1,(SOUTH 111 j)1,(iEAST 111 )1,(
NORTH 12 ),1(
),(
qjp, ijiqj, p-iq-jp, i ji
qj p,iji
jineighbors
W allows other neighborhood definitions• distance based• 8-neighbors
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Space + 4-neighborhood
6th row
Binary W
6th row
Row-normalized W
Given:• Spatial framework• Attributes
0100100000000000101001000000000001010010000000000010000100000000100001001000000001001010010000000010010100100000000100100001000000001000010010000000010010100100000000100101001000000001001000010000000010000100000000000100101000000000001001010000000000010010
021002
1000000000003
1031003
10000000000
03103
10031000000000
002100002
1000000003
1000031003
10000000
041004
1041004
1000000
0041004
1041004
100000
00031003
10000310000
00003100003
10031000
0000041004
1041004
100
00000041004
1041004
10
000000031003
1000031
000000002100002
100
00000000031003
10310
000000000031003
1031
0000000000021002
10
07/08/2006 Spatial Dependency Modeling Using SAR 6
• Linear Regression → SAR• Spatial auto-regression (SAR) model has higher accuracy and removes
IID assumption of linear regression
εxβy εxβWyy
SDM Provides Better Model!
07/08/2006 Spatial Dependency Modeling Using SAR 7
Data Structures in SAR Model
• Vectors: y, β, ε
• Matrices: W, x• W is a large matrix
y
= + +
W x β ε
n-by-1 n-by-n
1-by-1 n-by-k k-by-1 n-by-1
y
n-by-1
07/08/2006 Spatial Dependency Modeling Using SAR 8
1
ln2
||lnMIN
)]()([
1||
1
BBA
yWIxxxxIBWIA
T
TT
n
n
Computational Challenge
• Maximum-Likelihood Estimation = MINimizing the log-likelihood Function
• Solving SAR Model– = 0 → Least Squares Problem– = 0, = 0 → Eigen-value Problem– General case: → Computationally expensive due to the
log-det term in the ML Function
framework spatialover matrix odneighborho -by- : parameter n)correlatio-(auto regression-auto spatial the:
nnW
Log-det termTheorem 1
β ε
SSE term
07/08/2006 Spatial Dependency Modeling Using SAR 9
Outline
• Motivation & Background
• Problem Definition
• Related Work & Contributions
• Proposed Approach
• Experimental Evaluation
• Conclusion & Future Work
07/08/2006 Spatial Dependency Modeling Using SAR 10
Problem Statement
Given: • A spatial framework S consisting of sites {s1, …, sq}
for an underlying geographic space G• A collection of explanatory functions fxk
: S k , k=1,…, K. k is the range of possible values for the explanatory functions
• A dependent function fy: y • A family of F (SAR equation) of learning model
functions mapping 1 x … x k y • A neighborhood relationship (4 and 8- neighbor) on
the spatial frameworkFind:
• The SAR parameter and the regression coefficient vector with a desired precision to save log-det computations.
07/08/2006 Spatial Dependency Modeling Using SAR 11
Problem Statement – Cont’d
Objective: Algebraic error ranking of approximate SAR model
solutions.Constraints:
• S is a multi-dimensional Euclidean Space, • The values of the explanatory variables x and the
dependent function (observed variable) y may not be independent with respect to those of nearby spatial sites, i.e., spatial autocorrelation exists.
• The domain of x and y are real numbers.• The SAR parameter varies in the range [0,1), • The error is normally distributed with unit standard
deviation and zero mean, i.e., ~N(0,2I) IID • The neighborhood matrix W exhibits sparsity.
07/08/2006 Spatial Dependency Modeling Using SAR 12
Related Work
Exact Estimate
Matrix Exponential Specification [Pace00]
Graph Theory [Pace00]
Taylor Series [Martin93, Kazar04, Shekhar04]
Chebyshev Poly. [Pace02, Kazar04,Shekhar04]
NORTHSTAR [Kazar05-06] Semiparametric Estimates[Pace02]
Characteristic Poly. [Smirnov01]
Double Bounded Likelihood Estimator[Pace04]
Upper & Lower Bounds via Div&Conq [Pace03]
SAR Local Estimation[Pace03]
Gauss-Lanczos [Bai, Golub98,Kazar05-06]
Matrix Exponential Specification[LeSage00]
MCMC [Barry99,LeSage00]None
Maximum Likelihood
Bayesian
Eigen-value based 1-D Surface Partitioning[Li96,Kazar03-04]
Direct Sparse Matrix Algorithms [Pace97, Kazar05]
07/08/2006 Spatial Dependency Modeling Using SAR 13
Contributions
• A new approximate SAR model solution: Gauss-Lanczos approximation method– Key Idea: Do not find all of the eigenvalues of W
• Error ranking of approximate SAR model solutions
)|()|(
))|((1
yy
y
d
df
07/08/2006 Spatial Dependency Modeling Using SAR 14
Outline
• Motivation & Background
• Problem Definition
• Related Work & Contributions
• Proposed Approach
• Experimental Evaluation
• Conclusion & Future Work
07/08/2006 Spatial Dependency Modeling Using SAR 15
Gauss-Lanczos Approximation
n
i
i
rIm
tr1
)(~~ 1
))(ln(ln WIWI
• Log-det is approximated by transforming the eigenvalue problem to the quadratic form.
• Finally, Gauss-type quadrature rules are applied using Lanczos procedure
07/08/2006 Spatial Dependency Modeling Using SAR 16
How does GL Method Work?
rr
rr
r
a
a
a
a
T
1
11
2
221
11
0...0
0......
...0
0...0
• GL (Algorithm 3.2) is repeated m (i.e., 400) times in our experiments• Parameter r varies between 5 and 8 in our experiments. • For large problem sizes, the effects of m and r for getting good solution are low.
07/08/2006 Spatial Dependency Modeling Using SAR 17
Taylor’s Series Approximation
• Log-det term in terms of Taylor’s Series– Trace is sum of eigen-values & W is symmetrized neighborhood matrix
SSE stage (Stage C)
One Dense Matrix (n-by-n) and Vector
(n-by-1) Multiplicatio
n
2 Dense Matrix (n-by-k) and
Vector (n-by-1)
Multiplications
3 Vector (n-by-1)
Dot Products
Scalar
Operation
2ˆ,ˆ,ˆ β Golden Section
search
Calculate ML Function
W~
, W, ρ , x, y
Taylor’s Series Expansion applied to
||ln WI
bestfit
ML Function
Value
Similar to Stages A & B
q
kk
ktracek|
1
)(|ln
WWI
07/08/2006 Spatial Dependency Modeling Using SAR 18
Chebyshev Polynomial Approximation
• Log-det term in terms of Chebyshev Polynomials – Trace is sum of eigen-values, Ts are matrix polynomials, cs are Chebyshev
polynomial coefficients
SSE stage (Stage C)
2ˆ,ˆ,ˆ β One Dense
Matrix (n-by-n) and Vector
(n-by-1) Multiplication
2 Dense Matrix (n-by-k) and Vector
(n-by-1) Multiplication
s
3 Vector (n-by-1)
Dot Products
Scalar
Operation
Similar to Stages A & B
Chebyshev Polynomial applied to ||ln WI
Chebyshev Polynomial Approximation
W~
q Golden Section search
Calculate ML
Function
W, W~
, ρ ,x,y
bestfit
ML Function
Value Trace of n-by-n dense matrix
Chebyshev coefficients
)(jc
q-1 dense n-by-n
matrix-matrix multiplications
1
111 )(
2
1))(()(||ln
q
kkk cTtracec WWI
07/08/2006 Spatial Dependency Modeling Using SAR 19
Outline
• Motivation & Background
• Problem Definition
• Related Work & Contributions
• Proposed Approach
• Experimental Evaluation
• Conclusion & Future Work
07/08/2006 Spatial Dependency Modeling Using SAR 20
Experiment Design
Factor Name Parameter Domain
Problem Size (n) 400, 1600, 2500 observation points
Neighborhood Structure
2-D with 4-neighbors
Candidates • Exact Approach (Eigenvalue Based)• Taylor's Series Approximation• Chebyshev Polynomial Approximation• Gauss-Lanczos Approximation
Dataset Synthetic Dataset for =0.1, 0.2, ….., 0.9
SAR Parameter [0,1)
Programming Language
Matlab
07/08/2006 Spatial Dependency Modeling Using SAR 21
Exact and Approximate Values of Log-det
• GL gives better approximation while spatial autocorrelation increases
07/08/2006 Spatial Dependency Modeling Using SAR 22
Absolute Relative Error of Approximations
• Absolute relative error of approximation goes down as spatial autocorrelation increases (GL Mean error % 0.9, GL max error % 1.78)
07/08/2006 Spatial Dependency Modeling Using SAR 23
Conclusions
• GL is slightly more expensive than Taylor series and Chebyshev polynomials.
• GL gives better approximations when spatial autocorrelation is high and the problem size is large.
• GL quality depends on the number of iterations and the initial Lanczos vector and the random number generator.
• No need to compute all eigenvalues.
07/08/2006 Spatial Dependency Modeling Using SAR 24
` Acknowledgments
• AHPCRC• Minnesota Supercomputing Institute (MSI)• Spatial Database Group Members• ARCTiC Labs Group Members• Dr. Dan Boley• Dr. Sanjay Chawla• Dr. Vipin Kumar• Dr. James LeSage • Dr. Kelley Pace• Dr. Pen-Chung Yew
THANK YOU VERY MUCHQ/A
top related